By using website you agree to our use of cookies as described in our cookie policy. Learn More

TDWI Upside - Where Data Means Business

Data Stories: Learn Text Analysis from Pop Lyrics, The Federalist Papers, and Jane Austen

Learn how to analyze text with compression, statistics, and n-grams using examples from pop culture, history, and literature.

Language Compression and Pop Lyrics

The Pudding 

Do you know how you could use compression to analyze language? These fun charts and interactive visualizations from The Pudding demonstrate by measuring repetition in song lyrics.


Bayesian Statistics and the Federalist Papers


The Priceonomics blog presents this in-depth article on using statistics to determine the authors of anonymous works -- from counting words by hand on paper slips to today’s automatic systems.


N-grams and Jane Austen 

How would you identify patterns of words that occur together? Data scientist Julia Silge uses n-grams to track differences in how male and female characters are described in Austen’s novels, and she explains every step of her process in this blog post.

About the Author

Lindsay Stares is a production editor at TDWI. You can contact her here.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.