Data Stories: Learn Text Analysis from Pop Lyrics, The Federalist Papers, and Jane Austen

Learn how to analyze text with compression, statistics, and n-grams using examples from pop culture, history, and literature.

Language Compression and Pop Lyrics

The Pudding 

Do you know how you could use compression to analyze language? These fun charts and interactive visualizations from The Pudding demonstrate by measuring repetition in song lyrics.


Bayesian Statistics and the Federalist Papers


The Priceonomics blog presents this in-depth article on using statistics to determine the authors of anonymous works -- from counting words by hand on paper slips to today’s automatic systems.


N-grams and Jane Austen 

How would you identify patterns of words that occur together? Data scientist Julia Silge uses n-grams to track differences in how male and female characters are described in Austen’s novels, and she explains every step of her process in this blog post.

