TDWI Upside - Where Data Means Business

Using Graph Analytics to Predict Outcomes in Game of Thrones

Data scientist Clement Fredembach used a combination of graph analytics and machine learning to predict the deaths of major characters in “Game of Thrones.” Of course, algorithms can only take you so far.

Spoiler alert: This article discusses major plot developments in both “Game of Thrones” (the HBO series) and A Song of Ice and Fire, the books on which the HBO series is based.

The death of family patriarch Eddard Stark is one of the most shocking moments in A Game of Thrones, the first book in George R. R. Martin’s A Song of Ice and Fire chronicle.

New fans are almost reliably tripped up by it, whether they’re queuing up episode nine of “Game of Thrones” on HBO or turning to Chapter 60 in the book of the same name. It’s hard to see coming.

It probably shouldn’t be, however. Data scientist Clement Fredembach used a combination of graph analytics and machine learning to predict the deaths of Eddard Stark, Khal Drogo, Renly Barratheon, Catelyn Stark, Robb Stark, and even King Joffrey, the Justin Bieber of Westeros.

Fredembach, a data scientist with Teradata’s Australia/New Zealand unit, discussed his predictions during a presentation at September’s Teradata Partners 2016 Conference in Atlanta.

The Prediction Method and Its Limitations

First, he used graph analysis (implemented in Teradata’s Aster Discovery appliance) to identify and graph the relationships among all the major characters in the A Song of Ice and Fire universe. His source was the text of all five books in the series, but he did not employ text analytics in his predictive analytics work, only graph analysis.

Fredembach then used a machine learning algorithm called Loopy Belief Propagation (LBP) -- basically, an inferencing algorithm for graphs -- to predict which characters would live and which would die. He quickly discovered that in the “Game of Thrones” universe, almost all death is untimely -- but most deaths are predictable.

At least they are at first. Fredembach’s experiment illustrates both the strengths and the limitations of predictive analytics. Because of the bias built into George R. R. Martin’s fictional universe, the long-term trend line tends toward death.

More characters die -- and are gone for good -- than are replaced by new characters. The upshot is that the accuracy of Fredembach’s model begins to break down with the third book (A Storm of Swords) at right about the time Joffrey -- the petulant, megalomaniacal, psychopathic king of Westeros -- meets his end.

Fredembach spun this as a feature, not a bug.

Analytics Models Lose Accuracy

In production usage, too, the accuracy or power of a predictive analytics forecast tends to diminish over time as conditions and circumstances change. Analytics models need to be retrained -- or, in some cases, scrapped altogether.

In Fredembach’s model, for example, life and death are binary representations: a value of 0 represents absolute confidence a character is alive; a value of 1, absolute confidence they’re dead. The probability any one character is alive or dead is expressed as a fractional value between 0 and 1. A value of 0.50 is non-indicative.

What happened with Fredembach’s analytics model is pretty much what happens with all such models: it started out strong, nailing the deaths of Eddard Stark, the fearsome Khal Drogo, and the tragic Renly Barratheon. By the third book, however, the model had lost a good bit of its predictive power.

It did predict the death of Joffrey -- albeit with weak (0.56 probability) confidence. The model also predicted the death of Bran Stark -- a major character who is still very much alive -- with the same 0.56 likelihood. It determined that matriarch Catelyn Stark (one of the victims of the notorious “Red Wedding”) was dead with a value of 0.78, but missed on Barristan Selmy (still alive in the books), predicting him dead with a value of 0.77. Finally, Robb Stark, murdered along with his mother in the Red Wedding’s climactic massacre, is only weakly dead (0.54) according to the model.

“Joffrey's death [reveals] a certain limitation [with the algorithm],” Fredembach told attendees, noting that, over time, his model skews toward death. “There's a reality of doing these things. The reality is we have something that doesn't work, so we need to understand [why that’s the case].”

Changing Model Accuracy Can Give Insight into Source Data

The model reveals something else, too. The first three books in the series were all released between 1996 and 2000; the publication of A Storm of Swords, the third book in the series, followed the second by a mere 21 months. Then book four (A Feast for Crows) didn’t appear for five years, with book five (A Dance with Dragons) following six years after that.

The breakdown of Fredembach’s model begins to accelerate when the story Martin is telling starts to lose focus: the tight-knit plotting of the first two books gives way to the relatively rushed -- but still concise -- structure of the third book. Thereafter, Martin struggled for more than a decade to write books four and five.

You can actually see this in the graphs, Fredembach said. “Where the story becomes less structured, [the] graphs are less dense, making [fuzzy belief] propagation more futile,” he said. Referring to books four and five: “The last two books are very disjointed works -- [Martin] really doesn't understand where the story is going anymore.”

Beyond the Case Study -- Applying Social Graphs

Fredembach’s experiment had both a pro-Teradata bent -- Aster’s built-in SQL-GR graphing engine makes this kind of analysis relatively easy to do, he claimed -- and universal applicability. It’s comparatively simple to transform text into a social graph, he noted, and LBP is “a unique algorithm [with which] to infer likelihoods from limited ground truth and no hindsight.”

As the breakdown of his model demonstrates, however, even the most advanced analytics tools must be complemented with domain knowledge, facts-on-the-ground experience, and research.

“You need to know what the data is and where it comes from,” he said. “Working with the data and getting familiar with it gives you hints and insights on what you can do with the data afterward.”

As for Martin’s forthcoming sixth book, The Winds of Winter, Fredembach says things aren’t looking good for Walder Fray -- good riddance to bad rubbish! -- Brienne of Tarth, Samwell Tarly, Cersei, Jaime, and Sansa. In the “Game of Thrones” world, virtually no one’s left standing when the curtain falls.

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at evets@alwaysbedisrupting.com.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.