TDWI Upside - Where Data Means Business

Fake News: Analytics Only Scratches the Surface

Current attempts to weed out fake news focus on analytics and algorithmic approaches. They miss more intractable social and psychological issues that must be addressed.

The Web is awash with worry. Fake news may have swung the U.S. election to Donald Trump. Fake news is distorting social media discourse toward more radical and polarized positions. Fake news in boosting the economy of a small town in Macedonia and diverting American advertising dollars to scammers. We must fix Facebook and tidy up Twitter.

So the stories go.

A December 6th news story reports that Facebook may be testing a new tool to combat fake news. It's not the first such report in recent weeks. Mark Zuckerberg said in a November 18 post: "The bottom line is: we take misinformation seriously. Our goal is to connect people with the stories they find most meaningful, and we know people want accurate information." His message was somewhat tarnished by the ironic presence of two fake news advertisements beside it on the Facebook page. Readers were quick to poke fun, but there are more serious underlying issues. How does Zuckerberg's post itself differ from fake news? How could one distinguish information from misinformation even within this post?

Zuckerberg offers no proof that people actually do want accurate information, so we have to ask: they don't? Really? The idea that one knows what other people want is associated with a psychological concept known as MasterTalk, the purpose of which is "to ensure the dominance of a single view with the concomitant extinguishing of other viewpoints." In politics, particularly in recent years, we have seen many examples of MasterTalk.

In the same 2003 article with the term defined, the author, Al Turtle, provides an example that uncannily presages some recent MasterTalk in the election campaign (emphasis mine):

Client: "George Bush is a crook!"

Therapist: "So, you believe George Bush is a crook. Did I get that?"

Client: "No. It is fact that he's a crook!"

Turtle classifies the response here as indicative that the client is high on the Tyrant scale, may intend to be threatening, and is unlikely to be willing to shift into any dialogue or negotiation about the reality of the statement. The first statement is, in essence, no more than a piece of information. Its veracity can be proven or disproven, at least within the limits of jurisprudence. However, its truth is largely irrelevant to its speaker; it's a belief. Its value is in its emotional effect.

Turning to Twitter, the problem is immensely exacerbated by the 140-character limit for posts. A recent opinion piece in the Washington Post suggests it's time to ban Donald Trump from Twitter. Although (in my opinion) somewhat tongue-in-cheek, the article touches obliquely on the question of how fake news works: "Like Pavlov's dogs salivating over every ding, we [the press corps] cannot help ourselves when it comes to the president-elect's Twitter feed. ... [W]hatever dumb thing Trump tweets is sure to dominate the headlines and cable chyrons every day."

The information content (true or false) of fake news is irrelevant. Its power comes from its ability to engage the reader at an emotional level. Whether you respond or retweet in total agreement or horrified outrage, its author has gained the attention s/he craves. The objective is achieved.

Debates about clearing fake news off Facebook or Twitter, therefore, miss several key points.

Information vs. misinformation is not a black-and-white issue; we're looking at shades of gray (perhaps fifty of them). Artificial intelligence and analytics can certainly distinguish between the more extreme ends of this spectrum, but the middle is murky. Where does advertising fall? Which religious beliefs will be deemed true or false? What about political promises or attacks during election campaigns? The learning algorithms will need a large set of training data to sort out our dysfunction. Unfortunately, the selection and tagging of this training set will be subject to the same dysfunction driving serious debate and significant bias.

Nevertheless, organizations from all areas of media and technology are attempting to apply algorithms to automate the process of weeding out fake news. One simple starting point is to build a blacklist of sites that generate fake news and a whitelist of verified, trusted sources. Other approaches try to identify suspicious poster behaviors or build context around news items and use that to cross-check claims. It's still too early to judge the success of such approaches, especially given concerns among the major social media players about curbing free speech or becoming "arbiters of truth".

Killing fake news will be -- like eliminating e-mail spam and virus propagation -- an unending Whac-a-Mole game. Catching and deleting the initial occurrence before the chain reaction ignites will be very difficult, and once exponential growth begins, removing all cross-references is impossible. As already seen in the use of analytics and algorithmic approaches to tackling spam, an arms race can quickly develop as fake news providers up their game or change attack vectors. A Bloomberg article 12 days before the U.S. election described the use of "Facebook 'dark posts' -- nonpublic posts whose viewership the campaign controls" which effectively circumvents broad analytics action.

Social media has created an extremely efficient, but deeply disjointed, information propagation mechanism. Its core characteristics of equality of speech and access have deeply devalued the concept of specialized expertise in any topic as a way of judging the truth of a proposition. Facts have likewise been devalued, as discussions on climate change and creationism clearly illustrate. As communities fragment and turn inwards, diatribe displaces discourse within siloed echo chambers; derision substitutes for reasoned discussion between rival viewpoints.

Moreover, with an advertisement-funded business model, sensational stories are proven to drive clicks. Analytics and algorithms will need to balance likely vague "truth-seeking" goals against an easily measured profit motive.

The fake news panic is only a symptom of a much deeper problem. In social media and always-on mobile connectivity, we have created an environment where pausing for thought, nuanced opinions, and rational discourse are displaced and overwhelmed by instant, emotive reactivity. Facts fail us. Information becomes impotent. Only manipulation matters.

The results of these trends became clear in 2016. A fundamental re-evaluation of social media, in its drivers and effects, its control and its use, is urgently needed.

About the Author

Dr. Barry Devlin defined the first data warehouse architecture in 1985 and is among the world’s foremost authorities on BI, big data, and beyond. His 2013 book, Business unIntelligence, offers a new architecture for modern information use and management.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.