TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

TDWI Articles

00 Days

00 Hrs

00 Min

00 Sec

Can AI Fix Fake Facts? Maybe Not

Evaluating text with AI and machine learning may not be sufficient to identify "fake news."

By Brian J. Dooley
May 8, 2017

Fact-checking is extraordinarily complicated. Notably publicized during the recent U.S. election, "fake news" now travels fast and furiously through social and news media alike.

The issue is not limited to the news; widespread rumors and misinformation are inherent in social media discussions and in the growing information overload faced by professionals. The speed, volume, and complexity of information needed for decision making are thwarting our ability to gauge data accuracy. It is a familiar big data problem, but there is no big data solution.

The interest in "fake news" has driven a number of initiatives that use artificial intelligence (AI) and machine learning to vet news stories. This has revealed some of the shortcomings of the current state of AI and has expanded the conversation about data quality issues.

The Scope of the Problem

In traditional fact-checking, individuals look at data sources (such as social feeds) and score their contents for accuracy. This has been done on the Web with varying success, most notably by Snopes.com, the familiar debunking site for hoaxes and illegitimate news. The human capability to make these judgments, even using crowdsourcing, takes too long to keep up with the speed and ferocity of today's social-media-driven conversations. An audience might be bombarded with a cloud of statements, each of which must be evaluated. By the time the fact-checking is complete, the next cloud has arrived and the audience has moved on.

Combating the rise of fake news is difficult for a number of reasons. Volume and velocity are certainly important; at the same time, the definition of what constitutes fake news (versus disputed accounts, biased information, satire, or exaggeration) becomes subject to interpretation. The principal issue is the impossibility of submitting every claim to a human vetting process. Attention has turned to artificial intelligence.

Numerous attempts have been made to establish AI-based news checking. Perhaps the most prominent has been Facebook's several attempts to clear up issues in its "Trending" news service. In its ongoing struggle against "clickbait" and news curation bias, it has used machine learning with some success, although the results are reportedly uneven.

British fact-checking organization Full Fact offers a composite service that includes machine learning as well as services that check statistics and real-time reporting. Another service, Claimbuster, focuses specifically on spotting factual claims and uses machine learning to sort through massive amounts of text to find items to fact check.

As with many things AI-related, it has become clear that identifying false information is a nuanced issue, and AI capabilities need to be exercised in a more complicated way than merely applying a set of algorithms to a stream of tweets.

Need for a Hybrid Solution

The problem with a purely AI approach is that machine evaluation based on content alone would require a general artificial intelligence capable of human-like understanding. However, there are approaches that demand only that the surrounding environment be checked for indicators of falseness, just as human fact-checkers have always done.

You can look at a wide variety of indicators, such as the sources of the statement and their past trustworthiness; the way the information is stated; the responses that it elicits online; who supports, favorites, or retweets it; and, in a more sophisticated approach, how it fits within a network of similar statements and narratives. Machine learning or AI can be applied in such an approach, but human involvement is inevitably required.

Many groups are taking on the search for this hybrid solution. The Fake News Challenge is a competition to find the best way to vet news stories with a hybrid machine/human approach. At the same time, Google, Facebook, Twitter, YouTube, Instagram, Periscope, and a growing number of media companies have joined the First Draft coalition to confront the problem of online misinformation, though they have not specified a methodology.

Purely manual fact-checking has been provided for decades through primarily manual sites -- these, like Full Fact, will now be forced to incorporate more hybrid and AI-based technology.

Consequences and Conclusions

Fact-checking of structured or semistructured information in a limited context by an algorithmic or AI approach is already possible. Statcheck provides a routine to check statistics in academic papers, and jEugene runs a machine learning process to check a variety of issues in legal documents.

Broader news and political statement checking is a much more difficult problem. It is more complex than spam detection or any of the limited domain problems to which AI is now being successfully applied. As such, it defines the current boundary between problems that AI can solve alone and what requires human intervention.

About the Author

Brian J. Dooley is an author, analyst, and journalist with more than 30 years' experience in analyzing and writing about trends in IT. He has written six books, numerous user manuals, hundreds of reports, and more than 1,000 magazine features. You can contact the author at [email protected].

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

TDWI Articles

Can AI Fix Fake Facts? Maybe Not

Related Articles

Trending Articles

From Reactive to Proactive: Automating Data Quality in Petabyte-Scale Analytics Pipelines

From Pilot to Production: Why LLM Features Stall, and a Readiness Checklist for Data Leaders

The Inferencing Cost Problem No One Is Talking About: Unstructured Data Quality

The Hidden Cost of Poor Training Data in Generative AI

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

TDWI Articles

Can AI Fix Fake Facts? Maybe Not

Related Articles

Trending Articles

From Reactive to Proactive: Automating Data Quality in Petabyte-Scale Analytics Pipelines

From Pilot to Production: Why LLM Features Stall, and a Readiness Checklist for Data Leaders

The Inferencing Cost Problem No One Is Talking About: Unstructured Data Quality

The Hidden Cost of Poor Training Data in Generative AI

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career