TDWI Upside - Where Data Means Business

Lessons from Facebook: Can We Defeat Databuse?

We in the information technology and data management disciplines must step up to an engaged and active role in educating the public and businesspeople who are not fully aware of the dangers of abuse of personal and related data.

Do you consider yourself to be in the big data business? Of course, we all are -- from data harvesters to hackers, from data scientists to the unknowing subjects whose data is the source of all analytics. We are all part of the giant sausage-making machine that grinds the grist of daily life into the holy grail of modern society: data-driven, convenient consumption.

Among those of us who build and tend the massive data mincer, the recent news headlines about Facebook and Cambridge Analytica should come as no surprise. If you haven't been following journalist Carole Cadwalladr's exceptional year-long exposé that culminated in the revelation that more than 50 million Facebook users had unwittingly contributed to vast psycho-sociological experiments on citizens of multiple democracies, not least the U.K. Brexit referendum and the 2016 U.S. presidential election, now might be a good time to catch up.

Pay particular attention to Cadwalladr's profile of whistleblower Christopher Wylie: "He was clever, funny, bitchy, profound, intellectually ravenous, compelling. A master storyteller. A politicker. A data science nerd." It was he, allegedly, who conceived and implemented the data gathering and analytics behind the social media campaigns in the U.K. and U.S. that possibly (or maybe probably) influenced the outcomes of both votes. Do we not admire and promote similar skills and behaviors in our industry?

Still in his early twenties when he mastered data science, Wylie was a most unlikely alt-right conspirator. A self-described gay, Canadian vegan, he says he somehow ended up creating "Steve Bannon's psychological warfare mindf**k tool." For Wylie, the data was easily obtainable, the analytics possible, the processing power available, the intellectual stimulation irresistible. Others provided the finance and the ideology. Here, surely, is a clear warning for young, clever but naïve data scientists and the companies offering them huge incentives.

Those Who Forget History...

Reading Wylie's story reminded me of my own naïveté 30 years ago when I was tasked to architect the first data warehouse, and a few years later when I consulted for an early data mining loyalty-card program. The simple, straightforward objectives of those projects -- improve management reporting and drive customer retention -- gave me little cause for ethical concern. Nonetheless, in the intervening years, the burgeoning growth in data and near-exponential increase in processing power have combined to leave such modest objectives flailing about for relevance in modern business.

In effect, data and technology have been weaponized. From tracking business measures such as sales or profit margins, we have moved to surveilling the entire population seeking signs of possible interest, potential purchase, or probable attrition. From attempting to improve internal processes and employee productivity, we now seek to nudge the behaviors of each and every potential or existing customer -- just about every last soul -- to improve the bottom line of every business. As recognized by Cambridge Analytica, among many others, the techniques of data mining and advanced analytics can be applied to influencing social and political behavior just as well as to consumer activity.

It seems that the social and political uses to which China is reportedly putting big data and analytics are also being attempted, albeit nefariously, in Western democracies.

In a December 2016 Upside article, I described a future environment where manipulation of opinion on an individual level was possible. I and, indeed, Peter Diamandis (whose predictions underpinned the article) were mistaken. It was not the future. Much of it was already past. I opined that the result "amounts to a complete subversion of the electoral process via big data and AI." It would appear that outcome has already been almost achieved.

...Are (Not Necessarily) Condemned to Repeat It

In the Cambridge Analytica affair, Facebook CEO Mark Zuckerberg apologized via full-page advertisements in multiple U.K. and U.S. newspapers over the March 24th weekend: "This was a breach of trust and ... I promise to do better for you."

He misses the point. "Investigating every single app that had access to large amounts of data" ignores the underlying issue. Facebook's business model is built entirely on harvesting as much personal data as possible, using it internally, and offering it -- or insights from it -- to as many paying advertisers, intermediaries, or other interested parties as possible. It is not in Facebook's interest to turn off or significantly limit that data flow. Google, despite the company's AI researcher François Chollet's Tweet stream attacking Facebook's behavior, has a similar business model.

This problem can be tackled only though economic and societal action on a broad front. The advertising-based business model of most Internet businesses must be radically changed. Says Tim Berners-Lee in an open letter marking 29 years of the Internet, there are two myths: "[T]hat advertising is the only possible business model for online companies, and ... that it's too late to change the way platforms operate. ... [W]e need to be a little more creative." He proposes a legal or regulatory framework that takes account of social objectives. That's easier said than done.

We must dismantle -- or, at least, partially disarm -- the giant data sausage-making machine we have created in four distinct but related areas:

1. The ultimate customers of this sausage-making machine. These are the organizations -- commercial, governmental, political, etc. -- that believe they have the right to mine and manipulate public opinion and behaviors. This is the most difficult area to influence, because it is underpinned by an ideology of entitlement to power or profit. However, there is a feedback loop in operation here from the fourth area which can help.

2. The mungers and mincers of data. Their role is to turn raw data into useful and actionable insights for their customers. Many of these companies operate almost invisibly in the data ecosystem. In the commercial field, the likes of Acxiom and Experian, Equifax and Oracle "aggregate, combine, and trade massive amounts of information collected from diverse online and offline sources on entire populations." Cambridge Analytica performs a similar role in the political sphere. Such operations must be subjected to worldwide, rigorous regulation of what data they access, how they combine it, and to what purposes.

3. The data harvesters. This group includes Facebook, Apple, Google, Twitter, Amazon, and more. As Marie Wallace, IBM Technical Strategist, said almost a year ago, "Collecting data is a byproduct of living in a digitally interconnected world and it's going to be impossible, in practical terms, to stop this." Here, a clear definition of digital rights and the enactment of data control legislation are required. In addition, in each of the areas where the above companies dominate, anti-monopoly actions will likely be needed to rein in their power. [

4. The general public -- the grist to the mill. Here, the requirement is for education in the real implications for both commerce and society in general of unrestricted collection and use of personal data. As we've seen with the #DeleteFacebook campaign, there is at least a nascent appetite among the public for action in support of privacy. Note also that it is the public who are the customers and supporters of those companies and organizations in the first category above. There is leverage here that can and should be exploited.

Legislation and Regulation

The concern, of course, is that the current Western political establishment is unwilling or unable to legislate or regulate in any of the areas just mentioned. This is where we in the information technology and data management disciplines must step up to an engaged and active role. Taking an informed and ethical stance will mean educating the public and businesspeople who are less aware of the dangers of databuse, embedding privacy principles in all products, declining to build or maintain dubious tools or data stores, and ultimately -- like Wylie -- coming forward as whistleblowers when all else fails.

Are you ready for the challenge?

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.