Big Data: What Your Colleagues Are Doing

How are enterprises adopting big data and what challenges do they face?

I’m just coming off completing a research report about how enterprises are using big data. I co-authored a study with Tom Davenport -- Big Data in Big Companies -- in which we talked to over 20 large firms that were doing exciting things with big data. Discussing why these companies are acquiring big data solutions, what they’re doing with them, and what they plan to do next has run the gamut from exciting to downright inspirational.

It wasn’t what I expected. I thought we’d hear a lot of groaning from these executives about the cost and value proposition of big data, such as “We have 563 terabytes of data on our data warehouse. Do you mean to tell me that’s not big data?” I actually did have someone say that to me, but he was the exception, not the rule.

In general, we found that these executives were either doing new things with big data or doing things they’ve already been doing, only faster.

Taking that latter point, we talked to several companies that were busy re-doing their data transformations, leveraging big data projects like Hadoop, MapReduce, and Impala to accelerate their data transformations and sunset some legacy ETL code. Big data projects like these are optimized around highly computational, rules-based processing. Why juggle all those crazy ETL transforms if you can shove them into Hadoop and store them in a relatively low-cost storage system like HDFS (Hadoop Distributed File System)? Between the cost savings on the ETL jobs, the labor, and the storage systems, these uses for big data fund themselves quickly, and they position big data solutions for other, newer uses.

Which brings me to companies using big data technologies to drive new capabilities. These companies are turning their heads to unstructured and semi-structured data for the first time. Suddenly social media transactions, Web logs, data from mobile devices, and sensor data can play a role in smarter business decisions. For instance, a bank I spoke to is using Hadoop to keep social media threads in context.

For example, someone named George tweets: “Bank X should win an award for the slowest tellers ever!” Then George’s friend Gretchen responds with the tweet, “I agree Bank X should win that award!” Sentiment analysis on Gretchen’s tweet might indicate that she’s a raving fan, but in the context of George’s tweet, she’s clearly not a fan. In fact, she has a high propensity to defect.

Big Data Challenges

The other thing that we’re learning is that big data is bigger than customer data. We can use emerging big data solutions to monitor liquid levels underground, predict seismic events, track the proliferation of cancer cells, and detect crime hot spots before criminal behavior occurs. Big data has tremendous societal promise—and risks if we misuse it.

That’s where the challenges lie. As companies and their websites, advertisers, loyalty programs, cloud service providers, and smartphone apps collect ever more data about purchases, preferences, and interests, they can combine that data and individualize it more easily than ever. Add the fluid data security policies at many companies and you risk revealing personal data about tens of millions of unwitting consumers.

Industry specific regulations, such as HIPAA in healthcare or Solvency II in banking, only go so far. Broader attempts at protecting consumer privacy are being debated in the European Union—which is attempting to give E.U. citizens more legal control over their own personal data, including the right to transfer personal information between companies (much like phone number portability is enforced for wireless carriers). In the United States, pressure is mounting for Congress to approve a “consumer privacy bill of rights” that would expand the rights of Americans to not only control but administer their own data.

The irony is that the big data crowd is talking more about the “big” than the “data.” Go to any big data conference and you’ll find two popular session tracks: One track will be about the platform, with conference rooms teeming with technologists eager to learn more about Hadoop and its many related open source projects. The other track will inevitably be about analytics, usually the focus of managers intent on upping their game in digital marketing or social media analytics.

In both cases, the general emphasis seems more about the “big” and less about the “data.” As the collective consciousness shifts from “Wow, we’ve got a ton of non-traditional data here!” to “How do we ingest and use all that data?” -- the skills, processes, and tools necessary to manage exploding amounts of non-standard data will be in-demand and -- arguably -- scarce, which means data skills will be a hot commodity. It may also mean career advancement for BI professionals like us, who have lived the data drama for a long time, and have many stories to tell.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.