Big Data's Big Themes at Strata Conference
Big data took New York by storm at the annual Strata conference, coincidentally on the anniversary of Super Storm Sandy.
By Cindi Howson, BI Scorecard
The recent Strata conference in New York drew only 700 people just a few years ago. This year, attendance rose to 3400, and the conference was sold out weeks in advance -- a testament to the interest, enthusiasm, and innovation in big data. Key trends from this year's conference included a dose of big data reality, solutions to help business analysts get to all data, the ease of search, and visualizing all that data.
Big Data Grows Up
A few years ago, some people in the big data world were predicting that Hadoop and NoSQL technologies would bring the end to the data warehouse as we know it. Slowly, there seems to be a more pragmatic view of the role that Hadoop has in the enterprise. Facebook chief analytics officer Ken Rudin, formerly of Zynga, declared that "Big data isn't about the technology -- it's about the business needs." He described how Facebook started its analytics with Hadoop but has since added a traditional data warehouse to their analytics portfolio. It's all about the right technology for the particular data and type of analytics, and although Hadoop may be great for storing data, it's not so good for the analytics.
To that end, MongoDB gave a great presentation on why they store big data in real time and why they believe Hadoop is better suited to offline data. As examples, two case studies in my just-published book (Successful Business Intelligence 2/E: Unlock the Value of BI and Big Data) reflect exactly these use cases. FlightStats, which tracks thousands of flights in real time, is using MongDB to serve up that information to millions of customers. The University of California, Irvine Health, on the other hand, is using Hadoop to store patient medical records, saving hundreds of thousands of dollars versus storing data in the electronic medical records system that runs on a relational database.
Cloudera's chief strategy officer Mike Olson announced Cloudera 5, which acts as an enterprise data hub from which the data warehouse can draw granular data on demand. It includes better security and data lineage. Also announced was Cloudera in the Cloud via its partners, initially Verizon and IBM.
Business Analysts Unleashed
Much of the big data community has initially catered to the statisticians and data scientists, who are skilled in math, statistics, programming, and the business. It's a job that requires a broad range of skills and expertise that is in short supply. In trying to address that talent shortage, several vendors are bringing capabilities to the business power user, but without the need to write SQL or MapReduce jobs. The original BI module that first helped unlock data in relational databases in the 1990s was the business or ad hoc query tool. A semantic view or business meta data layer helped these power users get to their data without knowing SQL, but only once that data was loaded into a data warehouse or mart.
The semantic layer and the data warehouse have been increasingly bogged down and maintained by IT. At the same time, power users are trying to get to a broader range of data sources (from Hadoop to relational to XML files) cleansed, transformed, and mashed together on their own terms. These power users are not data scientists who want to code in MapReduce or SQL; they still want to point and click, wherever the data lives. I'm dubbing this new category of tools "Business ELT"
New solutions in this category include Microsoft PowerBI. The company touted its cloud-based self-service BI solution, currently in community preview, in the keynote preseentation. The Power Query module allows a business user to connect to multiple data sources, transform and cleanse the data, and create a view that others can readily consume via native Excel or Power View, its visual data discovery solution. [Editor's note: A detailed review of its capabilities, strengths, and weaknesses, can be purchased at BI Scorecard.]
Meanwhile, Dell acquired Quest Software last year, best known for TOAD that helps DBAs write and optimize SQL queries. TOAD BI, however, is aimed at business power users, allowing them to mash together multiple data sources and rapidly create subject areas. TOAD BI also has a good differentiator in that it can access an SAP BusinessObjects universe or Oracle BI EE data model, leveraging existing semantic models.
Start-up vendor Paxata showcased what they describe as the "industry's first adaptive data preparation platform for business users." The tool has built-in algorithms so it detects both join relationships between tables as well as potential data quality issues. What's nice about this solution is that they also have partnered with QlikTech and Tableau so that once business power users have extracted and transformed their data, they can use an established visualization and dashboard front end.
BI Search 2.0
The concept of bringing the ease of use of Google to BI has been attempted a few times in the BI space, going back to 2006 when Google first released Google One Box for enterprise customers. A couple of BI vendors were quick to embrace this approach, feeding BI report metadata and, in some cases, cube structures to the search engine. It all sounds like a great idea, yet few customers leveraged this approach for a variety of reasons, such as poor marketing and high licensing costs or difficult implementations. We seem to be on the cusp of a second breath of search-based BI. Here, Microsoft showcased its Power Q&A, a module of Power BI, that allows users to ask simple questions such as "What are my sales in New Jersey this year?" It returns a dynamically created, interactive visualization in which the casual user can refine the question and criteria.
Also at Strata was start-up DataRPM, currently in beta and expected to be generally available in the first half of 2014. They also are leveraging natural language processing (NLP), not just indexed key words, with two key differentiators. First, they work on top of both structured and unstructured data. Second, they embed search within existing processes, not as a separate BI solution. I met with DataRPM at the end of a long day, and they still managed to get a wow out of me. I look forward to seeing how this vendor hits the market.
Although not at Strata, I've also been tracking a start-up out of the UK, Neutrino BI,that also brings the concept of NLP and search to their dashboard solution. (Catch them live at my Cool BI class at the TDWI World Conference in Orlando.)
With BI adoption still stuck at a paltry 24 percent of all employees (tell me your BI adoption in this survey, and for a limited time get a free copy of the 2012 report), the ease-of-use BI search has the potential to bring BI to more mainstream users. It also might unlock data that users currently struggle to find and access.
Visualizing Big Data
There are still two main camps in big data visualization: traditional BI vendors who can access relational data sources as well as Hadoop, and those that access primarily Hadoop. Tableau, QlikTech, TIBCO Spotfire, and MicroStrategy fall into the first category; DataMeer, Platfora, and Karmasphere fall into the second.
DataMeer has differentiated itself by its ability to generate MapReduce jobs directly without having to go through a slower HIVE interface that other BI solutions may rely on. It has tripled its customer base in the last year and sees four key use cases: customer analytics, Web log file analytics for IT operations, fraud detection, and lowering the cost of data storage. With DataMeer, the data scientist interacts with the Hadoop data via a spreadsheet interface and can then present the results via simple and appealing infographics.
Platfora, in beta at last year's conference, is now generally available. The company announced version 3 of its product, due in Q1 2014. Platfora creates a type of view that it calls a lens to data in Hadoop. Data is loaded into its own in-memory engine where business users can visualize and interact with the data. New capabilities in version 3 allow users to organize the data into events (such as store visit versus Web visit) and the ability to do iterative customer segmentation. Some early users of Platfora include Disney, Netflix, and Shopify.
Cindi Howson is the founder of BI Scorecard, an independent analyst firm that advises companies on BI tool strategies and offers in-depth business intelligence product reviews.