The Intersection of Big Data and Advanced Analytics
I recently started work on a new TDWI Best Practices Report with the working title: Deep Analytics with Big Data. The report is a tad schizophrenic, in that it’s really about two things – big data and analytics – plus how the two have teamed up to create one of the most profound trends in business intelligence (BI) today. Let me share some of the thinking behind the schizophrenia. Please reply to this blog to tell me whether this makes sense or not.
According to a recent TDWI survey, 38% of organizations surveyed are practicing advanced analytics today. But 85% say they’ll do it within 3 years!
Why the rush to advanced analytics? First, change is rampant in business; we’ve been through multiple “economies” in recent years. And analytics helps us discover what changed plus how we should react. Second, there are still many business opportunities to leverage -- even in the recession -- and more will come as we finally crawl out of the recession. To that end, advanced analytics is the best way to discover new customer segments, identify the best suppliers, associate products of affinity, understand sales seasonality, and so on. For these reasons, TDWI has seen an explosion of user organizations implementing analytics in recent years.
But note that user organizations are implementing specific forms of analytics, particularly what is sometimes call advanced analytics. This is a collection of related techniques and tools, usually including predictive analytics, data mining, statistical analysis, and complex SQL. We might also extend the list to cover data visualization, artificial intelligence, natural language processing, and database methods that support analytics.
All these techniques have been around for years, many of them appearing in the 1990s. The thing that’s different now is that far more user organizations are actually using them. That’s because most of these techniques adapt well to very large, multi-terabyte datasets, with minimal data preparation. And that brings us to big data.
Big data can be defined simply as multi-terabyte datasets. And this make sense, given that corporations, government agencies, and other user organizations are generating and retaining more data than ever before. Soon enough, big data will involve petabytes, not terabytes Yet, big data also involves big complexity, namely many diverse data sources (both internal and external), data types (structured, unstructured, semi-structured), and indexing schemes (relational, multidimensional, no-SQL).
Occasionally, I hear a user complain about the problems of storing and managing big data. Much more often, however, I hear people talk about what an extraordinary opportunity big data is. That’s because, for the kinds of discovery and prediction that most advanced analytic techniques enable, big data is truly a treasure trove of information that merits leverage for business advantage. And that brings us to the intersection mentioned in the title of this blog.
Advanced Analytics and Big Data: Why put them together?
Here are a few reasons:
Big data yields gigantic statistical samples. Most tools designed for data mining or statistical analysis tend to be optimized for large datasets. In fact, the general rule is that the larger the data sample, the more accurate are the statistics and other products of the analysis. Instead of mining and statistical tools, I regularly find users generating or hand-coding complex SQL, which parses big data in search of just the right customer segment, churn profile, or excessive operational cost. The newest generation of data visualization tools and in-database analytic functions likewise operate on big data.
Analytic tools and databases can now handle big data. And they can execute big queries and parses in record time. Recent generations of vendor tools and platforms have raised us onto a new plateau of performance that’s very compelling for applications involving big data.
There’s a lot to learn from messy data, as long as it’s big. Most modern tools and techniques for advanced analytics and big data are very tolerant of raw source data, with its transactional schema, non-standard data, and poor-quality data. That’s a good thing, because discovery and predictive analytics depend on lots of details, even questionable data. For example, analytic applications for fraud detection often depend on outliers and non-standard data as indications of fraud. If you apply ETL and DQ processes to big data, as you do for a data warehouse, you’ll strip out the very nuggets that make big data a treasure trove for advanced analytics.
Big data is a special asset that merits leverage. And that’s the real point of Deep Analytics with Big Data. The new technologies and new best practices are fascinating, even mesmerizing. And there’s a certain macho coolness to working with dozens of terabytes. But don’t do it for the technology. Put Big Data and Advance Analytics together for the new insights they give the business.
So, what do you think? Does the intersection of Big Data and Advance Analytics make sense to you? Let me know. Thanks!
To learn more, register to attend a TDWI Webinar on this topic. “The Intersection of Big Data and Analytics,” May 5, 2011 at noon eastern time. http://bit.ly/eh5YA9
Posted by Philip Russom, Ph.D. on April 25, 2011