What is your e-mail address?

My e-mail address is:

Do you have a password?

Forgot your password? Click here

Return to the Previous Page

Philip Russom

By Philip Russom

Blog archive

The Intersection of Big Data and Advanced Analytics

I recently started work on a new TDWI Best Practices Report with the working title: Deep Analytics with Big Data. The report is a tad schizophrenic, in that it’s really about two things – big data and analytics – plus how the two have teamed up to create one of the most profound trends in business intelligence (BI) today. Let me share some of the thinking behind the schizophrenia. Please reply to this blog to tell me whether this makes sense or not.

Advanced Analytics

According to a recent TDWI survey, 38% of organizations surveyed are practicing advanced analytics today. But 85% say they’ll do it within 3 years!

Why the rush to advanced analytics? First, change is rampant in business; we’ve been through multiple “economies” in recent years. And analytics helps us discover what changed plus how we should react. Second, there are still many business opportunities to leverage -- even in the recession -- and more will come as we finally crawl out of the recession. To that end, advanced analytics is the best way to discover new customer segments, identify the best suppliers, associate products of affinity, understand sales seasonality, and so on. For these reasons, TDWI has seen an explosion of user organizations implementing analytics in recent years.

But note that user organizations are implementing specific forms of analytics, particularly what is sometimes call advanced analytics. This is a collection of related techniques and tools, usually including predictive analytics, data mining, statistical analysis, and complex SQL. We might also extend the list to cover data visualization, artificial intelligence, natural language processing, and database methods that support analytics.

All these techniques have been around for years, many of them appearing in the 1990s. The thing that’s different now is that far more user organizations are actually using them. That’s because most of these techniques adapt well to very large, multi-terabyte datasets, with minimal data preparation. And that brings us to big data.

Big Data

Big data can be defined simply as multi-terabyte datasets. And this make sense, given that corporations, government agencies, and other user organizations are generating and retaining more data than ever before. Soon enough, big data will involve petabytes, not terabytes Yet, big data also involves big complexity, namely many diverse data sources (both internal and external), data types (structured, unstructured, semi-structured), and indexing schemes (relational, multidimensional, no-SQL).

Occasionally, I hear a user complain about the problems of storing and managing big data. Much more often, however, I hear people talk about what an extraordinary opportunity big data is. That’s because, for the kinds of discovery and prediction that most advanced analytic techniques enable, big data is truly a treasure trove of information that merits leverage for business advantage. And that brings us to the intersection mentioned in the title of this blog.

Advanced Analytics and Big Data: Why put them together?

Here are a few reasons:

Big data yields gigantic statistical samples. Most tools designed for data mining or statistical analysis tend to be optimized for large datasets. In fact, the general rule is that the larger the data sample, the more accurate are the statistics and other products of the analysis. Instead of mining and statistical tools, I regularly find users generating or hand-coding complex SQL, which parses big data in search of just the right customer segment, churn profile, or excessive operational cost. The newest generation of data visualization tools and in-database analytic functions likewise operate on big data.

Analytic tools and databases can now handle big data. And they can execute big queries and parses in record time. Recent generations of vendor tools and platforms have raised us onto a new plateau of performance that’s very compelling for applications involving big data.

There’s a lot to learn from messy data, as long as it’s big. Most modern tools and techniques for advanced analytics and big data are very tolerant of raw source data, with its transactional schema, non-standard data, and poor-quality data. That’s a good thing, because discovery and predictive analytics depend on lots of details, even questionable data. For example, analytic applications for fraud detection often depend on outliers and non-standard data as indications of fraud. If you apply ETL and DQ processes to big data, as you do for a data warehouse, you’ll strip out the very nuggets that make big data a treasure trove for advanced analytics.

Big data is a special asset that merits leverage. And that’s the real point of Deep Analytics with Big Data. The new technologies and new best practices are fascinating, even mesmerizing. And there’s a certain macho coolness to working with dozens of terabytes. But don’t do it for the technology. Put Big Data and Advance Analytics together for the new insights they give the business.

So, what do you think? Does the intersection of Big Data and Advance Analytics make sense to you? Let me know. Thanks!

To learn more, register to attend a TDWI Webinar on this topic. “The Intersection of Big Data and Analytics,” May 5, 2011 at noon eastern time. http://bit.ly/eh5YA9

Posted by Philip Russom on April 25, 2011


Wed, May 11, 2011 Guy Bayes

It's pretty hard these days to not take web activity into account when doing advanced analytics around customer behavior and buying patterns. However a lot fo time I think what the Advanced Analytics people really want is someone to shield them from the big data, summarize, bin, cleanse it for them. That being said I kind of chuckle when I read "Analytic tools and databases can now handle big data". Which Analytical Tools would that be, then?

Sat, May 7, 2011 James Taylor

I agree with Bill - I think it is sometimes unhelpful for these to be discussed together. Plenty of people are doing big data without doing advanced analytics (using hadoop to support fairly basic queries for instance) and almost everyone doing advanced analytics (predictive analytics or data mining) is NOT using big data and frankly have no particular reason to. There may well be good opportunities at the intersection but folks need to be focused on their business and on the decisions that drive it. If they "begin with the decision in mind" they can establish what kinds of analytics will help and whether or not they need big data too. Starting with the technologies will just get them in trouble.

Fri, May 6, 2011 Bill Hewitt United States

Philip, I agree that many people equate Big Data and analytics- but you don't have to be doing Big Data to be a user of advanced analytics. For the most part, the large amount of data people store is single fact table data- POS transactions, prescriptions, financial transactions, etc. The advanced part is developing the complex business relationships between data- transaction, location, social, seasonal, etc- one of our customers has a model with over 3,000 relationships which give them insight into customer buying patterns- and they are most successful in their field, doubling their market value in three years. Most Big Data failures start with the data without asking "what business problem are we trying to solve?" I agree with Stephen Brobst, the CTO of Teradata, who said in a recent article “It’s the diversity of data that’s more interesting, as a data scientist, than the size”. By understanding the problem first, businesses can better assess how and when they use Big Data and analytics for competitive advantage.

Add a Comment

Your Name:(optional)
Your Email:(optional)
Your Location:(optional)
Please type the letters/numbers you see above
Back to Top

Channels by Topic

  • Agile BI »
    • Agile
    • Scoping
    • Principles
    • Iterations
    • Scrum
    • Testing
  • Big Data Analytics »
    • Advanced Analytics
    • Diverse Data Types
    • Massive Volumes
    • Real-time/Streaming
    • Hadoop
    • MapReduce
  • Business Analytics »
    • Advanced Analytics
    • Predictive
    • Customer
    • Spatial
    • Text Mining
    • Big Data
  • Business Intelligence »
    • Agile
    • In-memory
    • Search
    • Real-time
    • SaaS
    • Open source
  • BI Leadership »
    • Latest Trends
    • Technologies
    • Thought Leadership
  • Data Analysis and Design »
    • Business Requirements
    • Metrics
    • KPIs
    • Rules
    • Models
    • Dimensions
    • Testing
  • Data Management »
    • Data Quality
    • Integration
    • Governance
    • Profiling
    • Monitoring
    • ETL
    • CDI
    • Master Data Management
    • Analytic/Operational
  • Data Warehousing »
    • Platforms
    • Architectures
    • Appliances
    • Spreadmarts
    • Databases
    • Services
  • Performance Management »
    • Dashboards, Scorecards
    • Measures
    • Objectives
    • Compliance
    • Profitability
    • Cost Management
  • Program Management »
    • Leadership
    • Planning
    • Team-Building
    • Staffing
    • Scoping
    • Road Maps
    • BPM, CRM, SCM
  • Master Data Management »
    • Business Definitions
    • Sharing
    • Integration
    • ETL, EAI, EII
    • Replication
    • Data Governance

Sponsored Links