Big Data Analytics: The View from Kognitio
Blog by Philip Russom
Research Director for Data Management, TDWI
I recently chatted with Paul Groom, the VP of Business Intelligence at Kognitio. Among other things, Paul had some great tips for moving beyond common barriers to analytics with big data. I’d like to share some of those tips with you. Philip Russom
: I’ve encountered several user companies that are hoarding big data – especially log data from Web sites – but they don’t know how to get started with analyzing it. Are you seeing this, too? Paul Groom
: Yes. I call it “data car parking.” Over time, the data car park gets so big that it’s a psychological barrier to taking any kind of action. For some reason, many data warehouse professionals think they have to process the entire data car park all at once – with the usual ETL, data quality, and data modeling techniques – before analytics can commence. That particular mindset is a show-stopper for big data analytics. Philip Russom
: In data warehousing, we’re taught that transforming, cleansing, and modeling data are requirements, because reports require squeaky clean, auditable data. But analytics and big data have different requirements. Right? Paul Groom
: Right. OLAP aside, most analytic methods require large data samples of highly detailed data drawn straight from operational sources. That’s because a business analyst is trying to discover unknown business facts in previously untapped data, which differs from data warehousing that reports on known business facts based on well-understood data. Careful data preparation is desirable in data warehousing for reports, but it’s actually a problem with analytics, because data prep strips out the details and granularity that analytics depends on. Oddly enough, when users figure out that they should forego most of the data prep they’re used to in data warehousing, it removes a barrier so they can proceed to analytics with the big data they’ve been caching away. Philip Russom
: I’ve been talking up the perils of data prep for analytics for about two years now. Even when users get the point, they’re still skeptical about the next step, namely complex analytic queries against non-optimized big data. Paul Groom
: We get a lot of that, too. The skepticism is natural, because data warehouse pros have been using hand-me-down database management systems designed for transaction processing, and these don’t perform well with complex analytic queries. But the newest generation of analytic databases does. Assuming you have one of these, such as Kognitio WX2, then the rules of the analytic game just changed.
Our mantra is: “Trust the database.” A modern analytic database can quickly execute any query you come up with, without need for time-consuming data prep or repetitive tweaking of queries and data models. Once users build confidence in new database performance, it removes another barrier to analytics.
So, what do you think, folks? Let me know. Thanks!
Posted by Philip Russom, Ph.D. on May 31, 2011