BI and Data Warehousing 2012: A Memorable Year
This year might be remembered as one of the most interesting in the history of business intelligence (BI) and data warehousing (DW).
- By Stephen Swoyer
- December 18, 2012
In 2012, several long-simmering forces emerged -- and in the case of big data, exploded -- to challenge the data management (DM) status quo. Collectively, the changes we saw this year make for a perfect storm of challenges to data management's orthodoxy.
Data Management at a Crossroads
In 2012, the selection pressures acting on data management came into sharp relief.
For starters, the year served to confirm the business intelligence (BI) discovery trend, with new discovery-oriented offerings from SAS Institute Inc. (Visual Analytics Explorer) and SAP AG (Visual Intelligence); at this point, all of the BI heavyweights -- including Information Builders Inc., IBM Cognos, Microsoft Corp., MicroStrategy Inc., and Oracle Corp., along with SAP and SAS -- market discovery-themed solutions aimed at blunting the success of the original discovery players such as TIBCO Spotfire and Tableau Software Inc.
There's another wrinkle: in addition to adopting the visual discovery metaphor, the big BI players are building enhanced self-service and collaborative features into their products. In this respect, they've taken a page from another successful (and successfully disruptive) vendor: workgroup BI specialist QlikTech Inc.
The upshot is that insurgent BI has forced the BI Powers That Be to adapt. Half a decade ago, this would've been unthinkable. "I'm an IT person who was an operational manager for 20 years. I bought those [BI] technologies, and I was a customer of Tableau's, too," says Dan Murray, director of business intelligence services and COO of InterWorks Inc., an Atlanta-based integration and services firm that specializes in Tableau.
"I don't view Tableau as a replacement for any BI or [analytic] database company," Murray continued. "It's just going to make any existent [analytic] database or BI deployment better -- make it more accessible [and] more useful to the average information consumer."
Change isn't confined to the outer reaches of data management (i.e., BI tools). Analytic databases, for example, have been with us for a decade, but 2012 introduced a new variation on this theme: the analytic discovery platform, such as those marketed by ParAccel Inc. and Teradata Corp. ParAccel's pitch is as an analytic discovery platform: an alternative to the overwhelmed enterprise data warehouse (EDW). Teradata's positioning is more complicated. The EDW is its bread and butter, and at its October Partners conference, Teradata unveiled its architecture (with supporting software and services) for Unity, a DM ecosystem that accords its Aster platform primacy of place as a complementary analytic discovery platform.
Regardless of how you position it, one thing is clear: DM is changing, and 2012 was the year the implications of this change first started coming into focus.
Big Data Explosion
If "discovery" gained critical mass in 2012, big data, by any objective standard, went supernova.
At this year's TDWI BI Executive Summit on big data in San Diego, for example, two TDWI regulars -- Mark Madsen and colleague Marc Demarest -- teamed up for an electrifying seminar on the pros and cons of big data. Both men are alike in thinking that big data is the Real Deal. Demarest, a principal with information management consultancy Noumenal, Inc., makes his case with characteristic bluntness. "The change is upon us and there is no way back," he argues.
Madsen, for his part, sees big data as a paradigm-shifting event. "We're in the midst of a paradigm shift, but the thing about paradigm shifts is that they take a long time. [T]he new paradigm is evident but not yet manifested," says the veteran data warehouse architect and a principal with BI and data management consultancy Third Nature, Inc. "We are in a market state where nobody has written the definitive architecture for the new world."
Marketers love a paradigm shift. In the case of big data, however, BI marketers at first failed to recognize what it is that makes a paradigm shift so special. Instead of promoting big data on the basis of its potential as a transformative force -- as something that, fully realized, can radically reshape how we understand the world -- the industry this year glommed on to the idea of promoting big data as a function of the volume, velocity, or variety of the information that comprises it.
Of course, the three Vs aren't new. They've always been with us. In fact, Gartner Inc. analyst Doug Laney first coined the volume-variety-velocity triptych more than a decade ago. If the term big data simply describes the volume, variety, and velocity of the information that constitutes it, our existing data management practices are still arguably up to the task. It also has the effect of pigeonholing big data as a data-management-specific event: paradigm shifts are sweeping, cutting across disciplines and domains.
If big data is as big a shift as Madsen, Demarest, and others believe, it must mean something more than volume, velocity, and variety. Savvy industry watcher and TDWI contributor Ted Cuzzillo, who blogs about BI at DataDoodle.com, uses the analogy of television to describe both what big data is and what it could ultimately be.
"Right now, big data's analogous to early TV. Skeptics called it 'radio with pictures,' and some of it was little more than that. But its new dimensions developed, with ever-higher resolution. Soon enough we had 'living color,' then HD, and now some of it's in 3-D," he explains.
That's the endgame of the big data paradigm shift: 3-D context. When, if ever, we'll achieve this is anybody's guess.
A New Role for the Data Warehouse?
Discovery and especially big data are putting pressure on DM practitioners to retest, re-evaluate, and (in some cases) discard core organizing or operating assumptions.
Even the data warehouse itself is coming under scrutiny.
We started to get a sense of this in 2012. Even after a year of big data hoopla, few data management practitioners can conceive of a world that doesn't have a data warehouse at its center. Try telling that to the average attendee at October's inaugural Strata + Hadoop World conference, however. Such folk aren't part of the data management mainstream; in fact, they're used to viewing the data warehouse as an obstacle -- or as an archaism. It isn't that they can't conceive of a world that doesn't have a data warehouse at its center; it's that in many cases they're actively anticipating the emergence of just such a world.
That's the point: the era of the data warehouse as its own isolated fiefdom is ending; DM, and the data warehouse along with it, are being coaxed -- or dragged -- out into the open.
Big data is placing selection pressure on the DW in a number of ways. Some vendors inside and outside the data management industry envision the Hadoop framework -- which by the end of 2012 had become virtually synonymous with big data -- as an information management platform residing alongside (and possibly displacing) the traditional data warehouse.
There was David Inbar, senior director of big data products with data integration (DI) specialist Pervasive Software Inc., who eloquently describes Hadoop as "a beautiful platform for all kinds of computation." At this summer's Pacific Northwest BI Summit, held in Grants Pass, Oregon, Yves de Montcheuil, vice president of marketing with open source software (OSS) DI vendor Talend, outlined a vision of Hadoop as the central site of enterprise information integration. In 2012, Pervasive and Talend, along with competitors Informatica Corp. and Syncsort Inc., all trumpeted Hadoop-centered, big data-focused product or service announcements. Pervasive and Syncsort both announced dedicated ETL libraries for Hadoop, and Talend promoted the idea of Hadoop- and MapReduce-powered ETL. Informatica, for its part, announced a "Big Data Edition" of its PowerCenter ETL platform.
DI is a natural fit for Hadoop, which (when paired with MapReduce) has been described as the equivalent of a brute-force, massively parallel ETL platform. In 2012 we saw a new (more ambitious) variation on this theme thanks to a slew of BI-like offerings based on Hadoop.
At Strata + Hadoop World, for example, Cloudera Inc. unveiled "Impala," a real-time, interactive query engine that runs inside Hadoop. (One reason Hadoop is promoted for ETL is because it's a batch-centric processing environment.) Impala enables OLAP-driven discovery in a Hadoop environment, along with other BI-like use cases. Cloudera competitor MapR Inc. uses a different approach (basically mounting the Hadoop File System as an NFS share) to achieve a similar end; DataMeer Inc., also at Strata + Hadoop World, touted a BI-like analytic discovery environment that it likewise implements on top of Hadoop. Other upstart players, such as Platfora Inc., take similar Hadoop-centric, de-emphasized DW approaches.
From Redshift to Paradigm Shift
Perhaps the most intriguing news item of 2012 was Amazon's November announcement of Redshift, its data-warehouse-in-the-cloud offering for Amazon Web Services (AWS).
Redshift is based on a respected massively parallel processing (MPP) engine: the ParAccel analytic database. This technology would count for little, however, if Amazon hadn't also addressed the Achilles' heel of data warehousing in the cloud: unpredictable I/O performance. There's good reason to believe that AWS, which uses solid-state, disk-based (SSD) local storage, does just that. This is what makes Redshift so intriguing.
After all, the concept of the DW in the cloud isn't new. Analytic database stalwart Kognitio first announced a data-warehouse-as-a-service (DaaS) offering half a decade ago; shortly thereafter, the former Vertica Inc. (now part of Hewlett-Packard Co.) followed suit.
These offerings -- and others -- scale out by virtue of provisioning new "instances" of a database engine; this means running (and managing) additional copies of a database. In the public cloud, and -- specifically in the case of I/O sensitive parallel processing -- this invariably involves trade-offs in I/O performance and elasticity. That's because the traditional DW-as-a-service model constitutes the transplanting of technology that was designed and perfected for use in a well-defined paradigm (i.e., a distributed, physical client-server topology) into a kind of alien context: viz., that of an elastic, multi-tenanted, inescapably virtual topology.
This doesn't mean that Amazon has delivered a Redshift DW-as-a-service that rivals on-premises platforms from Actian, HP, EMC Greenplum, IBM, Kognitio, Oracle, Teradata, and SAP AG, or ParAccel, for that matter. "Amazon fixed this [I/O] problem way back when, but databases want to own the entire machine, not little bits of several [machines]. You can't run a query database with unpredictable I/O or unpredictable interconnects. This is why MPP databases in Amazon traditionally haven't been very successful: one node always trails for some reason," says Madsen, who argues that Redshift's target customer isn't a Teradata or Netezza shop; it's a company that's maxed out on an RDBMS-powered data warehouse. "To run SQL Server and other [data warehouse] platforms, you need a DBA to configure and run it in the cloud, just like a regular server. This is a service. You set it up and go."
A NoSQL Armistice
Love it or hate it, NoSQL is here to stay. If nothing else, it's established. This year, for example, MongoDB turned five; next year, the Cassandra project will celebrate its fifth birthday. On top of this, NoSQL-like players such as MarkLogic Inc. and RainStor Inc. have been around even longer.
Something anticlimactic happened in 2012: the NoSQL wars came to an end without so much as a bang and with barely even a whimper. Whatever its problems or shortcomings from a DM perspective, NoSQL is now accepted as a legitimate product category. At all of TDWI's industry conferences this year, NoSQL exhibitors shared floor space with the likes of IBM, Oracle, and Teradata. There's evidence of acceptance in other contexts, too: SAP, for example, embeds NoSQL as one of three in-memory engines in its HANA platform.
"When I present on Hadoop and NoSQL, I always point out that it's [the product of a] programmer-centric world. It was built to solve a class of problems that these Web [application developers] were encountering for the first time," says veteran data warehouse architect and TDWI presenter John O'Brien, a principal with information management consultancy Radiant Advisors. "Whatever we [data management practitioners] might think of it, NoSQL was really built to exploit the next generation of [Web] apps, not the SQL-driven [class of] business intelligence and analytic [tools]. It has a kind of expedient purpose."
Although BI-in-the-cloud has something of a checkered history, it's possible that Amazon Redshift, in combination with other upstart projects, might finally push BI and DW vendors to get more serious about taking on the cloud. Not on their own terms (by transplanting on-premises tools to a cloud context) but on terms appropriate to the multi-tenanted, virtual turf that's characteristic of the cloud, and which is integral to its promise.
In addition to Amazon and Redshift, a pair of cloud-focused start-ups -- Akiban Technologies Inc. and NuoDB -- emerged this year to tout their own, built-from-scratch takes on a DB in the cloud. Akiban in early February announced Akiban server, which it describes as a cloud-based, ACID-compliant ("NewSQL") DBMS platform. NuoDB iterated throughout 2012 on its tiered, SQL-compliant cloud DBMS platform that uses a mix of redundancy and probabilism to achieve ACID compliance. By December, NuoDB 1.0 was approaching general availability.
This year we also saw a big splash in the U.S. by Australian BI specialist Yellowfin, which -- in spite of its cloud underpinnings -- touts a retro (reporting-centric) take on business intelligence. Yellowfin positions its "mass-production" BI platform in the cloud as an alternative to kitchen-sink suites that, according to CEO Glen Rabie, "don't do anything particularly well."
Another BI player that made a series of major cloud moves in 2012 was Jaspersoft Inc., which signed partnerships with Red Hat Inc. and VMware Inc. to deliver platform-as-a-service (PaaS) cloud BI. (Prior to 2012, Jaspersoft had an existing arrangement with Amazon for SaaS BI.) "This data-driven world ... is increasingly going to be cloud hosted, [and] the analytics piece is going to be vital. We expect that [an] analytic reporting service is going to be a ... de facto component that PaaS providers are going to offer," Karl Van den Bergh, Jaspersoft's vice president of product and alliances, told BI This Week in July.
This year was full of other innovations -- or modifications -- of the BI status quo. For example, a trio of vendors -- Armanta Inc., Cirro Inc., and Quest -- delivered what amounted to self-contained analytic platforms: products that combine a self-service analytic tool set with an underlying data virtualization (DV) layer and leave the DI and data preparation heavy lifting to us. Like DV players such as Composite Software Inc. and Denodo Technologies Inc., they use DV to create canonical representations (or "views") of data in source systems. Unlike Composite and Denodo, their respective DV technologies lack the refinement and hardening that accrue from extensive production use.
Acquisition-wise, 2012 was a strange year. Past years have been characterized by consolidation waves: there was 2003, for example, with its BI reporting consolidation; 2005, with its DI-focused consolidation wave; or 2007, with its BI suite extinction event. This year things were relatively quiet on the acquisition front, however. There was acquisition activity, to be sure: QlikTech acquired veteran DI player Expressor Software Corp. in June, Dell Inc. bought veteran DM player Quest Software, and Oracle bought DataRaker (a machine-generated data specialist). The year nevertheless lacked significant consolidation activity, though a spate of related acquisitions (such as those in reporting, DI, or BI suites) are a sure sign of a red-hot market. Expect plenty of consolidation in 2013 -- with big data being a likely vector.
The truth is, 2012 was an extremely eventful year, and 2013 promises to be at least as eventful.