The Year in BI and Data Warehousing
It was a year like no other for BI professionals. We highlight a few of the major events and trends of 2013.
- By Stephen Swoyer
- December 15, 2013
The year that was 2013 doesn't fit any obvious pattern. It was like no other. Applications such as social media sentiment analysis and big data analytics are hard, costly, time-consuming. In 2013, enterprises began to come to terms with this inescapable fact.
Along the way, marketing shifted into overdrive, cloud BI at long last took off, NoSQL grew exponentially, and a co-creator of the data warehouse – which, by the by, turned 25 this year – served up a trenchant assessment of the state of BI. Such was the year that was 2013.
Self-Service Salvation
This year, vendor marketers celebrated self-service as a fix for the usability, inflexibility, and adoption issues that have long bedeviled BI. Selling it as a new tool, however, was a tougher sell. Self-service BI isn't new; it isn’t even sort of new. A decade ago, the former Business Objects, the former Cognos, and the former Hyperion -- along with BI stalwarts such as Information Builders Inc. and MicroStrategy Corp. -- also championed self-service as a solution for what ailed BI. In painful point of fact, self-service is almost as old as BI itself.
“Self-service first became a mantra for BI back in the early 1990s. The problem was that data was locked in a database and [users] didn’t want to have to go to IT to ask them to write SQL queries whenever they needed something,” industry luminary Cindi Howson, a principal with BIScorecard, told BI This Week in an interview last year. “That first form of ‘self-service’ was generating SQL through a semantic layer, a business view, or whatever you want to call it.”
One of the things that ailed BI in 1993 and 2003 is the same thing that ails BI today: i.e., anemic adoption. This is in spite of the fact that today’s BI tools incorporate a staggering array of “self-service” features, along with (often genuinely helpful) data visualization capabilities. They likewise address new usage paradigms (such as visual BI discovery) and, accordingly, are less rigid (as regards both usability and information access) than were their predecessors.
BI vendors in 2013 talked about use cases and feature bundles that couldn't have been anticipated a decade ago, but has any of this actually made BI better, more usable, or more pervasive? Are companies having more “success” with BI -- and does this “success” actually correlate with “value?”
Judging by BI adoption rates, you might not think so. According to Howson, BI adoption has hovered at or around 25 percent for the better part of a decade. In the 2013 installment of BI Scorecard's “Successful BI Survey,” sizeable percentages of respondents reported that BI tools still aren't easy enough to use (an issue cited by 24 percent of survey respondents) or aren't able to answer complex questions (23 percent).
In Search of a Silver Bullet
Self-service is a time-tested prescription for the BI Blues. Information search is a new(er) solution.
The case for search goes something like this: Thanks to the “inflexibility” of the data warehouse and its “rigid” data model, information from multi-structured sources (e.g., machine or sensor logs; blog postings and documents; videos and photos) can't easily be prepared and schematized for SQL query access. Information search promises to bridge the structured and multi-structured worlds, situating business facts in a “rich” semantic context.
Search, too, is by no means new: Google Inc., for example, introduced an enterprise search appliance almost a decade ago. That being said, 2013 produced some genuinely interesting developments on the information search front, such as IBM Corp.'s still-incubating “Project Neo” natural-language search technology (NLS).
Elsewhere, Information Builders introduced a new version of its WebFOCUS Magnify information search offering and Microsoft touted NLS as a major part of PowerBI, the BI and analytic component of its Office365 cloud service. Also this year, vendors such as Cloudera Inc. (Cloudera Search), MarkLogic Inc., and DataRPM Corp. -- along with established players such as NeutrinoBI Ltd. and Oracle Corp. (with its Endeca product line) -- likewise touted search as a differentiating technology.
True, search might be marketed as a silver bullet, but there's no disputing that it's a genuinely intriguing technology. What's more, it's becoming increasingly commoditized. Tools such as Cloudera Search, DataRPM, and WebFOCUS Magnify leverage open source software (OSS) components such as Solr (an OSS search platform) and Lucene (an OSS indexing library).
The upshot is that it's increasingly possible to build a serviceable information search platform using free OSS tools. For example, a savvy organization could use a combination of Solr, the Apache unstructured information management architecture (UIMA) project, the R statistical programming environment, and other technologies to build and deploy an analytic search platform that addresses both faceted search (a non-taxonomic scheme for classifying information in multiple dimensions) and NLS requirements. Look for information search to play a more prominent role in 2014.
At Long Last Cloud?
In 2013, we saw BI marketing shift to the cloud, with Microsoft's PowerBI for Office365, a new cloud offering from Tableau, a new “Active Data Warehouse Private Cloud” service from Teradata, a cloud BI platform-as-a-service (PaaS) offering from start-up RedRock BI, and a data management splash by relative newcomer Treasure Data Inc., which markets a hosted big data analytic service (this last mixes OSS pieces of Hadoop with proprietary bits), among other entries.
There's been no shortage of SaaS BI offerings -- including solutions from Birst Inc., Domo Inc. (a relative newcomer), and GoodData Corp., among others -- but for a long time, prevailing wisdom held that enterprise BI and data warehousing just wouldn't “take” in the cloud. BI information is “too sensitive” and data warehousing workloads “too demanding” for cloud environments, some argued.
There's some truth to both claims: Some workloads or applications simply can't be shifted to the cloud, owing chiefly to regulatory requirements. Give Mark Madsen, a research analyst with IT strategy consultancy Third Nature Inc., a few hours and he'll exhaustively tally the many and varied reasons data warehousing workloads aren't a great fit for the highly-virtualized, loosely-coupled cloud.
That said, Madsen himself expects that a clear majority of data warehousing (DW) workloads will shift to the cloud over the next decade. As 2013 draws to a close, in fact, it's fair to say that cloud BI has real and verifiable momentum. Several vendors -- RedRock BI, but also MicroStrategy Corp. and Yellowfin International Pty Ltd. -- even market BI PaaS offerings, which shift BI (i.e., platform infrastructure, workloads, data, and development) entirely to the cloud.
MicroStrategy hosts its own PaaS service, while Yellowfin BI can be deployed on Amazon Web Services (AWS) and used in conjunction with Amazon's Redshift massively parallel processing (MPP) cloud data warehouse service. Other BI vendors -- such as Actuate Corp., JasperSoft Inc., and Talend -- are available as PaaS packages, too.
Then there's AWS, which is a bona-fide cloud powerhouse. One year ago this month, Amazon announced Redshift, an MPP cloud data warehouse for AWS. Let's not sugar-coat this: There are significant challenges involved in shifting data warehouse workloads into the cloud. With Redshift, Amazon seems to have licked many of them.
Steve Dine, managing partner with DataSource Consulting and a frequent instructor at TDWI educational events, says he's worked with Redshift in a few client engagements.
“It scales well. Just like any MPP system, it scales based on how well you parallelize your workloads, how well you partition your data, and how many nodes you spin up,” he explains.
“[Redshift is] just like any columnar database: if you're isolating it to a subset of attributes, it's great; if you're trying to do very wide queries, as you would in many retail situations, you are likely to see better performance from a row-based MPP database.”
Dine doesn't think of Redshift as a silver bullet -- e.g., even though it's inexpensive, Redshift's per-TB pricing can quickly add up -- but sees it (1) as a compelling option for smaller companies looking to build data warehouses in the cloud, and (2) as a proof-of-concept for large companies concerned about shifting MPP workloads to the cloud.
“It just democratizes [MPP analytic databases],” he points out. “What's nice about it is that you can spin it up and set it to automatically take snapshots. You can bring it up and take it down whenever you want. Will it work for everybody? As with any [MPP platform], it just depends on what your workload is.”
Hadoop, NoSQL, and Google F1
As a combined market/technology segment, NoSQL continues to grow like a flowering kudzu plant. At O'Reilly Inc.'s Strata conference in February, Pivotal, a big data spin-off formed by EMC Corp. and VMWare Inc. last December, announced “Hawq,” an ANSI SQL, ACID-compliant database system (based on EMC's Greenplum MPP database) for Hadoop.
Elsewhere this year, the OSS community (aided by Cloudera, Hortonworks Inc., MapR Inc., and other commercial software vendors) focused on bolstering Hadoop's security and disaster recovery bona fides. One of the biggest deliverables of 2013 was version 2.2 of the Hadoop framework, which went live in early October. Hadoop 2.2 bundles YARN (a bacronym for “yet another resource negotiator”), which promises to make it easier to monitor and manage non-MapReduce workloads in Hadoop clusters. (Prior to YARN, Hadoop's JobTracker and TaskTracker jointly managed resource negotiation. Both daemons were built with the MapReduce compute engine in mind.) Now that YARN's available, users should finally be able to manage, monitor, and scale mixed workloads in the Hadoop environment.
Nor is Hadoop the last word in NoSQL. Vendors such as Basho Technologies Inc. (which develops the Riak distributed DBMS), Cloudant (which bases its distributed NoSQL database on the Apache CouchDB project), DataStax (a commercial distribution of Apache Cassandra), FoundationDB, MarkLogic Inc., and RainStor Inc., among others, would beg to differ. This October, Basho previewed a version 2 release of Riak that claims to support strong consistency -- i.e., strong transactions, or the ACID that's known and loved by DM-types. Most distributed DBMSs (such as NuoDB and Splice Machine) support what's known as “eventual consistency.”
An altogether new entrant was F1, the ANSI SQL- and ACID-compliant database platform that Google Inc. announced in September. Google, which uses F1 to power its AdWords service, claims it can function as a single platform for both OLTP and analytic workloads. Unlike Hadoop, F1 addresses classic data management requirements (with support for strong transactions and row-level locking) -- and does so at Google scale. Google's push behind F1 underscores an important consideration: We conflate the terms “NoSQL,” “big data,” and “Hadoop” at our own risk.
Final Thoughts: Business unIntelligence and the Data Warehouse at 25
This year was a milestone annum, too. The data warehouse itself was born 25 years ago, in 1988, when Dr. Barry Devlin and Paul Murphy published their seminal paper, “An architecture for a business and information system,” in the IBM System Journal.
Devlin was back in 2013 with a new book, Business unIntelligence: Insight and Innovation beyond Analytics and Big Data. In many ways, Devlin's book is a wry assessment of his prodigal creation, which -- a quarter century on -- is at once dominant and besieged.
Over the last quarter century, organizations have invested hundreds of billions of dollars in data warehouse systems, to say nothing of the BI tools that are the DW's raison d'etre. There probably isn't a Global 2000 organization that doesn't have at least one “enterprise” data warehouse.
The net “net” of this ubiquity, as Devlin demonstrates with perspicacity and humor, is a kind of muddling through. (The “unIntelligence” in Devlin's title speaks to precisely this problem.) In this regard, Devlin could well say of BI what philosopher Immanuel Kant famously said of humankind: “Out of the crooked timber of [a BI implementation project], no straight thing was ever made.”
The Business of BI: Comings, Goings, and IPOs
This year, we bade adieu -- chiefly by way of acquisition -- to several stalwart vendors. Composite Software Inc., Kalido, KXEN Inc., ParAccel Inc., and Pervasive Software Inc., among others, were acquired this year. Cisco Systems Inc. -- which seems poised to make a big push into data management in 2014 and beyond -- snapped up Composite in June; Silverback Enterprise Group, an Austin-based holding company, acquired Kalido in October; SAP nabbed KXEN -- a long-time Teradata Partner -- in September; and Actian Corp. acquired both ParAccel and Pervasive. Will these technologies survive and thrive -- or will they vanish (as with the former Brio Software, the former Celequest, and the former DecisionPoint Software, to name just a few) into the void of the industry's memory hole?
Elsewhere this year, Tableau's long-rumoured IPO finally (and successfully) took place. Dell Inc. pulled off a kind of reverse-IPO: in early February, its shares were de-listed from both the NASDAQ and the Hong Kong Stock Exchange. At the same time, founder Michael Dell (bolstered by VC giant Silver Lake Partners, and with an additional $2 billion in financing from Microsoft) came back to take Dell private. Prior to its de-listing, Dell had managed to cobble together a large information management portfolio, anchored by its Toad assets, which it acquired from the former Quest Software. Its execution in 2014 will bear watching.
So, too, will that of Teradata, which this year became acquainted with the downside of being a public company. Teradata missed its earnings in Q2 and -- just ahead of its Partners conference in October -- cut its earnings outlook for the year. As a result, the DW giant's stock was repeatedly buffeted by the market. Slings and arrows, indeed.
Buzzwords: A Plea
A new year brings new buzzwords. By “buzzword,” we mean those once-unique coinages that -- when used sparingly or in isolation -- have imaginative, conceptual, and/or thematic power. As poet A.R. Ammons once aptly put it, “A word too much repeated falls out of being.”
And how. This year, adjectives such as “disruptive,” “transformative,” “self-service,” “high-value,” “advanced” and/or “predictive” analytic -- along with the term “analytic” itself -- were taken up as adjectives or adverbs by marketers everywhere. Even the word cloud was used and misused by marketeers.
All of these terms passed into a lexicon of descriptive words -- sprinkled with a few scant nouns and verbs -- that includes marketing mainstays such as “game-changing,” “unprecedented,” and “patent-pending,” as well as more innocuous descriptors such as “visual,” “intuitive,” “market-leading,” or “innovative.”
These words comprise the noise that we must filter out if we're to meaningfully assess products and technologies, or (more important) to make buying decisions about products and technologies. It isn't that these words aren't useful and don't mean anything. Rather, it's that the contexts in which they're employed are so general (or so inapposite) as to dilute their meanings. They're all-noise, no -- or very little -- signal.
This year, and with shocking frequency, BI and DM vendors delivered “disruptive” tools that “surface” (a popular alternative is “expose”) “innovative” and/or “intuitive” capabilities -- almost always in a “rich visual” and/or “self-service” context -- and promise to “transform” a typically “dysfunctional” status quo. In many cases, tools tout “advanced analytic” or “predictive” capabilities and (increasingly) have “cloud” components or attributes, too.
This author is as guilty in the dilution of these terms as anyone else. For the New Year, he's resolved to strike them from his lexicon -- at least in contexts where they're inappropriate.
If only industry vendors would do the same.