TDWI Upside - Where Data Means Business

Year in Review: The Year of DIY in BI

Highlights of the major business intelligence events in 2015, the year when do-it-yourself business intelligence took off.

What do ride-sharing services such as Uber and Lyft, 3D printing, urban farming, and do-it-yourself everything have to do with business intelligence (BI) and analytics?

They're all about individual enfranchisement and, to varying degrees, personal empowerment. They're also about disintermediation; about getting rid of interleaving layers -- of top-down hierarchies of production, distribution, management, and regulation -- and freeing people up to do stuff.

With their growing emphasis on self-service everything, BI and analytics technologies have been trending toward DIY for years. In fact, 2015 was the year that DIY vision crystallized -- not in the form of any one seminal or archetypal product but as a kind of conventional wisdom.

Any and every kind of BI and analytics offering, from the oldest to the newest, now has some kind of self-service story. In an encouraging number of cases, old and new alike, BI and analytics technologies boast creditable self-service stories. From self-service front-end tools to self-service data prep and data engineering tools to a new crop of self-service statistical, predictive analytics, and machine-learning tools, BI and analytics have become so much more self-serviceable.

Even if self-service isn't a panacea (it isn't), and even if many of the longest-standing and most intractable of problems are as long-standing and intractable as ever (they are), we're all the better for it. Read on to find out why as we recap the year that was -- 2015.

DIY Comes to BI

Self-service BI, as a category, isn't new. BI vendors have worked to equip information consumers to customize (and in some cases to build) front-end business views for some time. Over the last 24 months, however, we've seen a very different push to outfit data scientists and business analysts with DIY technologies for data engineering and data preparation.

Vendors such as Alteryx, Paxata, and Trifacta aren't new, either. Paxata and Trifacta launched in 2012, Alteryx even earlier, in 2010. It wasn't until this year, however, that do-it-yourself data prep seemed to become the stuff of conventional wisdom -- in this case, as a guided, visual counterpart to self-service BI. Before data can be consumed by self-serving BI users, it must first be prepared for analysis. Data prep, proponents argue, is the proverbial "last mile" of any analytics enterprise; it accelerates an otherwise tedious process that squanders the bulk of the self-serving analyst's time.

"In the self-service business intelligence and analytics space, the missing gap [is that] for every analytical exercise, you're spending as much as 80 percent or more of your time in data prep. That's a key, key, key point," Paxata CEO Prakash Nanduri told BI This Week.

Self-service data prep -- or, more generally, self-service data integration (DI) -- was huge in 2015, with product, feature, or functional announcements from best of breeds (Alteryx, Paxata, and Trifacta) and non-traditional players. Vendors such as Cisco Systems -- which cinched a deal to resell Paxata's data-prep solution as a complement to its Cisco Composite Information Server -- Informatica, Logi Analytics, Qlik, Tableau, and SnapLogic all got into the mix.

Never before has the ostensibly boring been so sexy, either. Tableau's announcement of new self-service DI capabilities (including Union, a new feature designed to automate the integration of spreadsheet files, and new automated support for cross-database joins) prompted cheers and applause at its annual Tableau Customer Conference (TCC) this November. "Before ... I'd have to ask IT to build me a data warehouse just to join those two databases, but now Tableau is giving me the ability to [join data from multiple sources]" Tableau's Roger Hau told attendees during his keynote presentation at TCC.

BI front-end tools also became more self-serviceable in 2015, with a slew of familiar names -- IBM, Microsoft, MicroStrategy, Oracle, Qlik, SAP, SAS Institute, and Tableau Software to name just a few -- incorporating additional self-service capabilities into their front-end offerings, with a particular emphasis on self-service analytics discovery, and the trend shows no sign of slowing down.

The next frontier in do-it-yourself self-service could well be that of predictive analytics and machine learning. Several vendors, including Alpine Data Labs, Dell, IBM, Microsoft, Predixion, Qlik, SAS Institute, and Tableau, are testing the bounds of what's possible in this area.

Back to Basics

It isn't yet clear just how we're going to square this new do-it-yourself ethos in BI and analytics with the requirements of good governance, but square it we must. Even as vendors and users are of one mind with respect to self-service and DIY-ness, their corporate masters -- i.e., the organizations that tend to buy and use BI and analytics software -- are of quite another mind.

Governance probably won't ever be sexy, but it's back. At September's Strata + Hadoop World Conference, for example, many of the exhibitors wanted to talk about governance. This was at the very epicenter of a Hadoop and NoSQL hotbed whose proponents have tended to downplay (or to avoid altogether) discussion of access control, auditing, data lineage, traceability, and other governance-oriented amenities.

By the end of 2015, big data proponents could no longer avoid talking about this stuff. "The subtext of this is that data management is coming to the big data environment. It's no longer about making [that] environment actually work anymore. That's what it used to be. The focus used to be on monitoring, scheduling, workflow -- all stuff that makes the environment work," says Mark Madsen, a research analyst with consultancy Third Nature. "Now [people] want to do stuff on them. Now it's becoming a data management problem."

Big data platforms won't become magically governable overnight. By the standards of traditional data management (DM), big data management is still a governance-challenged practice.

What's changed is that big-data vendors are at least willing to talk about such things as lineage and metadata management, even if the capabilities of the extant offerings -- be they open source software (OSS) projects or commercial products -- are still limited. Help is coming from traditional sources, too. Big DI and DM vendors such as IBM, Informatica, and Talend are focusing on governance in the context of big data. In November, for example, Informatica unveiled a new Informatica Big Data Management offering that it claims addresses the three-fold challenge of integrating, governing, and securing data in tandem with big data platforms such as Hadoop.

The new appetite for governance has implications for self-service BI, too. Big data might be the poster child of bad governance, but self-service products have taken their lumps for paying short shrift to niceties -- or necessities, depending on one's perspective -- such as data lineage and metadata management. This, too, is changing. Self-service players such as Qlik and Tableau are increasingly eager to talk about governance. That said, they're still filling in the substance.

Be on the lookout for much more substantive talk in 2016.

Alive and Kicking

Reports of the death of big data aren't so much exaggerated as over-hyped, much like the phenomenon of big data itself. Not a year -- or, for that matter, an end-of-year recap -- goes by without someone predicting the soon-and-inevitable demise of big data.

In 2015, we discovered that big data as a phenomenon is simultaneously over- and under-hyped.

The most visible movers and shakers in the big data space -- including all of the Hadoop best of breeds -- actually generate only a relatively small amount of big data-related revenues.

"Hadoop revenues [came to] less than $150 million last year [in 2014], not including whatever money Teradata got out of this [or] Oracle got out of this," industry luminary Merv Adrian, an analyst with Gartner Inc., told attendees at the 14th annual Pacific Northwest BI Summit this July.

The overall market for big data products and services is probably much bigger thanks to vendors such as IBM, Microsoft, Oracle, and Teradata, which also sell Hadoop-based big data systems. At this point, however, sales of Hadoop- and NoSQL-branded big data technologies are still a drop in the bucket compared with the $33 billion DBMS market.

"We know Teradata has sold a bunch of big data appliances. We know IBM has sold some of theirs. We know Oracle has sold some of theirs. Those guys play in a [bigger overall] market that's $33 billion, so this [their share of big data-related revenues] is not even a rounding error at this point."

Adrian isn't a big data pessimist, however. He understands that the phenomenon of big data actually reflects an unprecedented change in how we as human beings imagine and make sense of our world. In the old information economy, the multidimensional OLAP cube was seen as sufficient to capture and model the world of human events. Increasingly, however, we've come to know this world as probabilistic, or at least as nondeterministic; as a context for interdependent events, interactions, and transactions that can't be adequately captured and modeled in the too-limited dimensional context of an OLAP cube. "Big data" is a catch-all term for the ongoing shift to this new information economy. It's an economy that uses statistical, numerical, and analytical technologies and methods to capture and model a richer, more contextualized -- an n-dimensional -- world.

It's an economy that requires more data, of different types and varieties, at greater rates, than ever before. It's an economy that constantly requires data -- that expects to consume data as it streams and pulses. It's an economy, then, for which the old tools -- such as the batch-driven data integration and data warehouse model, with its static production reports and its OLAP-based dashboards and analytics -- are insufficient. Not outmoded but insufficient. The new information economy needs more: data, tools, technologies, and methods. That's what the umbrella term "big data" is all about.

That's why big data-as-a-phenomenon has legs, Adrian argued. "We're getting a chance to do [what we've done for decades] at scale, with speed, and with data we weren't using before, and [big data is] about that combinatorial explosion of data," he said.

This year, big data was going through its "trough" period, Adrian argued, referring to Gartner's hype cycle. Ultimately, he predicted, most organizations will gear up for big data and the new information economy. This will likely take longer than expected. "Once this stuff has demonstrated its value in the early-adopter community, the mainstream tends to respond by asking its strategic partner vendors, 'I'm thinking about this thing, what have you got to offer?'"

Very Hard Problems

The emergence of self-service and DIY belies an inconvenient truth: much of this -- particularly the data acquisition, preparation, modeling, and human oversight involved in advanced analysis -- is Very Hard. Advanced analytics in the context of big data platforms, with their polyglot programming requirements and their command line-centric user experience, is likewise Very Hard.

Over the last decade, analytically savvy organizations cleared away a good bit of the low-hanging analytical fruit, starting with basic -- but potentially lucrative -- customer segmentation analysis or predictive maintenance. This year, however, we started to come to terms with the fact that the stuff that's higher up on the tree is going to be a lot more difficult and considerably more costly to get at.

"There's one large organization that we work with ... and their premise was basically 'I've already picked the lowest hanging fruit off the tree. In the beginning, I could show 20 times the return [on investment], 20 dollars of value created for every dollar I invested in analytics, but those days are over. I have to go up higher in the tree now and I'm going after bigger and bigger projects in terms of driving performance.' What [analytics advocates are] finding is that they're now competing with other functional leaders for mindshare and funding," Jack Phillips, CEO and co-founder of the International Institute for Analytics (with industry luminary Tom Davenport), told BI This Week.

Expect more such soul-searching in 2016. Self-service innovation will help to some extent. (As we said, Dell, IBM, Microsoft, Predixion, Qlik, Tableau, and SAS are making noises about self-service-enabling statistical analysis and predictive analytics.) New automation features -- including machine learning-driven automation routines -- will help, too, especially in connection with big data platforms. However, advanced analytics is fundamentally a highly specialized domain. Human knowledge, imagination, and perspicacity will continue to make or break the success of analytical efforts.

An Unscientific Postscript

Finally, 2015 was the year we seemed to arrive at a consensus regarding the strengths and weaknesses of our many and varied data management tools. In effect, this involved a cessation of hostilities -- of outright warring -- between rival factions of SQL and NoSQL advocates.

We in data management are coming to terms with the fact that we misuse the RDBMS when we ask it to ingest and store non-relational as non-relational data -- as distinct to the relational data for which it was designed and optimized. It took half a decade, but data management practitioners now recognize that NoSQL platforms are actually well suited for storing and managing the JavaScript object notation (JSON) files that comprise the data interchange backbone of the REST-ful cloud; the text droppings that are the stuff of the social Web; the audio, video, image, and binary files that we encounter in any and all contexts.

At the same time, we've accepted that all NoSQL platforms are very much works in progress, with constantly improving -- or, at any rate, constantly changing -- functions, libraries, dependencies, application programming interfaces (API), and the like. In other words, we in data management are no longer fighting Hadoop and NoSQL, dismissing them as unsuitable, ungovernable, immature, or unreliable. They've been brought into the data management fold and have become part of the DM toolset. That's a welcome, if overdue, development.

Gradually, too, we've come around to the idea that data management isn't something that must always be centrally managed; that DM itself can and should be democratized and that, moreover, priorities such as governance and security must be reconciled with the idea that it's both necessary and possible to do this.

This is, after all, the age of DIY, of the enfranchised disenfranchised, of the have-nots becoming -- suddenly, gloriously -- haves. It's the age of BYOD, of cloud, of 3D printing, of self-service this, self-service that. It's the age of Uber, of Airbnb, of boutique food carts, of "artisan" donuts, and of other, similar wonders. As punk-rock poet Patti Smith puts it: this is the era where everybody creates.

We aren't yet there and we might not get there in 2016, either, but we're certainly headed in that direction.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.