Self-Service BI: What's Not So Great
Self-service BI is helping enterprises spread analytics among more users, but it's not without its problems.
- By Steve Swoyer
- September 8, 2016
We can't overstate the importance and popularity of self-service BI. Self-service tools can get rid of top-down hierarchies and free people up to work.
However, there are more than a few not so good things about self-service -- as it currently stands, anyway.
Qlik, Tableau, and other players (including established vendors such as IBM, Microsoft, MicroStrategy, Oracle, SAP, and SAS, which likewise market self-service tools) are taking concrete steps to address some of the self-service model's most salient shortcomings. The good news is that the story is slowly but surely getting better.
Self-Service Currently Difficult to Scale
In practice, the use of self-service discovery tools hasn't been associated with especially high levels of reuse and repeatability. Furthermore, if traditional approaches to governance are overly restrictive -- and they are -- the self-service take on governance is overly laissez-faire.
For these and other reasons, it's proving difficult to scale some kinds of self-service practices, such as discovery and data prep. It's one thing to champion something like self-service analytics discovery for a small cadre of power consumers; it's quite another thing to roll it out at enterprise scale to hundreds, thousands, or even tens of thousands of consumers.
Between the organization's need for pragmatic governance on the one hand, and the analyst's or business person's need for maximum agility on the other, organizations must find a happy balance.
Self-service vendors, to their credit, are working to help them. They're still getting there, however.
Data Integration By Any Other Name
The price you pay for self-service convenience is that someone first has to load data into a box somewhere -- be it a database-like server or a business analyst's desktop.
This might sound straightforward enough, but to oversimplify what's involved here is to ignore just why data integration (DI) is such a complex and often frustrating discipline. Integrating data isn't as simple as downloading a tool such as Tableau, installing it on your desktop, and going out and grabbing data from SAP or PeopleSoft.
Somehow, someway, you have to build some plumbing. Even if it were easy to obtain and configure access -- to get the appropriate credentials, to configure a connection to an SAP system, and to begin siphoning data -- it isn't at all easy to join data.
"There is a big myth in self-service which is that you can just somehow open a big database and the users are going to go make sense out of it," said Philip Russom, TDWI's director of research for data management, during a recent panel discussion. "Typically, there has to be a lot of prep work to get something in place. It doesn't matter what kind of self-service it is. IT, data people, whoever -- technical people -- have to do a fair amount of prep work to get at it and make it available."
This is where self-service discovery breaks down -- and why self-service data prep technologies (such as those marketed by Alteryx, Paxata, Trifacta, and others) have received so much attention.
Traditionally, the only way to join data when using self-service tools such as Qlik or Tableau was to use the vendor's back-end scripting facility. Unlike their easy-to-use front-end offerings, these tools were ETL-like in terms of their technical complexity.
In other words, you needed both business and technical expertise to use them effectively. Not necessarily development expertise, but technical expertise. This situation is getting better: Qlik (with its acquisition of former ETL best-of-breed Expressor) has improved and simplified its built-in DI and data prep feature sets.
Tableau, meanwhile, announced several new self-service-oriented data prep or data integration capabilities at last November's Tableau Customer Conference. New features include Tableau Union -- a feature that simplifies and automates the process of integrating data from CSV and spreadsheet files -- and support for cross-database joins.
The challenge for these vendors is to transform the features they've traditionally exposed as part of the end-user-oriented self-service experience (e.g., Tableau's Union or the classic self-service prep model) into enterprise-grade services. An enterprise-scale alternative to Union (or the ability to perform cross-database joins) must emphasize reuse, repeatability, and manageability.
Metadata? We Don't Need No Stinkin' Metadata!
Self-service tools are primarily designed for looking at or transforming data -- not for managing it. In the self-service model, there's nothing analogous to the logical metadata layer that (for better or worse) is a fixture of enterprise business intelligence (BI) tools.
Self-serving consumers (or the IT departments charged with serving them) have to resort to kludges to enforce consistent metadata definitions or to create the equivalent of a single, authoritative view of business information.
Traditionally, tools such as Tableau haven't offered much in the way of enterprise reporting, scheduling, or alerting capabilities either. (QlikView is another matter.) The point is that it's extremely challenging to roll out and support self-service discovery or data prep at enterprise scale.
Until recently, self-service vendors used to downplay the value of portable/standardized business definitions. That said, some vendors such as Qlik have taken metadata management far more seriously for far longer.
Established vendors (IBM, Microsoft, MicroStrategy, Oracle, SAP, and SAS Institute, among others) have introduced self-service discovery tools that complement -- and for all intents and purposes integrate with -- their enterprise BI offerings. (In the same way, Qlik's Sense discovery tool is designed to complement -- and supersede -- its QlikView offering.)
Self-service vendors have been consistently dinged by Forrester Research, Gartner, and other industry watchers for their shortcomings with respect to metadata management and other enterprise amenities. Consequently, this is an issue they've started to pay a lot more attention to.
The Tragedy of the Commons
In the classic self-service discovery model, individual users create point-to-point data flow pipes between source and (server or desktop) target. This can and does result in upstream performance problems. Imagine that three or four self-serving users, working individually, decide to schedule batch extracts from the same upstream system at the same time -- bringing that system to its knees.
Data warehouse architecture, although far from perfect, was conceived with just this kind of problem in mind. It consolidates information into a single system and staggers batch extraction jobs to mitigate the impact on upstream data sources.
Over time, self-service practices -- discovery and data prep -- will likely converge on something like a data warehouse (a central repository for persisting, managing, and reusing data visualizations, data flows, and so on) to address this problem.
A Spreadmart in Disguise
There's also a sense in which self-service tools -- discovery tools, in particular -- are a new spin on the spreadmarts of old. This isn't to diminish the impact of self-service technologies. It's rather to observe that their use has a similar effect in practice.
Even though self-service discovery tools presume a high degree of information sharing and collaboration, most of the same issues remain. For example, how do you know which numbers came from which calculations came from which workbook? You don't, because each metric is redone in each workbook.
In the self-service model, as with spreadmarts, there's no scalable, reliable, built-in way to control for data provenance, to track data lineage, or even to ensure that common business definitions ("customer," "product,") mean the same thing from workbook to workbook.
Working Out the Kinks
An absence of visibility and auditability is a problem with self-service in all of its incarnations. It's something that self-service vendors are just beginning to address. There's a whole host of other potential kinks -- especially of the regulatory kind -- that will need to be worked out, too.
If history is any indication, the industry (vendors, integrators, customers, regulators, and other interested parties) will coalesce around a mix of technological, process, and behavioral interventions. These interventions should -- at the very least -- improve these key measures: data integration, metadata management, and auditability.