TDWI Upside - Where Data Means Business

The How and the Why of Emerging Technologies and Methods

At their best, emerging technologies and methods potentially extend, complement, or enhance the BI, analytics, and DW status quo. In all cases, they also address the core needs -- for agency, empowerment, and perceived competency -- of frustrated users and IT groups.

A new Best Practices report from TDWI Research explores the role of so-called "ETMs" in relation to business intelligence (BI), analytics, and data warehousing (DW).

ETM is shorthand for "emerging technologies and methods." Basically an ETM isn't so much a what as a how. Hadoop is an ETM, as is the larger category of NoSQL databases. Self-service is an ETM, and so are data visualization, cloud, the Internet of Things (IoT), mobile BI, advanced analytics, and multi-structured data.

According to TDWI, the ETM-liness of a technology or method has to do with the ways in which it potentially extends, complements, or enhances the BI, analytics, and DW status quos.

"[T]he innovations and excitement of ETMs can make BI, DW, and analytics more appealing, pervasive, insightful, and actionable," write co-authors Fern Halper, Philip Russom, and David Stodder. (Halper is director of advanced analytics for TDWI Research; Russom, director of data management; Stodder is director of BI.) "There are many ETM types," the trio continues; "[T]his report focuses on those that currently enable real-world use cases in BI, analytics, and DW."

Needful Things

Another way of looking at this is to see that ETM-liness is a function of needfulness.

The more pressing or painful the needs of a group of people, the more likely are they to be using emerging technologies and methods -- such as self-service BI discovery or Not-only-SQL (NoSQL) databases. In many cases, they're using ETMs because their needs aren't being addressed by IT and, specifically, data management (DM). In still others, need is a function of pressure from without: perceived pressure from competitors, perceived inability to cope with the pace of change, with growing volumes of data, with the demands of customers, suppliers, etc.

The TDWI report bears this out. The bulk of its findings are based on a survey of more than 400 respondents (303 of whom completed every question) conducted this year by TDWI.

It suggests that even though ETMs typically lack maturity and aren't always clearly recognized or understood by those who use them -- one survey respondent told TDWI that "We always use 'big data' as the single term to describe ETMs" -- their use strongly correlates with need.

"Users who[m] TDWI interviewed for this report regularly spoke of how having ETMs that competing firms don't have can typically provide a competitive edge. For example, using real-time ETMs to review and approve new loans or new insurance policies faster than competitors helps to both get new customers and retain old ones," Halper, Russom, and Stodder write.

"[K]nowledge workers from logistics companies have spoken at TDWI conferences about how adding more sensors and devices to vehicles, shipping pallets, and other mobile assets ... has enabled them to innovate with geospatial and near-time data, thereby remaining competitive."

Ironically, there's no necessary relation between the thematic reasons people give for adopting ETMs and their measurable effects in practice. This is because the people or groups using ETMs don't always know what those effects are. Among respondents, the most frequently cited reason for using emerging technologies and methods was that they provide a "competitive advantage." More than half (56 percent) of all survey participants gave just this reason. Respondents were not asked if they'd made any attempt to substantiate their claims of competitive advantage, however.

We "know" that ETMs gives people new and different capabilities and that ETMs extend existing practices and use cases, but does the use of ETMs actually translate into competitive advantage in practice? It's more likely that ETMs make people feel as if they're acting substantively -- doing something -- to respond to perceived needs and pressures. That's a good thing.

Other reasons respondents gave for using ETMs are that they open up new sales channels (self-service apps, monitoring, social media) or add new platforms (Hadoop, specifically; NoSQL technologies in general). In all cases, there's the belief that the value -- e.g., competitive advantage, new revenue-generating apps or services -- is there. Again, it isn't clear that users have made any follow-up effort to substantiate this value, however. This could lead to the pessimistic conclusion that ETMs are used blindly: i.e., people are adopting them because they're new, because they claim to address cutting-edge use cases, or because they purport to extend existing use cases.

There's probably a little truth to this, but it's also true that people take up and use ETMs because they're frustrated -- by the strictures of governance; by the unresponsiveness of IT and, especially, of data management; by the shortcomings, limitations, and unsuitability of existing technologies, tools, and methods: in short, by feelings of powerlessness -- by a felt lack of agency.

Among other things, ETMs permit people to self-medicate -- usually harmlessly, sometimes efficaciously. Business users, for example, tend to experience the use of cloud and self-service ETMs as freeing because both technologies permit them to do stuff. (Speak to any new user of Tableau, for example, and you'll get exactly this sense.) Traditionally, one big reason for this was that they weren't being deployed, managed, and policed by IT. In practice, they tended to go out-of-band around IT, around competency centers, and around different kinds of governance mechanisms.

Line-of-business users aren't the only beneficiaries, either. NoSQL technologies permit IT to address new or historically neglected use cases and to work around the limitations of existing data management technologies. NoSQL technologies have also given IT slightly more leverage in dealing with traditional suppliers: some vendors -- particularly BI and data integration players -- have slashed prices, tweaked pricing models, or offered more licensee-friendly terms in response to pressure from NoSQL, cloud, and other ETM disrupters.

Finally, NoSQL technologies enable IT to respond to the needs of users instead of, for example, responding with an inflexible "No" or "That's not possible" to user requests. (It's worth noting, too, that the adoption and use of NoSQL was primarily driven by general IT -- not by data management. Seven years ago, DM was blithe to NoSQL; five years ago, it was dismissive. DM's grudging acceptance of NoSQL is a recent phenomenon.)

Just What I Needed

Take ETMs in the BI space, for example. Nearly four-fifths (79 percent) of respondents said that self-service data discovery technology was at least "Somewhat important;" overall, with almost half of respondents (47 percent) deemed it "Very important." Ditto for data visualization for query and analysis: 78 percent deemed it at least "Somewhat important" and 44 percent of all respondents said it was "Very important." Self-service dashboard authoring was also recognized as critical, with 76 percent of respondents rating it "Somewhat Important" (43 percent of all respondents deemed it "Very important"). Respondents cited search and exploration, analytic application platforms, self-service data prep, analytic appliances, self-service data mapping and transformation tools, in-memory computing technologies, storytelling and collaboration, cloud-based BI and analytics, and software-as-a-service in general as important ETMs.

If this sounds like a smorgasbord, it is. As Halper, Russom, and Stodder note, however, most of these technologies fit into just a few critical categories: the biggest, by far, is self-service BI, which includes data visualization for ad hoc query, self-service dashboard authoring, and, of course, self-service data discovery. Another critical category is self-service data prep, which -- thanks to "data wrangling" software from Alteryx Inc., Paxata Inc., Trifacta Inc., and others -- has come on strong. Both self-service categories address the same core need. Which is what, exactly?

"Users would also like to see more rapid development and deployment of new BI and analytics applications as well as activation of new features for their existing applications, according to our research. More than half (55%) of research participants said that users in their organizations are dissatisfied with the amount of time development and deployment is taking, with 23% not satisfied at all. Significantly fewer 42% are satisfied," Halper, Russom, and Stodder explain in the report.

Take the profusion of Hadoop-oriented ETMs. Survey respondents cited a raft of open source ETMs, including Apache Drill, Apache Flume, Apache Hive, Apache Spark, Apache Storm, Apache Tez, and Apache Kafka, along with Presto. Talk about a smorgasbord!

To recap, Drill is a distributed query technology, Flume is a service for collecting logs and streaming them into Hadoop, and Hive is a SQL interpreter for Hadoop that compiles SQL queries into MapReduce or Tez jobs. (Tez is an alternative framework or engine for Hadoop that -- unlike MapReduce -- supports interactive processing.) Spark is a cluster computing framework that includes a library (Spark SQL) that enables SQL query processing. Storm is a stream-processing and analysis engine that is usually paired with Kafka, a message-queuing engine. Presto, finally, is an alternative ANSI-SQL-on-Hadoop project. These technologies variously complement, overlap with, or basically reduplicate, one another. On the one hand, this preponderance of OSS ETMs speaks to the needs that Not-only-SQL (NoSQL) platforms such as Hadoop address.

"With these technologies, users can gain new perspectives on data relationships, examine context around BI reports and key performance indicators [KPIs], and search and analyze unstructured big data generated by customer behavior, social media, and more," Halper, Russom, and Stodder write.

Why so many? Because NoSQL as a category is itself immature. Organizations adopted NoSQL technologies to address requirements that couldn't (or weren't) being met by existing data management technologies. These NoSQL offerings were (and are) works in progress, however.

"Although Hadoop and NoSQL ETMs are important for organizations to consider deploying to store and analyze huge volumes of multi-structured data, they have presented challenges in terms of accessing them from BI, data discovery, and visual analytics tools and applications that are built to work with relational data structures," the TDWI Research report points out.

"Connecting these tools and applications to Hadoop and NoSQL data sources has typically required specialized development of custom MapReduce code and scripts and, even if commercial software is used, customization for queries as well as the ETL routines needed for each source."

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.