TDWI Upside - Where Data Means Business

2016: An Uncanny Year in Review

This year will go down as a strange, even uncanny, kind of year.

It was a year in which data and analytics failed catastrophically, with polls and data-driven models failing to capture the substantial base of support for President-elect Donald Trump. It was also a year in which mainstream culture discovered data, analytics, and, above all, artificial intelligence (AI).

In 2016, AI became an explicit -- and, in a sense, unavoidable -- topic of popular discourse.

Elsewhere, 2016 was a year in which the big data market (or, at any rate, its most visible players) got a comeuppance of sorts. It was a year in which bread-and-butter business intelligence (BI) all but officially died -- with a "modern," and arguably less capable, BI platform emerging in its stead.

Let's plunge into the year that was -- 2016.

AI As Object of Cultural Fascination

Way back in 2005, noted futurist Ray Kurzweil predicted that a renaissance in AI was all but imminent. Kurzweil's book, The Singularity Is Coming: When Humans Transcend Biology, argued that the rapid development of artificial intelligence (AI) technologies would wholly transform human existence. Kurzweil didn't think we'd experience singularity until 2045.

Amazon, Apple, Facebook, Google, IBM, Microsoft, Oracle, and others are doing their best to accelerate this event. AI -- or a certain kind of AI, at least -- went mainstream this year.

From the ubiquitous -- and now-normalized -- personal digital assistant to the still-jarring prospect of the self-driving car, AI was everywhere in 2016. Just this month, Amazon teased us with "Go," its vision of a grab-and-go, checkout-free, AI-managed convenience store. Terms such as "deep learning" and "neural networks" began appearing -- alongside other, more banal concepts (e.g., "machine learning") -- in mainstream news publications, including the New York Times and the Washington Post.

Entertainment, too, warmed to AI: films such as Her (2013) and Ex Machina (2015) proved there was an audience for AI-themed topics. In 2015 and 2016, AI made the transition to TV, with the BBC's Humans (2015) and HBO's Westworld (2016) serving up two very different takes on the topic. Westworld, in particular, offers a new and notable spin on the venerable AI horror genre.

Compared to the more-human-than-human robot hosts of Westworld, the AI of the present, our AI, is a rather mundane thing. This isn't to say it isn't capable of breathtaking achievements, however.

This year, Google's AlphaGo program achieved a Singularity-like coup, besting a human opponent (5-0) in the ancient game of Go. AI watchers believed it would be a decade or more before machine intelligence could consistently beat a top-flight human opponent in Go, which has many more possible positions (2.082 × 10^170) than there are atoms in the universe (approximately 10^80).

The thing is, AlphaGo is an example of a specific, limited kind of AI. It's created and trained to do one thing -- and one thing only. For most AI enthusiasts, the Holy Grail of AI is so-called "artificial general intelligence," or AGI, a posited higher AI that's able to perform -- or experience -- any human intellectual task, including self-awareness. This is what they mean by "Singularity."

The AI technologies Amazon, Apple, Facebook, IBM, Microsoft, Oracle,, and other companies are working on are no less function-specific. What they're doing with them, however, suggests how Kurzweil's looked-for Singularity might come to pass. Late this year, for example, Amazon released a trio of new AI offerings -- Lex, Rekognition, and Polly -- for its Amazon Web Services (AWS) platform. These services provide natural-language conversation, image recognition, and text-to-speech capabilities, respectively. Individually, they're all instances of function-specific artificial intelligence. Because they're exposed as microservices, however, developers can call and embed them into their apps. As a result, a single app can exploit multiple function-specific AI services. Combine enough AI services and you get closer to an approximation of human intelligence.

That's the theory, anyway. What happens in practice is anybody's guess.

Look for AI to be hyped even more in 2017. Just don't look for Kurzweil's Singularity anytime soon.

IoT: Time to Believe the Hype?

IoT was huge in 2016, too. It's unclear exactly for whom it was huge, however. In the last 18 to 24 months, a slew of vendors and services firms introduced IoT-themed offerings. Several, such as Cisco Systems, Hewlett-Packard Enterprise, IBM, Microsoft, Oracle, SAP, SAS, and Teradata, just to name a few, made IoT-related announcements in 2016, some of them major. Gartner, International Data Corp., and lesser market watchers also project a future in which IoT is a very big deal indeed.

The key words in that last sentence are "project" and "future." Almost all IoT-related discussion is focused on future possibility -- not present reality. We're told IoT will be a force for massive cultural change and that it will transform how people consume and pay for products. We're told IoT will be a no less powerful engine for economic change, transforming how companies manufacture and distribute their products. That it will have implications for everything from the factory floor to the logistics of shipping and receiving. We're told IoT will open up new markets, shake up long-standing partner and supplier arrangements, and bring about completely new competitive relations.

Still, how much of this is actually happening right now? Sure, most vendors can point to a textbook IoT customer example or two -- organizations they say are meeting the "disruptive" challenge of IoT head on. The claims vendors and analysts are making about IoT go beyond this, however: IoT is supposed to be a force for large-scale transformation. We're just not seeing that yet.

According to a Gartner report from earlier this year, for example, almost half of all organizations don't yet have a clear picture of IoT-related business benefits. Nearly half say they're worried about hiring employees with IoT-related skills -- or, no less challenging, training existing employees for IoT. Almost 40 percent say there's no clear market leadership for IoT right now. (See "User Survey Analysis: Findings Point Clear Path Forward for IoT Solution Providers," March, 2016.)

In a separate report, more than half (51 percent) of respondents cited IoT-related security as a top technological challenge, and nearly half (43 percent) named IoT-related integration challenges. Many organizations are stil struggling with IoT basics such as determining requirements or retooling their data management infrastructures for IoT. (See "Survey Analysis Users Cite Ambitious Growth -- and Formidable Technical Challenges -- in IoT Adoption," March, 2016.)

The upshot is that organizations are currently cooling their heels when it comes to IoT.

There's every reason to believe IoT will be a bona-fide force for transformation -- probably sooner than most of us think. The potential stakes are just too great. Companies that don't get on Team IoT risk being outmaneuvered -- and, ultimately, outmoded -- by traditional rivals and new competitors alike. To put it bluntly: IIoT isn't the kind of paradigm shift you can sit out.

Big Data Battleground Shifts to the Cloud, Spark vs. Hadoop Isn't a Thing

Big data isn't dead, although it isn't quite the irresistible force it used to be, either.

In 2016, in fact, the vendors most closely associated with hyping and selling big data technologies -- and, in particular, the Hadoop environment -- seemed to run smack up against that most immovable of objects: Amazon and its formidable AWS cloud stack -- but we're getting ahead of ourselves.

At this summer's Pacific Northwest BI Summit, Gartner analyst Merv Adrian predicted that Hadoop as we know it will soon cease to exist. Adrian wasn't predicting the end of the Hadoop ecosystem nor of the technologies that collectively comprise the Hadoop stack. He was anticipating the obsolescence of Hadoop as a marketing differentiator -- and with it, the end of an era.

Cloudera, Hortonworks, and MapR used to trumpet their Hadoop platform underpinnings in their marketing efforts. That's changed, according to Adrian. The Hadoop pure plays aren't running away from identifying with the Hadoop platform, he argued -- but they aren't playing it up anymore, either.

If and when Hadoop ceases to exist as a marketing differentiator, it will probably be because the largest known purveyor of Hadoop-based services, Amazon, doesn't explicitly promote it as such.

Amazon sells more Hadoop capacity and hosts more Hadoop instances than every other vendor combined, Adrian said. "There is a helluva lot more Amazon out there than most of us thought. We all kind of knew it was going on but [Amazon was] so opaque in their financial reporting," he said.

"Today, [Amazon] has more users of Hadoop than all of the other vendors in the market combined. Period. There are thousands of active users with [Amazon's] EMR [Elastic MapReduce]," Adrian pointed out. "The most any of the indies will tell you is 'we're getting close to 1,000 [commercial users] now.' We have some 800 to 900 [users] numbers from some of the pure-play guys, and IBM, Microsoft, none of them have given us specific numbers publicly about what they have."

EMR provides capacity for Hadoop, Spark, and other compute engines. Amazon uses EMR services extensively -- and usually transparently -- throughout AWS. In November of this year, for example, it announced Athena, an interactive SQL query facility for its Scalable Storage Service (S3). Athena uses a SQL interface called Presto, which requires a separate compute engine. S3 is a storage-only service. It doesn't have a baked-in compute engine. Behind the scenes, Amazon uses Presto running in conjunction with a Hadoop instance in EMR to process Presto queries.

AWS and EMR aren't the only options for Hadoop and Spark in the cloud. Microsoft, which offers Hadoop and Spark capacity via its HDInsight service for Azure, is getting a good bit of traction with enterprises, too. Most Hadoop-related client inquiries to Gartner's Data Team had to do with Cloudera, Hortonworks, IBM, and MapR Technologies -- in that order -- but Microsoft was number five, Adrian said. Google this year also introduced its own EMR-like service, Google Compute. All indications are that the big data battleground -- for both Hadoop and Spark -- is shifting to the cloud.

Speaking of Spark, it was even hotter, if that's possible, in 2016. This prompted speculation that Spark had supplanted Hadoop in the imagination -- if not the production data centers -- of enterprise customers. This is wrong-headed, however. As Abraham Lincoln might put it, Hadoop and Spark need not be enemies, but, rather, friends. The two technologies can and do complement one another. Unlike Hadoop, Spark wasn't conceived as an all-in-one platform for distributed compute and scalable distributed storage. Spark was designed as a high-performance, in-memory compute engine. Hadoop itself isn't a database, but Spark is even less of a database than Hadoop, which (via HBase and Hive) provides hierarchical database and relational database-like services.

Spark is popular because its programming model is much more flexible and intuitive than the map and reduce model used in versions of Hadoop prior to 2.0. Because it's a compute engine, Spark doesn't implement a persistence layer comparable to the Hadoop distributed file system (HDFS) or to post-HDFS successors such as Kudu. In most configurations, Spark runs on top of Hadoop and uses HDFS as a persistence layer. It's managed via Hadoop's native cluster manager (YARN) and uses the Hive Metastore for managing metadata and schema information.

There's nothing either/or about this. It is, instead, a mostly complementary win-win.

Bye Bye, Bread and Butter BI

We've said little about business intelligence so far, but BI was a hot-button issue in 2016.

This year, we wrangled over everything from the definition and scope of BI to -- soberingly -- its future. It felt as if BI as we've known it died in 2016 -- only to be reborn as a kind of afterthought.

Market watcher Gartner kicked off a tempest early in the year when it tweaked the criteria it uses to include and rank vendors in its annual Magic Quadrant for BI tools and platforms.

"We have different gating criteria. You have to meet the definition of the 'modern' BI platform because that's where the buying is going. An easy example is Microsoft's SQL Server Reporting Services [SSRS]. Reporting Services is not a modern BI platform, it's an IT-centric, reporting-centric BI platform," Cindi Howson, a research vice president with Gartner, explained to Upside.

In Gartner's 2016 Magic Quadrant for Business Intelligence and Analytics Platforms, some vendors -- Oracle foremost among them -- completely disappeared. Oracle Business Intelligence Enterprise Edition was nowhere to be found in Gartner's report. In the same way, several significant players (IBM, Information Builders, MicroStrategy, SAP, and SAS) found themselves bumped to the ranks of also-rans -- i.e., "Visionaries" or "Challengers."

Gartner is by no means the last or most authoritative word in what makes BI, well, BI. With its revised Magic Quadrant report, however, it formally recognized the clear distinction in the market between do-it-yourself BI discovery tools and traditional IT-centric BI technologies. Gartner's revised Magic Quadrant reflects a reality in which the control and ownership of information has been wrested from IT. The long-dominant IT culture of centralized planning and control is dead, Gartner seemed to be saying -- and, with it, the traditional, IT-centric BI paradigm.

In its place, the market watcher came up with a new category: the "modern" BI platform, the "Leaders" of which are Tableau, Qlik, and Microsoft, with its Power BI platform. Ironically, however, the gating criteria for Gartner's "modern" BI platform explicitly exclude the features and capabilities that used to make BI, well, BI. Our BI. In fact, Gartner shunted traditional BI -- are we supposed to start calling it "legacy" BI now? -- into a new category: "Enterprise-Reporting-Based Platforms."

Gartner now produces a separate publication (Market Guide for Enterprise-Reporting-Based Platforms) to track what it calls "IT-centric" BI and reporting platforms. BI practitioners will recognize in the criteria for these platforms most of the elements of a traditional production reporting system. Production reporting was the engine that used to power -- and, for all intents and purposes, defined the limits of -- "enterprise BI." As the new Gartner report explained: "[Enterprise-reporting-based platforms are] most often deployed against a well-modeled data warehouse and/or data mart, including an optimization layer featuring online analytical processing (OLAP) cubes."

It's hard to avoid the conclusion that BI as we've known it has effectively been demoted, isn't it?

A Plea for 2017 and Beyond: A Recognition of Core Principles

Not so fast. Yes, the ongoing information renaissance is exciting and promising, even awe-inspiring. However, the core problems of integrating, managing, facilitating access to, and -- most important -- disseminating information at scale are as challenging as ever.

They will likely pose significant challenges for some time to come.

By themselves, BI discovery tools do little to address these challenges. Ditto for self-service data prep, machine learning, or statistics tools. The availability of faster, easier-to-program parallel processing compute engines, such as Hadoop and Spark, isn't the answer.

Nor are the new Mathematica-like "notebook" tools (such as Jupyter and Apache Zeppelin) that have become popular among data scientists. Advanced machine learning technologies, function-specific AI, and other products of the information renaissance aren't the answer -- yet, anyway.

Something like traditional BI, in combination with data warehouse architecture, is.

This isn't to pit old versus new technologies, "modern" versus "legacy" platforms. It's to recognize that BI -- as both a tool for discovery and the collection, management, and dissemination of information and insights -- is a critical foundation stone of the ongoing information renaissance. The relational data that is grist for most BI-analytic use cases will also be used -- as one among several sources of data -- in and for advanced analytical applications. Organizations will continue to depend on reports and dashboards to support operational business decision making. In the same way, BI reports and dashboards will increasingly be enriched with insights derived from new technologies and practices, such as machine learning and exploratory analytics. The shift to the so-called "modern" BI platform doesn't set up an either-or choice between "modern" and traditional BI. It's both-and.

The mainstreaming of advanced machine learning technologies and exploratory analytics does nothing to invalidate your existing BI programs. The (mostly) business-user-driven success of Tableau-, Qlik-, or Power BI-like discovery tools doesn't either. BI platforms won't be as IT-centric as they were in the past but they will retain most of the same features and capabilities. Whether Gartner recognizes it or not, bread-and-butter reports, dashboards, and (something like) ad hoc query are non-negotiable requirements for day-to-day business decision-making. These same capabilities will, over time, find their way into the new, "modern" BI platforms, too.

That's it for 2016. It's been a heck of an annum. We'll see what 2017 had in store for us next year.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.