RESEARCH & RESOURCES

Big Data Strata Hadoop World

At Strata + Hadoop World, vendors scrambled to stake their claims in a big data market most analysts believe is still riding a rising tide of hype.

The recent Strata + Hadoop World conference had a throwback-like feel to it -- a throwback to the 1889 Oklahoma Land Rush. The idea, then as now, was to stake one's claim, work it, and ultimately to prosper in a greenfield market.

In 1889, this involved a pell-mell rush to claim one's piece of the newly-opened Oklahoma Territory. In 2012, it involved a similar pell-mell rush: in this case, to claim one's piece of (or connection to) Hadoop -- and, by extension, one's claim to big data riches.

"Strata" is the brand name for a global series of events sponsored by O'Reilly Media Inc., a publishing enterprise. The event marked its combination with Hadoop World, which once used to be heavily associated with Cloudera Inc.

The atmosphere was frantic – frequently frenetic. Industry veteran Wayne Eckerson, a principal with BI Leader Consulting, likened the experience to another throwback -- in this case, the early days of The Data Warehousing Institute (TDWI).

"It felt like TDWI in 1997 when it had its largest conference ever [of approximately 1,400 attendees] in a ramshackle hotel outside San Diego. Lots of energy and buzz," says Eckerson. A veteran of Hadoop Worlds past, Eckerson says things were different with this year's event.

"This is the third Hadoop World I've attended. The first year, the attire consisted of pony tails, T-shirts, and dungarees -- i.e. open source-inclined app developers," he explains. "We're now seeing a mix of pony tails and suits, as commercial vendors and more mainline companies -- i.e. not Internet and media companies -- jump on the big-data bandwagon."

This year's combined Strata + Hadoop World still had plenty of casual attire. It also featured plenty of data scientists.

Talk to any vendor at Strata + Hadoop World and you'd hear data scientists being extolled as at once invaluable and endangered. Hadoop analytics upstart Datameer Inc. even made the Vanishing Data Scientist the centerpiece of its new AppStore-like exchange for analytic applications.

Plenty of flesh and blood data scientists were in attendance.

One of them, Joseph Turian, Ph.D, president of predictive analytic and natural language processing (NLP) consultancy MetaOptimize LLC, has himself been a featured speaker at past Strata events. Dubbing himself the "Data Doctor," Turian looked the part of a geeky Lucy Van Pelt -- of Peanuts fame -- and offered free, five-minute consultations on data modeling, predictive analytics, NLP, and other issues.

Decked out in a bow tie and with a precisely waxed mustache, Turian used a Peanuts-like backdrop, complete with a "The Dr. is IN" sign. His makeshift booth -- which consisted of a stool for himself and an easel for his sign -- was reliably mobbed.

Data scientists come in more conventional-looking packages, too.

Take Deborah Cooper, a principal with Deborah M. Cooper Consulting, who presented a session on Linking Census and Enterprise Data Sets. Cooper, a veteran of both Fidelity Investments and Putnam Investments, discussed methods and practices for using census data to enrich enterprise analytics and drive market strategy. Dressed in a suit and with her hair well-coiffed, Cooper looked like a familiar boardroom figure. In a former professional life, however, she specialized in molecular genetics and medical demand forecasting

Yes, Strata + Hadoop World had something for everyone.

Geeks Galore

It was odd to see many of the same vendors exhibiting at both Partners and Strata + Hadoop World.

IBM Corp., Alteryx Inc., Hortonworks, Informatica Corp., RainStor, SAP AG, SAS Institute Inc., Tableau, Teradata Aster, and Talend were all there, just to name a few.

Strata + Hadoop World even had a business intelligence- (BI) and data warehousing (DW)-like feel: in addition to Eckerson, prominent industry thought-leader Cindi Howson, a principal with BIScorecard.com, was on site, as were TDWI faculty Marc Demarest (a principal with information management consultancy Noumenal Inc.) and Mark Madsen (a principal with consultancy Third Nature Inc.), who co-presented a session on big data.

Many familiar faces in the BI and DW worlds were on hand, too, including companies such as Actian Corp., Birst, Cirro Inc., Cloudera Inc., Greenplum (a division of EMC Corp.), Hewlett-Packard Co., IBM, Informatica, Kognitio, Microsoft Corp., Oracle Corp., ParAccel Inc., and Pervasive Software Inc.

This first-ever combination of Strata and Hadoop World was also sprinkled with familiar landmarks. It featured the strange, the bizarre, the bazaar-like, and the audacious.

One of the biggest news releases was Cloudera's Impala, which -- in spite of its branding -- has nothing to do with an automobile. Impala is, instead, Cloudera's take on a real-time query execution engine for Hadoop. One of Hadoop's perceived limitations is its batch-iness: barring kludges or proprietary extensions -- one of which is touted by Hadoop distributor MapR -- vanilla Hadoop is a batch-only proposition. Impala promises to change that. Used in combination with its Cloudera's Enterprise Real-time Query (RTQ), officials claim it can support low-latency use-cases, such as interactive querying.

Cloudera's symmetrical competitors -- Hortonworks and MapR -- were likewise in attendance. Only MapR had teed up a news release, however: in this case, version m7 of its Hadoop distribution. According to Jack Norris, vice president of marketing with MapR, the revamped m7 boasts improved HBase performance, along with enhancements that effectively eliminate several existing Hadoop-specific limitations (such as constraints on the numbers of files and the sizes of BLOB cells). MapR teamed with Google Inc. to tout a world record performance in the 1 TB TeraSort benchmark (which MapR and Google broke at the Strata + Hadoop World show) by shaving 13 percent off the previous best mark.

Perhaps of most interest to BI This Week readers were product or service introductions from Datameer Inc., Kognitio, Platfora Inc., ParAccel Inc., TIBCO Spotfire, Tableau Software Inc., and Talend, in addition to SAP. Platfora unveiled what it describes as an in-memory BI platform for Hadoop, complete with an HTML5-based UI. If its technology is as good -- or as bold -- as its marketing, it could be a disruptive force. (We'll dive deeper into Platfora and several other newcomers in a future article.)

Datameer, too, is ambitious: it positions its flagship offering, Datameer Analytic Solution (DAS), as a "business integration platform" for Hadoop.

It is the logical application of a still-coalescing theme: that of Hadoop as the site for enterprise integration, connecting not just the worlds of structured and semi-structured data but the event- or messaging-centric world of application integration, too. At Strata + Hadoop World, Datameer trumpeted a new release of DAS (version 2.1) along with a new AppStore-like market for analytic apps, the aptly-titled Datameer Analytic Applications Market.

With its Hadoop-centric take on integration, Datameer might be on to something. Yves de Montcheuil, vice president of marketing with Talend, outlined a similar vision at this summer's Pacific Northwest BI Summit. At Strata + Hadoop World, Talend staked out its own claim to enterprise integration bragging rights, announcing expanded support for NoSQL repositories in the form of native connectors for MongoDB, HBase, and Cassandra.

Elsewhere, Tableau announced that its discovery environment is certified for Cloudera Impala and announced partnerships with a raft of big data- or Hadoop-centric vendors. Kognitio announced a capacity-limited free version of its analytic database software (it's free up to 128 GB, in a scheme similar to that of HP's Vertica, which offers up to 1 TB of free capacity). ParAccel announced upcoming enhancements for its On Demand Integration modules, including bi-directional support for Hadoop and trigger-driven UDF transformations. Spotfire announced a new industry-specific visualization capability for energy and gas. Finally, SAP announced a new "big data bundle" that centers on SAP HANA and involves partnerships with Cloudera, Hitachi Data Systems (HDS), Hortonworks, HP, and IBM.

In upcoming articles, we'll have much more to say about the products, messages, and trends presented at Strata + Hadoop World.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.