New, Updated Products Abound at TDWI World Conference
An unprecedented number of vendors unveiled new products at last month's TDWI World Conference in Las Vegas.
- By Stephen Swoyer
- March 5, 2013
In a sense, last month's TDWI World Conference in Las Vegas was overshadowed by another, bigger industry event: the Strata 2013 conference in Santa Clara, Calif., held just one week later. Nevertheless, an unprecedented tally of vendors unveiled new products at the conference, including several players that planned to exhibit at Strata.
Take open source software (OSS) specialist Hortonworks, which announced three new products or enhancements. Although it saved one of its biggest announcement -- the availability of a version of its Hadoop distribution for Microsoft Windows -- for Strata, Jim Walker, director of product marketing with the company, had plenty of news for TDWI's business intelligence- (BI) and data warehousing-savvy (DW) audience.
"We look at it as a way to fine tune our message and who we want to be -- particularly for [the world of] enterprise data management. When you think about Hadoop and where it's at, it's already being adopted by the enterprise: we're past the stage where it's just [proof of concepts]. We're seeing adoption in the enterprise," Walker told BI This Week. "If you think about Hadoop and the way it's going to be adopted, there's this huge world of [BI and DW] ecosystems that's built around it. That's why we're announcing what we did here."
Hortonworks announced "Stinger," a new interactive query feature for Hive, the RDBMS-like overlay for Hadoop; a proposed security gateway specification (dubbed "Knox") for Hadoop; and "Tez," an Apache incubator project that aims to boost the performance of Hadoop and MapReduce -- including Apache Hive -- by developing a next-generation runtime based on the Apache YARN project.
"The idea with Stinger is to improve Hive to become more interactive," he explained, adding that Hortonworks claims a 100x increase in performance. Walker distinguishes between Hortonworks' approach with Stinger and that of competitor Cloudera (which at last year's Strata + Hadoop World announced Impala, its own take on interactive-query-for-Hive). "We haven't done this in private with a team of guys. It isn't proprietary. We're committing of the [Stinger] code back to the [Hadoop] community."
Teradata and SAP Unveil Major New Releases
Data warehousing powerhouse Teradata announced version 5.10 of its Aster Discovery Platform, which officials modestly described as a "data-science-in-a-box" solution.
The revamped version of the Aster Discovery Platform ships with what might be called an "app studio" for big data, says Manan Goel, senior director of product marketing for Teradata Aster.
"What we did was we introduced the industry's first visual SQL MapReduce functions," he explained. There's a four-stage workflow associated with big data discovery, Goel pointed out; most of the stages of this workflow are siloed -- or discrete -- from one another. As a result, developers or data scientists must use multiple tools to build big data apps.
"It takes a long time to implement a big data project. 80 percent of [a data scientist's] time is spent on trivial tasks like acquiring and profiling data," Goel continued.
"We're introducing a new visualization module ... [in which are embedded] multiple SQL MapReduce functions for doing a lot of things -- but a customer [has only to] write one single SQL statement and go from data to visualization." The revamped Aster Discovery Platform ships with 20 new visualizations, according to Goel. "There are three main areas of visualization: there's a flow visualizer, a hierarchy visualizer, and an affinity visualizer. These live in-database, they are in-process, and they are analytics aware."
Also at the TDWI World Conference, application software giant SAP unveiled version 16 release of its IQ columnar analytic platform. IQ 16 is two years in the making, according to Tom Traubitz, director of product marketing and analytics product strategy with SAP.
Instead of tabulating IQ 16's dozens of new features, Traubitz used another metric -- namely, manual (or "man") pages -- to describe the new IQ's impact: with some 300 new man pages, he argued, IQ 16 is nothing less than a milestone release.
"Thematically, what it's really about is being able to help our customers go from terabyte to petabyte scale. I know that's almost become a cliché in the marketplace, but when you look at columnar systems, there were a lot of things that needed to be done [to enable petabyte-scale]," he explained. "First and foremost was how to deal with loading that amount of data, so we did a number of things that allow start-able/stoppable loads, drip loading, [and] continuous loading technologies."
IQ 16 also boasts improved query handling capabilities, Traubitz said.
"We did a number of things in the columnar engine to improve performance. We now allow significantly different ways of partitioning the columnar indexes, so the load when you're loading 100 TB of data is the same performance as the load on an empty database," he continued, explaining that -- owing to the constraints of the classic columnar architecture -- columnar data stores tend to bog down as data volumes increase.
"Generally new attributes or rows are added on to the end [of the existing data], and this generally becomes a cost. Loading 10 GB a day is extremely fast when the system is brand new and happy; when you have these larger volumes, loading 10 GB can take 10 to 15 hours. We've leveled that off by reorganizing the columnar engine."
SAP enabled this by implementing what Traubitz called a "row version concurrency control system" that (by design) resides in IQ's in-memory cache. As a result, he indicated, "users can actually query against the data as it's loading into IQ."
Another big new feature is robust multiplex failover, which has been a long time coming to IQ. "We really wanted to harden [this] so that if a system within the multiplex failed, its workload would be restartable while it also would not derange the workload in other parts of the system," Traubitz noted.
Talend, Vitria, and WhereScape Make Waves
When it first debuted late last year, Amazon's Redshift data-warehouse-as-a-service platform captivated both the attention and the imagination of the BI and DW industry.
Amazon wasn't at the TDWI World Conference, but OSS data integration (DI) specialist Talend was.
What's more, Talend tapped the TDWI event as an occasion to unveil a new DI connector for Redshift. According to vice president of marketing Yves de Montcheuil, Talend and Amazon started talking early.
"They approached us pretty early in the game. They wanted to have a full-fledged data integration platform to support Redshift," he explained. "Their positioning [with Redshift] is to democratize data warehousing by making it inexpensive and easy to deploy, with zero management. Their price point is extremely aggressive: they charge $1,000 a year per TB. As everyone knows, the traditional ETL tools are extremely expensive."
Talend plans to offer Redshift connectors in both its pay-for-use (Enterprise) and free (Community Edition) DI software. De Montcheuil says Talend's Redshift connector is by no means exclusive or unprecedented. To put it another way, Talend isn't betting solely on the Redshift platform.
"Last summer, we did a similar thing with Google and their BigQuery platform," he explained, adding that Google's rationale was similar to that of Amazon's. "They wanted to have enterprise-grade ETL that would be able to load queries."
Application integration vendor Vitria Technology Inc. also unveiled version 4.0 of its Operational Intelligence (OI) platform at the TDWI conference.
Vitria's specialty -- the real-time processing and analysis of streaming or sensor data -- has suddenly become The Next Big Thing, at least if several vendors (including OSS data integration specialist Talend, OSS analytic database player Infobright, as well as IBM Corp. and start-up complex analytics specialist Paradigm4 Inc.) are right.
Vitria CTO Dr. Dale Skeen points to his company's longevity -- it was founded in 1994 -- and says that OI has been Vitria's bread-and-butter for nearly a decade. "Operational intelligence is continuous, real-time analytics over streaming data: data in motion as opposed to static data. The term 'real-time' is thrown around a lot, but [the term] 'continuous real-time' [describes] the load latency as the information is coming in: it's [the practice of] continuously analyzing this stream of information so that you can detect problems as they arrive."
Operational intelligence applies "to problems that traditionally BI cannot address," Skeen explained, arguing that traditional BI and data warehousing are bound -- hidebound, even -- to rigid schematic definitions. Unfortunately, most streaming or event data doesn't conform to these definitions.
"We're schema-less. Regardless of whether [the data we're consuming is in the form of] flexible schemas, richer schemas, [or] information that's partially structured -- we use third-party tools to help mine it for meaning. If it's partially structured or richly structured, all of the structure and semantics are inherently exposed in the data itself," Skeen indicated.
Also at the Las Vegas event, WhereScape Inc. made it official: the party -- or the free lunch -- is over. As of the TDWI World Conference, WhereScape's 3D prototyping tool -- which first entered open beta 18 months ago -- is no longer free. More to the point, said CEO Michael Whitehead, WhereScape already has several paying customers (at $20,000-per seat) in the U.S.
"We've made multiple sales into Fortune 500 accounts. We tried to get press releases here [to promote a recent sale to] a prominent networking vendor, but they declined to be a public reference," he commented.
WhereScape has always promoted 3D as a prototyping solution: a tool for planning, scoping out, and designing data warehouses. To this end, it's able to capture the workflow involved in warehouse planning and design, generate documentation (including data source mappings), and perform other prototyping tasks.
One thing it doesn't (officially) do is integrate with WhereScape's RED data warehouse generation tool. That's in part because WhereScape positions 3D as a general-purpose DW proto-typing tool -- and not as a RED-specific offering, said Whitehead.
"We want it to be a planning product for your data warehouse. Originally we were quite strict on that, [such that] the output on that was a document that you could then take and develop any way you want. What we did in the end was we added the ability to output tables with data in them," he said. "Through user feedback from the agile community ... we [added the ability to] go and discover a source system, profile that source system, and then populate it with a sample set of data. This gives front-end developers sample reports and other things before they start building the actual data warehouse."
Others Get into the Act, Too
Predictive analytics specialist Predixion announced the general availability of version 3.0 of its predictive analytic platform, as well as a new "Community Edition" download-and-go license for its software. In this last respect, Predixion seems to be taking its cue from information discovery specialist Tableau Sofware Inc., which offers a free (for a single user) downloadable version of its software.
Its focus has changed over the last two years, but one thing hasn't changed, maintained CEO Simon Arkell: Predixion still aims to demystify predictive analytics. This means shifting the focus from predictive analytics as a pure discipline -- i.e., its esotericism, its complexity, its relation to and derivation from machine learning -- to predictive analytics as an applied practice, he said.
"We try and stay away from selling the concept of predictive analytics and focus more on selling solutions to expensive problems," Arkell explained.
"We absolutely do do predictive analytics, and we have a fantastic platform that can be used to create highly accurate predictive models, but we're always focusing on [determining the] concrete problem that the accuracy of our predictive models [is supposed to] solve. The point isn't necessarily to have the most accurate models; it's to use these [models in combination] with solutions that are able to effectively address problems."
Coming into the TDWI World Conference, ParAccel Inc. was riding high, thanks to its role as what it says is the enabling database technology for Amazon's Redshift data warehousing service. The company touted a new "right-to-deploy" licensing option that vice president of marketing Rich Ghiossi said will take some of the punitive guess-work out of capacity sizing.
Right to deploy eschews traditional per-terabyte licensing in favor of use-case-specific or per-node licensing terms. Customers can license ParAccel for specific use-cases -- e.g., as a massive data archive or analytic sandbox; on a per-node basis -- and deploy as much compute capacity as they want.
"What we're saying [with right to deploy] is that you ought to be able to use our system to get whatever value you want out of that. Whatever you're doing, you pay us a [fixed] price for a year or two or three and [this gives you] the unconstrained ability to expand it any way you want," Ghiossi said.