Cisco Takes on Data Prep in Manhattan
The old barriers keep breaking down, as Cisco demonstrates with its forthcoming Cisco Data Preparation product. The networking behemoth positions its foray into data prep as a complement to its more traditional data virtualization technology.
- By Stephen Swoyer
- November 3, 2015
At the Strata + Hadoop World Conference in New York City, Cisco Systems Inc. trumpeted its new partnership with Paxata Inc., a respected purveyor of self-service data preparation software. Also at Strata + Hadoop World, Cisco announced Cisco Data Preparation, a new data-prep solution that's the first fruit of that partnership.
With its spreadsheet-like interface, built-in data visualization capabilities, and library of machine learning (ML) algorithms under the hood, Cisco Data Preparation looks an awful lot like Paxata's titular flagship offering, but Cisco's first foray into data prep isn't just a rebranded version of Paxata, officials insist.
Instead, Cisco Data Preparation will integrate with Cisco's best-of-breed data virtualization (DV) technology. Just what form the integration between and among Cisco Data Preparation and Cisco's DV technology will ultimately take is unknown. Cisco's new data prep offering isn't yet generally available. More to the point, it's a version 1.0 product, but Cisco's own efforts with self-service data virtualization suggest at least one way in which the two can come together.
Cisco vaulted into the DV business more than two years ago when it acquired DV pioneer Composite Software. At the time, Composite had been working on the equivalent of a self-service capability for data virtualization with its "Composite Collage" offering, a business directory that's built on top of Composite Information Server. Composite Collage permits IT practitioners and savvy business analysts to build "curated assets" that can in turn be consumed by non-technical users. In most cases, Cisco officials say, these curated assets will still need work. At the very least, they'll need to be adapted to the requirements of data scientists or business analysts.
That's one way in which Cisco Data Preparation can and will be used. In this scheme, said an Cisco official we spoke to, consumers could browse Cisco's data virtualization Business Directory, select the curated data sets that are of interest to them, and ingest those sets into Cisco Data Preparation. From there, they can explore this data, as well as profile, blend, and transform it as needed. (In the data prep space, this is sometimes called "data wrangling". However, the use of the term wrangling was first coined by a Paxata competitor.
The potential for integration isn't just one-way, this official said. On their own, self-service data preparation offerings -- much like their front-end kith, self-service business intelligence (BI) discovery tools -- don't provide a feedback loop for engineered or wrangled data. They likewise don't provide robust data governance and security facilities. (With respect to self-service BI discovery, this is a problem that Tableau Software Corp., Qlik, and other vendors have worked hard to address.) In this way, assets that have been wrangled or customized in Cisco Data Preparation could be fed back into Cisco Data Virtualization's Business Directory as well as into the Information Server engine that powers Cisco's DV technology.
"Certain result sets from that [wrangling process] should be shared in an industrialized model. We'll bring those back through the Information Server and share those with everybody," said this Cisco official, who was not authorized to speak on the record. "Also, this kind of [encourages] that better partnership between business and IT. [Companies can] blend the integration approach that's appropriate to the specific need. If [the need is] a quick ad hoc data set for single analysts, they can use [Cisco Data Preparation]. If it's a more industrialized need, if it's going to be [consumed by] dozens or hundreds of users, they can use data virtualization."