Paxata Riding High as Data-Prep Complement to Tableau
Paxata is riding high. In September, it signed a partnership and reseller agreement with Cisco. In October, it announced improved integration with Tableau's self-service data visualization software.
- By Stephen Swoyer
- November 10, 2015
Paxata Inc. is riding high. In September it signed a partnership and reseller agreement with Cisco Systems Inc. At October's Tableau Customer Conference (TCC), held in Las Vegas, Paxata announced improved integration with Tableau Software Corp.'s self-service data visualization software. Paxata announced a new "ClicktoPrep" feature it says will permit analysts to invoke Paxata's "edit" or "filter" mode from within Tableau itself and to switch between the two environments. In this way, analysts can explore, standardize, prepare, and -- if necessary -- clean data before blending it while continuing to work in Tableau.
Paxata likes to position itself as a natural complement to Tableau. It's one of several self-service data-prep offerings that enjoys wide use among that company's customer base. (Paxata's competitors include Alteryx Inc. and Trifacta Inc., among others.) Tableau and similar offerings -- products such as TIBCO Software Inc.'s Spotfire or Qlik Inc.'s Sense -- provide built-in facilities for "blending" data from different sources. Before data can be blended, however, it must be prepared: explored, profiled, and standardized. In some cases it must be cleansed, too.
If this data is sourced from the warehouse or from data marts, it may require little, if any, data preparation. Business intelligence (BI) discovery is interested in more than just curated warehouse data, however. BI discoverers seek out raw data from OLTP systems; personal data from spreadsheets; open data from publicly available data sets; JSON, YAML, and other file objects from Web, cloud, or social sources; and so on. This is the logic of self-service data preparation, according to Paxata CEO Prakash Nanduri: it exposes a guided, visual user experience in order to simplify and accelerate the time-consuming process of preparing data for blending.
"In the self-service business intelligence and analytics space, the missing gap [is that] for every analytical exercise you're spending as much as 80 percent or more of your time in data prep. That's a key, key, key point," he stresses.
Nanduri doesn't cite an empirical source for this 80-percent figure. In common usage, neither does anybody else. It's uncontroversial. So uncontroversial, in fact, that it's taken as gospel that anywhere between 60 to 80 percent -- or more -- of a would-be analyst's time is spent preparing data. This isn't to pick on Nanduri and Paxata -- virtually every vendor that has any play in the data integration or data preparation space likes to trumpet this 80-percent claim.
The figure itself isn't manufactured out of thin air, however, he says.
"I've been in enterprise software for close to 25 years now, in 1999 I left Booz Allen and founded my first company," he says, citing subsequent work with Tibco and SAP AG, among others.
"My background is very much in data management and BI. On the basis of this experience, I am convinced -- very, very clearly convinced -- that what's needed is an explosion in innovation on the self-service data preparation side, and that this [explosion] isn't going to come from IT. IT is only going to give us more of the same. The existing [data integration] is not designed for non-IT people. The self-service BI users have kind of hit the wall [with self-service tools], and now they're going into [data] prep because they need to do things more at scale and in a more consistent fashion."
Scale is the issue. Traditionally, self-serving BI discoverers have used a spreadsheet (nominally Microsoft's Excel) to explore, profile, and transform data. In working with OLTP data, CSV files, or with smaller data volumes, this is still a workable approach, Nanduri concedes. In preparing multi-structured data -- and, moreover, in doing so at anything like big-data scale -- it's much less workable, he argues.
"Basically, 80 percent of the people who do data prep in a company are actually non-technical business-analyst types. These are the folks who know Pivot Tables, they know vlookups. More and more they're having to struggle with the larger and larger amounts of data they're dealing with. They can't do it in Excel, and they have to have a tool for self-service data prep," Nanduri explains.
"In an enterprise, there are personal data sources, such as Excel spreadsheets. There are external data sources, such as Thomson-Reuters, Bloomberg, etc., and there's a whole lot of other sources of data that need to be harnessed. The challenge is to drive [the production of] clean, consumable information that can be blended and analyzed."
This, says Narduri, is the rationale behind Paxata's partnership with Cisco, which markets a best-of-breed data virtualization (DV) product, Cisco Data Virtualization.
"The combination is very natural, very powerful. It's about making sure that you're able to access as many sources of data, wherever they come from, as possible. The Paxata value proposition is about addressing not just IT, not just the business, but the combination of both," he concludes.
"We are going towards a world of hyper-converged environments. Cisco has been one of the leaders of hyper-convergence" -- Narduri here cites Cisco's Unified Computing System (UCS) strategy -- "and this combination is hugely important, because the stack is changing."