RESEARCH & RESOURCES

DataRPM Seeks to Heat Up BI and Analytic Natural Language Search

The search for a clear front-runner in the business intelligence and analytic search market is far from over.

The search for a clear front-runner in the market for business intelligence (BI) and analytic search is far from over. DataRPM Corp., a vendor specializing in natural language search (NLS), hopes to have something to say about this.

The last 18 months have seen a swarm of activity focused on NLS, not least of which was Microsoft Corp.'s Q&A technology, which officially debuted with Redmond's Power BI suite in February. Also last year, UK-based NeutrinoBI stepped up its own efforts, shipping version 4 of its natural language processing search platform; Oracle Corp. shipped a new version 3.0 release of its Endeca NLS offering; and Information Builders Inc. (IBI) announced a new NLS-capable version of its trusty WebFOCUS Magnify search offering.

IBI's approach with WebFOCUS Magnify differs from those of Microsoft, NeutrinoBI, and Oracle Endeca in that it's based on a pair of open source technologies -- namely, the Apache Software Foundation's Lucene search library and Solr, a full-fledged enterprise search platform for Lucene.

DataRPM, too, makes use of Apache Lucene and Solr to deliver what it claims is a Google-like, point-and-click take on NLS. You don't have to aggregate, prepare, model, or index data, claims CEO Sundeep Sanghavi -- you just use DataRPM's Web-based app to navigate to a source and search against it.

"Our mission is to provide visualization from machines to humans in a human way. We see this as totally different from every BI solution, [where] you have to learn how to use their software, where the software gets in the way of data," Sanghavi told BI This Week late last year.

DataRPM's marketing collateral describes the product as "schema-less," but this isn't strictly true. (There's no structure without schema; semantics is schema.) What actually happens is that DataRPM constructs a just-in-time-schema -- i.e., a kind of schema-on-access approach, according to Sanghavi. For example, when a user accesses a source for the first time (or when an IT department configures access to that source), the DataRPM software indexes it.

There's a lot going on under the covers, Sanghavi said: even though DataRPM uses Lucene and is based on Solr, it implements its own functions and algorithms to index a source (preferably source metadata, if available), identify entities and relationships, and generate schema.

"We don't care if you want to see this information today or if you might want it tomorrow, we're going to index all of it," he said. "Our index is based on metadata from the RDBMS or from the streaming database. It's based on the importance of the data itself."

DataRPM exploits the Hadoop distributed file system (HDFS) for storage. It uses Apache Sqoop to get at data stored in RDBMS platforms and Hadoop, and Apache Shark to query against Hive. DataRPM can also access Apache Spark clusters, which are typically layered over HDFS.

"We'll hook into any RDBMS, we'll hook into Hive, [with] Shark, we'll hook into Spark. All of that information goes through our HDFS framework," Sanghavi explained.

On the other hand, DataRPM's competitors, such as IBI, NeutrinoBI, and Oracle Endeca, claim to be able to get at the same (or, in IBI's case, even more) sources. They also expose NLS in an intuitive, visual context. How does DataRPM expect to differentiate itself?

For one thing, Sanghavi pointed out, DataRPM's user experience (UX) is explicitly Google-like. He notes that industry luminary Cindi Howson -- who regularly assesses BI and analytic tools for BIScorecard.com -- has praised DataRPM's Google-like UX. (At last year's Strata+Hadoop World conference in New York City, Howson tweeted: "datarpm got a wow out of me with their google like viz BI on top of any data.")

Second, he maintains that DataRPM's competitors bring a traditional BI/data warehouse-focused perspective to search. "We believe the data warehouse is where data goes to die. We've seen it over and over again. They [the data warehouse team] say, 'We'll sit with it, we'll get your requirements, we'll go away for months or years, and come back with something.' What this is, without asking what data you're going to look for, you can drop it into the search index and search [against it]."

Both NeutrinoBI -- which explicitly does not require a data warehouse, and which implements the equivalent of a data virtualization layer to get at disparate data -- and IBI, which (after all) shares two key open source technologies with DataRPM, would certainly take issue with this characterization.

According to Sanghavi, however, there's another, more significant difference: a product such as NeutrinoBI wants to be a general-purpose search tool; DataRPM -- like the former Oracle Endeca technology (but also like IBI's WebFOCUS Magnify) -- emphasizes a targeted app-dev model: organizations embed (in this case) DataRPM NLS capabilities into business process-specific or domain-specific apps. You can use DataRPM as a general-purpose NLS technology, he said, but it's designed to be embedded and effectively productized.

"We're focusing on any enterprise that wants to embed natural language analytics as part of [exposing search] to their customers, regardless of whether it's [i.e., data] sitting in a CRM app, an ERP app, or whatever. DataRPM lets them pull information [from these systems] back to their customers in the simplest way possible," Sanghavi indicated.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.