ThoughtSpot Touts a Data Warehouse-less Take on Ad Hoc Query -- and Dashboards
ThoughtSpot dubs itself a "relational search appliance."
- By Stephen Swoyer
- May 19, 2015
Start-up ThoughtSpot Inc. touts a Google-like take on business intelligence (BI) search and analysis.
Sound familiar? There has been no shortage of Google-themed BI search offerings, starting with Google Inc.'s seminal enterprise search appliance, which famously garnered support from established BI players such as Information Builders Inc. (IBI) and the former Business Objects SA.
Since then, major powers such as IBI, Microsoft Corp., MicroStrategy Inc., SAS Institute Inc., and SAP AG -- along with newcomers Neutrino Concepts Ltd., DataRPM Inc., and the former Endeca (acquired by Oracle Corp. in 2011) -- introduced BI search offerings of their own, too. At one point or another, most of these vendors have used the term "Google-like" to describe their products. It's just good marketing.
ThoughtSpot says it has something its more established rivals don't: a pedigree that's deeply steeped in search. Co-founder and director of engineering Shashank Gupta logged time at both Yahoo! Inc. and at Amazon Inc. (Gupta was a senior engineering manager at Yahooo! and at Amazon he led a data mining team.) Similarly, co-founder and CTO Amit Prakash helped developed Google's ingenious (and highly lucrative) AdSense program. (Prakash helped Microsoft with its Bing search service, too.) In ThoughtSpot, Gupta, Prakash, and their other co-founders (seven in all, including CEO Ajeet Singh, co-founder of virtualization specialist Nutanix) say they've created an altogether New Take on BI: think of it as ad hoc query for the world of big data, or -- even better -- as data warehouse-less ad hoc query or dashboards. Make that a data warehouse-less ad hoc query or dashboards appliance.
ThoughtSpot's own marketing calls their product -- Analytical Search Appliance -- a "relational search appliance."
"What we've built is an appliance. We ship a box. It's standard hardware but it's pre-configured with our software. Each node supports around a terabyte of data," says Anand Raghavan, ThoughtSpot's director of product marketing. The number of nodes will depend on a customer's concurrency requirements, Raghavan points out: the more active concurrent users, the more nodes, the more "our product looks like a database to the external world, but the performance that we can get from a [pre-configured/optimized] appliance is far superior to any database or database appliance," he adds.
ThoughtSpot doesn't actually eschew a data warehouse. It can run against DWs, RDBMSs, spreadsheets, cloud services -- provided they expose a SQL interface or can be accessed using third-party data integration tools, or services -- and other relational sources. "The EDW is one potential data source, the second is any cloud [service]. People can use [ThoughtSpot's Connector for] Informatica Cloud to extract data from those [cloud] sources, and if you have a bunch of spreadsheets, you can do a mash-up of them in one place," Raghavan explains.
ThoughtSpot doesn't work as you'd expect a search engine or appliance to work. For one thing, it doesn't cache search results: instead, it compiles search terms down to SQL, such that it can actually store or persist compiled queries. ThoughtSpot doesn't pre-compute averages, Raghavan points out.
"Once [data] comes in, our relational search engine starts to work [and] starts indexing data. We don't pre-compute averages. ... We actually just store where the different data is located so that when you type the query, we know where to get it," he says.
"For the appliance, the node piece, we ... can do a lot of parallel loading, so we can get data into the system quickly. If you did the data load in the morning, in the afternoon your business users are ready to start searching."
Even though the company explicitly cultivates comparisons with Google, this is true only up to a point. "We don't want you to be able to do this on your desktop with a small set of data. We want you to be able to scale across billions of rows with terabytes of data -- and [we want to] still deliver sub-second response times," Raghavan indicates. "Google is the one product in the world that billions of people have been 'trained' on and have used. The bar at Google is to return an answer for you in a matter of seconds. [For] our engineering team, the bar [to do this] is milliseconds. Anything that returns something in more than 200 milliseconds does not meet our expectations."