TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
  - From Silos to Insights: Centralizing Data to Drive AI July 24, 2025
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Executive Summit AI Accelerate 2025, Brought to You by AI Boadroom & TDWI August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Aerospike Accelerates Specialty Analytics

Upstart player Aerospike is betting there's a market for a highly specialized analytics platform.

By Stephen Swoyer
February 3, 2015

Upstart player Aerospike Inc. is betting that there's a market for a highly specialized analytics platform. How highly specialized? Aerospike's NoSQL database -- dubbed, appropriately enough, "Aerospike" -- is optimized for so-called "transactional analytic applications."

Just what those apps are and why they're important is a bit complicated.

"The problem we started out with was that it's hard to build a high-scale app on the Internet that would actually stay up and available. The primary problem in our view is the data-store layer, and we felt that the best solution [to this problem] was to focus on key-value [pairs]," explains Brian Bulkowski, Aerospike's CTO. "In our speak, a 'database,' is really a primary key. If you need to go beyond that [with user joins and other things], three's usually going to be an amorphous penalty that you don't understand. Key-value pair is where predictability is."

What does this mean? Why key-value pairs? What kinds of problems is Aerospike trying to address?

Part of the answer has to do with how advertising analytics works. Starting in around 2010, the ad analytics market switched from storing session information on the client (as "cookies") to storing it on both the client and the server. There's a lot more going on in the background, but from a data management (DM) perspective -- and to Bulkowski's point -- most of this server-side session information is written as key-value pairs. Most Web-serving workloads (like most non-analytic workloads, for that matter) are read-intensive. Generally, speaking, you're going to be reading from disk (or Flash or memory) exponentially more than you're going to be writing to them. Not so with server-side sessionization, which requires scalable, 50/50 read and write performance.

"The advertising guys said, 'We need something just like this that does 50/50 read-write workloads, and something that has a baked-in level of reliability.' When they thought about it some more, they [decided they] wanted something that could use shared, attached storage, too," Bulkowski explains, noting that he and co-founder Srini Srinivasan had built just a system in "Citrusleaf," a distributed, fault-tolerant, NoSQL key-value store optimized for Flash (i.e., solid-state disk, or SSD) storage. When they thought more about the problem, however, Bulkowski and Srinivasan realized they'd need more than just a Flash-optimized data-store layer.

"We said, 'We really do think it makes sense to do a little analytics on that front-side store without ETL-ing [it out to an analytics platform], so we built out capabilities to do that," he explains.

Enter "Aerospike," which is a hybrid or melding of the former Citrusleaf Flash-optimized key-value store and an open source software (OSS) database called AlchemyDB. AlchemyDB, which Aerospike acquired in 2012, is based on the popular Remote Dictionary Server, or Redis, an OSS key-value pair data store.

AlchemyDB is much more than just another key-value store, however. As a NewSQL eventual consistency database, it implements a SQL-like language that permits it to process SQL queries. AlchemyDB can also implement graph functions using both SQL (for indexing) and an open source scripting language called Lua, which is used to express graphing logic. Add it all up and you have a database engine -- Aerospike -- that Bulkowski says can process "real-time analytics workloads for applications that require millisecond or sub-second response times.

"We're the hot data store, the thing on the front side of an application server. There are going to be a whole bunch of databases behind us, and they're going to be used for all sorts of different analytics [i.e., workloads]. If you think about the kinds of queries you want to run on the front-batch [i.e., Aerospike], they're usually pretty simple. They usually don't have a lot of complicated join information. For example, I know we have 30 days of data in that front-end store, but in the last day or two, what happened to this audience or this advertising campaign or this pool of users?"

Think of this as "single-column" (but non-columnar) analytics. Because Aerospike is trying to solve a highly specialized problem, it does things -- or makes assumptions -- that would make traditional data management (DM) practitioners uncomfortable. For example, it doesn't do data validation -- at least not on ingest. "There's the data validation portion of schema, and then there's the I-have-to-know-how-to-index-if-I have-an-index portion," Bulkowski notes, explaining that Aerospike builds secondary indexes and can also index on column values. "For the first side, we don't do data validation on input -- but the second side, which is schema management for the purposes of indexing, we have it. We have a SQL-like tool that maintains a catalog table. There's contention resolution."

Aerospike also exposes a SQL query interface. Bulkowski argues that SQL's usefulness is radically underappreciated, at least among traditional or Web application developers. "We think SQL is the most natural way of expressing a lot of different queries, including streaming."

Two Different Visions of the Future

Bulkowski positions Aerospike as a kind of point solution. It's designed to address a very specific problem -- namely, real-time or transactional analytics -- which imposes hard requirements with respect to performance and availability. Web advertising is a good example -- e.g., Aerospike works in the background to figure out which ads to serve up based on a person's browsing history -- but Bulkowski cites similar requirements in utilities, financial services, manufacturing, and other markets.

To this end, Aerospike last year significantly ramped up its sales and marketing, appearing at several industry trade events, including O'Reilly Inc.'s Strata + Hadoop World in New York. BI This Week caught up with Aerospike at last summer's O'Reilly Open Source Convention (OSCon), in Portland Ore. At OSCon, Bulkowski answered lots of questions about his company's embrace of OSS, which had occurred just one month earlier. "We're pretty happy having satisfied this performance-intensive niche in advertising that we have a code base that's really, really hardened. We want to go wide with this, [and] that requires an open source model," he said, explaining the move.

Aerospike's specialty pitch flies in the face of the irrepressible human demand for an all-in-one fix for all possible workloads (or for all workloads in a related domain). What we're seeing with some takes on the Hadoop platform -- e.g., Cloudera's Enterprise Data Hub vision -- or with megafauna-like systems from Oracle Corp. (Exadata) and SAP AG (HANA) are articulations of this all-in-one fixation. These and other technologies are still too shortsighted, however, Bulkowski argues. After all, he points out, we're still figuring out what we're going to do with data. How are we to know what the data architecture of the future -- or of 10 years from now, for that matter -- will look like?

Why, then, should BI or data management practitioners care about Aerospike? On the one hand, Bulkowski argues, it's optimized for a problem -- viz., real-time analytics, be it in the context of sales or marketing campaign optimization -- that few other offerings can credibly address. On the other hand, he says, it's a fast, in-memory engine that can accelerate certain kinds of NoSQL and traditional SQL analytics. Insofar as it exposes a SQL interface, it can be accessed and queried by traditional BI tools. Above all, he claims, Aerospike is just one of several critical components of a next-generation data "layer."

His is a vision of a layer of optimized engines for specialty processing and of one-size-fits-most engines for general-purpose processing. Data is vectored to where it needs to go -- to a platform such as Aerospike for certain kinds of transactional analytics, to Hadoop for long-term storage in HDFS, to the data warehouse -- and data movement is itself minimized.

"We're not going up against Hadoop. I think my view of the back-end analytic space is [that] HDFS [i.e., the Hadoop distributed file system] is going to eat the world but that Hadoop is not. Hadoop is just one style, so to speak. Sometimes I want key value [storage] on my ... petabyte data set, and then I go through HBase. Sometimes I want to use something like a Spark-style streaming that's going to mate nicely with HDFS. There's not going to be one query layer and one app layer. There must be one data storage layer because we can't ETL anymore, and it's driving everybody crazy."

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Aerospike Accelerates Specialty Analytics

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research