In-Memory NoSQL Brings Real-Time Big Data to More Businesses
By Monica Pal, CMO, Aerospike
- This article was first published in TDWI’s What Works in Big Data. For a free white paper on this topic from
Aerospike, download “The Omnichannel Superstore," or you can view the full list of free white papers from What Works sponsors.
The Internet and mobile computing have
paved the way to new applications that interact
with consumers based on what they are
interested in, what they are doing, and where
they are located.
Early pioneers of interactive applications—
dominated by companies in Internet marketing
and advertising—turned to databases
utilizing in-memory and NoSQL technology
to deliver contextual big data in real time.
Many of these early deployments have been
marked by the ability to handle extremely
large volumes of information within milliseconds.
Typical examples of this are some Internet
ad and marketing companies relying on the
Aerospike in-memory NoSQL database:
- Each of The Trade Desk’s data centers
processes millions of transactions per
second (TPS) for hundreds of application
servers and manages billions of records
and terabytes of data across multiple
clusters
- eXelate runs its databases on Internap
bare-metal cloud servers in four Internap
data centers to manage 700 million
unique user profiles across 60 billion
transactions per month, while accessing
20 TB of real-time context
- Federated Media has databases in five
data centers to manage more than 180
million monthly ad impressions across
145,000 sites
At the same time, companies are realizing
exponential gains in efficiency. Among the
most notable is AppNexus, which has a trading
desk that handles up to 50 million ad
impressions per day, but has been able to
reduce its cluster size by 84 percent, from
50 servers to 8 servers, using the Aerospike
database and Intel solid-state drives.
More Businesses Rely on Real-Time
Big Data
Databases built for real-time big data can
efficiently use server and storage resources,
so engaging in customer interactions based
on large volumes of contextual data is no longer
restricted to businesses with large multidata-
center deployments.
A case in point is Snapdeal, India’s largest
online marketplace, which has a network of
more than 20,000 sellers serving over 20
million members—one out of every six Internet
users in India.
The Snapdeal.com platform enables sellers
to list products for sale on the site, manage
inventory, and make pricing changes in real
time based on what is happening in the
marketplace. High volume—for example, a
pair of shoes sells every 30 seconds—means
that thousands of sellers are making dynamic
price adjustments. This results in Snapdeal’s
inventory and pricing management system
processing more than 500 writes per
second.
“In two years, we have scaled more than
200 times—the number of products
listed, the number of sellers, the amount
of business they do, the number of servers,
storage, and the technology team—
everything has grown 200 times,” said
Amitabh Misra, Snapdeal vice president of
engineering.
From a seller standpoint, Misra explained,
“An efficient marketplace requires that
sellers be able to push their updates in
real time. As more sellers sign up and
more products are listed and more price
changes are made, we knew we needed
to scale the system.”
Faster Responses from Fewer
Database Servers
To support its inventory and pricing
system, Snapdeal initially deployed 10
MongoDB NoSQL database servers with
5 GB of data in DRAM as a cache in front
of MySQL. However, as the business scaled
and more sellers made price adjustments
on more products, the MongoDB response
times shot up from five milliseconds to more
than a full second. This not only compromised
the consumers’ shopping experience,
but it also led to lost revenue opportunities.
As a result, Misra said, “We decided to
evaluate a variety SQL and NoSQL technologies,
including Aerospike’s in-memory NoSQL
database.”
Today, the Java-based Snapdeal inventory
and pricing management system uses Aerospike
to provide predictable sub-millisecond
responses while managing 100 million-plus
objects stored in 32 GB of DRAM. The data
stored includes seller and product IDs, inventory,
seller rankings, and pricing attributes.
Product and price changes are made to both
Aerospike and a MySQL database while seller
rankings and product details are read from
Aerospike.
The implementation runs on two Linux servers
on the Amazon Elastic Compute Cloud
(EC2) and it takes advantage of Amazon
Elastic Block Store (EBS) for persistent blocklevel
cloud storage.
“Aerospike has really come out with flying
colors, and between 90 to 99 percent of the
time, we have been getting the same consistent
numbers,” Misra said.
“With our past database, whenever there was
a search in concurrent price updates from
many services, we saw degradation in the
buyer experience,” Misra continued. “Now
with Aerospike, we can push through huge
price changes while maintaining the same
response time experience on the buyer’s
side—even with millions of buyers. That has
been the biggest advantage.”
- This article was first published in TDWI’s What Works in Big Data. For a free white paper on this topic from
Aerospike, download “The Omnichannel Superstore," or you can view the full list of free white papers from What Works sponsors.