AMD Improves Yield Predictions with Cloudera-Powered Engineering Data Warehouse
By Karina Babcock, Senior Manager, Customer Advocacy, Cloudera
Advanced Micro Devices (AMD) is a multinational
semiconductor manufacturer that
designs and builds graphics cards and
microprocessors that power millions of the
world’s personal computers, tablets, gaming
consoles, embedded devices, and cloud
servers. The world’s leading PC and video
game console manufacturers have AMD
technology inside.
AMD relies on manufacturing test data
to ensure product quality and perform
engineering analysis to improve upon its
world-class product designs. The company
decided to migrate from a relational database
management system (RDBMS) to an
Apache Hadoop-powered Cloudera system
that can process and store more detailed
and historical data at low cost.
The result:
- Up to 300 percent faster query
performance over larger volumes of data
- Ability to store more than 90 percent of
available data elements over 1.5-plus
years of history
- An environment that requires less
support and enables cost savings to the
company
- Fewer integration tools and steps
The Challenge
The company wanted to empower its
engineers by giving them access to larger
data sets at faster speeds. However, the
incumbent environment stored less than 30
percent of available data elements, was built
with several different integration tools, had
many integration steps, used a traditional
RDBMS, and relied on a large IT team
delivering various skill sets to support and
maintain.
An environment outage in 2011 took weeks
to recover from, so AMD initiated an engineering
data warehouse (EngDW) project to
find a more agile, cost-effective solution and
a simpler, more robust way to store, process,
and fetch larger amounts of data for AMD’s
engineers.
A couple of engineers experimentally downloaded
raw Apache Hadoop and saw the
potential of the platform and the capabilities
it could enable. After spending a few months
proving the technology internally, AMD
decided to move forward with Hadoop as
its new EngDW platform. The team realized,
however, that configuring all the components
and making them work together would be a
challenge.
Cloudera offered a bundled, open source
configuration of Hadoop that had already
been tried and tested (CDH), and was the
only commercial big data vendor that would
deliver on-site, customized training through
Cloudera University. AMD knew that training
would be essential to adoption of the
platform, and that once the EngDW was
ready for production, the team would need
enterprise-grade support. This drove the
organization to sign up for a Cloudera Enterprise
subscription.
The project progressed from start to finish
as follows:
- Downloaded Apache Hadoop onto three
development nodes from Apache.org
- Rebuilt environment with CDH 3.x
- Engaged Cloudera for on-site developer
training
- Purchased Dell servers powered by
AMD Opteron processors
- Signed up for a Cloudera Enterprise
subscription
- Development team finished data
application and integration while infrastructure
team integrated Cloudera with
existing configuration management,
security, and monitoring
- Went live with small user base (parallel
roll out)
- Purchased more Dell servers powered
by AMD Opteron processors
- Upgraded to CDH 4.x
- Rolled out application to full user set
- Retired and shut down previous environment
Since moving into full-speed production
with the Cloudera environment, AMD has
upgraded its version of CDH four times, and
the project is still going strong.
The Solution
AMD deployed the Dell Cloudera Solution
for Apache Hadoop. AMD runs a 34-node
production cluster today (with a 5-node
development cluster), which collects data
throughout the manufacturing process.
Hundreds of millions of new digital and parametric
test readings are loaded to the cluster
every day. At the heart of the EngDW project
are CDH and Apache HBase (the non-relational,
distributed open source data store). A
custom query engine reads from HBase to
put the test measurements into the hands of
the company’s engineers.
AMD’s Cloudera environment consists of
28 data nodes, 2 master nodes, and 4
edge nodes, adding up to 276 terabytes.
Leveraging open source Hadoop projects
using Apache Hive, Apache ZooKeeper,
HBase, HDFS, httpfs, LZO compression,
and MapReduce on top of the core CDH
platform, AMD has achieved significant performance
improvements over the previous
traditional relational database environment.
Impact: Better, Faster Analytics Helps
Improve Business Visibility
AMD’s decision to move from an RDBMS to
a Hadoop platform that uses Cloudera on
Dell servers powered by AMD Opteron processors
has resulted in orders of magnitude
performance improvement, in terms of both
data loads and analytics. Query times have
been reduced by up to 300 percent, running
on larger data sets than before. Ninety-nine
percent of all queries execute in 15 minutes
or less, with a median execution time of just
23 seconds. (See graph, below.)
Queries on hundreds of thousands of units
execute two orders of magnitude faster than
before.
Data reloads at a rate of three months per
day, whereas it used to take a full day to
reload 1.5 days’ worth of data—that’s 60
times faster.
“Hadoop technologies have allowed us
to develop a solution that empowers our
engineers to access larger product data
sets in drastically reduced times.”
—Jonathan Jarvis, Product Development Engineer, AMD
Not only has AMD’s EngDW project brought
significant performance benefits, but it
also delivers greater functionality and value.
Query results on EngDW now have an unlimited
row limit, compared to the previous limit
of just 100,000 rows (which had been set
to ensure queries would return results in a
given period of time). The EngDW project’s
Hadoop-based cluster allows AMD to store
more than 90 percent of available data
elements spanning 1.5-plus years’ history,
whereas the previous system stored less
than 30 percent of data available for only
three to four months’ history.
Now that AMD engineers can access greater
amounts of test data in higher detail and
at faster speeds—applying data insights,
debugging, and making continuous improvements—work is completed to ensure the
company delivers world-class products that
enable today and inspire tomorrow.
Impact: Greater Simplicity, Lower Cost
AMD has not only recognized performance
benefits through its Hadoop-based platform,
but has also significantly reduced the total
cost of ownership (TCO) through:
- Lower vendor support costs for relational
database management software
- Less vendor support for data integration
tools and software
- Fewer steps and tools needed for data
integration
- Less vendor support for high-end storage
arrays (external SAN storage)
- Smaller IT support staff needed for end-to-end management
All in all, by migrating its engineering data
warehouse to a big data platform powered
by Cloudera, AMD’s product testing workflow
benefits from analytics that are faster, more
comprehensive, and more flexible, in an
environment that delivers significant cost
savings and greater operational efficiency.