RESEARCH & RESOURCES

AMD Improves Yield Predictions with Cloudera-Powered Engineering Data Warehouse

By Karina Babcock, Senior Manager, Customer Advocacy, Cloudera


Advanced Micro Devices (AMD) is a multinational semiconductor manufacturer that designs and builds graphics cards and microprocessors that power millions of the world’s personal computers, tablets, gaming consoles, embedded devices, and cloud servers. The world’s leading PC and video game console manufacturers have AMD technology inside.

AMD relies on manufacturing test data to ensure product quality and perform engineering analysis to improve upon its world-class product designs. The company decided to migrate from a relational database management system (RDBMS) to an Apache Hadoop-powered Cloudera system that can process and store more detailed and historical data at low cost.

The result:

  • Up to 300 percent faster query performance over larger volumes of data
  • Ability to store more than 90 percent of available data elements over 1.5-plus years of history
  • An environment that requires less support and enables cost savings to the company
  • Fewer integration tools and steps
The Challenge

The company wanted to empower its engineers by giving them access to larger data sets at faster speeds. However, the incumbent environment stored less than 30 percent of available data elements, was built with several different integration tools, had many integration steps, used a traditional RDBMS, and relied on a large IT team delivering various skill sets to support and maintain.

An environment outage in 2011 took weeks to recover from, so AMD initiated an engineering data warehouse (EngDW) project to find a more agile, cost-effective solution and a simpler, more robust way to store, process, and fetch larger amounts of data for AMD’s engineers.

A couple of engineers experimentally downloaded raw Apache Hadoop and saw the potential of the platform and the capabilities it could enable. After spending a few months proving the technology internally, AMD decided to move forward with Hadoop as its new EngDW platform. The team realized, however, that configuring all the components and making them work together would be a challenge.

Cloudera offered a bundled, open source configuration of Hadoop that had already been tried and tested (CDH), and was the only commercial big data vendor that would deliver on-site, customized training through Cloudera University. AMD knew that training would be essential to adoption of the platform, and that once the EngDW was ready for production, the team would need enterprise-grade support. This drove the organization to sign up for a Cloudera Enterprise subscription.

The project progressed from start to finish as follows:

  1. Downloaded Apache Hadoop onto three development nodes from Apache.org
  2. Rebuilt environment with CDH 3.x
  3. Engaged Cloudera for on-site developer training
  4. Purchased Dell servers powered by AMD Opteron processors
  5. Signed up for a Cloudera Enterprise subscription
  6. Development team finished data application and integration while infrastructure team integrated Cloudera with existing configuration management, security, and monitoring
  7. Went live with small user base (parallel roll out)
  8. Purchased more Dell servers powered by AMD Opteron processors
  9. Upgraded to CDH 4.x
  10. Rolled out application to full user set
  11. Retired and shut down previous environment

Since moving into full-speed production with the Cloudera environment, AMD has upgraded its version of CDH four times, and the project is still going strong.

The Solution

AMD deployed the Dell Cloudera Solution for Apache Hadoop. AMD runs a 34-node production cluster today (with a 5-node development cluster), which collects data throughout the manufacturing process. Hundreds of millions of new digital and parametric test readings are loaded to the cluster every day. At the heart of the EngDW project are CDH and Apache HBase (the non-relational, distributed open source data store). A custom query engine reads from HBase to put the test measurements into the hands of the company’s engineers.

AMD’s Cloudera environment consists of 28 data nodes, 2 master nodes, and 4 edge nodes, adding up to 276 terabytes. Leveraging open source Hadoop projects using Apache Hive, Apache ZooKeeper, HBase, HDFS, httpfs, LZO compression, and MapReduce on top of the core CDH platform, AMD has achieved significant performance improvements over the previous traditional relational database environment.

Impact: Better, Faster Analytics Helps Improve Business Visibility

AMD’s decision to move from an RDBMS to a Hadoop platform that uses Cloudera on Dell servers powered by AMD Opteron processors has resulted in orders of magnitude performance improvement, in terms of both data loads and analytics. Query times have been reduced by up to 300 percent, running on larger data sets than before. Ninety-nine percent of all queries execute in 15 minutes or less, with a median execution time of just 23 seconds. (See graph, below.)

Queries on hundreds of thousands of units execute two orders of magnitude faster than before.

Data reloads at a rate of three months per day, whereas it used to take a full day to reload 1.5 days’ worth of data—that’s 60 times faster.

“Hadoop technologies have allowed us to develop a solution that empowers our engineers to access larger product data sets in drastically reduced times.”
—Jonathan Jarvis, Product Development Engineer, AMD

Not only has AMD’s EngDW project brought significant performance benefits, but it also delivers greater functionality and value. Query results on EngDW now have an unlimited row limit, compared to the previous limit of just 100,000 rows (which had been set to ensure queries would return results in a given period of time). The EngDW project’s Hadoop-based cluster allows AMD to store more than 90 percent of available data elements spanning 1.5-plus years’ history, whereas the previous system stored less than 30 percent of data available for only three to four months’ history.

Now that AMD engineers can access greater amounts of test data in higher detail and at faster speeds—applying data insights, debugging, and making continuous improvements—work is completed to ensure the company delivers world-class products that enable today and inspire tomorrow.

Impact: Greater Simplicity, Lower Cost

AMD has not only recognized performance benefits through its Hadoop-based platform, but has also significantly reduced the total cost of ownership (TCO) through:

  • Lower vendor support costs for relational database management software
  • Less vendor support for data integration tools and software
  • Fewer steps and tools needed for data integration
  • Less vendor support for high-end storage arrays (external SAN storage)
  • Smaller IT support staff needed for end-to-end management

All in all, by migrating its engineering data warehouse to a big data platform powered by Cloudera, AMD’s product testing workflow benefits from analytics that are faster, more comprehensive, and more flexible, in an environment that delivers significant cost savings and greater operational efficiency.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.