CASE STUDY - ACNielsen Selects a High-Performance Data Transformation Solution for Europe’s Largest Retail Sales Data Factory
Commentary by Craig Abramson, Technical Analyst, Syncsort Incorporated
ACNielsen, a VNU business, is building what is to become the largest retail sales data factory in Europe. New Factory is designed to analyze sales data from many retailer channels and countries throughout the region. The analysis is intended to provide insight into such interesting questions as:
- How much impact did a specific promotion have on product sales?
- How well did brands perform in comparison with other brands?
- How successful was the launch of a new product?
It is through innovations like New Factory that ACNielsen has become the world’s leading marketing information provider. Offering services in more than 100 countries, the unit provides measurement and analysis of marketplace dynamics and consumer attitudes and behavior. Clients rely on ACNielsen’s market research, proprietary products, analytical tools, and professional service to understand competitive performance, to uncover new opportunities, and to raise the profitability of their marketing and sales campaigns.
When we started developing our data factory application, called New Factory, we knew that performance was going to be an issue.
—Michael Benillouche, Technical Director, ACNielsen
The need to complete a large number of complex aggregations represented a potential performance bottleneck for New Factory. Further, due to the necessary computation of non-additive distribution facts, simple roll-up or cube functions would not be sufficient. These are just some of the reasons why ACNielsen began a thorough search to find a powerful, high-performance data transformation solution that could complete the aggregations. DMExpress from Syncsort Incorporated proved to be that solution.
DMExpress was installed by ACNielsen in a proof of concept to aggregate an initial 2.7 billion facts over four different dimensions, varying in hierarchy depth from two to nine levels. According to Technical Director Michael Benillouche, “When we started developing our data factory application, called New Factory, we knew that performance was going to be an issue. We searched for a solution that could handle the high volume of data we were processing in the shortest amount of time. After considering ETL software from major vendors, we selected DMExpress. DMExpress easily integrated into New Factory’s distributed computing framework and provided us with the outstanding results we needed.”
ACNielsen tested DMExpress in New Factory, running it on a large-scale UNIX server with 16 CPUs, 32 GB of memory, and terabytes of disk arrays. The server is capable of delivering data at a sustained rate of 600 MB/sec. Once in production, data will be constantly processed in this carefully designed factory. It is estimated that 12 billion sales facts will be aggregated along four different dimensions each week in order to aggregate the thousands of data scopes accessed through the New Factory Web applications.
ACNielsen discovered that as data volumes grow, so do the performance advantages of DMExpress. With the ability to process in parallel, DMExpress speeds through data-intensive applications. Application development is also much faster with the advanced, easy-to-use graphical user interface (GUI). Instead of focusing on processing the data, you can use the time to create what you need. Discussing ACNielsen’s use of DMExpress for aggregation, Andrew Coleman, Syncsort’s director of software engineering, said:
More and more, we see the aggregation step being the critical performance issue in our customers’ data warehouse applications. The hardware capacity is typically available, provided that the software can fully exploit it. Our combination of proprietary aggregation algorithms and relentless pursuit of parallelization across multiple processors and multiple servers allows DMExpress to achieve the maximum from the hardware.
DMExpress™ is a high-performance data transformation product for UNIX, Linux, and Windows environments. It extracts data at very high speed from any source database or flat file, applies any kind of record level transformation and/or field level transformation, and then loads the data into any target database or flat file.