RESEARCH & RESOURCES

Page 4 of 4

The New Model of Retail Sales Analytics

If you want a textbook example of a big data problem, look at retail sales analytics.

Retailers were among the first companies to grasp the importance of data; Teradata has touted WalMart Inc. as a reference customer for more than a decade.

Retailers also pioneered the use of one of the most successful voluntary data collection programs in business -- the loyalty card -- that collect a staggering amount of information about what, when, and where we buy products and services. When paired with advanced analytic technologies, they can also disclose -- or suggest -- why we buy them.

Recently, business intelligence (BI) vendors have been paying more attention to loyalty cards. This year alone, for example, a pair of BI vendors (analytic database specialist Kognitio and BI veteran Information Builders Inc., or IBI) touted partnerships with vendors that provide loyalty card services for retailers.

It's part of a trend: from old-guard vendors such as SAP AG to avant-garde players like Tableau Software Inc., BI and analytic players are doubling down on retail sales analytics. It's part of a focus on retail, where big data problems abound, big data analytics can pay big dividends, and the perceived misuse of analytic technology (or its scarily prescient predictions) can pose big problems.

Loyalty cards used to be the primary source of data for retailers, but this data is being supplemented and enriched by information from other sources, such as social media. One upshot of this is that retail's New Model effectively inverts its Old Model.

"The Old Model attempted to extract individual buying behaviors or to infer buying trends by looking at an aggregate of all behaviors," says industry veteran Mark Madsen, a principal with information management consultancy Third Nature Inc. The New Model, he says, analyzes data about individual shopping behaviors and uses it both to identify trends and -- more significantly -- to extrapolate (so to speak) to an aggregate.

In other words, the New Model attempts to structure the shopping experience such that it comports with the preferences of customers. This means catering to majority preferences (i.e., the products or services favored by a majority of a store's customers) and micro-catering to address more specific (and, typically, more lucrative) customer preferences.

Another aspect of the New Model is the use of retail sales analytics to customize layouts and assortments across store locations. This means that store layouts and product assortments can vary considerably from one location to the next.

Some retailers (e.g., WalMart) might have only minor deviations in layout (with greater variation in assortment); others -- e.g., high-end supermarkets or department stores -- tend to have significant variation in layout and assortment across locations.

Madsen uses the example of UK retailer Tesco, which he says uses behavioral profiling to look at where its customers live, where they shop, and what they're shopping for. He points out that Tesco identifies its customers' shopping patterns and actually optimizes its local store assortments to reflect the preferences of customers in a given location or region.

"The way [a retailer such as] Tesco does is it is what you could call the 'New Model.' It's [a question of] analyzing shopping behaviors and trying to identify patterns or trends that [a retailer] can use to custom-tailor the shopping experience for customers," he says. "They don't even do it for all of their stores. They'll focus on a specific region, or stores in certain demographic areas. The way they reset their [store] assortments will differ across regions."

This is part of a trend that Madsen calls "micro-segmentation." Tesco's a supermarket chain, but micro-segmentation isn't just confined to supermarkets or to purveyors of consumer packaged goods (CPG). For example, micro-segmented "boutiques" are popping up in major department stores: at locations across the United States, Saks Fifth Avenue, Neiman Marcus, and other high-end retailers promote an in-store "boutique" experience that caters to certain customer segments or which promotes the products of specific designers.

Micro-segmentation goes this trend one better: a Neiman Marcus in Austin, Texas might unveil a boutique catering to a designer popular with customers in the Southwest region or just the Lone Star State; it might likewise use information it's collected about its customers to try to optimize the shopping experience.

"Optimization" in this sense means using details harvested from loyalty card programs, past customer interactions, social media, and other sources to personalize the shopping experience. Some stores attempt to tailor in-store amenities -- such as music or refreshments -- to the preferences of customers.

Loyalty card programs are just a part of this effort. Traditionally, the information collected from loyalty card programs permitted retailers to make strong inferences about a customer's socio-economic status, about life changes, or about behaviors, preferences, or characteristics (i.e., "affinities") customers might not even know they have. This information is being combined or blended with data from other sources, such as data gleaned from sales promotions, from customer relationship management systems, from social media, or from non-traditional sources such as geographic information systems (GIS) or census data.

A Data Management Headache?

The purpose of retail sales analytics at this level, says Madsen, is to profile, model, and understand customers. In the background, this can involve massive data integration and data management issues. Insights from sales analytics must be combined with and circulated back into BI systems -- e.g., marketing BI, sales BI, or customer service BI. All of this data must be blended together, but ingesting everything into a data warehouse isn't practicable, particularly in the case of semi-structured social media data.

Using Hadoop is more practicable, but it entails other problems -- chiefly, that of getting it back out again. Hadoop projects such as Hive -- an RDBMS-like layer for the Hadoop Distributed File System (HDFS) -- and Hcatalog (a metadata catalog service for HDFS data) suffer, respectively, from poor performance and immaturity. Hive is likewise handicapped by its SQL-like but non-SQL query language, HQL. Human beings can quickly learn HQL; many tools (automated and manual) don't yet support it, however.

A more expedient approach involves what might be called "collating" data from dispersed sources. This can mean using Hadoop as a landing zone for semi-structured or non-relational data -- the purpose for which it was first designed -- and maintaining the data warehouse as a repository for structured data. This can also mean extracting information from live operational systems, operational data stores, or other repositories.

There are different ways of collating data. A classic approach involves using data federation technology; today, several BI and analytic tools implement a data virtualization (DV) layer, which (like federation) is an attempt to create a single, logical view of data. DV requires considerable upfront work, however; its business views (i.e., canonical representations of data) must first be codified; DV views must also be maintained over time.

Other tools support what's called "data blending," which is a scheme for combining information from multiple sources. Proponents say data can effectively be "blended" on an as-needed basis. At a basic level, "data blending" is a kind of on-demand ELT: it involves the extraction and loading of source data into a destination system -- typically, a client tool or analytic discovery product -- where it's transformed or manipulated.

Some tools claim to offer robust data blending capabilities; Tableau, for example, makes "data blending" an explicit part of its marketing; it argues that its in-memory-like model (i.e., the Tableau engine loads data sets into and runs them out of physical memory) particularly lends itself to data blending, especially for the interactive use cases that are typical of analytic discovery. Tableau was among the first vendors to introduce a connector for HCatalog, which permits it to extract data from Hadoop and HDFS; Tableau also markets optimized connectors Oracle and SQL Server, as well as a connector for SAP HANA.

In data blending, as with data integration of any kind, connectivity is key. Most tools use ODBC and JDBC to get at relational data; others (such as Tableau) offer a range of DBMS-specific, application-specific or use-case-specific adapters.

Industry luminary Colin White, president and founder of BI Research, describes data blending as a means to provide "fast, easy, and interactive" access to data.

"I think of data blending as the ability to quickly and interactively access multiple sources spread across multiple systems. The results are then blended or mashed together [and are] ready for analysis. In some cases, the retrieved data is always cached in memory to improve the performance of interactive processing," White writes.

"Other products support both caching and live data access to avoid the constraints imposed by trying to fit all of the result data into memory," he adds, concluding: "Care is required then when enabling data access in ... client-based products to avoid such performance issues."

About the Author


Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at [email protected].

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.