TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - Expert Panel: Building an AI-Driven Data Strategy September 15, 2025
  - Why Enterprises Aren’t Ready for AI—And How to Fix It September 18, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

The New Model of Retail Sales Analytics

The New Model of retail sales analytics aims to optimize and personalize the retail shopping experience for customers by integrating non-traditional data sources.

By Stephen Swoyer
June 4, 2013

If you want a textbook example of a big data problem, look at retail sales analytics.

Retailers were among the first companies to grasp the importance of data; Teradata has touted WalMart Inc. as a reference customer for more than a decade.

Retailers also pioneered the use of one of the most successful voluntary data collection programs in business -- the loyalty card -- that collect a staggering amount of information about what, when, and where we buy products and services. When paired with advanced analytic technologies, they can also disclose -- or suggest -- why we buy them.

Recently, business intelligence (BI) vendors have been paying more attention to loyalty cards. This year alone, for example, a pair of BI vendors (analytic database specialist Kognitio and BI veteran Information Builders Inc., or IBI) touted partnerships with vendors that provide loyalty card services for retailers.

It's part of a trend: from old-guard vendors such as SAP AG to avant-garde players like Tableau Software Inc., BI and analytic players are doubling down on retail sales analytics. It's part of a focus on retail, where big data problems abound, big data analytics can pay big dividends, and the perceived misuse of analytic technology (or its scarily prescient predictions) can pose big problems.

Loyalty cards used to be the primary source of data for retailers, but this data is being supplemented and enriched by information from other sources, such as social media. One upshot of this is that retail's New Model effectively inverts its Old Model.

"The Old Model attempted to extract individual buying behaviors or to infer buying trends by looking at an aggregate of all behaviors," says industry veteran Mark Madsen, a principal with information management consultancy Third Nature Inc. The New Model, he says, analyzes data about individual shopping behaviors and uses it both to identify trends and -- more significantly -- to extrapolate (so to speak) to an aggregate.

In other words, the New Model attempts to structure the shopping experience such that it comports with the preferences of customers. This means catering to majority preferences (i.e., the products or services favored by a majority of a store's customers) and micro-catering to address more specific (and, typically, more lucrative) customer preferences.

Another aspect of the New Model is the use of retail sales analytics to customize layouts and assortments across store locations. This means that store layouts and product assortments can vary considerably from one location to the next.

Some retailers (e.g., WalMart) might have only minor deviations in layout (with greater variation in assortment); others -- e.g., high-end supermarkets or department stores -- tend to have significant variation in layout and assortment across locations.

Madsen uses the example of UK retailer Tesco, which he says uses behavioral profiling to look at where its customers live, where they shop, and what they're shopping for. He points out that Tesco identifies its customers' shopping patterns and actually optimizes its local store assortments to reflect the preferences of customers in a given location or region.

"The way [a retailer such as] Tesco does is it is what you could call the 'New Model.' It's [a question of] analyzing shopping behaviors and trying to identify patterns or trends that [a retailer] can use to custom-tailor the shopping experience for customers," he says. "They don't even do it for all of their stores. They'll focus on a specific region, or stores in certain demographic areas. The way they reset their [store] assortments will differ across regions."

This is part of a trend that Madsen calls "micro-segmentation." Tesco's a supermarket chain, but micro-segmentation isn't just confined to supermarkets or to purveyors of consumer packaged goods (CPG). For example, micro-segmented "boutiques" are popping up in major department stores: at locations across the United States, Saks Fifth Avenue, Neiman Marcus, and other high-end retailers promote an in-store "boutique" experience that caters to certain customer segments or which promotes the products of specific designers.

Micro-segmentation goes this trend one better: a Neiman Marcus in Austin, Texas might unveil a boutique catering to a designer popular with customers in the Southwest region or just the Lone Star State; it might likewise use information it's collected about its customers to try to optimize the shopping experience.

"Optimization" in this sense means using details harvested from loyalty card programs, past customer interactions, social media, and other sources to personalize the shopping experience. Some stores attempt to tailor in-store amenities -- such as music or refreshments -- to the preferences of customers.

Loyalty card programs are just a part of this effort. Traditionally, the information collected from loyalty card programs permitted retailers to make strong inferences about a customer's socio-economic status, about life changes, or about behaviors, preferences, or characteristics (i.e., "affinities") customers might not even know they have. This information is being combined or blended with data from other sources, such as data gleaned from sales promotions, from customer relationship management systems, from social media, or from non-traditional sources such as geographic information systems (GIS) or census data.

A Data Management Headache?

The purpose of retail sales analytics at this level, says Madsen, is to profile, model, and understand customers. In the background, this can involve massive data integration and data management issues. Insights from sales analytics must be combined with and circulated back into BI systems -- e.g., marketing BI, sales BI, or customer service BI. All of this data must be blended together, but ingesting everything into a data warehouse isn't practicable, particularly in the case of semi-structured social media data.

Using Hadoop is more practicable, but it entails other problems -- chiefly, that of getting it back out again. Hadoop projects such as Hive -- an RDBMS-like layer for the Hadoop Distributed File System (HDFS) -- and Hcatalog (a metadata catalog service for HDFS data) suffer, respectively, from poor performance and immaturity. Hive is likewise handicapped by its SQL-like but non-SQL query language, HQL. Human beings can quickly learn HQL; many tools (automated and manual) don't yet support it, however.

A more expedient approach involves what might be called "collating" data from dispersed sources. This can mean using Hadoop as a landing zone for semi-structured or non-relational data -- the purpose for which it was first designed -- and maintaining the data warehouse as a repository for structured data. This can also mean extracting information from live operational systems, operational data stores, or other repositories.

There are different ways of collating data. A classic approach involves using data federation technology; today, several BI and analytic tools implement a data virtualization (DV) layer, which (like federation) is an attempt to create a single, logical view of data. DV requires considerable upfront work, however; its business views (i.e., canonical representations of data) must first be codified; DV views must also be maintained over time.

Other tools support what's called "data blending," which is a scheme for combining information from multiple sources. Proponents say data can effectively be "blended" on an as-needed basis. At a basic level, "data blending" is a kind of on-demand ELT: it involves the extraction and loading of source data into a destination system -- typically, a client tool or analytic discovery product -- where it's transformed or manipulated.

Some tools claim to offer robust data blending capabilities; Tableau, for example, makes "data blending" an explicit part of its marketing; it argues that its in-memory-like model (i.e., the Tableau engine loads data sets into and runs them out of physical memory) particularly lends itself to data blending, especially for the interactive use cases that are typical of analytic discovery. Tableau was among the first vendors to introduce a connector for HCatalog, which permits it to extract data from Hadoop and HDFS; Tableau also markets optimized connectors Oracle and SQL Server, as well as a connector for SAP HANA.

In data blending, as with data integration of any kind, connectivity is key. Most tools use ODBC and JDBC to get at relational data; others (such as Tableau) offer a range of DBMS-specific, application-specific or use-case-specific adapters.

Industry luminary Colin White, president and founder of BI Research, describes data blending as a means to provide "fast, easy, and interactive" access to data.

"I think of data blending as the ability to quickly and interactively access multiple sources spread across multiple systems. The results are then blended or mashed together [and are] ready for analysis. In some cases, the retrieved data is always cached in memory to improve the performance of interactive processing," White writes.

"Other products support both caching and live data access to avoid the constraints imposed by trying to fit all of the result data into memory," he adds, concluding: "Care is required then when enabling data access in ... client-based products to avoid such performance issues."

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

The New Model of Retail Sales Analytics

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research