RESEARCH & RESOURCES

CASE STUDY - Clickstream Analytics in a Competitive World

By Pete Benesh, Product Marketing Manager, Syncsort

Clickstream analytics is a powerful tool for businesses with an online presence. For online retailers, however, it’s more than a tool—it’s a requirement. In this $60 billion market, one giant has combined extreme data integration with clickstream analytics, further securing its position at the top.

The Need

This leading online retailer (LOR) makes more than $1 billion in revenue annually through its online channels. To meet its revenue goals, the company must obtain new business partners interested in selling inventory through its site. LOR must also quickly transform clickstream data into analytics that support strategies for planning, budgeting, determining product selections and offerings, and developing campaigns aimed at increasing Web site traffic. Key to this is maintaining a rich data inventory of customer-centric information, transactions, and Web site activity.

Using Omniture’s SiteCatalyst to perform Web analytics, LOR can effectively report on Web site visits, page views, and conversions. But as with all SaaS Web analytics products, this page-centric reporting cannot segment Web site activity based on individual user sessions. This type of segmentation would allow LOR to connect multiple sessions over time to individual users, and link all of a user’s online activities with activities from various operational systems, including the customer support and provisioning systems.

Responding to this need, the company developed its own clickstream data warehouse to augment the reporting available with Omniture’s SiteCatalyst. Each day, Omniture delivers the previous day’s complete page-tag data to the customer. These daily Omniture files range between 18 and 30 GB and require cleansing and processing before they can be loaded into the company’s Teradata enterprise data warehouse (EDW).

The Challenge

Initially, LOR was using lengthy, complex, and custom shell scripts, AWK scripts, Perl scripts and UNIX utilities to parse, cleanse, and structure the Omniture data prior to loading into the EDW. Execution of these scripts caused performance problems.

In addition, the scripts were tedious and cumbersome to develop, modify, and maintain. The inflexible scripts prevented the company from enabling new business partners with unique business rules or processes in a timely manner. And because a single developer created all of the scripts—and was the only employee who could maintain or extend them—an unacceptable level of risk was created.

"With DMExpress, our business managers reliably receive their deliverables on a timely basis, and we are confident they’ll continue to do so well into the future regardless of data growth." —IT Supervisor, Leading Online Retailer (LOR)

With the added pressure of rising data growth, LOR recognized the need to improve its file-based processing methodologies. “We were processing 30 to 60 GB of raw clickstream data each day,” said an IT supervisor at LOR. “But our systems and processes were reaching their resource limits. Since our management was projecting significant yearover- year growth well into the next decade, we needed to shore up our ability to scale and grow.”

The Solution

LOR sought a product to rapidly transform raw clickstream data and prepare it for EDW deployment. The company participated in a Syncsort proof of concept (POC), which was hosted within LOR’s infrastructure and simulated actual production environments and data sets. Shortly after, LOR selected Syncsort’s DMExpress Clickstream Data Integration Solution to pull the Web analytics files, perform complex transformations, and quickly pass the data off to the Teradata load utility. The solution allowed LOR to replace its custom shell scripts and successfully decrease processing time by 65 percent. The business eliminated its reliance on a single developer for maintaining and extending the code, and improved its ability to incorporate new business partners.

The company affirms that its new development environment is more agile, and the clickstream processes now complete faster and more efficiently than before the implementation. More important, as the IT supervisor at LOR remarked, “With DMExpress, our business managers reliably receive their deliverables on a timely basis, and we are confident they’ll continue to do so well into the future regardless of data growth.”


For a free white paper on this topic from Syncsort, click here and choose the title “Addressing the Destructive Business Impact of Data Performance Problems: Nondisruptive Strategies for Eliminating Performance Problems in Existing Data Integration Environments.” For more free white papers, click here.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.