Q&A: Cloud Service Provider Provides Quick Big Data Solution
A global retailer with growing big data needs turns to a cloud-based managed service provider.
- By Linda L. Briggs
- August 12, 2014
Faced with quantities of incoming clickstream data that were too large and rapid for data warehousing, global retailer Muji turned to a cloud-based managed service provider to handle its big data needs.
Muji is a Japan-based company with a focus on quality and minimalism that resonates with consumers. According to a recent interview in the Los Angeles Times with the company's president, Masaaki Kanai, Muji focuses on product quality. "From a business standpoint, you would think that customers buying something new every six months is a good idea, but as the human population is heading to 10 billion, we need to stop and think calmly about what things we really need as an operating system for life. At Muji, we have a fixed catalog of basic products and keep selling the same designs for a long time. So if a customer breaks the lid on their teapot, we can sell them a new one."
The company has over 600 stores across Asia, the U.S., and Europe, and is operated by Tokyo-based Ryohin Keikaku Co., Ltd.
In this interview with BI This Week, Takashi Okutani, the general manager of the Web business division at Muji, talks about the creation of a data warehouse, along with deployment of a cloud-based solution from Treasure Data to handle its big data. Okutani has worked for Muji for over a decade in a variety of positions, from store manager to retail category manager.
As Muji discovered, Treasure Data can complement the data warehouse and leverage an existing SQL system, avoiding overwhelming the data warehouse with big data.
BI This Week: Muji is a retail brand based in Japan with stores around the world. Can you give some more facts about the company in terms of size and scope?
Takashi Okutani: Muji was established in 1980 with a "no-brand" ethos, meaning that we spend little on advertising and classical marketing and pursue an ethos of simplicity. (In fact, Muji means "no label" in English.) Although there is definitely a distinct Muji brand in terms of look and feel, Muji products are not branded. We attribute our success instead to word of mouth and a simple shopping experience.
We drew upon experience with in-house product development at Seiyu, where Ryohin Keikaku first began as a store brand. The basis for our product development is to take products that are truly necessary for daily life and create them in their most essential form.
For that reason, we need to review our raw material choices carefully, curtail labor within the production process whenever possible, and use simple packaging. This policy fits the aesthetic sensibility of the age, and our simple yet beautiful products are well-loved by our customers.
We have 385 domestic stores, 269 of which are directly owned. Overseas, Muji has 255 stores -- eight of them in the U.S., nearly 40 in the UK and Europe, and 100 in China. Annual operating revenue for its last fiscal year, which ended in February 2014, was projected to be US $175.8 billion.
What data challenge was the company facing that led it to consider the cloud data service provider Treasure Data?
We were working on development of a new loyalty app called "Muji Passport," which links three data sources: Muji's online store, its credit card, and data from our main website and various landing pages. The Muji Passport project provides for the creation of data that can be analyzed across sections. We needed an efficient solution to analyze the clickstream data and provide targeted promotions in our Passport loyalty app. We looked specifically for a data management platform with data accumulation analysis capabilities, but could not find a product that fit our needs, so we decided to construct our own.
At the beginning, there was no guarantee that we would see results, so one thing we needed was an elastic analytic system infrastructure that would allow development from a small start. We decided on a combination of Amazon's cloud data warehouse solution, Redshift, plus Tableau to view ad hoc data.
To take in unstructured data produced by the website and app, we chose Treasure Data, a cost-effective solution that allowed us to deploy our program quickly and to leverage the resources and systems we already had. At the same time, it allowed us to capture and process new types of big data that were critical to our new programs.
What exactly does the Treasure Data app do? How were those tasks performed before?
Input from raw data logs produced by the website and app are delivered to Treasure Data by way of Adobe's analytics software, SiteCatalyst. We are using SQL-like HiveQL to convert cookie IDs in the raw logs to customer IDs, and to input the required columns, with the required data granularity, to Redshift. We are then using Tableau to match up Web logs and app logs with purchase data and so forth for analysis.
Prior to the Treasure Data implementation, we were using SiteCatalyst, but we could only analyze customer behavior data from our website. Now, we are analyzing behavior data from the website, from our online credit card, and from customer loyalty programs.
What size and volume of data was involved for this project? How much data is being processed by Muji Passport?
Using Treasure Data, Muji processes logs for behavioral analysis on 4.3 million registered Web users and 1.4 million Muji Passport mobile app users. We then aggregate and export a subset of the clickstream data, integrating it with other data sources in Amazon Redshift.
Treasure Data handles roughly 2 million entries per day, or 900 million entries per year for us. Those numbers take into account both Web logs and app logs.
What is the existing data warehouse structure at Muji? Were any changes made to the data warehouse (for example, cleaning up the data or adding new data sources) for the Treasure Data deployment?
Because we did not have an existing data warehouse, we constructed one with Redshift.
What was it about a cloud-based service that appealed to Muji? What features in particular made Treasure Data the right solution?
We had more than 400 columns in the Web logs and app logs that we were acquiring via SiteCatalyst. Treasure Data was appealing because it provided a schema-less system with flexible column definitions, which meant we did not need to define data structures ahead of time.
With Treasure Data, we've found that our software engineers can manipulate data in a SQL-like way but without being Hadoop engineers. That makes mastery of Treasure Data relatively simple. It's not necessary to hire a full-time Hadoop engineer, which makes it a great choice for retailers.
Regarding record units, we need to decode each cookie ID to a customer ID, so an architecture able to handle distributed processing of vast amounts of data was optimal.
Because there were no startup costs with Treasure Data -- they use a monthly payment system and we were able to set up a cost plan per department -- the decision to implement was easy.
What did the implementation look like? How long did it take and what was involved?
The proof of concept took one staff member one week. Two weeks of development time were required to store data originating from the SiteCatalyst daily log files at Treasure Data, and to create a batch program to decode CookieIDs in Redshift and input only the necessary columns.
What kinds of skills were required of the Muji team during the rollout? What skills are needed going forward, to maintain the app?
We need these skills: knowledge of how distributed processing is working, the differences between what Treasure Data versus Redshift are handling, a working knowledge of SQL, and knowledge of batch processing descriptors such as bash.
The skills to maintain Hadoop, as well as knowledge of the Java programming language, aren't required. Treasure Data takes care of that.
Who are typical users of the new app? What kind of training have they required?
We have one data scientist and one data warehousing engineer who provides that scientist with data. In an initial face-to-face meeting with Treasure Data, the data warehouse engineer learned how to use the necessary commands; after that, he simply used online documentation. The engineer already had knowledge of batch processing and Linux, so little training was required. He received only a few hours of face-to-face instruction from a Treasure Data representative. This was an efficient way to grasp an outline of the app, but really, the online documentation alone could have sufficed.
What are future plans for use of Treasure Data at Muji?
We are anticipating services that go beyond the execution of simply queries, such as path analysis and machine learning.