TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
  - From Silos to Insights: Centralizing Data to Drive AI July 24, 2025
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Executive Summit AI Accelerate 2025, Brought to You by AI Boadroom & TDWI August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Treasure Data: Not Your Typical Take on Hadoop or Data Warehouses

Think of SaaS newcover Treasure Data as a kind of Hadoop-as-a-service offering, albeit one with an emphasis on data warehousing workloads.

By Stephen Swoyer
November 5, 2013

Think of SaaS newcomer Treasure Data Inc. as a kind of Hadoop-as-a-service offering, albeit one with an emphasis on data warehousing (DW) workloads.

Treasure Data's SaaS offering mixes open source software (OSS) technologies (including Hadoop's MapReduce compute engine) with some special sauce of its own.

In place of the Hadoop Distributed File System (HDFS), for example, Treasure Data substitutes its own object-based columnar file system, which it calls "Plazma." Plazma delaminates MapReduce from HDFS, Hadoop's baked-in storage layer. Treasure Data claims that Plazma's columnar object storage format is better suited for analytical queries, which enables it to achieve superior I/O performance.

In addition, argues CTO and co-founder Kaz Ohta, Plazma itself incorporates I/O optimizations such as parallel pre-fetch and background decompression. Ohta says Plazma is likewise a better fit for fluentd, the OSS technology Treasure Data uses to power its td-agent software. Td-agent is a tool that collects, transforms, and repackages relational data in JSON format before replicating it to the Treasure Data service, which uses Amazon S3 for storage. Once in the cloud, Plasma extracts and saves the row-based JSON files as columnar objects.

In place of Hive, Hadoop's MapReduce-powered query engine, Treasure Data uses another OSS offering: Impala, which was largely developed by Cloudera Inc. This gives it what Ohta calls a "responsive" interactive query facility, which he distinguishes from the batch-centric Hive.

"Hive is more robust [than Impala] in terms of batch processing. If you have a nightly batch, it takes five hours or six hours, and if you run [with that much utilization for] six hours, there's a high probability of node failure. Hive is more tolerant of that [kind of failure] than Impala is," says Ohta, "but Impala supports the interactive query."

Industry veteran Rich Ghiossi, who signed on with Treasure Data as its vice president of marketing in July, says Treasure Data addresses a pair of use cases: first, greenfield Hadoop adopters -- for which its SaaS model is ideal, Ghiossi argues -- and second, underperforming, problematic, or floundering Hadoop deployments. With respect to greenfield Hadoopers, Ghiossi says most of them aren't interested in the technicalities of HDFS or MapReduce, let alone the phenomenon of Hadoop itself. They have untraditional problems that they're trying to solve.

"The people who are coming up to speak with us don't care about that," he says. "Their motto is, if you have a method and mechanism to get data into the cloud and you're going to charge a relatively inexpensive fee compared to doing it onsite, and if you're going to allow me to tie Tableau or another reporting environment to it, why should I care how it works?'"

With regard to existing Hadoop deployments, Ghiossi argues, Treasure Data principals Ohta and Hiro Yoshikawa -- its co-founder and CEO -- have been working with Hadoop for more than half a decade. Both are self-described OSS advocates: Ohta helped found Japan's Hadoop User Group, which claims to be the world's largest; Yoshikawa worked for Red Hat Inc. for almost six years.

Nevertheless, Ghiossi says, Ohta and Yoshikawa recognize that vanilla Hadoop has a variety of shortcomings, especially from a data management perspective; they founded Treasure Data specifically to address these. "What they saw was that Hadoop was just way too difficult for most people to successfully deploy because of all of the expertise required," he explains.

"That's why [Yoshikawa] said 'Let's build this company around a service, but let's hide all of the complexity [involved] in making it work.' Their idea was to make [Hadoop] really work like MPP [i.e., a data warehousing platform], make it easily scalable, and make it truly multi-tenant. They figured there would be lot of frustration out there [among Hadoop adopters]."

Viki, a video streaming website based in Singapore, was one such frustrated Hadooper. Jason Grendus, director of analytics for Viki, describes his company as a kind of "Asia-Pacific Hulu."

When Grendus came onboard at Viki, he says he inherited a Hadoop project that was going nowhere. "When I came in, the preexisting team had built its own Hadoop cluster. I was brought in because I had a background in analytics and [because] they were getting some of what they needed [from Hadoop], but not to the extent that they wanted. They were having problems with instability in reporting -- [with] inconsistency in reporting. The numbers were unreliable," says Grendus.

"I found Treasure Data because we were already using fluentd as the end point for [data] collection. Treasure Data offered us a free trial where we could try them out, and at that point, I was just trying to get things working. So I figured, 'Why not give it a try?' I started using Treasure Data more and more and more because it was reliable, and [Viki's existing Hadoop solution] wasn't. For the first time, I was getting consistent numbers out of it."

Viki continuously tracks which videos are watched; when a session begins, its country of origin, and when it ends; which ads run, and so on. Grendus says it uses Treasure Data-powered MapReduce to extract all of this information from system and event logs. "If we tried to take all of our logging data and put it in a database directly rather than pre-formatting it and summarizing it -- it just wouldn't be economical to do [on a conventional RDBMS]," he explains.

He particularly praises Treasure Data's capacity licensing scheme, which exploits multi-tenancy and time-zone differences to offer free capacity headroom. (For example, Ohta says that Treasure Data co-locates Japanese and EU customers on the same Hadoop cluster. When one region is at peak, the other is off-peak.)

"One thing we really like is that we have a dedicated number of cores -- [this means] a guaranteed number of cores that [Treasure Data gives] us. This is a measure of how many processors you have for Hadoop [workloads]. Right now we're on 24, but it can scale up to four times this number because of the way they balance their [Hadoop] clusters," he comments.

"They guarantee you a minimum, but they're also able to give you excess [capacity] when you need it -- up to a certain point -- at no additional charge."

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Treasure Data: Not Your Typical Take on Hadoop or Data Warehouses

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research