TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - The State of Self-Service Analytics: Results from TDWI’s Latest Research September 8, 2025
  - Expert Panel: Building an AI-Driven Data Strategy September 15, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

A Framework for Understanding the Big Data Revolution

This Information Life Cycle Framework helps you navigate the big data technology landscape and see big data in light of a complete, end-to-end solution that can tackle the entire information life cycle.

October 9, 2012

By Mehul Shah and Sachin Sinha

We are in the midst of big data revolution that holds significant potential in disrupting most of the industries and sectors. The 3Vs of big data have been touted many times at conferences and in journals, blogs, and periodicals. There are some early adopters and winners mostly in the Internet industry who have been successful in leveraging the big data for tangible top- and bottom-line growth. Apart from these elite few, most organizations are still in their big data infancy. There are multiple challenges that must be addressed before big data analytics goes mainstream.

The big data boom has spawned a variety of technology companies eager to help realize the dream. The existing incumbents have bolstered their product portfolio and revamped their marketing machine around big data. There are also quite a few "disruptors" who have emerged using open source frameworks and/or as spin offs from big Internet companies. The current buzz about big data has led to a plethora of vendors, each one claiming their technology to be the best at lowering the total cost of ownership (TCO).

With all these tools at your disposal, you might think it is easy to get started with your first project. The reality, however, is that most people still find it difficult to start a big data analytics project. Impediments and challenges include:

Lack of a single vendor technology stack that handles big data from cradle to grave
Disparate tools and a lack of integration of tools from different vendor
Differing approaches and complex architectural choices
Complicated licensing in an open source/subscription model resulting in big costs (or at least the fear of them)
Lack of the right skills and clarity about the responsibility within the team or organization
An inability to identify a use case that can provide a quick "win" to gain broader executive buy-in
Lack of a robust data governance and quality framework to handle both traditional and big data

Information Life Cycle Framework for the Big Data Industry Landscape

To provide clarity and direction about some of these challenges, we have created an Information Life Cycle Framework for the big data landscape. We'll first define the framework, then discuss how it addresses some of the challenges we've described.

The Information Life Cycle Framework is comprised of two information life cycle categories: Incumbents (traditional players) and Disruptors (newcomers). We further subdivided each life cycle category into three traditional information life cycle stages: data acquisition (ETL, ELT, etc), data storage and warehousing (DBMSes -- columnar, row, MPP, data warehouse optimized, hybrid, Hadoop-based), and data analysis (business intelligence and advanced analytics).

	Data Acquisition	Data Storage and Warehousing	Data Analysis
*Incumbents*	Informatica Abinitio IBM - Infosphere Microsoft Oracle SAP SAS – Dataflux	Teradata - AsterData IBM - Netezza Oracle Exalytics SAP - HANA , Sybase IQ EMC - GreenPlum Microsoft HP- Vertica	Microstrategy IBM - Cognos , SPSS SAP - BOBJ Oracle - BI SAS Microsoft
*Disruptors*	Talend Pentaho Cloudera (Sqoop , Flume) Scribe (developed at Facebook) Syncsort	Cloudera Hortonworks MapR Amazon (Elastic MapReduce) MongoDB Cassandra ParAccel	DataMeer Splunk Revolution R Karmasphere Alpine Miner Automated Insights Tableau Jaspersoft Tibco Spotfire Clarabridge

The "Incumbents" comprise the old guards of the data industry; this includes pure-play vendors such as Informatica, Microstrategy, and Teradata who operate in one segment as well as the Big 4 (IBM, Oracle, SAP, and Microsoft) who tend to operate in all the three segments, claiming to provide a complete stack. The vendors in the "Incumbents" category have provided solutions to handle data for some time focused primarily on the two of three big data Vs -- volume and velocity. These vendors are now extending their support to handle the third V -- variety. Most of the Incumbents have products rooted in RDBMS arena and with MPP, columnar storage, and compression, they tend to support structured analytics and reporting on large volumes of data.

The "Disruptors" include up-and-coming vendors and startups that are based on a new paradigm of handling big data. The majority of vendors are basing their products on Hadoop/MapReduce ecosystem which is undergoing rapid evolution and emerging as a primary platform for big data. The Disruptors category also encompasses pure-play NoSQL database vendors such as MongoDB and Cassandra which tend to fill the void on the lighter areas of handling variety and large volumes. The new age tools do support both variety and volume but are still largely evolving on the velocity front for providing real-time analytics.

Benefits of the Information Life Cycle Framework

The Information Life Cycle Framework highlights that none of the upcoming vendors provides an end-to-end solution for handling and leveraging big data. You need to ensure that you pick the right set of vendors who can operate across the entire information life cycle.

For example, by procuring Cloudera's Hadoop distribution, you take care of the big data storage and processing needs but you still need analytics/BI to run on top to provide the real business benefit. Also, the right set of vendors you select should be able to interoperate together. For example, the analytics solution you procure from Revolution R should integrate well and harness the full power of Hadoop-based data storage appliance.

You might already have made significant investments in the technologies from incumbents. For example, you might be an SAP or an Oracle shop and want to build something on top of these existing investments. As depicted in the Framework, incumbents have extended their product lines to provide support for handling big data. It is smart, then, to investigate your current vendors before procuring the new disruptive technology. For example, if you currently leverage an Oracle stack, you might want to consider Oracle Exadata and Exalytics for further expansion of your big data program needs.

As we've explained, the Incumbents and Disruptors have a different architectural approach to handle big data. You might want to dive deeper into your unique business needs to decide which one fits better. Not all the three Vs of big data may be equally important in your environment. A majority of the enterprises in non-Internet-based industries still collect large volumes of transaction data that can come from both internal and external sources. These companies should look at tuning their existing architectures to handle and analyze large volumes of fairly structured data. They might only have a small percentage of their data in multi-structured format (Twitter feeds and Facebook interaction data) that can be handled in cloud via MicroStrategy or Clarabridge to perform the textual analytics and provide structured data feedback to be included with the rest of the data in-house for holistic analysis. If your needs are greater, you can consider SPSS, SAS, or R.

Summary

The Information Life Cycle Framework helps you navigate the big data technology landscape and see big data in light of a complete, end-to-end solution that can tackle the entire information life cycle. It helps you compare apples to apples (comparing MapR with Cloudera, for example), not apples to oranges (comparing Cloudera with SAS).

To get started with big data, you need to evaluate the current architecture and identify the gaps as well as put together a future target state. This framework can aid you with this exercise.

Mehul Shah is a senior manager focusing on information management and data governance for a top 10 financial services company. He is an accomplished IT manager with over 12 years of experience in information management and managing large, complex programs and projects related to enterprise wide business intelligence and data warehouse implementations, architecting and building dashboards and BI applications, and work with cross-functional teams. Mehul has an MBA in marketing and analytics and MS in Computer Science from University of Maryland and is also PMP Certified practitioner. You can contact the author at [email protected].

Sachin Sinha is director of business intelligence and analytics at ThrivOn where he is responsible for designing innovative architectures, developing methodologies, and delivering of solutions in analytics, business intelligence, and data warehousing that helps clients realize maximum value from their data assets. For over a decade, Mr. Sinha has designed, architected, and delivered data integration, data warehousing, analytics, and business intelligence solutions. Specializing in information management, Mr. Sinha's domestic and international consulting portfolio includes organizations in the financial services, insurance, health-care, pharmaceutical, and energy industries. You can contact the author at [email protected].

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

A Framework for Understanding the Big Data Revolution

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research