TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - The State of Self-Service Analytics: Results from TDWI’s Latest Research September 8, 2025
  - Expert Panel: Building an AI-Driven Data Strategy September 15, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Making Real-Time Analytics a Reality

In-memory analytics processing and access to live data is finally achievable and will make active live analytics a real possibility, and that opens up a new area of information processing and exchange.

July 15, 2014

By MC Brown, Senior Information Architect, Continuent

Analytics is evolving at an alarming rate. For many years, analytics has largely been a one-way transfer of information, feeding the data loaded into an existing, often transactional, database and then using ETL to transfer information into an analytics environment so that it can be processed and analyzed. When data volumes were low and the time required (and expected) for the data processing was measured in weeks or even days, these methods were more than adequate.

Times are changing.

Traditional ETL offers some specific advantages for the transfer of information, and that means particularly for the "T" in the acronym (for transformation). ETL tools extract full datasets from the database and then offer basic transformation, which may cover the gamut from simple sums and totals to complete translation into different formats or compound documents. However, as the power of data warehouse engines increases, particularly when looking at clustered solutions (such as Vertica or Hadoop), it is becoming easier and more practical just to copy all information and perform the transformation within the data warehouse. There are two drivers behind this process. First SQL-based tools and environments such as Hive and Impala have improved considerably, which provides an SQL interface layer onto MapReduce and replicated storage of Hadoop. Second, live transactional databases have grown to a size where they have often outgrown a single data server. Instead, data is now loaded into the analytics engine from multiple database servers based on sharding (or other splitting) mechanisms and then combined within the analytics engine.

This use of sharded information often makes loading from ETL environments more difficult because the data may be sharded at an application level, not a database level. This makes it more difficult to translate the configuration into that required by the ETL application to export, transform, and load the data into Hadoop. Furthermore, the complex sharding and separation of information may be costly to import and merge during the ETL phase.

The result is that it is now easier to directly transfer raw transactional table data into your data warehouse, including Hadoop, and perform whatever transformations are required within the target analytics environment. That process can take into account the merging of data from multiple sources, and through the SQL interface, perform the required join process to combine the data into a format suitable for analytics. Using suitable workflow management, the entire process can be automated and simplified.

To put it simply, Hadoop now can take raw transactional data, execute SQL statements to build a suitable view of the data, and provide analytics output. The use of raw transactional tables within the data warehouse also opens up the possibility of further queries and analytics based on the raw data without having to repeat an explicit load. For example, when looking at sales information, if you have the raw tables for prices, sales, and receipts, you can perform analytics across all three within the more powerful Hadoop environment from a single load.

Returning to the original issue, to achieve this loading of transactional data into Hadoop requires a different approach than the intermittent ETL process. With the required turnaround times for modern analytics environments, data needs to be constantly loaded into Hadoop so that analytics can be performed daily, even every hour, on the incoming transactional data.

One solution is to use technology that extracts information from the transactional logs of modern databases and provides a list of changes that can be replicated into Hadoop. Log information can then be translated from the transactional format into carbon-copy tables that match the original tables from the transactional database. Within a sharded environment, data can be replicated into Hadoop from each shard and then merged and compounded into a single table to be used within the analytics environment.

There are a number of benefits to using the log extraction method. One is that it is an external, lightweight process to extract the data; it does not require an active connection into the database to make the changes. Because the database engine automatically generates the logs, the information is accessible without using the query engine on the database. More important, particularly for a transactional database, extracting the data in this method does not populate the RDBMS query and data caches, which may be detrimental to the performance of the source database.

There is another advantage of this approach to loading the transactional data, and that is we also gain the history of the changes to that information. Transactional databases hold the "current" information, but by processing the log of changes, the individual data points that led to the "current" value can be tracked and identified. This gives a very powerful mechanism within the analytical process, allowing both the current and historical values to be analyzed. Furthermore, we get this information for free from the replicated data rather than having to regularly extract and load the information into the data warehouse, which is both expensive and may miss changes to frequently updated tables.

The resulting process of loading information into your data warehouse also enables more frequent analytics and improves turnaround time from weeks or days to enable a live analytics view of the data. This will enable the next leap in data analytics to be achieved more easily. With the advent of in-memory analytics processing, such as Storm and Spark, access to live data -- rather than the information loaded days or even weeks beforehand -- is finally achievable. This, in turn, makes active live analytics a real possibility, and that opens up a new area of information processing and exchange.

MC Brown is a senior information architect at Continuent, a provider of open source database clustering and replication solutions. To learn more, contact Continuent at [email protected] or visit http://www.continuent.com.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Making Real-Time Analytics a Reality

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research