TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
  - Redefining Clinical Operations with Agentic AI: Accelerating Innovation Across Data Management and Site Monitoring July 30, 2025
  - Smarter Marketing in Retail: How AI and Modern Data Foundation Drive Growth July 31, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Platforms & Architecture Week July 25, 2025
  - AI Bootcamp Week July 25, 2025
  - Data Governance Week July 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Beyond ETL: On Demand ETL and the Big Shift in BI and Analysis

An emerging trend -- "on-demand" ETL -- augurs a big shift in the way analysis and BI are performed and results disseminated.

By Stephen Swoyer
April 2, 2013

ETL's batch-based conceptual model lends itself to a very different kind of timeliness, what many call "on-demand" ETL, where ETL jobs are triggered as needed.

This isn't real-time ETL -- not as marketed by real-time specialists such as IBM Corp. and Informatica Corp., for example -- but it does have some real-time characteristics.

In the on-demand model, a user or application might not be consuming time-sensitive information from operational systems in real time, but nonetheless is triggering an ETL process in real time. For some industry veterans, it augurs a big shift in how analysis and, with it, business intelligence (BI) are performed and results disseminated.

Beyond ETL: a Quick Primer

Over the last five years, there's been a push to recast ETL as a more agile or iterative technology -- or to eliminate it altogether.

Data warehousing specialist Kalido, for example, has spent time with the latter project. It uses an ELT-like technology (its Unified Load Controller) that it says can be used to bypass traditional ETL. ULC can "land" data into the Kalido Information Engine, where it can be conformed to DW standards. From Kalido's perspective, ETL -- with its protracted development process and batch-based underpinnings -- is the problem.

Competitor WhereScape Inc. comes at the same problem from an ETL-centric perspective. It touts an iterative approach to ETL that emphasizes rapid development, testing, and optimization. WhereScape concedes that the latency and inertia associated with DW development and management are problematic, but it doesn't understand either to be intrinsic (as problems) to ETL itself.

Rather, WhereScape claims, the problem lies with the process or methodology by which data warehouses are developed and managed.

Even though WhereScape's approach is predicated on a build-in-advance model, its focus on highly-iterative ETL -- on, in effect, rapid ETL development (from which the name of its flagship product, RED, derives) -- that gets close to a kind of on-demand ETL. It's likewise consistent with an analytic discovery model that emphasizes access to information, regardless of its cleanliness, consistency, or standardization. Both WhereScape's and Kalido's schemes promise to reduce ETL batch windows, automate (or in Kalido's case, eliminate) common ETL tasks, and accelerate access.

On-Demand ETL

The on-demand ETL visions touted by ParAccel Inc. (On Demand Integration, or ODI) and Pentaho Inc. go further.

Dave Henry, senior vice president of engineering with Pentaho Inc., describes a big data use case that involves blending data from Hadoop with data from operational sources, which he calls "on demand" ETL.

"If you can think about [a scenario in which you're] doing a query against Hadoop and getting some data out of it -- maybe you're going through Hive, maybe you're reading Hadoop files or supporting something like Impala from Cloudera -- you're going to get a stream of data out of that [query]. As that data comes out, you'd like to do look ups [i.e., comparing or blending it] against your operational data. It's a kind of on-the-fly enrichment."

Pentaho markets an ETL tool in Pentaho Data Integration (PDI), which is based on the Kettle open source software (OSS) ETL project. Nevertheless, Pentaho conceives of ETL as a prerequisite for discovery and analysis: its focus isn't on PDI as a general-purpose ETL product but on PDI as an enabling technology for Pentaho Analysis.

It's a distinction worth emphasizing: ETL is a product of the data warehouse-driven BI model. It's an ingestion technology; its purpose is to transform data such that it conforms to the requirements of the DW. As a result, the ETL model traditionally emphasizes the cleanliness, consistency, and standardization of data. The priorities of this model are fundamentally at odds with the more relaxed constraints of discovery. They're likewise at odds with the ways in which people want and (increasingly) expect to consume information.

Pentaho's new InstaView product -- which is a kind of real-time "blender" for data from Hadoop, other NoSQL repositories, and Web applications -- is an exemplar of this, he maintains. "Increasingly, we have more and more information that's more distributed than ever, and people want to be able to mash it up on the fly. Salesforce[.com] doesn't want to be your data warehouse, [nor are] most people ... going to stuff all of their non-CRM corporate information into Salesforce," Henry says.

"People who are creating line-of-business applications that may be more specialized than Salesforce don't want to get into that [putting everything into a data warehouse]: it's all one-off. You'd have to have a really complex [DW] schema to handle all of that ingestion, so what we're seeing is [the creation of] these kinds of 'data marts on demand,' particularly for highly aggregated stuff."

On-Demand ETL in Practice

At last year's Strata + Hadoop World conference, analytic database specialist ParAccel Inc. discussed a range of similar scenarios. Today, ParAccel offers On Demand Integration (ODI) modules for several database platforms or standards (e.g., Teradata, ParAccel itself, and ODBC) in addition to Hadoop. ODI connectivity can be used to directly import data from these platforms into ParAccel.

At Strata + Hadoop World, officials discussed the idea of embedding user-defined functions (UDFs) in the ODI layer. This is less an ad hoc ETL capability -- e.g., importing data from another platform into ParAccel -- than a kind of user-initiated batch ETL, said vice president of marketing John Santaferraro. A user or application doesn't initiate a completely new ETL job nor consume data from a new (i.e., external) source. Instead, the user (or the application used) consumes data that's already in ParAccel. The embedded UDFs transform this data into a format that can be consumed by the initiating application. This happens on-the-fly.

ParAccel, like Pentaho doesn't position its ODI connectivity as a replacement for robust ETL. Instead, Santaferraro explained, it sees ODI-powered information access and on-the-fly transformations as consistent with the relaxed constraints of the discovery model. Discovery, Santaferraro argued, emphasizes data access above all; it's more tolerant of consistency or quality issues.

"We're not trying to become a transformation engine; we are an analytics company. Everything we do has to do with analytics," he said.

"On-demand integration is about making data available for analysis. When you take the [ParAccel analytics] platform and add ODI services, you've now become sort of the service center for analytic services for people and applications. You can now use the power of the analytic platform to do the heavy-lifting processing of the analytics."

Harbinger of Things to Come?

ParAccel isn't alone. At the TDWI World Conference in Las Vegas, Teradata Corp. trumpeted a new version 5.1 release of its Aster Discovery Platform, which it claims can automate the time-consuming or repetitive aspects of data access and preparation. Teradata's pitch with Aster Discovery 5.10 isn't unlike ParAccel's with ODI; there's likewise a sense in which both efforts -- along with Pentaho's on-demand ETL vision -- might be seen as harbingers of an industry-wide shift to come.

Industry luminary Colin White, president of BI Research, says the DW-driven BI model is being supplanted by a more diverse schema in which multiple platforms or disciplines cohabitate in a kind of information ecosystem. An analog to discovery in this new ecosystem is what White calls "investigative computing." This describes a methodology that emphasizes rapid, test-driven iteration -- of hypotheses, analytic models, or other artifacts. Investigative computing is an umbrella paradigm for practices such as predictive analytics (PA) or analytic discovery. The point, says White, is that the traditional DW-driven model can't keep pace with -- can't feed -- the information requirements of PA, discovery, or other investigative practices.

"At the moment, getting data into the data warehouse has become a bottleneck; typically, with operational intelligence, by the time we have it in there, it's already changed," White observes. In the DW-driven status quo, analysis has largely been a retrospective practice. Analytic discovery and PA change this. Information heterogeneity -- i.e., more data, from more sources -- is a prerequisite for both practices; so, too, is rapid iteration. PA, in particular, works by optimizing models, and models are optimized iteratively: data scientists hypothesize, test, and tweak. The DW-driven status quo is inimical to this kind of "investigative" model.

"Up until now, [the way] we've produced analytics [is a lot like] 'what's-happened-in-the-past' and 'what's-happening-now.' When we get into predictive analytics ... we have to change what we're doing. We have to take advantage of [new] technology to blend in data faster, run models faster, and get results much faster. This isn't possible in the [store and analyze] model we have today," White says.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Beyond ETL: On Demand ETL and the Big Shift in BI and Analysis

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research