By using tdwi.org website you agree to our use of cookies as described in our cookie policy. Learn More

RESEARCH & RESOURCES

LESSON - Intelligent Data Warehousing with Extract Transact Predict Load

By Keith Laker, Oracle Warehouse Builder Product Management, Oracle

Every organization is struggling to ensure the provision of intelligent data warehouse (iDW) environments that contain as much embedded value as possible.
However, any embedded value is usually constrained by a number of typical data-related issues:

  1. Missing or incorrect data
  2. Lack of relevant business attributes
  3. Lack of focus on most relevant attributes, hierarchies, and levels
  4. Ability to located critical data in a timely fashion

Performing simple extract, transform, and load (ETL) operations as part of the data warehouse process is no longer sufficient.

Adding Value To The ETL Process

It is possible to add significant value to the data warehouse environment by incorporating predictive analytics directly within the normal data loading process. The whole ETL process simply expands to include a new, and critically important, stage—“P,” or predict.

By incorporating a predict stage as part of the upstream processing, all business users can benefit from the results, no matter which reporting tool they are using.

The key to adding value within a data warehouse is to do as much as possible,as close to the data as possible, as earlyin the process as possible.

Imagine if, as part of your ETL procedures, it were possible to predict new values for missing data points. That would help reduce the amount of revenue allocated to the ever-popular “unknown value” bucket. Imagine if it were possible to actually validate existing values to determine if they were in fact correct or, as happens most of the time, just garbage values. This elevates the whole ETL process to a whole new level and adds real business value.

By using predictive analytics, it is possible to automate the discovery of patterns using functions that can sweep through tables and identify previously hidden patterns. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. This information can be used to drive the creation of data-driven hierarchies that allow business users to directly monitor key buying patterns over time.

Other pattern discovery problems include detecting clusters within a column that can then be used to drive completely new hierarchies and drill paths.

Succeeding with ETPL

The following are key requirements for successfully extending existing ETL processes
to include a new predict stage:

  1. Integrated database engine
  2. Industry standard APIs
  3. Extensible ETL

Integrated Database Engine
To leverage the opportunities of ETPL, you must start with an integrated database engine that supports both storage and analysis of relational, spatial, multi-dimensional models, predictive analytics, and optimized ETL capabilities. Cross-fertilization of analysis then comes as standard, and because these objects are stored directly within the database, all the processing associated with these features (such as predictive analytics) can be performed directly within the ETL process as close to the data as possible.

Industry Standard APIs
The industry standard for interrogating and analyzing data stored in an RDBMS is SQL. By providing a SQL and PL/SQL interface to key features such as ETL and predictive analytics, it is possible to seamlessly integrate these powerful features using industry standard languages.

Extensible ETL
The ETL framework must be extensible and make use of the previous two key features. By using industry standard APIs to generate ETL scripts, additional capabilities such as predictive analytics can be integrated seamlessly. By leveraging an integrated database engine, you can safely implement ETPL today, without stretching your existing batch windows to the breaking point.

Delivering Intelligent Data Warehousing

The key to adding value within a data warehouse is to do as much as possible, as close to the data as possible, as early in the process as possible. This is only possible if your database vendor is ETPL enabled.

Only ETPL can help you deliver iDW.

This article originally appeared in the issue of .

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.