LESSON - Intelligent Data Warehousing with Extract Transact Predict Load
By Keith Laker, Oracle Warehouse Builder Product Management, Oracle
Every organization is struggling to ensure the provision of intelligent data warehouse (iDW) environments that contain as much embedded value as possible.
However, any embedded value is usually constrained by a number of typical data-related issues:
- Missing or incorrect data
- Lack of relevant business attributes
- Lack of focus on most relevant attributes, hierarchies, and levels
- Ability to located critical data in a timely fashion
Performing simple extract, transform, and load (ETL) operations as part of the data warehouse process is no longer sufficient.
Adding Value To The ETL Process
It is possible to add significant value to the data warehouse environment by incorporating predictive analytics directly within the normal data loading process. The whole ETL process simply expands to include a new, and critically important, stage—“P,” or predict.
By incorporating a predict stage as part of the upstream processing, all business users can benefit from the results, no matter which reporting tool they are using.
The key to adding value within a data warehouse is to do as much as possible,as close to the data as possible, as earlyin the process as possible.
Imagine if, as part of your ETL procedures, it were possible to predict new values for missing data points. That would help reduce the amount of revenue allocated to the ever-popular “unknown value” bucket. Imagine if it were possible to actually validate existing values to determine if they were in fact correct or, as happens most of the time, just garbage values. This elevates the whole ETL process to a whole new level and adds real business value.
By using predictive analytics, it is possible to automate the discovery of patterns using functions that can sweep through tables and identify previously hidden patterns. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. This information can be used to drive the creation of data-driven hierarchies that allow business users to directly monitor key buying patterns over time.
Other pattern discovery problems include detecting clusters within a column that can then be used to drive completely new hierarchies and drill paths.
Succeeding with ETPL
The following are key requirements for successfully extending existing ETL processes
to include a new predict stage:
- Integrated database engine
- Industry standard APIs
- Extensible ETL
Integrated Database Engine
To leverage the opportunities of ETPL, you must start with an integrated database engine that supports both storage and analysis of relational, spatial, multi-dimensional models, predictive analytics, and optimized ETL capabilities. Cross-fertilization of analysis then comes as standard, and because these objects are stored directly within the database, all the processing associated with these features (such as predictive analytics) can be performed directly within the ETL process as close to the data as possible.
Industry Standard APIs
The industry standard for interrogating and analyzing data stored in an RDBMS is SQL. By providing a SQL and PL/SQL interface to key features such as ETL and predictive analytics, it is possible to seamlessly integrate these powerful features using industry standard languages.
Extensible ETL
The ETL framework must be extensible and make use of the previous two key features. By using industry standard APIs to generate ETL scripts, additional capabilities such as predictive analytics can be integrated seamlessly. By leveraging an integrated database engine, you can safely implement ETPL today, without stretching your existing batch windows to the breaking point.
Delivering Intelligent Data Warehousing
The key to adding value within a data warehouse is to do as much as possible, as close to the data as possible, as early in the process as possible. This is only possible if your database vendor is ETPL enabled.
Only ETPL can help you deliver iDW.
This article originally appeared in the issue of .