LESSON - Extract, Transform, and Load and the Batch Window Challenge: A Practical Approach to BI Process Efficiency
By Derek Evan, Solutions Architect, Cisco
Despite the rise in real-time operational business intelligence (BI), the majority of BI implementations still depend heavily on batch processes. Creating an efficient process is key in meeting the challenge of timely information delivery within ever-shrinking batch windows.
With potentially hundreds of data sources, large volumes of data, and complex dependencies in the ETL process, the number of points where delays or errors can occur is many. Even when delays and errors are not factors, large variations in job running times cause a breakdown of processes based on traditional time-based scheduling. These variations lead to the inclusion of longer than needed “wait” times between steps, and an inability to complete processing in the available window.
In addition, significant manual effort is required to “babysit” processes, monitoring every step to ensure it completes in the manner expected, managing process step dependencies, and rescheduling or manually triggering steps when the process doesn’t follow previously scheduled timelines. This is true even when using a packaged ETL tool; although these tools do a good job of managing the steps within the ETL process, they are unable to coordinate the ETL process with predecessor jobs in other applications, or with reporting solutions on the tail end.
Centralized and standardized event-based job scheduling eliminates several of these issues. A job scheduling solution that connects to different parts of the BI and ETL environment as well as other applications and data sources outside of the environment can be used to build an automated process that eliminates wait states, manages complex dependencies, and sends alerts when things don’t happen as expected.
For instance, consider the arrival of a file on an FTP server that needs to trigger a job in the ERP system. On completion, the ERP job needs to trigger an ETL workflow that has other conditions that must be met. The data loaded by the ETL is then used to build a cube, upon which reporting and interactive analysis is performed. This entire process can be completely automated by an enterprise distributed job scheduler that connects to all of the required business process components: FTP, ERP, ETL, and BI. No manual intervention is required to move from one part of the process to the next.
In addition, centralized job scheduling also helps manage resources. For instance, it can prevent two CPU-intensive jobs (with different dependencies) from running concurrently and creating resource contention, which could lead to job failure or running too long.
Standardizing on a cross-platform, cross-application job scheduler to manage the ETL and BI processes not only makes the process quicker and less error prone, but it also provides a single console from which to view and monitor the progress of the process, and frees up BI operational staff to focus on strategic tasks to help support business needs and increase the value of BI for the business.
Look for a job scheduling solution that is easy to use and has wide platform coverage and built-in integration for your chosen BI tools, and a vendor that has experience setting up BI and ETL integration to get the maximum value in the least amount of time.
For a free white paper on this topic from Cisco, click here and choose the title “BI and ETL Process Management Pain Points: A Look at the Most Pressing Pain Points, and Strategies for Addressing Them.” For more free white papers, click here.
This article originally appeared in the issue of .