Relieving the Pain of the BI Back Room with Data Warehouse Automation
Data warehouse automation is more than simply automation of ETL development. It automates the entire data warehousing life cycle from planning, analysis, and design through development and extending into operations, maintenance, and change management. Dave Wells explains the benefits of the technology.
By Dave Wells
[Editor's note: Dave Wells will discuss the productivity gains, cost savings, and quality improvement possible with data warehouse automation in his session Data Warehouse Automation: Better, Faster, Cheaper ... You Can Have It All at the TDWI World Conference in Chicago (May 11-16, 2014).]
As the business-facing capabilities of BI -- dashboards, scorecards, analytics, etc. -- become more demanding and the data sources more diverse and complex, it is increasingly difficult for the data warehouse to keep pace with new and changing requirements. The common and long-standing problems with data warehouses are that they take too long to build, they cost too much to build, and they're too hard to change after deployment.
Increasing expectations on the front-end of BI combine with expanding data sources at the back-end to amplify these problems. The data warehouse is caught in the middle and simply can't respond quickly enough using traditional methods of development, operation, and maintenance. Fortunately there is an alternative to conventional methods. Data warehouse automation -- a relatively mature but underutilized technology -- is a proven and effective way to resolve the top three challenges of data warehousing.
What is Data Warehouse Automation?
Data warehouse automation uses technology to gain efficiencies and improve effectiveness in data warehousing processes. Data warehouse automation is more than simply automation of ETL development. It automates the entire data warehousing life cycle from planning, analysis, and design through development and extending into operations, maintenance, and change management.
Adoption of data warehouse automation changes the way we think about building data warehouses. The widely accepted practice of extensive up-front analysis, design, and modeling is left behind as the mindset changes from "get it right the first time" to "develop fast and develop frequently." This approach fits neatly with agile development practices, yet data warehouse automation doesn't demand that you go agile. You can achieve substantial speed, quality, and cost savings without fully embracing and implementing an agile methodology.
Automation in data warehousing has many of the same benefits as in manufacturing:
- Increased productivity and speed of production
- Reduction of manual effort
- Improved quality and consistency
- Better controls and process optimization opportunities
The manufacturing parallel holds true when building a data warehouse; we can think of it as an information factory. However, data warehousing is more complex than product manufacturing. Manufactured products are typically delivered to a consumer and the job is done. Data warehouses must be sustained through a long life cycle where changes in source data, business requirements, and underlying technologies are ongoing considerations. Automation helps to implement the right changes in the right ways and as quickly as they are needed.
The Foundation
Design patterns, standards, metadata, and reuse are the cornerstones of data warehouse automation. Data warehousing offers the opportunity to apply many design patterns including:
- Architectural patterns such as hub-and-spoke, bus, and hybrid architectures
- Data structure patterns including normalized, de-normalized, and multi-dimensional
- Data management patterns such as those for key management and time variance
- Data integration patterns including ETL, ELT, virtualization, and federation
Design patterns can be coupled with architecture, design, and implementation standards and best practices to build reusable data warehousing components. The patterns, standards, and components are captured and described as metadata. Then each use of components and the relationships among them are described with more detailed and implementation-specific metadata. A good data warehouse automation platform makes it easy to cross the gap from reusable to reused.
The Technology
There are several data warehouse automation tools in the marketplace -- some embedding automation functions into a broader tool suite and some built specifically for pure-play automation. I occasionally meet people who have built their own data warehouse automation functions -- an approach that is certainly possible but often impractical and not the best use of developer time and skills. In general, the pure-play automation tools fall into two categories:
- Open tools where the generated data warehouse schema and processing logic uses standard SQL and scripting languages that can be viewed and even directly changed by developers
- Proprietary tools where the generated data warehouse components are "black box" and capabilities to view and change are provided entirely and exclusively through the tool
If you want to learn more about data warehouse automation, consider joining me at TDWI's World Conference in Chicago for a full day of data warehouse automation education and an opportunity to take a look at some of the leading products: BIReady, Kalido, timeXtender, and WhereScape.
Dave Wells is a consultant, mentor, and teacher in the field of business intelligence. You can contact the author at [email protected].