Managing System Change
BI environments are like personal computers: after a year or two, performance starts to degrade and you are never quite sure why. The best explanation is that these systems start accumulating a lot of “gunk” that is hard to identify and difficult to eliminate.
Personal computers, for example, become infected with viruses, spyware, and other malware that wreak havoc on performance. But we cause many problems ourselves by installing lots of poorly designed software, adding too many memory-resident programs, accidentally deleting key systems files, changing configuration settings, and failing to perform routine maintenance. And when the system finally freezes up, we execute unscheduled (i.e. three-finger) shutdowns, which usually compound performance issues. Many of us quickly get to the point where it’s easier and cheaper to replace our personal computers rather than try to fix them.
Unfortunately, BI environments are much harder and more expensive to return to a pristine environment. Over time, many queries become suboptimized because of changes we make to logical models, physical schema, or indexes or because we create incompatibilities when we upgrade or replace drivers and other software. Each time we touch any part of the BI environment, we create a ripple effect of problems that makes IT adverse to making any changes at all, even to fix known problems! One data architect recently confessed to me, “I’ve been trying 10 years to get permission to get rid of one table in our data warehousing schema that is adversely affecting performance, but I haven’t succeeded.”
But when IT is slow to make changes and maintenance efforts begins to dwarf development initiatives, then the business revolts and refuses to work with IT and fund its projects.
The above architect said the solution is “better regression testing.” The idea is that if we perform continuous regression testing, IT will be less hesitant to change things because it will see quickly whether the impact is deleterious or not. However, this is like using a hammer and chisel to chop down a tree. It will work but it’s not very effective.
The better approach is to implement end-to-end metadata so you can see what impact any change in one part of the BI environment will have on every other part. Of course, a metadata management system has been an elusive goal for many years. But we are starting to see new classes of tools emerge that begin to support impact analysis and data lineage. ETL vendors, such as Informatica and IBM, have long offered metadata management tools for the parts of the BI environment they touch. And a new class of tools that I call data warehouse automation tools, which automatically generate star schema and semantic layers for reporting, also provide a glimmer of hope for easier change management and reporting. These tools include Kalido, BI Ready, Wherescape, and Composite Software with its new BI Accelerator product. You’ll hear more about these tools from me in future blogs.
Posted by Wayne Eckerson on May 6, 2009