EXCERPT - Introduction to Next Generation Data Warehouse Platforms
By Philip Russom
If you’re a data warehouse professional—or you work closely with one—you’ve probably noticed the many new options for data warehouse platforms that have appeared this decade.
We’ve seen the emergence of new categories of data warehouse (DW) platforms, such as data warehouse appliances and software appliances. A new interest in columnar databases has led to several new vendor products and renewed interest in older ones. Open source Linux is now common in data warehousing, and open source databases, data integration tools, and reporting platforms have come out of nowhere to establish a firm foothold. In the hardware realm, 64-bit computing has enabled larger in-memory data caches, and more vendors now offer MPP architectures. Leading database vendors have added more features and products conducive to data warehousing.
Those are mostly features within the data warehouse platform, especially its database. There are also growing practices that are demanding support from the platform, including real-time integration between the data warehouse platform and operational applications, various types of advanced analytics, and reusable interfaces exposed through Web services or service-oriented architecture (SOA). Furthermore, a number of data warehouse platforms and other business intelligence platforms are now readily available through software-as-a-service (SaaS) and cloud computing.
The good news is that the options for data warehouse platforms have recently become far more numerous. The bad news is that it’s difficult for data warehouse professionals and their business sponsors to keep track of these advancements and select the ones that are appropriate for their needs.
To help organizations understand the many new options available to them, this report catalogs the new data warehouse platform products, features, and techniques that have appeared this decade, plus notable advances in more established data warehouse platforms. As examples, the report mentions many vendors and their products. From the survey data cited here, you’ll see that many organizations are planning the next generation of their data warehouse, and this report provides information that can be instrumental for such planning. The focus is on technology, but this report also explains how technology’s adoption in next generation data warehouse platforms is driven by real-world business and organizational needs and requirements.
Definitions of Terms and Concepts
DATA WAREHOUSE PLATFORM
For the purposes of this report, a data warehouse platform consists of one or more hardware servers, an operating system, a database management system (DBMS), and data storage. These communicate via a LAN or WAN, although a multi-node data warehouse platform may have its own specialized network. Note that a data warehouse platform manages a data warehouse, defined as a collection of metadata, data model, and data content, designed for the purposes of reporting, analyzing information, and making decisions. But the data warehouse is not part of the platform per se. (See Figure 1.) All these components and more have seen generational advances in recent years.
GENERATIONS OF DATA WAREHOUSES
TDWI’s position is that certain relatively new technologies, techniques, and business practices are driving the majority of data warehouses and their platforms toward a redesign, major retrofit, or even replacement that we can recognize as a generation. TDWI takes the term literally, meaning that the current generation of a data warehouse will beget the next generation. In many cases, generational change is an evolutionary process that adapts the resulting data warehouse to changing business and technology requirements. In fact, generational change is often driven by these requirements, as is explained in detail in the next section of this report. In other cases, generational change is more of a maturation process that steps a data warehouse through multiple stages of a lifecycle.
NEXT GENERATION DATA WAREHOUSE PLATFORMS
What’s next for a given organization’s data warehouse platform can vary tremendously. For example, a next generation data warehouse platform may tap into leading-edge features, such as appliances, open source, and cloud computing. It may simply get you caught up with somewhat more established practices for real-time operation, advanced analytics, and services. Sometimes, the next generation addresses administrative issues, such as hardware upgrades(from 32-bit to 64-bit), data migrations (from one DBMS
to another) or architectural changes (from SMP to MPP).
So, let’s keep in mind that a next generation data warehouse
platform is a relative concept, because it depends on where
you’re starting, what new requirements you must address, and
how many resources you have.
WHY CARE ABOUT DATA WAREHOUSE PLATFORMS NOW?
- Businesses face change more often than ever before. Recent
history has seen businesses repeatedly adjusting to boomand-
bust economies, a recession, financial crises, and shifts
in global dynamics or competitive pressures. Increasingly,
businesses rely on the data warehouse and related business
intelligence infrastructure to understand change and react
- DW platforms need updating to support changing business
requirements. In fact, many of the technologies associated
with the next generation DW relate to change in some
way, such as advanced analytics, scalable architectures,
virtualization methods, reusable services, real-time
integration with operational applications, and so on.
- Successful DWs mature through multiple lifecycle stages.
This usually provokes changes in the underlying DW
platform and elsewhere in the business intelligence (BI)
- There’s probably a new generation in your near future.
TDWI survey data shows that almost half of respondents
are planning a data warehouse platform replacement in
2009–2012. Many others anticipate keeping their current
platforms, but updating them significantly.
USER STORY: MANAGEMENT REQUIREMENTS OF TEN
DICTATE THE DESIGN OF A NEXT GENERATION DW
AND ITS PLATFORM.
“We pulled together our current data warehouse a couple of years
ago,” said Karl Mikula, the data and BI manager at Hagerty
Insurance Agency, America’s leading provider of products and
services for collectors of classic cars and boats. “Now that the
company sees the value, we’re building our next generation
data warehouse and BI solution atop a platform that’ll do what
the company needs. In a nutshell, upper management wants to
adapt a performance management methodology with scorecards.
And they want self-service BI, where they can search a repository
and pull data into reports or spreadsheets of their own design,
presented through a corporate portal. To support this, we’re
designing a data warehouse that stores metrics and KPIs in
a searchable repository. For the next generation platform, we
have a database management system, a data integration tool,
a reporting tool, a search engine, and an enterprise portal. All
these come from Microsoft, and they’re all tightly integrated out
of the box.”
Philip Russom is the senior manager of TDWI Research at The Data
Warehousing Institute, where he oversees many of TDWI’s research-oriented
publications, services, and events. He can be reached at
This article was excerpted from the full, 32-page report, Next
Generation Data Warehouse Platforms. You can download this and
other TDWI Research free of charge at tdwi.org/research/reportseries.
The report was sponsored by Aster Data Systems, HP, IBM, Infobright,
Kognitio, Microsoft, Oracle/Intel, Sybase, and Teradata.
This article originally appeared in the issue of .