Data Warehouse Architecture: Anticipating Future Needs
Even if their existing data warehouse architecture satisfies current requirements, organizations need to be prepared to address their future needs.
- By Mike Schiff
- September 4, 2012
Although no one can totally predict tomorrow's data warehouse requirements, there are some aspects that we can accurately anticipate. For example, we can expect new data sources, greater data volumes, the need for faster analyses, and new data warehouse database engines. To better address their future needs, organizations need to take steps to prepare for them today.
Examples of Change
Decades ago, most operational systems utilized networked or hierarchical databases (e.g., IDMS and IMS/DB respectively) or even flat files, while most analytic databases were relational or multidimensional (e.g., Express, Essbase). Over the years, relational databases continued to evolve with extensions such as bit-mapped indices or column-based architectures that greatly enhanced performance in analytic (but not necessarily operational) environments. Advances such as massively parallel processing and in-memory technology combined with 64-bit addressing have benefited both operational and analytic applications.
Databases have (and will) continue to evolve, and although a subject of current debate is whether the "no" in "NoSQL" databases represents "not" or "not only," we can be certain that these databases represent additional data structures that were, for the most part, not present a few years earlier. Furthermore, although we don't know what specific data structures we will need to integrate, access, and analyze the data contents of in the future, we can be certain that they will be a superset of those we access today. Legacy databases are not likely to be an underlying data warehouse database, but the data contained in legacy databases will continue to be data sources, at least until the applications that use them are retired.
Consider the attention currently given to unstructured big data. Organizations that a year or two ago were showing cursory interest in big data and associated enabling technologies (such as Hadoop and MapReduce) now recognize the need to integrate and analyze vast volumes of both structured and unstructured data. Furthermore, many of these organizations wish to directly and quickly analyze data stored in non-relational data structures without having to first extract data from these sources and load the data into a relational database. This may require modifying their existing data warehouse architectures to accommodate their latest needs.
Cloud technology is no longer just a low-cost solution for organizations that lack an IT infrastructure or don't want to invest in on-premise hardware and software. Cloud technology (which I believe derives from the timesharing days of the late 1960s and early 1970s but connects over the Internet rather than telephones and 120-baud acoustic couplers!), is now gaining wide acceptance as a component of an organization's overall data warehousing architecture.
Consider a Vendor's Track Record of Responding to Change
One way to be better prepared for dealing with future requirements is for organizations to consider, as part of their product selection criteria, a vendor's track record of responding to new technology.
For example, when evaluating data integration vendors, consider how quickly the vendor was able to work with new data sources. For database vendors, consider how rapidly the vendor incorporated new data structures (e.g., XML), enhanced its offering with features such as column-orientation, or extended its SQL syntax to incorporate MapReduce functions. For BI vendors, consider how soon a vendor was able to directly work with new database offerings (e.g., Hadoop). Organizations should also seek out business intelligence vendors that have embraced mobile technology across a variety of portable devices; vendors with a proven track record of working with today's devices will likely be able to work with tomorrow's devices as well. Organizations should also consider a vendor's current or planned cloud-based offerings.
One of the hallmarks of an agile data warehousing environment is its ability to quickly respond to changing user requirements. Organizations need to create data warehouse architectures that address their current requirements and that are flexible enough to be able to quickly respond to their future needs as well. One way to make this possible is to work with vendors that have a proven track record of being able to rapidly respond to, and work with, the continuing evolution of data warehouse technologies.