Q&A: Extending Your Data Warehouse for Next-Generation BI
Does your enterprise need a more powerful data warehouse? Industry analyst Claudia Imhoff looks at what's driving the need for next-generation BI and the best practices your enterprise can follow to update your environment.
- By James E. Powell
- February 18, 2014
Is your data warehouse out of date? Has it stopped meeting your enterprise's need? If so, you need not rip and replace your current environment. In her keynote with Colin White at the TDWI World Conference in Las Vegas, Claudia Imhoff will explore the new, extended DW architecture and the need to support robust and easy-to-maintain access to all data for effective decision making. Here she explains what is driving the need for an expanded data warehouse and the best practices for getting there.
BI This Week: The title of your keynote address is "Extending the Data Warehouse for the Next Generation of BI". Does this mean our current-generation data warehouses aren't yet obsolete? How much life do they have left?
Claudia Imhoff: Data warehouses certainly have a big role still to play in BI. However, they are no longer the only source of analytics for organizations. Our talk will cover the other areas ("extensions") where analytics are accruing. These include the operational environment and the investigative computing platform areas (the "Hadoop thingy" area!).
What are the major trends you see driving the need to extend DWs?
The availability of new technologies such as Hadoop, MapReduce, modern data warehouse appliances, the cloud, new and unusual sources of data such as social media, sensor data, text, and more. These do not gracefully fit into the traditional data warehouse architectures, hence the need to expand or extend the architecture to encompass these new areas.
Another factor is the rise of big data and, of course, data scientists. They are putting enormous demands on our data warehouses that may not be met. They need their own environments - such as the investigative computing platform or sandboxes -- so they can analyze, experiment, and discover to their hearts' content without impacting the production BI occurring in the data warehouse.
What does an extended DW look like? How different is it from the DWs we use today?
There is a data warehouse, but in addition there are areas for experimentation as mentioned and also for operational truly real-time analytics such as CEP and streaming analytics.
What the biggest mistakes enterprises make when they try to extend their data warehouse?
Assuming that the traditional data warehouse architecture -- hub and spoke or bus -- can still handle all forms of new data, technologies and forms of analytics. It simply cannot. For example, it was never intended to perform real-time analytics on real-time data.
What best practices do you recommend to overcome these problems?
I would recommend two practices. First, understand the business problem you are trying to solve, then go after the appropriate technologies and techniques. Second, augment your existing traditional DW architecture with the new environments, but don't throw the baby out with the bath water.
If we extend our data warehouse, does that mean we have to update or extend all the tools we use to work with it?
No, but you may need to augment your tool box with the newer technologies. Again, this is not a rip and replace. It is an augmentation of the existing environment.
What if there are two or more trends pushing a data warehouse extension? Should these projects be done in tandem or serially?
They should be looked at together to see if there are synergies between them. It may be that they both require the investigative computing environment or streaming analytics. Why duplicate the technology and requirements gathering if you don't need to?
Just as with an older car, at some point we have to decide whether it's worth repairing or if it would be wiser to buy a newer car. How do we weigh the trade-offs, and at what point should an enterprise admit that it's time to replace the current DW with a brand new warehouse to handle today's requirements?
I suggest that this process is more of an evolution, not revolution. You can begin by adding new technologies onto your existing architecture. Bring in Hadoop for the investigative computing area. Once the data has been analyzed and certain pieces become valuable for standard analysis, it may be moved through the ETL grist mill and into the data warehouse. There are many ways to salvage your existing environment while extending it to take advantage of the features and functions available in the new technologies.
Also, vendors have made great strides in upgrading their existing databases: in-memory caches, data compression, data skipping, and columnar storage are now available in the relational database workhorses. You can upgrade from one version to the next fairly easily and get a lot of the benefits without tearing everything apart.