Big Data is All the Rage, But Dimensional Data Warehouses Get the Job Done
Although there's considerable potential in big data, overinvesting in the technology can starve what produces consistent value and more predictable ROI: dimensional data warehousing.
By Aaron Fuller, Principal Consultant and Owner, Superior Data Strategies
"Big data" is pervasive in the headlines nowadays. The entire world is enthralled by the implications of this enormous, constant flow of variably-structured information as well as the great and terrible things that can be and are being done with its power. There are mountains of data in every industry, from manufacturing to finance, retail to security, and the possibilities for what we can do with this information are endless. Big data may be all the buzz, but we must not let this excitement distract us from the reason we want data in the first place: to make better decisions.
Dimensional modeling -- the process of organizing an enterprise's information into facts (the things that we measure) and dimensions (the things that describe the measurements) -- isn't new anymore and it isn't generating big headlines. Great visionaries of data management laid out the approach two decades ago and although it has been refined and expanded in the time since, the basic idea has remained the same. Yet this fairly simple approach to building data marts and data warehouses has continuously proven its incredible value over time.
The concept of a dimensional model is beautiful in its simplicity: organize the data the way the business people imagine it. When a business decision maker asks a question of her data, she usually says, "I want to see such-and-such a measurement by dimension x, dimension y, and dimension z." For example, "I want to see the average sales transaction amount by customer demographic group, by store, and by month." When we build the database along these lines we make it easy for the user to understand and quick for the database engine to answer the question.
There are still major investments being made in dimensional data marts and data warehouses. They haven't lost their relevance and popularity but rather their mindshare. Big data is the sexy new concept and everyone is pushing their money that way, but sometimes your best investment isn't in the next big thing, it's in the proven thing. While companies and executives are focusing on big data, they're often underinvesting in dimensional data warehouses.
Think about it. You're an executive. You have a certain amount of money to spend in data management and analytics. You could spend those dollars enhancing your existing data warehouse so the company is capturing more information about products or purchases and enhancing the warehouse's capability to answer questions, or you can invest in a "big data" project and start mining social media feeds for mentions of the company and its products.
There are many mistaken reasons to pull funding from traditional data warehousing and push it towards big data. Data warehousing is difficult and expensive and if a company hasn't built the underlying capabilities (such as strong metadata management; a productive, well-trained team; well-reasoned and right-weight development and support processes; and a powerful storage and server infrastructure), the company may not feel that their investments in data warehousing are producing the desired results.
A new "big data" environment, such as a Hadoop cluster, may seem like a much more productive area for investment in new capabilities, and it's likely the company has its best and brightest people working on it, has made a large initial allocation of storage and server resources, and lacks legacy systems and processes to slow down progress. By the time the new platform loses its new-car smell, it'll begin suffering from the same issues the data warehouse experienced. This initial burst of productivity won't ultimately be worth it when the company ends up with two completely separate and very different data architectures being used to accomplish the same sorts of business purposes.
The truth is that despite the significant differences in data architectures, tools and development processes, the challenges in succeeding at data warehousing and at big data are similar. How will we govern the acquisition, cleansing, and use of information? How can we measure the quality of the data? How can we make the information produced easy to consume for our user community? What is the right amount of storage and server resources to allocate to data platforms that seem to have an almost infinite demand? What enterprises really need are common approaches to these fundamental data management challenges and an enterprise data architecture that seeks to use the different methodologies for providing the capabilities best suited to them.
Big data technologies are not in any way a replacement for the dimensional data warehouse. Instead, they're another new enhancement to the enterprise data architecture, just like dimensional data warehouses were 20 years ago. Successful data warehouses have not been developed in isolation from the rest of enterprises' data architectures or with the goal idea of replacing the rest of the architecture. Rather, they've been tightly integrated over time into the architecture and business processes of companies, focusing on doing what they do best. Big data platforms should do the same.
We need to consider the real reasons why big data has taken up so much of the attention of business and technology leaders over the last several years. There are many legitimate innovations and improvements in capability that have been achieved. There's no denying that for certain types of advanced analytics purposes, the improvements have been significant.
However, there are several other less-compelling reasons it has so much mindshare right now.
One is that software vendors have put resources into pushing big data as a reason companies should increase their software spending, upgrade to new versions, and switch tools and platforms. The industry has a clear reason to encourage any type of hype that increases their revenue. No one should blame them for wanting to make money, we just have to keep them in mind when we hear their messaging and we need to recognize that much of the IT-related press is sponsored in both direct and indirect ways by the vendors.
The more insidious reason for all of the attention is that underperforming business and technology leaders love to try to look for the magic bullet that will make their jobs easy. They would rather blame the old data warehousing platform and believe that a shiny new toolset will make them successful rather than admit that they mismanaged the old platform and never gave it a chance to succeed.
Although big data often provides valuable insights, for some companies, it's a technology looking for a need instead of collecting information to answer the right questions. There's lot of potential in big data but overinvestment can starve the thing you know produces consistent value and more predictable ROI – dimensional data warehousing.
You have to find the right balance between exploring big data and analyzing the information in the data warehouse to make business decisions. The whole point of doing anything with data is to make better decisions, and too often we get hung up on the hot technology and lose sight of what's going to provide us with the most value in our decision making.
Although concepts such as big data are all the buzz, let's not forget where the honey is, where's it's been for the last 20 years, and where we see no end to that supply of value.
Aaron Fuller is the principal consultant and owner at Superior Data Strategies and is responsible for guiding clients toward reliable and valuable business solutions as it relates to their data warehousing, business intelligence and enterprise architecture programs. Fuller is skilled in dozens of software, databases, and standards and methodological programs and serves as a faculty member at TDWI. You can reach him at firstname.lastname@example.org.