Q&A: Data Warehouses and the Demands of Big Data
- By James E. Powell
- December 3, 2013
TDWI: Is big data the inevitable future for data professionals?
Jonas Olsson: Of course not. Big data and data warehousing are two different technologies solving two different sets of challenges. Big data focuses on volume and unstructured data; data warehousing focuses on structured data and traceability. Which technology will suit your organization best depends on many factors. Using traditional data warehouse technology to analyze sensor-generated data is probably not a good idea because of the high volume of data, just as using big data technology to perform regulatory reporting is not a good idea due to poor traceability.
Where does big data fall short?
It's not so much that big data falls short, it's more that people make the mistake of moving on to this complex thing without knowing if there's a positive ROI. It's like building an addition to your house before you're done building the house. Companies still have a lot to do before they've fully capitalized on the potential of the data they already have, so why get so distracted by the shiny object?
Big data is a great technology for pattern recognition when mining large volumes of unstructured data. This is something that can be of great value, driving both tactical and strategic decisions. However, I don't think unstructured data will ever be the main source of strategic or tactical data for most enterprises. As the CEO or CFO of a Fortune 500 company, I would much rather have my operational data under control and be able to analyze it daily (with traceability), and that's what data warehouses are designed to do.
Is there too much big data hyperbole?
There are some statements that don't make any sense to me. Some big data proponents are trying to argue that causality (the why) is becoming irrelevant, and that with a big enough data set correlation (the what) is sufficient. A simple example shows that this is incorrect.
Suppose I told you that there is a type of building where people are 100 times more likely to die than in an average apartment building? If you disregard the question of causality (why people are dying in this kind of building) and just focus in correlation (how many people die in different kinds of buildings), the decision will be to demolish the buildings where people are dying at high rates. Get rid of those buildings, and we get rid of deaths. The problem? The buildings being closed are hospitals.
What do data warehouses do better than big data?
The vast majority of data that companies need to analyze is structured data, and that's where data warehouses simply can't be beat.
Apart from the analyzing structured data, I would say that traceability is the other key differentiator. If one needs to defend actions based on an analysis of data or to explain the numbers in a report -- true for virtually any company, particularly those in regulated industries like healthcare, energy or financial services -- then certainly the data warehouse is the answer.
How can data warehouses address the demands that only big data seems to answer?
One of the things people love about big data is the ability to simply dump a ton of data into storage and then figure out what to do with it later. From our experience, there's no reason data warehouses can't do this. The old extract, transform, and load (ETL) paradigm can just as easily be extract, load, and transform (ELT).
What are people not understanding when it comes to big data?
People talk about big data like we talk about space – as if it's the final frontier. I would argue that there are still entire continents to discover using the data we already have. It's as if the United States would have stopped the real benefits of western migration to focus instead on the potential of space travel.
Sometimes people talk about big data as if it is simply data warehousing for large data sets, and it's not. There are fundamental differences between the technologies that make them suitable for different purposes.
What do you think the future looks like for data warehouses?
Because they've been in corporate environments for so long, data warehouses have become a bit stagnant and lost some of the innovative spirit that made them so successful in the first place. In comparison, big data becomes exciting in part simply because it's something new.
I think this is waking up a lot of people in data warehousing, and as a result, I think we'll see data warehouses start looking a little like big data, particularly in a transition from ETL to ELT. The old way may be requiring our customers to do too much planning and ELT allows some important decisions to be made later in the process.
In addition, companies simply cannot keep spending on data technology as they have done in the past; something has to give. The cost is becoming too great and we as an industry need to find ways to predictively lower the cost of implementing and maintaining data technology. The traditional approach of creating your solution based on a generic platform and then spending countless hours filling in the gaps for your organization is becoming obsolete. It's simply too expensive.
To lower the costs, along with the overall risks of data warehouse projects, we will also see a rise in industry-specific platforms that act as software appliances as compared to today's generic platforms.
Jonas Olsson is the CEO and founder of Graz Sweden AB, a data warehouse software company. He can be contacted at firstname.lastname@example.org.