Can the New Thing Replace the Old Thing?
This is an important question to ask about many new IT systems and tools. The answer depends on many factors -- including forming the question properly.
- By Philip Russom
- July 20, 2020
A common question we hear at TDWI is: Can [the new thing] replace [my old thing]? We hear variations of this question a lot because when it comes to IT, data management, and analytics, there are so many new data platforms, tool types, design patterns, and methodologies -- and new things just keep coming.
We're lucky to have so many innovations and options, but we're also a bit confused as to how and when we should adopt the new things.
Answering the question is problematic because of the many factors involved and the challenge of asking the question in a way that makes sense. Here are some general factors you should consider:
New technology adoptions tend toward accretion, not displacement. In other words, in IT, new systems coexist with and complement older systems more often than they replace them.
Displacement is messy, takes time, and evades consensus. Traditionally, a "server on the side" (SOS) is easier for a vendor to sell and for IT to get approved. That is because it adds business value in a non-invasive manner compared to long and risky migration and consolidation projects that kill off mature but valuable systems.
Even when displacement occurs, it takes time to play out. For example, forklifting data from one platform to another has become easier and more reliable in recent years because of new tools, practices, and the power of cloud data platforms. Even so, the complexity of data domains, structures, platforms, applications, and user communities can easily require a multiphase, multiyear project.
Many replacement questions include flawed logic. Expecting an orange to replace an apple is not unreasonable because both are fruits and both supply the same nutritional components (although at different levels). However, expecting a building's foundation to replace its roof is a ludicrous idea because the two are not functionally equivalent despite being part of the same structure. For example, it is a false equivalency when you equate the data of a data warehouse with the database platform that manages the data.
IT system replacement decisions should be based on requirements. Don't be seduced by new shiny objects, the allure of being "modern," or your need for a new item on your resume. Adopting a new technology should address business requirements first, technology requirements second.
Question Variations (and Answers)
At TDWI, this question takes a number of forms relative to the data warehouse and analytics. Here are some common versions (with answers).
Can Hadoop replace my data warehouse?
This question doesn't make sense because the data warehouse is the data (plus data models, architecture, and metadata), whereas Hadoop is a data platform that can be used to manage and leverage a warehouse's data.
A more logical question (involving two data platforms that are roughly equivalent) would be: Can Hadoop replace my relational database? Similarly, you might ask: Can I successfully migrate my data warehouse to Hadoop?
The short answer is: Probably not, especially if your data warehouse or other use case has substantial relational or SQL requirements. Although relational support can be retrofitted onto Hadoop (via Apache tools, for example), the support is disappointing for people spoiled by mature relational databases.
However, the answer is different when a so-called data warehouse turns out to be a bucket of operational data stores and other simple table structures. Hadoop extended with Apache tools for row stores can handle this kind of data just fine.
Can a cloud data platform replace Hadoop?
This boils down to a data lake issue because the data lake design paradigm evolved on Hadoop and many lakes are still on Hadoop. However, due to current dissatisfaction with the costs of HDFS on premises and the weak relational support in Hadoop tools, TDWI sees many of its members migrating their data lakes from on-premises Hadoop to cloud data platforms. In other words, lake users are still committed to the lake design pattern. However, their data lakes need a scalable platform with deep relational functionality at a reasonable cost -- and that's what they get from a cloud data platform.
Can a cloud data platform replace my on-premises relational database?
We indirectly answered "yes" to this question in our answer to the previous question. Vendor products in the latest generation of cloud data platforms are all database management systems (DBMSs) that support SQL and other accoutrements of the relational paradigm. In essence, these are all cloud RDBMSs. Furthermore, all cloud data platforms have already achieved impressive maturity in their support of relational functionality, plus they support common use cases in data warehousing, data lakes, analytics, and operational data.
Can a data lake replace my data warehouse?
Lakes and warehouses are easily confused because they have similar requirements for scalability, use cases, platforms, data integration, and the relational paradigm. However, they fulfill distinctly different functions.
Most data warehouses focus on well-known data that is aggregated, transformed, cleansed, documented, and structured for set-based analytics (reporting, OLAP, and ad hoc queries). The data lake, on the other hand, is a massive repository mostly of detailed source data (unaltered from its original state) built for algorithm-based analytics (data exploration, discovery, mining, statistics, and machine learning).
Design patterns and use cases for data warehouses and data lakes tend to be complementary and synergistic; organizations increasingly have both, with significant integration between them. Instead of one replacing the other, a single, unified data warehouse and data lake architecture can address a very broad range of analytics use cases, from traditional business reporting to the leading edge of predictive analytics.
A Final Word
There are many other replacement questions to consider, but for reasons of time and space, let's stop here. Keep asking and answering questions because that's how we learn and adapt to change. As you do, however, please keep in mind the following when asking IT system replacement questions:
- Coexistence of old and new is more likely than the new replacing the old
- Replacement is disruptive (in a bad way) so avoid it or manage it carefully
- When replacement occurs, you'll need additional time, money, and human resources
- Form replacement questions carefully to avoid false equivalencies and other flaws in logic
- All IT system replacement decisions should hinge on whether the new thing satisfies current business requirements first, technology requirements second
Philip Russom is director of TDWI Research for data management and oversees many of TDWI’s research-oriented publications, services, and events. He is a well-known figure in data warehousing and business intelligence, having published over 600 research reports, magazine articles, opinion columns, speeches, Webinars, and more. Before joining TDWI in 2005, Russom was an industry analyst covering BI at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and BI consultant and was a contributing editor with leading IT magazines. Before that, Russom worked in technical and marketing positions for various database vendors. You can reach him at [email protected], @prussom on Twitter, and on LinkedIn at linkedin.com/in/philiprussom.