Executive Summary | Building the Unified Data Warehouse and Data Lake
Executive summary for the TDWI Best Practices Report: Building the Unified Data Warehouse and Data Lake
- By Fern Halper, Ph.D., James G. Kobielus
- May 28, 2021
For years, TDWI research has tracked the modernization and evolution of data warehouse architectures as well as the emergence of the data lake design pattern for organizing massive volumes of analytics data. The two have recently converged to form a new and richer data architecture. The architecture is fairly new, and not many organizations have embraced it yet. The majority of respondents to this survey see it as an opportunity because it provides more options for managing an increasingly diverse range of data structures, end user types, and business use cases.
Within this evolved environment, data warehouses and data lakes can incorporate distinct but integrated, overlapping, and interoperable architectures that incorporate standard functional layers. These unifying layers include data storage, mixed workload management, data virtualization, content ETL, and data governance and protection. This unified DW/DL architecture continues to evolve, blurring the architectural distinctions between these formerly discrete approaches to deploying, processing, and managing analytics data.
In this study, 64% of respondents stated that the point of the unified data warehouse/data lake is to get more business value from data, whether in operations or analytics. Top value drivers include unifying silos (53%), providing a better foundation for analytics against new and traditional data types (49%), and storage and cost considerations (28%). Eighty-four percent of respondents to the survey stated that the unified DW/DL was either extremely important (48%) or moderately important (36%).
Organizations are accomplishing unification in different ways. This includes physical consolidation as well as using semantic layers and virtualization. They are making use of tools such as modern data pipelines and data catalogs. They are utilizing disciplines such as data governance, master data management, and metadata management. Organizations attempting unification face challenges as well. Data governance ranks at the top of the list of challenges for the unified DW/DL environment.
This TDWI Best Practices Report examines the convergence of the data warehouse and data lake. It looks at how organizations are currently using their data warehouse and data lake environments and how they are bringing the two together. It examines the drivers, challenges, and opportunities for the unified DW/DL and provides best practices for moving forward.
Denodo, Dremio, Hitachi, Matillion, SAP, Snowflake Computing, Trifacta, and Vertica are Platinum Sponsors of the research and writing of this report. Qlik is a Gold Sponsor.
About the Authors
Fern Halper, Ph.D., is vice president and senior director of TDWI Research for advanced analytics. She is well known in the analytics community, having been published hundreds of times on data mining and information technology over the past 20 years. Halper is also co-author of several Dummies books on cloud computing and big data. She focuses on advanced analytics, including predictive analytics, text and social media analysis, machine-learning, AI, cognitive computing, and big data analytics approaches. She has been a partner at industry analyst firm Hurwitz & Associates and a lead data analyst for Bell Labs. Her Ph.D. is from Texas A&M University. You can reach her by email ([email protected]), on Twitter (twitter.com/fhalper), and on LinkedIn (linkedin.com/in/fbhalper).
James Kobielus is senior director of research for data management at TDWI. He is a veteran industry analyst, consultant, author, speaker, and blogger in analytics and data management. At TDWI he focuses on data management, artificial intelligence, and cloud computing. Previously, Kobielus held positions at Futurum Research, SiliconANGLEWikibon, Forrester Research, Current Analysis, and the Burton Group. He has also served as senior program director, product marketing for big data analytics for IBM, where he was both a subject matter expert and a strategist on thought leadership and content marketing programs targeted at the data science community.