Evolving Data Warehouse Architectures: An Overview in 35 Tweets
By Philip Russom
Research Director for Data Management, TDWI
To help you better understand the ongoing evolution of data warehouse architectures and why you should care, I’d like to share with you the series of 35 tweets I recently issued on the topic. I think you’ll find the tweets interesting because they provide an overview of big data management and its best practices in a form that’s compact, yet amazingly comprehensive.
Every tweet I wrote was a short sound bite or stat bite drawn from my recent TDWI report Evolving Data Warehouse Architectures in the Age of Big Data. Many of the tweets focus on a statistic cited in the report, while other tweets are definitions stated in the report.
I left in the arcane acronyms, abbreviations, and incomplete sentences typical of tweets, because I think that all of you already know them or can figure them out. Even so, I deleted a few tiny URLs, hashtags, and repetitive phrases. I issued the tweets in groups, on related topics; so I’ve added some headings to this blog to show that organization. Otherwise, these are raw tweets.
Basic Components of the Average Data Warehouse Architecture
- Most DW Arch’s have 4 layers: logical, physical, hardware topology, data standards.
- DW logical architecture is mostly about data models, entity models & relationships.
- DW logical arch also defines standards for data models, dev practices, interfaces, etc.
- DW physical architecture is mostly a plan for data deployment on servers.
- DW physical arch also defines topology for hardware & software servers plus interfaces.
Users’ Views of Architectural Components
- #TDWI SURVEY SEZ: Data standards & rules are highest priority (71%) of #EDW architecture.
- #TDWI SURVEY SEZ: Logical design (66%) is the starting point of an #EDW architecture.
- #TDWI SURVEY SEZ: Physical plan (56%) locates logical pieces in an #EDW architecture.
- #TDWI SURVEY SEZ: Only 12% have #EDW that’s “collection of data & platforms without a plan.”
- #TDWI SURVEY SEZ: Only 12% feel Inmon vs Kimball argument is priority for #EDW architecture.
The Evolution of Data Warehouse Architectures
- #TDWI SURVEY SEZ: 79% say their #DataWarehouse has an architecture.
- #TDWI SURVEY SEZ: #EDW arch is evolving dramatically (22%), moderately (54%) or slightly (22%)
- #TDWI SURVEY SEZ: Driving #EDW arch evolution: #Analytics 57%, #BigData 56%, #RealTime 41%.
- #TDWI SURVEY SEZ: Driving #EDW arch evolution: BizPerfMgt 38%, OLAP 30%, UnstrucData 25%.
- #TDWI SURVEY SEZ: Driving #EDW arch evolution: competition 45%, compliance 29%, dep’ts 29%.
The Importance of Data Warehouse Architectures
- #TDWI SURVEY SEZ: Architecture extremely (79%) or moderately (19%) important to #EDW success.
- #TDWI SURVEY SEZ: #EDW Architecture is an opportunity (84%), not a problem (16%).
Benefits and Barriers for Data Warehouse Architecture
- #TDWI SURVEY SEZ: Stuff that benefits from #DWarch: #analytics, biz value, data breadth.
- #TDWI SURVEY SEZ: Barriers to #DWarch success: skills gap, sponsorship, #DataMgt, funding.
Multi-Platform Data Warehouse Environments
- #EDWarch trend: more standalone platforms: #analytics DBMSs, columnar, appliances, #Hadoop, etc.
- As #EDW workloads get more diverse, so do types of standalone data platforms in #EDW environment.
- As types and numbers of data platforms grow in DW environs, architecture gets ever more distributed. #
- Distributed #EDWarch is good&bad: provides workload optimized platforms. But may spawn data silos.
- Logical layer of #EDWarch more important than ever to unite big design across multi data platforms.
Single-Platform versus Multi-Platform DW Architectures
- #TDWI SURVEY SEZ: Totally pure #EDWarchs are rare. Only 15% have central monolithic #EDW.
- #TDWI SURVEY SEZ: Hybrid #EDWarchs are most common today = central #EDW + a few other data platforms (37%).
- #TDWI SURVEY SEZ: 2nd most common Hybrid #EDWarch = central #EDW + many other data platforms (16%).
- #TDWI SURVEY SEZ: Sometimes #EDW plays small role in #EDWarch compared to workload platforms (15%).
- #TDWI SURVEY SEZ: Some organizations (15%) have many workload-specific data platforms, but no true DW.
Big Data’s Influence on Evolving DW Architectures
- #TDWI SURVEY SEZ: 41% will extend existing core #EDW to handle #BigData.
- #TDWI SURVEY SEZ: 25% will deploy new data platforms to handle #BigData.
- #TDWI SURVEY SEZ: 23% have no strategy for their #EDW’s architecture, though they need one.
- #TDWI SURVEY SEZ: Only 6% feel they don’t need a strategy for their #EDW’s architecture.
Reports and Analytics have Different DW Architecture Needs
- Many users preserve #EDW for reporting, BizPerfMgt & OLAP, but take #analytics data elsewhere.
- Data prep for reports differs from same for #analytics. So, many users prep data on separate platforms.
Want to learn more about evolving data warehouse architectures?
For a more detailed discussion—in a traditional publication!—get the TDWI Best Practices Report, titled Evolving Data Warehouse Architectures in the Age of Big Data, which is available in a PDF file via a free download.
You can also register for and replay my TDWI Webinar, where I present the findings of the TDWI report Evolving Data Warehouse Architectures in the Age of Big Data.
Posted by Philip Russom, Ph.D. on April 15, 2014