Agile Alternatives for a Modernized Data Warehouse Environment
The appeal of the agile technology alternatives goes beyond the potential for improved performance. They all represent approaches to augmenting the data warehouse environment in ways that reduce restrictions.
- By David Loshin
- February 24, 2016
After assessing the current state of the reporting and analytics infrastructure and deciding that your organization’s data warehouse environment is ripe for renovation, your next step is to consider how to address your current environment’s gaps in meeting business expectations. In some cases those gaps are related to functionality, while in other cases the gaps reflect inefficiencies in the processes that have been organized around the defaults associated with the existing system platform choices. In either case, one key objective is to consider modern platform choices that reduce or eliminate the impediments posed by the existing environment.
One approach involves selecting platform components that increase agility, whether that term is used loosely to convey quickness and coordination or used in the context of agile development, implying close interaction between programmer teams and business experts for rapid delivery of value. In the recent study that our research firm DecisionWorx, LLC performed, we concentrated on technologies that we believed to enhance corporate agility in terms of speeding design and development, improving cycle times in producing reports and analyses and strengthening the collaboration between developers and business analysts.
The following technologies are alternatives to consider as part of your data warehouse environment modernization effort:
Columnar databases, designed to align data in ways that speed loading and query response time.
In-memory databases, in which data sets are organized to load the entire database (or at least the most frequently-accessed tables) directly in the memory of systems configured with large amounts of memory. By maintaining hot data in main memory, queries will be significantly accelerated, resulting in faster delivery of application results.
Hadoop, an open source ecosystem of tools for data distribution and parallel execution. To get the optimal scalable performance increase, Hadoop is often deployed using platforms built with commodity components. Although Hadoop is not a database and at this point is not likely to replace a data warehouse, it is an environment that is well suited for developing predictive and prescriptive analytics applications that can consume and take advantage of massive data volumes.
Data lake, which, according to TechTarget, is “a large object-based storage repository that holds data in its native format until it is needed,” providing a place for collecting data sets in their original format, making those data sets available to different consumers, and allowing data users to consume that data in ways specific to their need.
Data warehouse automation, which provides tools to facilitate many aspects of the development and production of a data warehouse. Data warehouse automation supports all aspects of the development life cycle: source system analysis, design, development, generation of data integration scripts, building, deployment, generation of documentation, and testing, as well as support for ongoing operations, impact analysis, and change management.
Cloud-based or hosted data warehousing, including data warehouses deployed on a cloud platform or with a service provider who hosts hardware and software. In cases, hardware and software acquisition, operations, and maintenance efforts and costs are reduced, if not mostly eliminated. In addition, service providers can provide consulting and guidance in the design, development, and deployment of the data warehouse.
Data warehouse appliance, which is a specialty hardware configuration engineered for high-performance reporting and analytics. An appliance is typically configured with multiple processing nodes, multiple storage nodes, and high-speed interconnectivity that can be configured to specific data warehousing and business intelligence needs.
The appeal of these agile alternatives goes beyond the potential for improved performance. They all represent approaches to augmenting the data warehouse environment in ways that reduce restrictions, whether they are imposed by the constraints of mainframe implementations or by adherence to relational database structure. Combine different architectural approaches that are aligned with the business objectives (as we discussed in one of my prior articles) and suited to the types of applications that address the business, technical, and strategic perspectives underlying the motivations for change.
[Editor's note: The discussion continues here.]
David Loshin is a recognized thought leader in the areas of data quality and governance, master data management, and business intelligence. David is a prolific author regarding BI best practices via the expert channel at BeyeNETWORK and numerous books on BI and data quality. His valuable MDM insights can be found in his book, Master Data Management, which has been endorsed by data management industry leaders.