Modernizing a Data Warehouse with Real-Time Functions
Accelerate your business closer to real-time operation by implementing new technologies in the data warehouse and related systems.
- By Philip Russom
- April 22, 2016
For decades, BI professionals have been pushing the refresh and delivery of reports and analyses closer and closer to real time. Today, a number of common BI practices handle data in near time (minutes or hours), including operational BI, dashboarding, and metrics-driven performance management. These practices enable managers to make tactical and operational decisions based on very fresh information.
However, for some fast-paced, time-sensitive business processes, near time isn't fast enough. They need true real time, where data is handled within seconds, preferably microseconds. Examples include applications for financial trading systems, business activity monitoring, utility grid monitoring, e-commerce product recommendations, and facility surveillance.
For organizations needing to modernize their data warehouse environments to handle data in near time or real time (sometimes called "real-time data warehousing"), many technologies are available for your consideration.
The list includes data federation and virtualization, data replication and synchronization, intraday micro batches, columnar database management systems, data warehouse appliances, MPP computing architectures, elastic clouds, in-database analytics, in-memory functions, and solid-state drives. Note that the bar has been raised on these; not only must they operate in a variety of short time frames (sometimes called "right time"), they must also operate on a wider range of data structures, in unprecedented volumes.
Current solutions that can provide speedy processing include:
Complex event processing (CEP) for streaming data. One form of big data is streaming data. Data streams into an organization more or less continuously as a series of data records, each describing a business event. For example, streams come online when users add sensors to their machines, products, vehicles, and mobile devices or when users log in to Web or enterprise applications.
In CEP, streaming data is captured, triaged, and processed to determine a reaction; then an automated response is executed by software or a user is alerted -- all within seconds or milliseconds. Standalone CEP tools have arisen to handle streams, and users are adding CEP tools to their data warehouse environments as they modernize for true real-time operations.
Hadoop for streaming data. Early versions of Hadoop lacked near-time and real-time capabilities. This situation has improved considerably with the introduction of open-source projects for capturing and analyzing streaming data (Samza, Spark, and Storm). These promise to handle both the speed of real time and the massive data volumes we expect in Hadoop.
TDWI anticipates that Hadoop will become a preferred real-time platform because of its low cost (compared to commercial CEP platforms) and its massive storage capabilities. After all, streaming data adds up to large volumes in a hurry.
Interactive SQL on Hadoop. The many users using HiveQL with Hive and HBase attest to their value, yet data management professionals are calling for better support of standard SQL on Hadoop, so they can leverage their SQL skills and their SQL-based tools. Likewise, data analysts need near real-time query responses to support analytics practices such as data exploration and ad hoc queries.
The open-source projects Drill and Impala provide these and other functions. In addition, some vendor distributions of Hadoop support file-system enhancements for fast ingestion of data streams, so these are available immediately for both analytics and operational workloads.
Streaming ETL on Hadoop. Hadoop's capabilities designed for handling and analyzing streaming data can also be used for streaming ETL, which can aggregate, transform, and otherwise process data as it arrives. Streaming ETL avoids the overhead and latency of applying structure before load time, and by speeding up the ETL process, downstream decision making and other business processes are greatly accelerated.
To learn more about common data warehouse modernization techniques, read TDWI's recent Checklist Report, "Eight Tips for Modernizing a Data Warehouse" at http://bit.ly/1GBImZX.
Philip Russom is director of TDWI Research for data management and oversees many of TDWI’s research-oriented publications, services, and events. He is a well-known figure in data warehousing and business intelligence, having published over 600 research reports, magazine articles, opinion columns, speeches, Webinars, and more. Before joining TDWI in 2005, Russom was an industry analyst covering BI at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and BI consultant and was a contributing editor with leading IT magazines. Before that, Russom worked in technical and marketing positions for various database vendors. You can reach him at [email protected], @prussom on Twitter, and on LinkedIn at linkedin.com/in/philiprussom.