LESSON - Key Considerations When Implementing Real Time Data Integration Solutions
By Chris McAllister, Senior Director of Product Management, GoldenGate Software
For years, companies have relied on the integration of data to measure results and improve strategic planning. Forward-thinking companies, however, are not only looking at the past to plan the future, but are also looking at what is happening right now to influence current, dynamic business activity. Data can now be leveraged to empower operational decision making, to answer questions such as “Which products will need to be restocked in the next two hours?” or “Which customers making purchases right now would be likely to accept a cross-sell offer?” That means putting data into the hands of front-line employees across the enterprise, such as call centers, the manufacturing floor, and store operations.
A crucial requirement is the ability to integrate—and provide access to—real-time or near-real-time operational data. There are several approaches and technologies that may be evaluated to achieve real-time data integration, but they differ in important ways. Companies should make the following considerations part of their evaluation checklist.
Data Timeliness and IT Impact
Understanding the data latency requirements of the business is an important factor. Some may need only hourly data, while others may need data to be delivered in minutes or seconds. The IT team should evaluate the level of complexity in implementing mini batches or intraday batches versus designing for continuous real-time feeds. They may discover that four-times-a-day or hourly batch architectures are far more complicated to implement and also impose too much impact throughout the business day. Further, a longer-term view should be taken: Hourly updates today may mean real-time updates will be requested next year. Continuous data feeds can be implemented with negligible system impact; architecting a solution where technology isn’t the constraint may save time and money in the long run.
Data Volumes and Performance
The optimal data integration solution should be capable of keeping up with the volume of new and changed data at the determined latency. In many cases, this could mean moving thousands of transaction operations per second. Spikes during peak times should also be handled while adding little or no additional latency.
In a 24/7 world, batch windows are shrinking rapidly. Higher data volumes combined with more frequent integration make batch-oriented technologies a limiting choice when moving toward a lower-latency architecture. The optimal data integration solution should not rely on a batch window.
The most common method for acquiring data from source systems is to use extract, transform, and load (ETL) tools, but ETL tools process data in batches and OLTP activity must cease when they do so. However, the transformation and data cleansing capabilities may be necessary for some subsets of the data. A real-time data capture solution should be able to integrate easily with ETL tools when more extensive transformation and cleansing are required.
A data integration solution by definition should support a variety of heterogeneous environments, but beyond that it should support myriad topologies and be easy to implement, manage, and scale as needs change. For example, adding new data sources and/or targets should be straightforward and not require major overhauls.
Continuous feeds can be designed for negligible impact—and may save time and money in the long run.
Data Integrity and Recoverability
When moving and integrating data at the transaction level, the solution should maintain the referential integrity of each transaction between source and target systems. The architecture should also be built to easily recover from unexpected interruptions, such as hardware failures, network issues, and human errors, without losing or corrupting the transactional data.
Real-time data integration solutions are getting a lot of attention, but they are not all alike. Unless the solution delivers real-time data without compromising the performance of OLTP systems, and unless this information is continuously available, organizations will not be able to realize the full benefits of leveraging operational data across the enterprise.
For a free white paper on this topic, click here and choose the title “Real-Time Data Integration for Data Warehousing and Operational BI.”