LESSON - Just Change Data Capture It! Five Use Cases for Change Data Capture
By Itamar Ankorion, Director of Marketing, Attunity
Over the last few years, we’ve observed two clear and present trends: data volumes are growing rapidly and data latencies are quickly shrinking as users expect fresh and accurate data to be available in near real time, where and when needed.
As these trends evolve, many companies are realizing that their traditional bulk processing approach is doomed to fail. The solution of choice is to change the processing paradigm and only work with data that actually changed, which in many cases is a fraction of the source data (e.g., 5 percent). This paradigm is based on change data capture (CDC), a technology that reduces costs and enables improvement of data timeliness, quality, and consistency. The following applications can benefit from CDC, typically used in conjunction with ETL tools.
ETL for Data Warehousing
The most common case for CDC is in loading data warehouses, where processing changes can dramatically reduce load time, required resources (e.g., CPUs, memory), and associated costs (e.g., software licenses). In many cases, daily changes represent a fraction of the total data volume, so CDC has a big impact on efficiency and provides a solution for the continued and accelerating growth in data volumes.
CDC also enables processing small batches of data at higher frequencies (e.g., every hour, every minute, continuously), thus supporting lower delivery latencies for real-time data warehousing. Finally, CDC makes downtime and batch windows shrink or disappear, mitigating the risk of failure in long-running ETL jobs.
Slowly Changing Dimensions
Any data warehouse team needs to address slowly changing dimensions (SCD), which requires identifying the records and attributes that changed. For large dimension tables, this is a demanding and inefficient process, typically done by joining staging and production tables.
CDC delivers the changes to records and attributes right out of the box, reducing SCD processing time and enabling it to run more often, thus improving timeliness and accuracy.
CDC reduces costs and improves data timeliness, quality, and consistency.
Data Replication for BI
As reporting and BI become more pervasive in supporting daily operations, more users require access to timely information from their production systems. A common solution is to offload production data to a secondary database that is then used by operational reporting applications.
CDC enables the replication of changes made to production tables with low latency and low impact on the source databases, and can be used with existing ETL tools to avoid the need to purchase expensive replication software.
Master Data Management
Key objectives of any MDM initiative are to improve and ensure the quality and consistency of master data, whether stored in a single repository or distributed across many. This requires timely responses to master data changes.
CDC makes it possible to capture and process master data changes efficiently and quickly, so quality and consistency can be ensured.
Improving data quality in source systems has become a common requirement, typically implemented by periodically scanning the data. By capturing and processing only changes, CDC enables ETL jobs (used to clean data) to run more efficiently and frequently. As a result, errors can be corrected faster, improving decisions and operations.
These use cases, and others that CDC enables, demonstrate the strategic nature of this technology. Mature solutions exist today, including independent CDC software that is not limited to a specific vendor and can support many tools, enabling one-time capture and use anywhere. For all these reasons and more, it’s time to “just CDC it!”
For a free white paper on this topic from Attunity, click here and choose the title “Efficient and Real-Time Data Integration with Change Data Capture.” For more free white papers, click here.