Taming Cloud Data Integration Complexity
These three steps can help your organization rein in its fractured facts and manage data across multiple systems.
- By David Torre
- April 20, 2017
As business applications continue their steady exodus to the cloud, enterprise data is following a precarious trajectory of erosion, one in which information becomes increasingly fragmented as the number of cloud systems grows. The need for cohesive, comprehensive cloud data integration is more prominent than ever.
From One Source to Many Systems
Enterprises must understand that today's data elements don't have only one "source of truth" and "system of reference," referring respectively to where an object was created and where it was copied.
In simpler times, systems were fewer in number and data was often authored in one system exclusively. Fast forward to today and such paradigms feel downright medieval. The days of one system owning an entire business object are long gone, and the holistic records of yesterday have given way to the fractured facts we see today.
Modern business objects are now subdivided into tiny pieces; dissimilar systems own and author micro subsets of the original logical data object. A common example is a logical employee record being cut up into fragments, in which:
- The global HR system edits the employee's legal name and salary details
- The company's cloud productivity suite assigns and maintains the user's email address and telephone number
- Yet another cloud platform manages the employee's performance metrics
Eventually all these systems must come together so their record fragments can be joined and each system can accurately depict the holistic employee record at all times.
Whether we label today's SaaS adoption as the result of shadow IT, technology decentralization, or business empowerment, the result is unequivocally the same: cloud is king and we can only expect increased cloud adoption in the coming years. Data professionals must, therefore, embrace data disjointedness (and not just for business intelligence reporting). Cross-pollination of data fragments among the various members of the cloud ecosystem must occur in a seamless, timely fashion if an enterprise is to maintain operational data integrity.
Stitching together subsets of a business object across a motley patchwork of SaaS solutions brings modern challenges to the table, one of which is timeliness. Unlike the batch processes of yesterday, operational cloud data flows often occur in real time. If HR terminates an employee, that new representation of the employee needs to be updated in all relevant systems as soon as possible.
Another challenge relates to the use of third-party APIs. Gone are the days of unfettered database access to on-premises systems. Data now remains at arm's length, reachable only via APIs that operate at varying levels of maturity.
Three Building Blocks for Success
These challenges may sound daunting, but the good news is that many aspects of cloud data movement are within our control. With the right building blocks, modern solutions can quickly extract, manipulate, and disseminate data with surgical precision. Although there are many options for addressing cloud integration complexity, I find myself embracing three architecture building blocks each time I tackle this problem: canonical data models, building real-time data flows, and employing intermediate data stores.
Building Block #1: Canonical Data Models
The topic of canonical data models is well documented, yet melding data objects from dissimilar cloud systems requires an overarching nomenclature -- a vernacular that references objects by their business purpose instead of their system of origin.
For instance, if we want to analyze employee retention, we naturally yearn to examine a universal employee object. We don't want a Frankenstein-esque view of multiple HR systems, especially one hastily cobbled together. Essentially the canonical data model provides a business-friendly interface for what would otherwise be a confusing alphabet soup of system-specific data fields.
Building Block #2: Wrangling Real-Time Data
Designing for real-time data requires a shift in how we model data flows. Systems no longer sit around waiting for an extract-transform-load (ETL) engine to shuffle data to and fro. Instead, systems take the initiative by pushing data to APIs, typically as a result of triggers or events.
After retrofitting systems with the ability to push or pull data from API endpoints at will, hiccups inevitably occur. Endpoints go down for a variety of reasons and real-time events (such as an employee termination) can be lost due to transient technical issues. Whether you implement a store-and-forward queue or something else entirely, a "retry capability" is a must-have.
Building Block #3: Intermediate Data Stores
Like information-age surgeons, we're expected to reassemble data by piecing together various extremities into a unified whole. Real-time APIs represent our emergency room, and canonical data models, our sutures that stitch everything together. Another integral component in the data doctor's repertoire is, of course, the operating table: the area where we stage constituent parts and perform surgical operations. This is where intermediate data stores come into play.
Braiding two or more streams of data into a new, composite flow may be performed entirely in memory, but that's difficult at best, especially when intermediate steps (such as aggregations) must occur. I find that intermediate stores, such as traditional operational data stores (ODS), can kill three birds with one stone.
First, you have the convenience of staging data within a nonvolatile container that likely utilizes a familiar database access paradigm. Second, the ODS can serve double duty as an operational reporting data reservoir. Finally, the ODS can be used as a staging area or "launch pad" into the traditionally subject-oriented and historically focused enterprise data warehouse, assuming one is employed within your organization.
Unlike surgery, cloud integration need not be painful. Understanding the challenges of cloud integration is the first step in mapping out the process, and leveraging these three building blocks can smooth out your learning curve.
For Further Reading:
David Torre is the owner of Center Mast. With nearly 20 years of experience and advanced degrees in information systems and business intelligence, David's unique combination of skills has enabled him to deliver cybersecurity and business intelligence solutions to some of most well-known companies in the world. You can contact the author at [email protected].