Bring Your Data Back to Earth: The Cloud's Impact on Data Quality

As growth of cloud-based applications outpaces growth of on-premise apps, enterprises must address how best to integrate data between the cloud and on-premise storage.

By Konstantin Polukhin

Data is the lifeblood that enables organizations to make accurate strategic decisions and ensure the services they provide remain relevant. If data quality deteriorates, an organization's entire business processes will suffer, so enterprises place a premium on ensuring that all their data are as timely, accurate, and consistent as possible.

The growing relevance of the cloud adds a further dimension to managing data quality --and yet, astoundingly few organizations are taking this into account. According to Ventana Research study, only 15 percent of organizations have completed a quality initiative for their cloud data. That number drops to 5 percent for master data management. It is no surprise, then, that only 21 percent of organizations fully trust their cloud data, rising to 46 percent for data from on-premise applications.

That's not to say cloud applications pose an immediate danger to data quality. It is in moving the data from cloud applications back on-premise that the problems begin to occur; namely, when companies are unable to extend their painstakingly developed data quality management processes to the data that is produced by the cloud application and has to be integrated back into the on-premise data store.

The reason for this is that more often than not, cloud applications are provided by companies whose business models are based on providing functionality and ease of use, more so than quality control. The content, after all, is the customer's concern -- or so goes their thinking. Therefore, it's not realistic to expect the application provider to control the quality of the information stored in its database.

Obviously, cloud application providers offer service- level agreements (SLAs) that outline their data management practices to mollify general concerns such as data recovery issues. However, the reality remains that when surrendering data to the cloud, the information's owner is also surrendering data oversight and maintenance in exchange for flexibility and elasticity.

Data integration is also an issue. Consider, for instance, a hypothetical bank that already collects and manages large amounts of customer data, and has, most likely, made considerable investments in building a reliable master database and ensuring its data quality. What would happen, however, if the bank introduced, for example, a cloud-based campaign management and execution platform to automate and enhance its direct marketing? Simply creating a purpose-built database for such a highly involved function is a serious project in and of itself. Furthermore, maintaining the quality of the in-house data will now require ongoing, rigorous integration with the cloud to keep the structure and the unique identifiers of the core database intact.

Essentially, the bank would be faced with significant data duplication, serious integration overhead, and related data quality risks, not to mention the extra work. The dilemma remains: the cloud application provides significant business value but severely complicates data management. Considering this, wouldn't the ideal scenario be for the bank to have the option of keeping the master data store on-premise, where it is governed by its internal data quality and management policies, rather than have data duplicated in the cloud? For instance, the bank could rework its computing process so as to retain control of the data, enabling the application to "borrow" the relevant data for processing.

Such situations are at the heart of the data management/cloud conundrum. For the cloud to "borrow" only the necessary data requires a rules-based mechanism for mapping the communications between cloud application and the data storage, automating and extending quality management practices to data leveraged by the cloud. One may think of it as a "cloud-to-earth" connector that implements reliable communication across an unstable network with comprehensive mapping between the cloud-based business and the on-premise data storage, combined with a queuing mechanism on both ends. A separate data access layer would work with the "cloud-to-earth" mapping process producing a logical representation of the physical data structures.

This approach is obviously not without its trade-offs -- some automation, speed, and cost savings may be lost in exchange for data quality. However, it would also include an element of "having your cake and eating it too." Your software can reside in the cloud --reaping all the benefits of running virtual applications -- and the data is treated as though it resides there, even though it is on-premise with only the necessary information shuttled to and from the cloud.

Analysts predict that over the next few years, cloud­based applications will grow faster than on­premise applications. It is, therefore, vitally important for companies to address the gap between needs and capabilities of integrating data between the cloud and on-premise. Those who succeed at this will avoid potentially costly errors stemming from inconsistencies or inefficiencies resulting from reconciling different sets of data.

Konstantin Polukhin is a managing partner at First Line Software, where he focuses on architecture, implementation, and integration of cloud-based applications. You can contact the author at

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.