4 Data Management Best Practices for Cloud Computing
Data originating on cloud or migrating to cloud demands best practices just as other valuable data assets do.
- By Philip Russom
- March 6, 2017
An increasing number of organizations are committing to the cloud as a computing platform, especially for use cases in data management and analytics. For example, the survey for TDWI's recent Emerging Technologies Best Practices Report revealed that many enterprises already have cloud-based solutions for data warehousing (35% of respondents), analytics (31%), sandboxes (29%), data integration (24%), and Hadoop (19%). (See Figure 16 in the report, available at http://tdwi.org/bpreports.)
As more organizations begin their journey to the cloud, they need to plan how they will apply the best practices of data management to ensure that cloud-based, data-driven use cases are successful for end users and comply with enterprise governance and data standards. The good news is that existing best practices work well in cloud environments, although adjustments are usually needed. Here are several examples of data management best practices for cloud computing.
Best Practice #1: Manage data across all platforms, including cloud
This is true whether data exists on premises, in the cloud, or both (as is common in today's multi-platform hybrid data architectures). It is also true whether data migrates to a cloud, originates there, migrates off a cloud, or in some combination of these. Enterprise-scale data and application architectures that involve clouds can be complex, but this is not showstopper. TDWI regularly sees organizations succeed with clouds by extending or augmenting existing teams, skills, governance policies, business sponsorship, data management practices, and data integration infrastructure.
Best Practice #2: Deploy substantial data management infrastructure before journeying to the cloud
In complex scenarios such as those just described, you will need substantial tools and architecture for data integration -- and sometimes application integration, too. This infrastructure is required to regularly migrate and move data among platforms. Put this infrastructure in place before starting your journey to the cloud because retrofitting it later is risky and disruptive.
If you have pre-existing infrastructure for data integration, you may be able to simply extend it to cloud platforms. You should also be open to additional tools that are built and optimized specifically for the kind of cloud and use case you need. As with your on-premises best practices, cloud best practices and tools need to address data quality, metadata, master data, and varying data speeds. Be sure these are baked into your infrastructure and team skills.
Best Practice #3: Give priority to data integration requirements for clouds
As you design and revise data integration solutions, give careful thought to where specific processing should occur -- in the cloud versus on premises. Likewise, you will most likely need to adjust your approach to data landing and staging.
Be sure your data integration toolset supports the interfaces and protocols of popular cloud-based applications and platforms, not just common on-premises enterprise sources. TDWI sees users increasing adopting cloud-based Hadoop, which involves multiple interface points (such as MapReduce, Pig, Hive, HBase, Spark, Drill, and Presto). Similarly, look for support for APIs that are proprietary to the cloud provider you have selected.
Data coming from or going to clouds is trending toward real time, so your data integration tools and data management infrastructure should address multiple "right-time" interfaces, ranging from offline batch and microbatch to real time and on-demand.
For years, TDWI has seen organizations depend on their data integration tools and platforms for broad metadata management, and this trend continues with clouds. Be sure your strategy supports multiple metadata types (technical, business, and operational) that can be accessed by many application and user types. Finally, many clouds are capturing big data and other new data types (IoT and sensor data). Because these tend to be "metadata-poor," look for tools that help you deduce, develop, and inject metadata, perhaps on the fly at read time.
Best Practice #4: Govern data holistically, regardless of the data's platform or location
Organizations with a pre-existing data governance program (or similar program for stewardship or curation) can most likely revise existing policies designed for on-premises data usage, and thereby assure compliance for data that is traveling in and out of clouds. Organizations without such a program should leverage their journey to the cloud as a driver for initiating governance.
TDWI's view is that data governance is a critical success factor for most data initiatives because it avoids the non-compliant use of data, and it aligns data management work with business goals. When governance extends beyond compliance issues to data standards, it also elevates data's quality, usability, and trust. To get consistency and enterprise control across these desirable benefits, data governance should apply to all datasets, whether on premises, in the cloud, or strewn across hybrid architectures.
A Final Word
For more information on this topic, replay a recent Informatica Virtual Summit, in which TDWI's Philip Russom discusses these data management best practices for cloud computing. The recording is available here.
About the Author
Philip Russom is director of TDWI Research for data management and oversees many of TDWI’s research-oriented publications, services, and events. He is a well-known figure in data warehousing and business intelligence, having published over 600 research reports, magazine articles, opinion columns, speeches, Webinars, and more. Before joining TDWI in 2005, Russom was an industry analyst covering BI at Forrester Research and Giga Information Group. He also ran his own business as an independent industry analyst and BI consultant and was a contributing editor with leading IT magazines. Before that, Russom worked in technical and marketing positions for various database vendors. You can reach him at [email protected], @prussom on Twitter, and on LinkedIn at linkedin.com/in/philiprussom.