Executive Q&A: Data Management and the Cloud
Moving data to the cloud poses several challenges. Datometry CEO Mike Waas explains how to make the move smoother.
- By Upside Staff
- February 16, 2022
Moving data (and databases) to the cloud can be stressful. Datometry’s CEO, Mike Waas, explains what you need to think about, explains the pros and cons of three different approaches, and offers best practices to make the move less taxing.
Upside: There’s been an increasing movement of data management from on-premises environments to the cloud. What do you see driving your customers to make the move? What benefits are they expecting?
Mike Waas: We see primarily two driving forces. First, many companies are executing on a C-level mandate to become cloud-native, often on an accelerated timeline. In this case, moving all data management to the cloud is part of a larger initiative. Second, IT relocates data assets into the cloud independently to overcome limitations of on-premises systems and lower their operational expenses.
Underlying either path is a careful cost-benefit analysis that has to justify what is often the single most complex transformation project within the organization.
Not surprisingly, the primary benefit is the anticipation of drastic savings. Compared to an on-premises data warehouse appliance, cloud-native systems can achieve a cost reduction of up to 75 percent. The savings are a combination of lower fees and a reduction in personnel because managed cloud systems liberate enterprises from having to staff large support teams.
A close second when it comes to benefits is the prospect of generating new revenue. In the cloud, companies expect to be able to tap into a bevy of integrated processing capabilities, ranging from flexibility and rapid scaling to complete vertical-specific solutions.
What are the biggest problems when adopting a new data management strategy in the cloud?
Understanding which type of cloud database is the right fit is often the biggest challenge. It’s helpful to think of cloud-native databases as being in one of two categories: platform-native systems (i.e., offerings by cloud providers themselves) or in-cloud systems offered by third-party vendors.
Platform-native solutions include Azure Synapse, BigQuery, and Redshift. They offer deep integration with the provider’s cloud. Because they are highly optimized for their target infrastructure, they offer seamless and immediate interoperability with other native services.
Platform-native systems are a great choice for enterprises that want to go all-in on a given cloud and are looking for simplicity of deployment and interoperability. In addition, these systems offer the considerable advantage of having to deal with a single vendor only.
In contrast, in-cloud systems tout cloud independence. This seems like a great advantage at first. However, moving hundreds of terabytes between clouds has its own challenges. In addition, customers inevitably end up using other platform-native services that are only available on a given cloud, which further reduces the perceived advantage of cloud independence.
However, in-cloud systems are a great choice if an enterprise needs specific features not available on other systems. Additionally, in-cloud systems are less sensitive to cost. Because these systems have to pass the cost of the underlying cloud infrastructure on to customers in full, they have less pricing flexibility than their platform-native counterparts.
What approaches have enterprises used to move their databases? Which approaches are more successful?
There are three major approaches to move a database: conventional database migration, lift-and-shift, and virtualization. They all come with their advantages and disadvantages.
1. Database migrations: high cost and high failure rates?
The classical approach of migration involves replacing all drivers and connectors and subsequently rewriting all SQL across the enterprise. Most IT leaders are horrified when they learn just how pervasive a given SQL dialect has become to their enterprise. This affects not only custom developments but also third-party applications despite best efforts to keep them “clean.”
Database migrations of large systems are daunting. Enterprises have often amassed a diverse range of data sources and applications over the years, which complicates the process substantially. For a mid-range data warehouse appliance, a migration is typically budgeted at $20 million to $30 million and expected to take up to five years. Unfortunately, the majority of those projects fail due to their complexity.
Surprisingly, this approach can actually be perfect for some systems. Data marts or small data warehouses which host only a few applications can dodge most of these complications. In particular, department-specific systems may be great candidates for conventional migrations.
2. Lift-and-shift: quick but incomplete?
Moving from an on-premises system to the in-cloud system of the same vendor is a straightforward solution. The compatibility between the on-premises and the in-cloud version means many of the painful rewrites of the manual migration can be avoided. Some system integrators even specialize in this move.
However, as with most things that are easy, the results often leave much to be desired. When you’re done, you’re still on that database, you’re still overpaying, and you still don’t integrate better with the cloud. In general, this approach simply defers the problem and should be considered a stopgap measure at best.
However, there are clear use cases. Lift-and-shift can be the right solution when the enterprise needs to vacate a data center on an accelerated timeline. Another use case is when IT leaders do not have the support of their executives for true transformation.
3. Database virtualization: best of both worlds?
With database virtualization (not data virtualization), enterprises get to move their existing applications into a different database -- without rewriting SQL or changing APIs. Database virtualization inserts an abstraction layer between application and database. This eliminates the incumbent’s vendor lock-in. [Editor’s note: the speaker works for a company that specializes in database virtualization.]
The primary use case for database virtualization is adopting a cloud data warehouse rapidly when rewriting a complex application environment would be too costly and too risky. Database virtualization makes enterprises cloud-native at roughly 10 percent of cost and time of a conventional migration. Because database virtualization does not require the rewriting of applications, it prevents new vendor lock-in which in turn achieves cloud independence.
To be fair, virtualization requires the destination system to be of similar performance and scalability as the original system. For systems where this isn’t the case, building a bespoke solution based on a conventional rewrite approach can gloss over these deficiencies.
What are some of the problems enterprises experience when making the move? What haven’t they thought of?
The biggest challenges are around planning and driving the execution. Most IT leaders are doing this for the first time. Therefore, plans might be based on unrealistic estimates. The effort to rewrite applications is almost always underestimated by a wide margin.
The next problem is usually the transfer of the content of the database. The data pipelines need to take into account many different situations along the way. Surprisingly, the egress from the existing system is frequently the single most severe bottleneck.
Once the project is underway, line-of-business users will come forward with lists of modifications they’d like to see. They sense this project is a once-in-a-lifetime opportunity to make these changes. However, giving in to these demands means the project may grow exponentially in scope, which in turn puts it at risk for failure.
Finally, every major IT project is an endurance sport. This one tests everybody’s mettle. Because these projects have high visibility, not only within the organization but in many cases industry-wide, the team members will find themselves under constant siege from vendors, including the incumbent. This calls for strong leadership to ensure your team stays laser-focused.
What best practices can you suggest making the move as seamless as possible?
In addition to the technical decisions I’ve mentioned, IT leaders need to remind themselves that people and processes are the most critical elements of any migration project. Making staff feel comfortable -- and excited about the direction -- ensures the project will gather momentum. We found that conducting frequent show-and-tell sessions lets leaders demonstrate progress and accountability continuously.
Finally, constantly reminding everybody of the why behind the project helps maintain momentum. This goes beyond economic benefits and cost savings. Project leaders can expect a successful completion of the project not only to raise their enterprise’s competitive posture but also to elevate their status as trailblazers.
What makes Datometry unique?
Datometry pioneered the concept of database virtualization in the data warehousing space. Using Datometry, leading enterprises have become cloud-native in record time while saving considerable funds as well as time compared to conventional rewrite-based migrations.
Datometry is constantly expanding the range of database systems it supports. To date, Datometry supports many of the most commonly used databases, including Teradata, Oracle, Azure Synapse, Amazon Redshift, Google BigQuery, SQL Server, and PostgreSQL.
Based on the core principle of making databases and applications interoperable, Datometry liberates enterprises from vendor lock-in and lets IT leaders accomplish true cloud-independence.
[Editor’s Note: Mike Waas is the founder and CEO of [https://datometry.com/] Datometry, a SaaS database virtualization platform enabling existing applications to run natively on modern cloud data management systems without being rewritten. Mike has held senior engineering positions at Microsoft, Amazon, EMC, and Pivotal and is the architect of Greenplum’s ORCA query optimizer. He has authored over 40 peer-reviewed scientific publications in various areas of database research and holds over 50 patents. You can reach the author via LinkedIn.]