How Replatforming to a Cloud Data Warehouse Could End Your Career
Examining three replatforming misconceptions and learning how to overcome them may save you from a resume-generating event.
The pandemic has been a wake-up call of enormous proportions for many IT departments. To compete effectively in a post-COVID-19 economy, companies realized they needed to embrace the cloud, urgently. How they lay the foundation in the cloud will matter for many years to come. As always, data is at the heart of the matter.
The most critical first step is to devise and implement a replatforming strategy from the on-premises enterprise data warehouse (EDW) to a counterpart in the cloud including every department connected to the EDW and every major revenue source that depends on it. Its importance cannot possibly be overstated.
If IT executes well on the replatforming strategy, they will accelerate revenue, reduce cost, and ensure they achieve compliance. If they execute poorly, replatforming may turn into a quagmire -- and end careers. Learning from others' mistakes in this space offers a few vital lessons. Let's examine a few mistakes often made by IT leaders and how to overcome each one of them.
Mistake #1: Not Knowing What's in Your EDW
Few enterprise IT executives have sufficient clarity when it comes to the operations of their on-premises EDW. Over the years, more business users started tapping the EDW for its incredibly valuable data, so it's no surprise the system has become home to a myriad of workloads. Some are well-designed although others reflect the ad hoc nature of the projects from which they originated.
Many service providers offer assessments that promise to help. However, most assessments stop at the number of statements run each day and the total volume of data on disk. This rather primitive level of insight does not help much when preparing for a migration. If anything, it may provide a false sense of security up-front and lead to disaster down the road.
A successful replatforming strategy needs to take many inputs into account. Among others, they include statistics about the use of specific features, the complexity of statements, and data dependencies. These parameters will be critical to prioritize work. In some cases, they may identify workload patterns that will pose challenges for the destination data warehouse.
To avoid flying blind, enterprises must get a full workup of their EDW and have their vendor produce detailed insights that translate directly into a comprehensive project plan.
Mistake #2: Going Full Monty and Rewriting Everything
Replatforming is all too often viewed as an opportunity to clean house. Combining migration with modernization can be tempting. Especially for decision-makers who are unfamiliar with the history of the EDW, a Big Bang approach may be seen as a strategic career move. However, it may be one they come to regret.
It is not only the ego of an IT leader that is to blame. Database vendors are equally guilty. They often push the notion that large-scale rewriting is both easy and desirable. The notion that rewrites are always better is stubbornly held. Quite often, this is an excuse to gloss over the lack of a specific feature on the destination system.
As tempting as it is, the clean slate approach is almost always a bad idea. Combining migration with modernization is probably the number one reason projects fail. Overly ambitious business users pile on to an already oversubscribed IT team. Ultimately, a project balloons in scope until it finally collapses under its own weight.
Instead of boiling the ocean, separate the migration from modernization. Migrate first and start tapping into the benefits of the cloud immediately. Then, and only then, start looking at modernizing.
Mistake #3: Misunderstanding What's Needed
Not surprisingly, few IT organizations have ever replatformed their EDW. At most, they moved individual workloads or rewrote select applications. Almost always, queries have been customized across both ETL workflows and analytics. In a complete replatforming, a staggeringly large body of SQL needs to be reworked.
Many tools have emerged recently to address this problem. They promise to automatically convert SQL from existing applications and generate syntax for the new destination data warehouse. In practice, syntax conversion is confined to success rates of below 75 percent. It erroneously assumes that for every SQL term used on the original system there is an equivalent one in the destination system.
Although 75 percent may sound great at first, it is of surprisingly limited impact. Replatforming is an 80/20 problem if there's ever been one. That is to say, solving even 80 percent of the problem reduces the overall cost and effort by only 20 percent. One can see quickly why syntax conversion makes great headway at the beginning yet can never finish the job.
Avoid this time-delayed blow-up by having your vendor demonstrate they can truly deal with 95 percent or more of the workload. More important, make sure they can demonstrate this in a proof-of-concept or pilot implementation in the course of just a few weeks.
Get Smart Before You Get Going
Replatforming your EDW to the cloud is every IT leader's biggest challenge. To be successful, your approach needs to incorporate three things:
- Get clarity from the beginning. A vendor that cannot provide the insight and a detailed plan taking into account intrinsic details of the workloads is probably not qualified.
- Pick your battles and your workloads. Do not succumb to the temptation to reinvent the business when resources can be used more favorably.
- Insist on a timely pilot implementation of a representative workload including ETL. Do not accept vague projections and hearsay of successful replatforming initiatives.
In our experience, any major data warehouse can be replatformed to the cloud in 12-18 months. There is no need for projects that take 3 to 5 years. Remember, there are no shortcuts to complete this kind of work in just a couple of months. With the guidelines above, IT leaders can accomplish a move within a reasonable timeframe while controlling the risk.
Mike Waas is the founder and CEO of Datometry, a SaaS database virtualization platform enabling existing applications to run natively on modern cloud data management systems without being rewritten. Mike has held senior engineering positions at Microsoft, Amazon, EMC, and Pivotal and is the architect of Greenplum’s ORCA query optimizer. He has authored over 40 peer-reviewed scientific publications in various areas of database research and holds over 50 patents. You can reach the author via LinkedIn.