To Migrate or Not to Migrate Your Data Warehouse
When it comes to migrating a data warehouse, resist the temptation to lift-and-shift and do it right. It might take more time upfront, but it'll save you time in the long run.
- By Stephen Swoyer
- December 11, 2012
Just how easy is to transition from one data warehouse (DW) platform to another? It depends whom you ask.
All of the large relational database vendors have targeted DW migration programs. Ask them about transitioning and you'll likely be told: there's a migration plan for that.
Ask an analytic database vendor, on the other hand, and you might get a different answer: No problem, everything runs faster on an MPP engine!
Ask data warehouse architects, integrators, and even some data integration (DI) vendors the same question, however, and you'll get a less sanguine take on the DW migration progress.
Take veteran data warehouse architect Mark Madsen, for example. Madsen, a principal with information management consultancy Third Nature Inc., says most DW migrations take place in a less than ideal context: they're either performed under duress or (even in the cases of orderly or planned transitions) they still take shortcuts.
Even if the tools are available, customers won't necessarily take advantage of them.
"A lot of people who do a data mart or data warehouse conversion, are usually doing it under duress. It's kind of the last-ditch effort to save the warehouse or the mart, either [because of] performance problems or because [of] some form of cost-performance argument, such as an upgrade to Exadata," he explains.
"When I talk to the people who've done the conversions, probably 80 percent say the same thing: 'We didn't have enough time to retool a lot of the stuff, so we ended up just picking up the whole thing and dropping it onto the new platform.'"
It's the ubiquitous "forklift" approach to transitioning from one DW platform to another.
In some cases, forklift migrations make sense. In the case of a large DW deployment on one platform to a large DW deployment on another platform, however, forklifting is generally a very bad idea, says Madsen. "It just doesn't work very well because you have [for example] a model for Oracle being deployed on Teradata or Netezza," he explains.
"So [the customer] get[s] some problems alleviated -- even the performance problems (they may not be perfect, but they're alleviated [to some extent]) -- but somewhere down the road, they have to refactor everything," Madsen continues. "They say, 'I really wish that we had done that when we moved the data.' Then someone else says, 'But we didn't have the time.'"
The Forklift Approach
Consider a hypothetical Oracle migration project. Last year, both IBM Corp. and Teradata Corp. kicked off aggressive Oracle migration programs that typically bundled products -- e.g., software and services -- which, they said, helped IT complete Oracle migrations faster.
According to Jim Dietz, senior product and business development executive with Teradata, his company offers customers several options -- including dedicated, Oracle-specific tooling -- to help customers accelerate their migrations. "What we've done is introduce some new [migration] technology: Teradata Migration Accelerator. This is a brand new toolset that enables companies ... to be able to convert a lot of the code from an Oracle environment -- [e.g.,] things like PL-SQL over to [Teradata] standard SQL," Dietz told BI This Week during an interview at TDWI's World Conference in Chicago.
Teradata, Dietz explains, has "migrated over 300+ Oracle" data warehouse deployments.
Of these, roughly half were forklifts, he says. "It's probably 50-50. Customers will [either] just take their thing and forklift it over and just run it more efficiently on Teradata. The other 50 percent will take the time right now to just take it and optimize," he says. "Even the 50 percent that do the forklift, they do the forklift first, then they refactor secondly."
IBM, for its part, developed a technology called "Blueprint Director" that supports drag-and-drop data movement capabiliities. Blueprint Director is less a migration tool than a quick way to move data from one environment (typically, a large data warehouse or EDW) to another, smaller environment (e.g., an application-specific or analytic data mart).
IBM has bigger plans for Blueprint Director, however.
"If I wanted to offload my deep analytics from DB2 to Netezza, how would I be able to do that? The codename of this technology is 'Data Bridge,' and it's part of Blueprint Director. The goal is the ability to easily move data from one system to another," said Nancy Kopp-Hensley, program director for PureData and Netezza product marketing and strategy with IBM, in an interview at the TDWI World Conference in Chicago. "We're also looking at sources being Teradata or Oracle, because we know that we're not going to move from IBM to IBM all of the time."
The idea, explained Kopp-Hensley, is to enable intelligent drag-and-drop migrations, at least for subsets of data. In this regard, Blueprint Director will make recommendations as to how to best construct a data model to better "marry the data with the anlaytics."
In most cases, Kopp-Hensley suggested, customers aren't moving an entire data warehouse en bloc. Instead, they're doing so iteratively, spinning out new analytic marts or warehouses in a phased approach to address performance problems. "I go into a lot of customers that have large Teradata or Oracle [systems]. These are not easy to migrate, so what we see is that a lot of customers have started to build out around it," she explains.
The Holy Grail of a data warehouse migration is the ability to optimize or refactor a warehouse on-the-fly: to effectively forklift a data model out of one environment and shift it into another, refactoring it (en route) such that it's optimized for its new home.
On-the-fly lift-and-shift is still a consummation devoutly to be wished. Nevertheless, data warehousing specialists Kalido and WhereScape Inc. claim to enable more rapid DW migration experiences. Both tout two very different, but notionally "agile," takes on the theme.
John Evans, director of product marketing with Kalido, believes that forklifting is almost never a good idea. "Putting something on a high-performance platform, like a [database] appliance-- sure, you end up with a faster-running hairball, but it's still a hairball. The effort isn't put into reconfiguring what's in that warehouse, but the migration [process], even under duress, is an opportunity to do it right. It doesn't have to be [prohibitively] time consuming."
Kalido offers two modeling tools: its Common Warehouse Modeler (CWM) and its Business Information Modeler (BIM). Each tool targets a distinctly different audience: CWM is aimed at data architects, data modelers, and other data heroes; BIM, on the other hand, is a kind of business semantic translation layer for the CWM modeling tool.
In the case of a hypothetical Oracle migration, a customer would first import an Oracle data model -- Kalido is certified by Oracle as an Exadata-optimized platform, Evans points out -- into CWM, which does perform some on-the-fly tweaking or refactoring.
From there, however, the data management team would switch to the BIM to coordinate and collaborate with business stakeholders. "You can suck those [models] down into the Business [Information] Modeler. The other thing that [BIM] will do is to manage all of the names and labels, if you have CUST_ID and all of those keys, so that you maintain your integrity and your technical naming standards," Evans explains. "If you've got all of that stuff in your logical model … we can do the reverse conversion, too: you can sit down with your business people and say here's what your product hierarchy is going to look like."
WhereScape, on the other hand, does offer a dedicated Oracle migration facility.
In fact, WhereScape itself isn't the only one promoting it.
"We've had a lot of Oracle customer wins with Wherescape using their migration tool," Kim Dossey, Teradata's manager of third-party industry consultants, told us at the TDWI World Conference in Chicago.
That said, WhereScape positions its Oracle migration facility as just part of an effective DW transition project. Yes, says CEO Michael Whitehead, RED -- WhereScape's flagship DW development tool -- can automatically move data from an Oracle database and remap it without hand coding. Yes, it can perform certain optimizations on-the-fly. If you wanted, Whitehead concedes, you could use RED to "migrate" an Oracle database in one pass.
However, this would be stupid, says Whitehead. In this scenario, for example, you'd be refactoring and optimizing your existing model. The problem, he argues, is that your existing model is almost certainly flawed. A DW migration gives you a chance to start from scratch; if you have an approach that enables you to quickly and iteratively start from scratch, he argues, you should take advantage of it. "No model that you start with is the model that you're going to end [up] with. For example, Teradata has a logical data model for [the] Telco [vertical]. The reason they have a logical data model and not a physical data model is that you're not going to implement that model; it doesn't survive contact with data. Your model won't, either."
WhereScape emphasizes iterative data warehouse development, says Whitehead: prototype, build, test, tweak, test, tweak again, test, and so on.
"Forklifting is just taking someone's screwed up platform ... and moving it on to a new platform and ending up with yesterday's mistakes," he stresses. "You would never do that if I it were just as quick to rebuild it as it was to move it -- or if it was possible to rebuild it in such a way that it uses the new architectures, techniques, and tools. The advantage we bring is that [with RED], you'll have something you're proud of after you migrate it."