RESEARCH & RESOURCES

Making the Case for Enterprise Extract Transform and Load

Because of several outside drivers, organizations could soon face a come-to-enterprise-ETL moment—whether they want to or not

Last week, we examined the surprising abundance of ad hoc ETL solutions that still accomplish much of the day-to-day data integration legwork in many enterprises. There are a number of reasons why organizations typically embrace ad hoc approaches to ETL (e.g., scripting, programmatic SQL) over bona-fide enterprise ETL tools—not the least of which is pricing. 

Because of several outside drivers, however, organizations could soon face a come-to-enterprise-ETL moment—whether they want to or not.

For starters, experts say, it’s no longer a batch world. ETL itself isn’t a once-, twice-, or three-times-a-day proposition, and (more importantly) data integration involves a lot more than simply extracting information from one source, reformatting it in some fashion, and loading it into another data source. To be sure, ETL is still synonymous with data integration—but that’s starting to change, too.

“ETL was—and still—is the biggest data-integration technology that there is,” agrees Tony Fisher, president and general manager of data quality specialist DataFlux. “But the growth area in data integration is not ETL. It is things like CDI [customer data integration] and PDI [product data integration], where you’re taking information from a lot of sources, creating a reference master, and also creating the linkage mechanisms out to all of the other sources.”

One upshot of this, Fisher says, is a trend toward real-time (or near-real-time) access to data sources. “The real trend is away from batch-oriented data movement toward much more real-time and on-demand, so you can integrate data into your operational systems, not just your analytic systems.”

This is an issue on which ETL vendors, not surprisingly, have long been ahead of the curve. As early as 2001, for example, Informatica Corp. started pushing its flagship ETL offering, PowerCenter, for use in real-time or near-real-time integration scenarios. Ascential Software Corp. followed suit, introducing a suite of real-time integration services for its own DataStage tool. Four years on, it appears as if a market for such solutions has started to materialize.

What kinds of applications are driving the demand for real-time access to operational data? Regulatory compliance, for one. Sections 404 and 409 of the Sarbanes-Oxley Act of 2002 encouraged CEOs and CFOs to demand more timely access to financial data, along with greater visibility into their business processes. More generally, SOX and other compliance measures simply demand more traceability and accountability from IT.

This has ramifications for companies using low-tech approaches to ETL, EAI, EII and other integration disciplines: While it’s possible to custom code logging, auditing, and failover capabilities for a script-driven ETL batch process, for example, it’s probably prohibitively expensive to do so. And yet you have a responsibility under SOX, the Health Information Privacy and Accountability Act (HIPAA), or some other regulatory measure to ensure that the batch operation which extracts data from your mainframe system, transforms it (i.e., merges fields and concatenates names), and loads it in your data warehouse executes reliably, doesn’t expose the data in question to unauthorized personnel, protects the integrity of that data, and can be effectively monitored.

Enterprise ETL tools from Informatica, Ascential, SAS Institute Inc., and other vendors promise to handle all of that for you. In many cases, these tools can work with the script resources you already have to add reliability, security, and auditing features. Of course, ETL vendors aren’t the only ones singing this tune.

“By running the file transfer over [WebSphere] MQ, … you can get the advantage of the assured delivery, you can take advantage of the logging, and also you can take advantage of all of the configuration and monitoring tools that you get as standard with MQ,” says Leif Davidsen, a product manager for IBM’s WebSphere MQ EAI solution. Like ad hoc ETL, Davidsen says ad hoc EAI (e.g., script-driven FTP transfer) is a fact of life in many companies. “We believe this is attractive enough [as a selling point], but with so much of the concern over compliance, it is something customers should really respond to.”

Enterprise data integration proponents point to a slew of other drivers, including the inevitability of merger-and-acquisition-type scenarios, RFID, application and platform consolidation efforts, and even outsourcing.

“If you’re trying to consolidate multiple instances of SAP down to one or two, you need to ensure data consistency across those instances, so synchronization of the data and making sure you get the data consistent is also critical,” argues Arvind Parthasarathi, director of horizontal solutions with Informatica.

On the outsourcing front, Parthasarathi says, customers must frequently transfer large amounts of data to service providers, a practice that requires support for heterogeneous connectivity as well as security (i.e., encryption) over the wire—in other words, not something you’d want to script using FTP. “And in the case of [a merger or acquisition], suppose a customer standardizes on a single platform for everything. They think they’re all set, and then they’re acquired, or they merge with another company, and this other company is standardized on a different [platform]. What are they going to do then?”

And then there’s IBM’s billion-dollar acquisition of Ascential, which competitors such as Informatica spin as a validation of the importance—indeed, of the emerging centrality—of enterprise data integration.

The Commoditization of Data Integration?

ETL stalwarts such as Ascential and Informatica have long talked up the commoditization of ETL, which they say places additional emphasis on what they do best—that is, full-blown data integration, between and among application- or database-specific ETL tools, in addition to a host of other data sources. iit’s true that ETL has to a large extent become commoditized: Microsoft, for example, first introduced an ETL capability (Data Transformation Services, or DTS) in its SQL Server 7.0 database seven years ago; Oracle followed suit shortly thereafter with its Warehouse Builder product.

On top of this, of course, Business Objects, Cognos, and SAS Institute Inc. all market branded ETL products of their own.

But if the path of the ETL market has trended increasingly toward commoditization, there’s a chance the data integration space is on a similar trajectory. Microsoft, after all, is prepping a significant update to DTS in its forthcoming SQL Server 2005 database. In fact, Microsoft changed the name of SQL Server’s ETL facility to “Integration Services” to reflect that product’s revamped focus. The reason, says Microsoft’s Rizzo, is that the revamped SQL Server data integration facility does a lot more than just ETL.

“We built it from the ground up to compete with the Ascentials and the Informaticas of the world. So I think you’re seeing a lot of consolidations in that space because I think there’s a lot of realization from the other vendors that we’re coming hard and heavy,” he suggests. “We’re definitely leading the forefront of developer productivity. But in the business intelligence space, a lot of the things we’re doing with Reporting Services and Integration Services are industry-leading stuff.”

And then there’s Oracle, which traditionally hasn’t touted its own Warehouse Builder ETL solution as a tool for accessing data in non-Oracle environments. One prominent analyst, who spoke on condition of anonymity, says this may change.

“Oracle has been quiet about it, but in one of the briefings I had, they swore to me, and it seems to be somewhat general knowledge, that they do plan to let Warehouse Builder populate other databases,” he says. “When they came out with the last Warehouse Builder as part of 10g, they never really spelled that out. It’s almost like common knowledge that isn’t public. But supposedly in a forthcoming release, I think they called it Paris, they will support this.”

About the Author


Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at [email protected].

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.