LESSON - The Four Cornerstones of Enterprise Data Integration Performance
By James Markarian, Senior Vice President and Chief Technology Officer, Informatica Corporation
The four basic cornerstones of enterprise data integration performance include: throughput, the rate at which rows or bytes of data can be processed; scalability, the ability to handle increasingly large and complex data integration scenarios; availability, the resilience of the environment; and performance manageability, the most important concept encompassing the ability to manage and maintain performance levels.
Throughput Delivers Fresher Information, Faster
The first cornerstone is throughput, which determines computing power requirements to process a workload. High throughput matters because it results in fresher data for the business, faster response to customers, and increased operational efficiency. But hardware is only part of the throughput equation. Multi-threading and data partitioning enable you to break up processing tasks and spread them across hardware resources. Another option is to capture only changed data and process it in real time.
Scalability Ensures Predictable Growth
Scalability, the second cornerstone, enables you to accurately predict how much computing power will be required as data volumes grow. Software with good scalability allows precise estimation of project windows, flexible configuration, and optimal resource utilization. Scalability prepares you to deal effectively with growth and is a foundation for long-term performance.
Depending on growth expectations, you might want to consider adding capabilities that enhance scalability, such as server grid/MPP architectures and 64-bit support. Both of these support growth without requiring extensive hardware purchases.
Availability Reflects the Organization’s Priorities
The third cornerstone is the heartbeat of a system: availability. If a resource isn’t available in the wake of a component failure, then it has no throughput, scalability, or manageability.
To assess your organization’s need for availability, quantify the costs associated with downtime for each system or application. Sometimes an outage means users are unable to do their jobs, reducing productivity. It can cost revenue if customers are forced to take their business elsewhere. Compare these costs to the cost of providing increased availability.
Of the four cornerstones, manageability deserves the most attention. You can’t consistently achieve good performance if you have little ability to manage or monitor across the environment.
It pays to invest in scheduling tools, which help ensure that your warehouses and operational data stores remain up to date and accurate. Monitoring capabilities are also essential—you want to know about any potential bottlenecks or outages. Developers require tools to work effectively across different environments. Template-driven design, object reuse, and inheritance enable developers to make changes more quickly and easily in a multi-developer environment.
In a recent survey atTDWI’s Winter 2005World Conference,55 percent of thosesurveyed said theyexpected theircompany’s datavolumes to rise morethan 25 percent inthe next 18 months.
Also, the separation of transformation logic and physical execution eliminates any need to make modifications as a result of environment changes. This is useful for migrating and deploying between environments, or providing flexibility in heterogeneous environments so that processing can be done wherever hardware resources are available.
The biggest payoff of performance manageability becomes apparent at runtime. Manageability enables you to process data in batch, on-demand, and real-time modes. You can fine-tune source and target system interaction and handle process requirements such as profiling and cleansing.
Managing Performance for Business Advantage
Hardware and network advances have created the potential for far greater performance than has been possible before. But to get from “potential” to “actual” requires greater throughput, scalability, availability, and manageability than most businesses have built into their environments. These four cornerstones of enterprise data integration performance need to be considered at the outset and reconsidered frequently during the operating life of a system.
This article originally appeared in the issue of .