LESSON - The Importance of Metadata for Extract Transform and Load
By Yves de Montcheuil, Director of Product Marketing, Sunopsis
According to Wikipedia.org, metadata (Greek meta + Latin data), is literally data about data: “information that describes another set of data.” Metadata is arguably one of the cornerstones of data warehousing and all its associated processes, especially the ETL processes that are used to load it.
Picking Your Metadata
Despite its widespread use, the word metadata is not precisely defined. There is no industry standard, and the scope of what is needed is sometimes perceived differently by various people, depending on their requirements. However, there is some common ground. Industry analysts and IT professionals agree that metadata should provide at least some information about the structure of the databases, element names, and the relationship between the different components, including cross-references and impact analysis.
Who Benefits from Metadata?
The benefits of metadata are huge for all those involved in the data warehouse project. For business users, absent or poor-quality metadata means that they have to figure out the data warehouse’s structure and the rules used to load it. Without metadata, this learning curve is steep, and the usage of the data warehouse is restricted. Metadata assists users in understanding the origin of the data: which source systems it came from and which transformations were applied before it was made available in the data warehouse. In this sense, metadata is an essential element of the entire flow, and constitutes an interface between IT developers and business users. Business users are able to drill through metadata, find which rules were developed against production data, and decide to reuse them as templates, instead of trying to create new ones.
The benefits ofmetadata arehuge for all thoseinvolved in the datawarehouse project.
For IT professionals, metadata is the basis for efficient change management. The number one benefit is reaped during the maintenance of the data warehouse. Metadata helps to evaluate the impact of changes to the data structures and the loading processes. Indeed, cross-references help the IT team find out which components are accessing a set of data and then analyze the impact of any structural change to existing data transformation processes. Notifying all users of a change that can impact their processes—and which part of these processes is affected— is also fundamental to avoiding execution problems. Metadata dramatically reduces the maintenance workload of the IT team and assists with a quick and smooth evolution of the data warehouse.
And for all parties involved, proper metadata management is the key to compliance audits and makes it possible to understand data lineage, data flows, how fields are calculated, and to track the proper execution of ETL processes.
Finding the Right Solution
Finding the right solution to implement metadata may be a challenge. Most ETL tools on the market include some level of metadata management. The key issue here is for this metadata management to be at the core of the ETL design and execution—as opposed to being a way to consolidate metadata afterwards. The ETL tool should most notably be able to:
- Examine the structures and access methods of databases and applications of the information system, and store all the descriptions of these structures inside the metadata repository.
- Allow the definition of data mappings and transformations at an abstract level—using business rules, for example—so that these definitions are de-correlated from the physical implementation of data.
- Provide visualization of all metadata through a user-friendly interface.
- Provide easy navigation among the metadata: dependencies, lineage, data flow maps, and cross-references.
With its business-rules-driven approach to ETL and integration design, along with its centralized metadata repository accessible through a Web-based interface, Sunopsis Data Conductor leverages your metadata and greatly improves the productivity and quality of data warehousing.
This article originally appeared in the issue of .