RESEARCH & RESOURCES

Data Management Defined

To help you make your way through the many powerful case studies and “lessons from the experts” articles in What Works in Data Management, we have arranged them into specific categories: data integration, data management, data modeling, data quality, and data warehouse architecture. What do these terms mean, and how do they apply to your organization?

Data Integration

Data integration (DI) is a family of techniques and best practices that repurpose data by transforming it as it’s moved. ETL (extract, transform, and load) is the most common form of DI found in data warehousing. There are other techniques, including data federation, database replication, data synchronization, and so on. Solutions based on these techniques may be hand coded, based on a vendor’s tool, or a mix of both. DI breaks into two broad practice areas. Analytic DI supports business intelligence (BI) and data warehousing (DW), and operational DI is applied outside BI/DW to the migration, consolidation, and synchronization of operational databases, as well as in exchanging data in a business-to-business context.

Data Management

Data management (DM) and information management, a synonym, are broad terms that encompass several data-oriented technical disciplines, such as data integration, data quality, master data management, data architecture, database administration, metadata management, and so on. DM may also include practices that rely heavily on DM, such as business intelligence, data warehousing, and data governance. By extension, enterprise data management (EDM) is a high-level practice that seeks to coordinate DM disciplines, align them with business-oriented goals, and give them consistency and quality through shared data standards and policies for data usage. Synonyms for EDM include unified data management (UDM) and enterprise information management (EIM).

Data Modeling

Data modeling is the process of creating a data model for an information system. Professional data modelers work closely with business stakeholders, as well as potential users of the information system. There are three common types of data models produced during a data modeling process. The data requirements are initially recorded as a conceptual data model, which is essentially a set of technology-independent specifications about the data; it’s used to discuss initial requirements with the business stakeholders. The conceptual model is then translated into a logical data model, which documents structures of the data that can be implemented in databases. Implementation of one conceptual data model may require multiple logical data models. The last step transforms the logical data model into a physical data model that organizes the data into tables, plus accounts for access, performance, and storage details. Data modeling defines not just data elements, but their structures and the relationships between them.

Data Quality

Data quality is a complex concept that encompasses many data management techniques and business quality practices, applied repeatedly over time as the state of quality evolves, to achieve levels of quality that vary per data type and seldom aspire to perfection. The most common technique is name-and-address cleansing, whereas the least common is the internationalization of data for quality purposes. Between these two extremes are numerous data quality techniques, including data standardization, verification, profiling, monitoring, matching, merging, householding, geocoding, postal standards, enrichment, and so on.

Data Warehouse Architecture

In general, a data warehouse’s platform expresses architecture. A data warehouse platform’s architecture consists of one or more hardware servers, an operating system, a database management system (DBMS), and data storage. These communicate via a LAN or WAN, although a multi-node data warehouse platform may have its own specialized network. There are many possible combinations of these platform components, hence many possible architectures. Note that a data warehouse platform manages a data warehouse, defined as a collection of metadata, schema, and physical data. Whereas a platform has an architecture, a warehouse’s database has one or more data models.

The architecture of single platforms aside, most DW environments today consist of several separate platforms. The patterns made by the number and combination of warehouse platform types constitute a kind of meta-architecture for multi-system environments. For example, a hub-and-spoke architecture may result from a warehouse at the hub that has interfaces to related data marts, operational data stores, analytic platforms, and data staging areas—which, in turn, may have their own architectures. Finally, a warehouse architecture has subsume computing architectures (MPP, SMP), resource architectures (shared nothing), and storage architectures (row-based or columnar).

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.