Data Management Defined
To help you make your way
through the many powerful case
studies and “lessons from the
experts” articles in What Works
in Data Management, we have
arranged them into specific categories:
data integration, data
management, data modeling,
data quality, and data warehouse
architecture. What do these terms
mean, and how do they apply to
your organization?
Data Integration
Data integration (DI) is a family of techniques
and best practices that repurpose data by
transforming it as it’s moved. ETL (extract,
transform, and load) is the most common
form of DI found in data warehousing.
There are other techniques, including
data federation, database replication, data
synchronization, and so on. Solutions based
on these techniques may be hand coded,
based on a vendor’s tool, or a mix of both.
DI breaks into two broad practice areas.
Analytic DI supports business intelligence
(BI) and data warehousing (DW), and
operational DI is applied outside
BI/DW to the migration, consolidation, and
synchronization of operational databases, as
well as in exchanging data in a business-to-business
context.
Data Management
Data management (DM) and information
management, a synonym, are broad terms
that encompass several data-oriented
technical disciplines, such as data integration,
data quality, master data management,
data architecture, database administration,
metadata management, and so on. DM
may also include practices that rely heavily
on DM, such as business intelligence, data
warehousing, and data governance. By
extension, enterprise data management
(EDM) is a high-level practice that seeks
to coordinate DM disciplines, align them
with business-oriented goals, and give
them consistency and quality through
shared data standards and policies for data
usage. Synonyms for EDM include unified
data management (UDM) and enterprise
information management (EIM).
Data Modeling
Data modeling is the process of creating
a data model for an information system.
Professional data modelers work closely with
business stakeholders, as well as potential
users of the information system. There
are three common types of data models
produced during a data modeling process.
The data requirements are initially recorded
as a conceptual data model, which is
essentially a set of technology-independent
specifications about the data; it’s used to
discuss initial requirements with the business
stakeholders. The conceptual model is then
translated into a logical data model, which
documents structures of the data that can be
implemented in databases. Implementation
of one conceptual data model may require
multiple logical data models. The last step
transforms the logical data model into a
physical data model that organizes the
data into tables, plus accounts for access,
performance, and storage details. Data
modeling defines not just data elements,
but their structures and the relationships
between them.
Data Quality
Data quality is a complex concept that
encompasses many data management
techniques and business quality practices,
applied repeatedly over time as the state of
quality evolves, to achieve levels of quality
that vary per data type and seldom aspire
to perfection. The most common technique
is name-and-address cleansing, whereas
the least common is the internationalization
of data for quality purposes. Between these
two extremes are numerous data quality
techniques, including data standardization,
verification, profiling, monitoring, matching,
merging, householding, geocoding, postal
standards, enrichment, and so on.
Data Warehouse Architecture
In general, a data warehouse’s platform
expresses architecture. A data warehouse
platform’s architecture consists of one or
more hardware servers, an operating system,
a database management system (DBMS),
and data storage. These communicate
via a LAN or WAN, although a multi-node
data warehouse platform may have its
own specialized network. There are many
possible combinations of these platform
components, hence many possible
architectures. Note that a data warehouse
platform manages a data warehouse,
defined as a collection of metadata, schema,
and physical data. Whereas a platform has
an architecture, a warehouse’s database has
one or more data models.
The architecture of single platforms aside,
most DW environments today consist of
several separate platforms. The patterns
made by the number and combination
of warehouse platform types constitute a
kind of meta-architecture for multi-system
environments. For example, a hub-and-spoke
architecture may result from a warehouse
at the hub that has interfaces to related
data marts, operational data stores, analytic
platforms, and data staging areas—which, in
turn, may have their own architectures. Finally,
a warehouse architecture has subsume
computing architectures (MPP, SMP),
resource architectures (shared nothing), and
storage architectures (row-based or columnar).