RESEARCH & RESOURCES

Data Quality

Question and Answer: Identity Resolution Technology Corrects Customer Data

How identity resolution technology can help enterprises correct the increasing incident of errors in customer data

One strong indication of the growing importance of a technology called identity resolution is Informatica Corp.'s acquisition in May of Identity Systems, a leading company offering the technology. Identity resolution goes beyond standard data quality processes to help resolve customer data accurately.

Resolving data errors is taking on a growing importance as companies step up their efforts to address risk and compliance issues, weed out fraud, and maintain more accurate records in the face of globalization, near-real-time data demands, and mushrooming data volumes. The Identity Systems' solution works across various languages, structures, and formats to resolve problems such as duplications, omission, and errors.

In this interview, Informatica's marketing director for identity resolution, James Jarvie, describes how identity resolution technology can help enterprises correct the increasing incident of errors in customer data. Identity resolution, Jarvis says, works like an expert user would to recognize and rank records, helping an organization clean up its data and discover valuable hidden connections.

BI This Week: What does the term "identity resolution" mean?

Jim Jarvie: Let's first define what's meant by "identity data." It's information that can uniquely identify a client, prospect, customer, supplier, taxpayer, voter, patient, or product. Unfortunately, this type of information suffers from more error and variation than just about any other class of data. Data quality processes can correct a fair amount of the error, but much of it is unavoidable, unpredictable, and difficult to fix. In addition, some of the error is deliberate, as in the case of fraudulent or criminal activity.

Identity resolution technology addresses these challenges by emulating an expert user's ability to recognize, rate, and rank matching records. This can help an organization discover the connections between people, accounts, and products -- connections that might otherwise remain hidden in the data.

How is that different from "identity management"?

Typically, identity management refers to access control, the means by which users are granted access to a computer system's resources. For example, logging in to your iTunes account is an identity management issue. A typo in your iTunes billing address is an identity resolution issue.

Why is working with identity-related data such a challenge for companies?

One simple example is customer names. Accurate, high-performance identity searching must work on uncommon as well as common names. This is an increasingly difficult challenge, since a database of a million names may contain 100,000 John Smiths, Juan Rodriguezes or Main Streets. In addition, there are different types of errors and variations that occur "naturally" in name-related data: word order, misspellings, typos, phonetic error, synonyms, nicknames, initials, truncation, abbreviations, prefix and suffix variations, concatenation, and splitting. Add in the growing need to deal with international names and addresses, and you begin to see the scope of the problem.

How have companies solved these issues in the past?

Matching records based on identity data, such as first and last name, is deceptively difficult. It's a constant challenge to balance the performance of a search or matching process against its reliability. Some companies have tried to tackle the problem using the search-and-match function built into a database, or by writing their own code using exact name searches, wild-card searches, match codes, or open-source algorithms such as Soundex. Many matching solutions use match codes built from data that has been through a cleaning process.

Most of these methods, however, tend to return too many false positives, miss too many real matches, perform slowly, or fail to handle multiple languages without a significant dollar investment. Organizations with critical search and matching needs quickly realize the need for a specialized set of tools to achieve reliable and scalable results.

Are there specific events that are making organizations look at identity resolution now?

I see four key industry drivers:

  • The exponential growth rate of data, either organically or via mergers and acquisitions
  • The increase in indirect channels of data capture that are not under the organization's data quality controls, such as Internet self-service and outsourced sales forces
  • Increasing regulatory mandates that require trusted data matching to reduce business risks and improve compliance
  • Globalization, in which more international names and addresses enter the data stream

What are some strategic areas within an enterprise that might benefit from better identity data?

Customer relationship management (CRM) and master data management (MDM) are examples of systems that require a single view of the customer, along with real-time search and match functionality to support the core processes of customer management. In addition, a range of fraud control, law enforcement, and security applications require high-performance, real-time searching, matching, and screening built into their core applications.

What should organizations look for in selecting identity resolution technology?

There are a number of key factors to consider. They include:

  • Will the matching algorithms produce results that accurately emulate a human expert's ability to determine a match based on a number of attributes?
  • Are online search results ranked intuitively?
  • Is the solution flexible and configurable enough to cater to different business areas with varying risk and performance requirements?
  • \
  • Can the solution deliver speed and scale in order to search large data volumes in real time and perform high-volume matching quickly against large record sets?
  • Can the solution perform accurate searching and matching on non-Latin script and multi-language data to accommodate the trends in a globalized market?

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.