LESSON - New Frontiers in Identity Resolution
By Jim Jarvie, Marketing Director, Identity Systems, Inc.
Identity resolution—the process of determining which data representations refer to the same entity—traditionally focused on matching name and address records from company systems. But today’s businesses face a new set of challenges:
- Matching against external lists. These may be watch lists from government authorities, fraud lists from industry consortia, shared customer lists from business partners, or data acquired through mergers and acquisitions. All are produced by organizations whose formats, standards, and processes are beyond your control, so the traditional strategy of improving data quality at the source is not an option.
- Unstructured sources. Information may be gathered from Internet search strings, field reports, telephone transcripts, e-mail and text messages, Web pages, and other unstructured formats. Parsing these to extract useful information and dealing effectively with the inevitable ambiguities is a major new challenge.
- Multiple geographies. Companies must increasingly integrate data from different countries, languages, and even character sets. Each region has its own formats, rules, and local information. Data from different regions is often mingled within a single input file, forcing the identity resolution system to determine on a record-by-record basis which set of rules should apply.
- More types of information. Individuals may be matched using an e-mail address, Internet cookies, Web sites, device IDs, product serial numbers, GPS coordinates, and other identifiers beyond the traditional name and postal address. And it’s not just individuals: companies increasingly use identity resolution to track products, materials, vehicles, equipment, and even legal documents.
- More applications. Identity resolution results are now served back for purposes ranging from compliance to marketing to customer service. Each application has its own ideal balance between cost, speed, and accuracy. Privacy and regulatory rules often mean that the data available for different purposes will vary as well.
- Need for quick response. Today’s systems increasingly must react—in real time—to input from a telephone agent, Web site, kiosk, or retail associate. Adding to the challenge, the data from these sources is often user-entered, meaning it is less accurate and consistent than entries from trained company employees.
No one technique can meet all of these new requirements. Some approaches that help include:
- Using multiple keys. In a perfect world, a single match key could combine different data elements in a fixed sequence to associate related records. But missing data, ambiguous meanings, alternative identifiers, and inconsistent formats make this impossible. Multiple keys ensure that related information is spotted even when it is hidden in different locations on the input records.
- Selectable search levels. Because different applications have different requirements for the accuracy, cost, and response time, the identity resolution system must make it easy to change the balance among these on the fly.
- Extensive local reference sets. Sophisticated string matching by itself can never substitute for standardization based on local rules and reference data. This becomes increasingly critical as each system handles a broader range of geographies and data types.
- Efficient adjustment to new needs. New data types, formats, geographies, and applications are added at an everincreasing rate. The identity resolution system must accommodate these quickly and effectively. This implies functions for importing and evaluating the new data, training the system to use it, testing the results for accuracy and assurance that they don’t cause problems elsewhere, and simple mechanisms for making the results accessible.
Not every system can meet these newrequirements. But the cost of acquiring suitabletechnology is the price of entry to a worldwhere identity resolution adds new value toenterprise systems—providing benefits thatvastly exceed the cost of the ticket itself.