The Evolution of Data Masking

Data masking just might be the most important technology you've never heard of.

Late last year, Gartner Inc. published its assessment of the leaders, visionaries, contenders, and niche players in the still-gestating data masking technology segment.

It identified a trio of well-known data integration (DI) players -- IBM Corp., Informatica Corp., and Oracle Corp. -- as leaders. This begs a question: What's data masking?

According to Ash Parikh, senior director of emerging technologies with DI powerhouse Informatica, it just might be the most important technology you've never heard of. It all has to do with the increasing velocity of DI.

"The problem [with traditional] business intelligence is the … huge time lag between when the business comes to IT and requests a new report and when that report is actually delivered," Parikh sugests. "What we're focused on [doing] is eliminating this lag: automating as much as possible, exposing connections to the business user so they can do some of that [DI] stuff as self-service. Realistically, you can't do this if you don't have good data masking [technology] in the background."

Data masking works by obfuscating or suppressing sensitive information; as a category, this can include credit card numbers, Social Security numbers, medical information, and intellectual property, among other assets. For a host of ethical, legal, or regulatory reasons, certain people in certain roles shouldn't (or can't) be permitted to see this information.

In certain scenarios -- e.g., a prototype, test, or proof-of-concept on a live data set -- the validity or accuracy of this information is superfluous. For a PoC, for example, what matters is that test data should conform to existing business rules and metadata standards.

"Data masking aims to prevent the abuse of sensitive data by hiding it from users," write analysts Joseph Feiman and Carsten Casper in Gartner's "Magic Quadrant for Data Masking Technology. "[V]endors offer multiple data masking techniques, such as replacing some fields with similar-looking characters, replacing characters with masking characters ... replacing real last names with fictional last names and reshuffling data in the database columns." Most DI vendors concur with Gartner in dubbing this "data masking;" on the other hand, Feiman and Casper note, it's "also known as data obfuscation, data privacy, data sanitization, data scrambling, data deidentification, data anonymization[,] and data deauthentication."

Traditionally, "data masking" denoted what Gartner calls "static data masking" (SDM): non-real-time masking on non-production databases.

That's changing, according to Feiman and Casper, who suggest that "the market is heading toward a consolidation of [SDM and dynamic data masking, or DDM] capabilities."

Gartner has IBM and Informatica alone in its "Leaders" quadrant; it has Oracle sitting athwart the "Y" axis of its "Leaders" and "Challengers" quadrants. That's it. It lists nine other players -- none of them prominent names in DI or BI -- in its "Visionaries" and "Niche Players" quadrants. To a degree, this makes sense: Parikh, who's been evangelizing Informatica's data masking products for several years, describes the practice as highly complex -- e.g., the DI equivalent of what in the academic world is called an "interdisciplinary" problem or project.

Static data masking was a comparatively subdued affair, but dynamic data masking ups the ante considerably, argues Parikh: it involves automation and orchestration between and among traditional ETL tools, data federation technologies (chiefly in the form of data virtualization (DV), with its emphasis on data quality and cleansing), standalone data quality tools, and other DI technologies.

"In the data virtualization space, you can apply masking in real time, while that [information] is in-flight," he explains, noting that Informatica Dynamic Data Masking and Informatica Data Quality can perform in-flight cleansing and masking. "If you ask a data quality vendor, 'Do you do data masking?' they'll say, 'Oh yeah, we do.' However, what they're talking about is a very basic kind. Data masking ... is not something that can be done overnight."

No Overnight Sensation

Over the last four years, IBM and Informatica have fleshed out their masking portfolios using a combination of in-house development and best-of-breed acquisition.

IBM catapulted to masking market leadership with its 2009 acquisition of the former Princeton Softech. It now markets Princeton Softech's Optim data masking technology as IBM InfoSphere Optim Data Privacy. Gartner likewise points to IBM's acquisition of the former Exeros (also in 2009) as boosting its data masking feature set. (Used in combination with Princeton Softech's Optim data masking tool, Exeros gave IBM a means to effectively identify potentially sensitive information.) Informatica markets two data masking products: Dynamic Data Masking and Persistent Data Masking. Much of its SDM technology is homegrown; two years ago, however, it acquired ActiveBase, a start-up provider of DDM technology.

At this point, the DDM is still relatively primitive, according to Gartner: in addition to IBM and Informatica, the only other credible DDM player is MENTISoftware, a New York, NY-based vendor that specializes in "sensitive information management." It markets a product called iMask for DDM and data redaction. For SDM, it markets a product called iScramble.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.