TDWI Articles

How to Let Your Data Lake Flow Without Fear in a World of Privacy Regulations

Data lakes pose compliance challenges. Here's how to overcome them.

Today's biggest companies are facing a deluge of data breaches, from the social media giants to credit card companies and healthcare organizations. In fact, the first six months of 2019 saw more than 3,800 publicly disclosed breaches and 4.1 billion compromised personal records. These breaches, along with the misuse and abuse of private information, continue to erode consumer trust. In response, companies are developing solutions to implement privacy and security controls that track, block, and restrict access to personal data.

For Further Reading:

Why Encryption Holds the Secret to Data Security

Modern Metadata Management

How to Survive the Coming Data Privacy Tsunami

As the public becomes increasingly aware of data breaches and how personal information is being stolen, organizations and their customers are asking how and why personal data is being used. Inquiries are coming in the form of data subject requests (DSRs). Even though data might be king, privacy compliance is ruling the kingdom. It's become more important than ever to understand these questions and how to address the ever-increasing volume of DSRs.

The Surge of Data Privacy Concerns

Regulations such as the General Data Protection Regulation (GDPR) and the upcoming California Consumer Privacy Act (CCPA) are forcing companies to respond to DSRs and answer consumer concerns about privacy (and rightfully so). However, achieving compliance with these regulations requires that companies understand what personal information they have across every ecosystem, where it's located, and how it's being used.

Data lakes are useful repositories for gathering massive amounts of data in its original format, with the idea that the data will eventually be subject to analysis, but privacy risks lurk within these systems. These huge storage repositories can pose serious problems when a customer submits a DSR. Data lakes are continuously ingesting disparate pieces of customer data from a variety of sources, so organizations often have no clue which sensitive information they have and how it is being combined.

For example, individual pieces of data can be safe on their own, but when combined can increase compliance risk. For example, gender, ZIP code, and date of birth fields are individually benign, but when combined can identify 87 percent of the United States population.

Using Automation to Monitor Data Lakes

To know and understand exactly what information is in their data lakes, enterprises need to inspect their data down to the data-element level and not rely on what's implied by their metadata. When operating at that level, enterprises can also identify highly sensitive combinations of data across their ecosystem to protect against security risks and remain in compliance.

To protect themselves from data lake compliance issues, organizations should implement automated data privacy management solutions to quickly identify where personal information is located across their systems. If organizations continue to use outdated manual processes, they risk human error caused by the constant stream of data being poured in and privacy teams working long hours to manually organize each piece of information.

Enterprises also need to monitor all data that enters and exits their systems -- continuously checking, scanning, and classifying data in motion. An automated data inventory and privacy solution can help in this effort and use de-identification or anonymization to prevent data analysts from connecting individuals to their personal information. In this way, data can still be used to drive business innovation without compromising privacy.

Protecting Data Use to Remain in Compliance

Regulations such as the GDPR and CCPA also require that data be encrypted to preserve the confidentiality and integrity of sensitive information. With the massive volume of data in a company's data lake and even more live data continuously streaming in, most traditional encryption validation tools quickly become obsolete.

Manual tools can't track data in motion. As soon as you manually track a piece of data, new data is already entering the system. This method only creates a snapshot for that moment in time, not accounting for any new information moving through the system. Data needs to be classified, labeled, and mapped back to the encryption requirements dictated by both these regulations and the organization's internal use policies to remain in compliance.

A Final Word

In an era of increasing data privacy regulations, it's more important than ever for organizations to know what sensitive data is held within their data lakes and repositories as well as what's traveling throughout their systems. Consumers now demand transparency, accuracy, and expediency when asking companies about what data they have and are collecting. It's imperative for companies to have the proper tools in place to accurately and responsibly handle data while responding to DSRs in a timely and efficient matter.

About the Author

Drew Schuil is the VP of global BD & EMEA operations for Integris Software. For the last 19 years, he has held key leadership positions with enterprise software and cybersecurity companies. Prior to Integris Software, he was VP of global product strategy at Imperva, a data-centric audit and protection software firm, meeting with companies and speaking at industry events in 43 countries. Exposure to global privacy sentiment and the GDPR led to joining data privacy innovator Integris Software just as regulations such as the California Consumer Privacy Act (CCPA) began driving heightened privacy awareness in the U.S.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.