How Collaborators Can Safely Share Sensitive Data
Billions of dollars’ worth of sensitive data is sitting unused in companies around the world for lack of a way to process it securely. What if there were a solution? These best practices can help you tackle a seemingly untouchable challenge.
- By Raluca Ada Popa
- November 3, 2022
Machine learning (ML) has become a key technique for enabling work efficiency, better tech outcomes, and academic research. The ability to streamline analysis on large data pools through sophisticated ML models has made it possible for teams to focus on the business value rather than setup. Despite this, more than $300 billion worth of the world’s most valuable data remains untapped because enterprises lack a secure processing environment for sensitive, encrypted data.
The inability to securely share this data or analyze confidential data owned by multiple parties has caused organizations to restrict data access, eliminate data sets or mask specific data fields, and outright prevent any level of data sharing. These restrictions prevent organizations from executing on significant use cases that involve sensitive data -- for example, collaborating to identify and prevent money laundering in financial services, confidentially sharing patient information for clinical trials, or sharing sensor data and manufacturing information to perform preventive maintenance -– despite an urgent need for data-driven results.
An emerging approach to solve this problem is confidential computing because it offers a way to protect data while it’s in use and being analyzed (not just at rest or in transit as standard encryption does). However, significant challenges around multi-party collaboration on sensitive data remain, preventing multi-party collaboration.
Through my extensive research at UC Berkeley’s RISELab, I’ve discovered how organizations can enable secure multiparty collaboration at scale.
Combine Software and Hardware Solutions
Confidential computing technologies include both hardware solutions, such as secure enclaves, and software solutions, such as cryptography, and alone each has limitations when it comes to multi-party analytics.
Secure enclaves enforce an environment that is inaccessible for other applications, users, or processes co-located on the system, but hardware alone is very difficult to scale and is susceptible to side-channel attacks -- meaning someone could steal data via an indirect channel of access.
Cryptography encodes information and communications so only those for whom the information is intended can read it. Encrypting data at rest and in motion is effective for security, but, when it’s run in the cloud, computing on encrypted data is slow, expensive for complex computations, and inefficient for modern production workloads.
Using a combination of secure enclaves and cryptography enables systems to leverage the strength of cloud scale and the efficiency of hardware with the added protection of cryptography. Here are two key strategies for mitigating any drawbacks and getting the most out of confidential computing technologies.
Strategy #1: Supply each organization with a cryptographic key
Each organization has its cryptographic key to encrypt and ensure that its own confidential data remains private to other collaborators. All participating groups should have a secret key for their own encrypted data set, which gets shared with the secure enclave responsible for running the training algorithm on the data pool. Secret cryptographic keys ensure the security of the encryption during multi-party collaboration –– preventing any detrimental consequences to the user’s or customer’s privacy.
To develop the information model based on encrypted data, the secure enclave needs access to the keys. From there, it can perform ML on the encrypted data and package the resulting model using each organization’s key.
Strategy #2: Take extra preventive measures against side-channel attacks
Side-channel cyberattacks don’t need unencrypted data to uncover sensitive information. Hackers are able to use indirect methods for extracting information, such as the pattern of accessing pages in memory, the patterns of accessing the processor caches, or the memory addresses sent over the memory bus. This information is unintentionally leaked by the system, providing clues to the encrypted information. With the right tools and experience, a hacker can identify subtle differences in the leaked information to decipher the encrypted data.
Employing cryptographic techniques, such as oblivious algorithms, ensures that the memory patterns mentioned above are devoid of sensitive information. These “oblivious” algorithms make it so that the patterns of memory accesses leaked to the attacker are predefined (independent of the data content) or look random.
The Future of Confidential Computing
As more businesses move to the cloud, secure collaboration using encrypted data becomes increasingly important, especially in industries that collect highly sensitive information. Gartner predicts that by 2025, over 50 percent of organizations will adopt privacy-enhancing computation to process sensitive data and conduct multi-party analytics. The sooner businesses get started on implementation, the better prepared they’ll be for a cloud-based future riddled with cyber threats.
Raluca Ada Popa is the president and co-founder of Opaque Systems and an associate professor of computer science at UC Berkeley. Popa recently won the ACM Grace Murray Hopper Award for her work building secure systems focused on protecting the confidentiality of data. Prior to founding Opaque, she co-created the MC2 open source project, a popular open source platforms for secure analytics and machine learning, which later became the basis for Opaque Systems. You can reach the author via email or on Twitter and LinkedIn.