Securing Big Data (Part 1 of 2)
Big data presents several data- and technology-based challenges to data security. In the first part of a two-part discussion, Raghuveeran Sowmyanarayanan at Accenture discusses the problems raised by data volumes, variety, and velocity.
By Raghuveeran Sowmyanarayanan, Vice President at Accenture
Introduction
Business insights from big data analytics promise major benefits to organizations, but they can also be risky for an enterprise. Managing massive amounts of data increases the risk and magnitude of a potential data breach. Sensitive, personal data and confidential data can be exposed and violate compliance and data-security regulations. Aggregating data across borders can break data residency laws.
Security and privacy challenges are magnified by the velocity, volume, and variety of big data, but organizations are looking to apply big data capabilities to sensitive data. Although existing and emerging security capabilities apply to big data, many organizations lack the tools to protect, monitor, and manage access to data processed at high rates and in a variety of formats. Organizations need to enable secure access to data for analytics in order to extract maximum value from information. Finding a solution to secure sensitive data, yet enabling analytics for meaningful insights, is necessary for any big data initiative.
Data Challenges
Existing security approaches are still valid, but the tools are just catching up with the volume, variety, and velocity of big data.
In terms of data volume, consider data sensitivity. Some datasets should not be co-mingled and some sensitive data elements should be masked or encrypted before being processed because some data combinations are governed by compliance regulations, legal agreements, and an organization's policies. Accomplishing this at high volumes is challenging. Additionally, enterprises must think about the economics of large data volumes because many existing enterprise security products are licensed by user or by CPU. This model makes sense in traditional systems but is problematic in distributed big data systems because of the larger user base.
When considering data variety, it's important to classify your data. Understanding what data you have (or don't have) in any dataset is made more difficult when you must manage and enforce data access and usage of unstructured data from a variety of sources. Speaking of security, authorization may need to be performed at the field level, but many data sources do not contain this precision or the data is optional.
Data velocity raises issues of access and performance. To reliably handle fast moving data at scale, a system of dozens to hundreds of nodes is typical. With no fixed path, controlling and monitoring access is very challenging in a big data environment. With hundreds of nodes serving different uses and purposes, how do you know who has access to what data and if that access is appropriate. Your enterprise must be able to quickly and accurately grant and revoke access to the data and needs to be able to recover data from a disaster. The more data and the faster it arrives, the more difficult it is to back up and restore. Throughput is also challenging, because traditional security tools are designed and optimized for traditional system architectures and may not be equipped for the throughput required to keep up with big data.
Technology-related Security Challenges
Security drives business value by enabling safe handling of regulated data, but new technologies are required. Traditional security tools designed and optimized for traditional system architectures may not be equipped for the throughput required to keep up with big data.
For big data, security issues must be addressed on both process and technical levels. This is where enterprise infrastructure and big data systems come together for monitoring, assessment, and access control.
In addition, big data solutions have broad attack surfaces and multiple custom interfaces. An in-depth knowledge of the vulnerabilities present in these systems is needed, and the risk of exploitation must be analyzed. Big data systems now have basic capabilities to authenticate, but by integrating big data systems with existing enterprise identity and access management (I&AM) systems, organizations can perform regular access reviews, reduce help desk and administrative costs, quickly provision and de-provision access, align big data permissions with organizational roles, and provide additional capabilities such as strong authentication.
The approach to securing big data varies by platform: Hadoop distributed file systems, NoSQL databases, and cloud-based services.
- Hadoop is a software framework for processing computationally intensive workloads -- often batch processes -- with large amounts of unstructured data. It is a large, multi-node distributed file system for parallel computing. The distributed nature of the technology makes it vulnerable and provides a broad attack surface and emphasizes the need for automated, validated, consistent application of security capabilities.
- NoSQL databases (such as Cassandra) form a distributed columnar data platform designed for scalability and performance, often for online transaction processing. Cassandra nodes do not have distinct roles, requiring a well-planned, layered approach to evaluating and applying security controls.
- Cloud-based services, whether operating as high-level analytics services or foundational platform services, address some security capabilities while introducing new challenges. The service provider may be addressing platform and network security to a high degree of assurance but lack visibility into who has accessed what.
In the next part of this discussion (available here), we will look at enterprise approaches to securing big data.
Raghuveeran Sowmyanarayanan is a vice president at Accenture and is responsible for designing solution architecture for RFPs/opportunities. You can reach him at [email protected].