Q&A: Big Data Uses and Issues

How big data is being used and why big data security issues must be addressed.

Big data is a big deal in analytics, but what is it, exactly and what is the biggest issue we should be aware of? To learn more, we turned to Todd Thiemann, senior director for product marketing at Vormetric, a security specialist.

BI This Week: There seems to be a brewing religious war over the definition of "big data." You have the NoSQL/Hadoop crowd saying one thing and the traditional database vendors saying another. Can you explain how and why these groups are so far apart, and what is your definition of big data?

Todd Thiemann: Big data is the sexy IT topic de jour and everyone is jumping on the bandwagon. The executive suite is buzzing with big data talk, budgets are going towards big data projects so IT vendors want to tout their goods as big data enablers. This is resulting in some different definitions for the term. Big data is similar to where the term "cloud" was a few years ago.

In terms of definitions, you could look at things in terms of scale or in terms of technology plus scale. If you take the scale approach, any large structured data repository is a big data environment. This would be the definition typically favored by the SQL database players. From a technology-plus-scale perspective, the NoSQL players such as Hadoop, MongoDB, CouchDB, and Cassandra would say that big data equals a NoSQL environment.

It will probably take a few years for a more precise "big data" definition to shake out. I am partial to the definition that I have seen from some analysts that lists the characteristics of volume, velocity, and variety of data. From the perspective of Vormetric (the company I work for), we are Switzerland in the "what is big data" argument as we protect information in all big data environments, regardless of whether they are SQL or NoSQL environments.

What are the biggest misconceptions enterprises have about big data?

In the enthusiasm to leverage big data, security is largely an afterthought. Although SQL databases are surrounded by a strong security ecosystem and IT security policies, NoSQL databases often have negligible security and NoSQL projects are frequently deployed unbeknownst to the security and risk management teams. Unfortunately, it will probably take a data breach or two to bring awareness to the security issues around big data.

What sort of big data use cases do you see cropping up? How do you think such usage will change over the coming year or so?

There is no big data cookie cutter -- the use cases are all over the map. Customers are using large aggregated data stores to drive faster, better business decisions. An article in The Harvard Business Review published October 2012 (Big Data: The Management Revolution) advocated using big data for making better decisions, so I suspect that it will be coming to a server near you. We have customers that run the gamut from one using MongoDB to analyze multi-terabyte data repositories of healthcare information to government agencies looking at huge volumes of data.

New technologies often introduce new security issues. For example, cloud storage and cloud applications have raised concerns and, for many enterprises, the concern is so high they are putting cloud initiatives on hold for the time being. Are there new security challenges with "big data"?

There are security issues around any environment that touches sensitive data, but the net new security challenges are coming from big data NoSQL environments. For example, Hadoop has negligible security compared to the protection ecosystem that exists around SQL databases.

It is important to note that most of the big data deployments that we see are typically private cloud rather than public cloud-based. This may be because we deal in protecting sensitive data and our customers are somewhat leery about putting this type of information in the public cloud. Furthermore, big data NoSQL environments involve large volumes of data that can be costly to move in the public cloud.

What security best practices can enterprises follow to avoid unauthorized data disclosure, theft, and fraud specifically related to big data?

The first order of business is getting a handle on where and what sensitive/regulated data in stored in big data environments. Although the Security team may know this for existing environments, some big data projects can be skunkworks initiatives where security is an afterthought. In SQL environments, some best practices include managing privileged users through a combination of encryption of data at rest and database activity monitoring (DAM). For NoSQL environments, some of the steps to consider including adding Kerberos for authentication, file-level encryption to protect data at rest, and key management to separate keys from the data.

What products or services does Vormetric offer to secure big data?

Vormetric provides enterprise data protection. Specifically, Vormetric secures and controls access to big data in SQL and NoSQL data repositories across the enterprise and reports on that access. Vormetric Encryption supports all major SQL and NoSQL platforms in Linux, Unix, and Windows in physical, virtual, and cloud environments. Customers like the extensibility of Vormetric Data Security -- they can avoid the cost and headaches of encryption silos by starting with one use case and expanding to other uses cases as their business requires. Vormetric provides a single data protection solution that delivers compliance, enables a consistent data security posture and minimizes administrative costs.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.