TDWI Articles

Q&A: Look to Data-Centric Security to Truly Protect Data

A security expert from Teradata explains how and why to focus security specifically on your data.

In the second installment of a two-part interview, we talk further with security evangelist Jay Irwin about data-centric security -- what it is and how to implement various layers of security that work together. Irwin, who directs the Teradata InfoSec Center of Expertise, speaks and writes often on cyber security, security architecture, international privacy, and information assurance. "Security is top-of-mind everywhere right now," Irwin says. "It's the hottest thing I've seen in my entire career, yet the problems are so pervasive. ... The percentage of companies that have done what they need to is fairly small. There's a lot of work left to do." [Editor's note: Read part 1 here.]

Upside: You've been working in this area for a while. As you talk to both business and technical managers, do they realize the importance of data-centric security? Are most companies working hard on this, or is there a lot of education still to be done?

Jay Irwin: Security is top-of-mind everywhere right now. It's the hottest thing I've seen in my entire career. Yet the problems are so pervasive. There are so many companies that are so far behind the curve that would not be in a good position if they were breached. The workload in security is huge and the percentage of companies that have done substantially what they need to is fairly small. There's a lot of work left to do, but I do commend companies for diving in and making it top-of- mind right now. There's just a lot that needs to be done and unfortunately budgets are not huge.

Let's drill down on some of the layers of security protection that you brought up in the first part of the interview. You mentioned data classification -- how should companies implement that?

When companies are classifying data, the first things they need to do are inventory the data and assign it an owner. If you have 10 businesses -- HR, finance, sales, service, and so forth -- you should have 10 data owners. Some executive needs to own each of those areas of data. That owner needs to know what that data is, how much there is, and have a current inventory of it that they can readily access.

Beyond that, those owners need to have the people who work for them administer their policies regarding that data on a day-to-day basis, including controlling access rights. On the IT side, they need a gatekeeper who puts the orders for access control into play.

Here's one place where classification often goes wrong: Business owners fail to classify the data according to its sensitivity. They need to use a couple of different metrics to measure that sensitivity -- one is value and the other is the risk of exploitation. The more valuable the data is, the higher the risk that someone is going to want to come after it, so the higher the probability that certain groups will use various attack vectors to initiate an attack.

Given that, you have to know the sensitivity of the data in terms of what you would lose if it got away. How much you would be penalized by regulations? What would your customers think about you? What recourses could they turn to in the legal system?

Perhaps the most important items of all to protect -- and ones that many companies miss -- are trade secrets. Consider that the entire foundation of your business is built on the fact that you do something better or different, or both, than anyone else. You have business processes, you have trade secrets, you have formulas, you have something unique about your business that you would not want your competitors to have. For each piece of data, you must ask yourself is, "Is this something I would want my competition to have?" If the answer is no, it's intellectual property. How highly do you value it? Do you want the competitor to know buying prices for certain classes of goods, for example? If no, then that's valuable, sensitive data.

This process fails when companies fail to identify all of their trade secrets. A company might say, oh, our memos are for internal use; our HR data is already confidential; our passwords and network topologies are hidden; our marketing brochures are already public; and so forth. They leave it at that. There's no organized process, there's no valuation of property, nor do they come up with a process to keep that data protected with the right controls in their data warehouse.

All of the data is assigned the same level of protection, correct?

That question takes us back to data-centric security. You put data in your warehouse that has various levels of value to the company, as we've discussed here. You shouldn't give all of it the same level of protection. You put the most critical -- and expensive -- protection on the highest value data and you put fewer protections on the data that you don't really care about. Those protections are wide and varied.

Who should make the decisions about what data to protect at what level? Is that made on the business side by the data owners?

Yes, and the trouble is, too often they don't spend enough time on the exercise I just described. Instead, they leave it to the security experts within the company to come up with policy. That's not enough. The business people have to be the ones that place a value on the data, and the only way they can take that job seriously is if someone at the board level says to them, "You're accountable for the finance data. If finance data gets compromise, guess what. Get your resume cleaned up."

That's really the only way. You have to get that commitment to buy in, whether it's forced or voluntary. Of course, it's best if it's voluntary. Ideally, there are business owners in the organization who are proactive about security and who believe in the company's business, who understand the need to protect its assets, and who are stakeholders who want to share in the company's profits.

Is there tension between the IT side and the business side when it comes to securing data? How do you deal with that?

IT frequently views business as disruptive, because business has a tendency to want access to everything. It's an offshoot of the personalities you see in sales -- no barriers. That frustrates IT because IT needs to be able to communicate back to the business, "Here's why it's important that we protect your stuff."

That message needs to come through roles other than just the technicians. IT management needs to say, "Here's what makes our company valuable." There has to be that layer of people at the CISO [Chief Information Security Officer] level to explain policies. They need to act as a liaison between IT and business, to say, "We're doing this for our customers and we're doing it for you, so work with us, business side." IT frustrates the business side sometimes because it takes time to get a security control in place. Business has to understand and appreciate that there's a process to be followed when you're doing a security project and that putting a control in place in the right way involves selecting it, acquiring it, designing it, putting it in place, testing it and testing it some more, and trying to break it repeatedly. The business probably needs to learn some patience.

Another security layer you discussed in the webinar is encryption and tokenization. What do those terms mean and what sorts of data they should be used with?

Encryption is a mathematical method or several methods of scrambling data. You take a name such as "Michael" and apply a mathematical algorithm to it, and it scrambles it into an unrecognizable form. The stronger the encryption, the more room it takes, so sometimes in your database you have to widen the columns a bit. You can also get unwieldy-looking accommodations of characters, but encryption is very strong, it's very fast, and it's very hard to break. There are also good standards for it.

Tokenization is looked upon favorably today as a good alternative, because the data appears as normal data. You don't have to stretch the columns of the database because it preserves the data type. If it's a name, it's going to be text; if it's a date, it's going to be in date format. You can make the data look normal and not have to change the design of the database. It's a bit slower, but if you test it out, it's just as secure, according to data scientists.

Tokenization produces fake data, whereas encryption produces mathematically scrambled real data. It doesn't seem as though there's a big difference there, but there is. With tokenization, once the data is tokened and replaced with fake data, you have to have a way to get the data back. We call that maintaining referential integrity.

You also have to find a way to do analytics on tokenized data. In today's world, we're now able to teach people how to extract analytical information from scrambled data -- out of that token data. We have very advanced applications that will keep the real data off the database, not stored anywhere, but reconstitute-able elsewhere through a process. Not a mathematical algorithm, but a very complex process.

Should both tokenization and encryption be used in a data warehouse?

Typically a company will pick one but that's not a requirement. You could do both. You probably wouldn't do both in one table of a database, because with encryption you're stretching the columns and changing the schema; with tokenization you're not, so your resulting report is going to look funny in some places and not in others. So you probably wouldn't do both in the same table unless you had a really good reason.

However, your business might need to meet a government security standard, in which case you might choose to encrypt some data only. Maybe you have a table with social security number, bank account number, name, and date of birth. You might decide that your risk of disclosing someone's identity will be reduced enough if that's all you protect. I'm not saying that's always true, but let's say you've made that risk-based decision, so you tokenize just those four columns. Your data looks normal, you don't have to change the table schema, the performance is fast enough, and your reports look normal.

In either case, if someone is inside your network and steals your encrypted or tokenized data, they've got nothing. That's the key. The key is leaving that sensitive data protected with either encryption or tokenization for as much of its useful life as possible. That way you gain the greatest business value and the greatest security protection.

As encrypted or tokenized data moves in and out of the warehouse, does it move in its protected form so that it stays protected for as long as possible before you analyze it? Is that the goal?

That's generally true, except that we advocate analyzing it in its cryptographic state. In other words, we advocate analyzing encrypted data as cypher text, which is the scrambled value. With tokenization, we advocate analyzing the tokens themselves rather than unprotecting the data.

Note that there is some performance overhead involved with de-crypting and re-encrypting. If you need to run lots of high-speed analytics, it's going to drain the performance a bit. You'll notice it. You won't love it, but it doesn't keep you from doing business.

Keep in mind that every time you de-crypt or de-tokenize, you expose that data to risk, so the ideal is to leave it protected 100 percent of the time. It's not always possible, but it is considered possible to leave it protected 85 to 90 percent of the time or more.

With that goal in mind, we as security people teach organizations how to do queries and analytics on cypher text. There have been methods developed so you can get an enormous amount of value out of your cypher text, just as if it were clear text. The truth is most people don't need to see the actual clear text value once it's been put into the database. They just need to have referential integrity between it and the real data.

You've said that another crucial security layer in a data-centric security scheme is auditing and monitoring. Why is that so important?

Once you've let people into your environment with a username and password -- giving them rights through access control -- and once you've encrypted sensitive data, you should still have another layer of protection. That layer tells you what your users are doing. You've taught them what they can and cannot do. You've protected your highest-value data so it's useless if it's stolen. You still want to know what's going on inside your network, though. Remember, you're never going to rely on just one control. There is no magic bullet.

To that end, you need another layer of defense. That is user auditing, logging, and monitoring. That's a security control that, on the technical side, will read all transactions as they occur in your database, from log-in to log-out, and everything in between. You can program very sophisticated electronic tools to look for patterns that might indicate nefarious activities. They can send a text message to the right people or have alarms go off -- anything you want. A situation that raised an alarm might be, "Oh, look, the sys DBA has logged on twice. I don't think that's right"; or "Oh, look, a database administrator just tried to access an encrypted column 1,400 times. That can't be right"; or "Someone just tried three of our gateway routers and ran 716,000 login attempts in 800 milliseconds. That's definitely not right."

Logging and monitoring helps when you have an inside threat, but they also help when an outside threat is attacking and you can discover a pattern, or you see something that can't be happening legitimately. You then know to take action.

As a security expert and evangelist for Teradata, where do you spend most of your time? What does that job entail?

The single biggest share of my time, other than scheduling my consultants to deliver our services, is going to customer sites and meeting with stakeholders -- CISOs or business owners or data warehouse managers -- and explaining the concept of layered security defenses, explaining what security service offerings Teradata has.

We also talk to businesses about enterprise security overall. By that I mean security as applied to the entire organization -- all things, all areas. Those face-to-face meetings with a customer somewhere in the country are some of the most productive conversations I have. That's where we change minds and that's where we help folks get on the right track, security-wise. I spend a lot of my time doing that and it's very satisfying.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.