RESEARCH & RESOURCES

Q&A: Don't Overlook Data Governance When It Comes to Big Data

The issue is often overlooked, but solid data governance policies are crucial for getting business value from big data, explains SAP's Dan Everett.

"Right now, most companies do data governance in an ad hoc manner across different parts of the business," says SAP's Dan Everett. That leads to fragmented governance processes that waste time and muddy the data. In this interview, he explains how better planning and coordination can help, regardless of the size of the data. Everett is senior director of information management at SAP, where he focuses on understanding how technology can fulfill business needs. He has extensive knowledge of data management technologies, along with analytic tools and applications.

BI This Week: Let's start with the relationship between data governance and big data. In a recent Webinar with TDWI, you mentioned that we tend to talk a lot about big data these days but we don't talk about data governance with the same frequency at all. How important is data governance to big data?

Dan Everett: In my opinion, data governance needs to be there regardless of whether we're talking about a terabyte or a petabyte of data. Regardless of data volume, you still need to do data managemnet. In big data discussions, there's often Hadoop data and social media data, along with whatever data you have internally. Regardless of all that, you still need to integrate the data. You still need to cleanse and rationalize it somehow.

As companies start collecting more and more information, certainly, there is potential value from storing it all. At some point, however, they're going to say, "Where's the tradeoff between the business value I can get out of this data and my cost for storing it?" At that point, companies need to make some decisions about what data to keep, what to archive, and what to get rid of altogether.

Now you have all this content -- unstructured, semi-structured, whatever term you choose. In most organizations, the people in charge of data management and the people in charge of content management are in two different groups. With all this attention around big data, we may start to see those groups coming together and sharing policies and procedures, those types of things.

In an article you wrote last year for Forbes online, you mentioned a McKinsey Global Institute Study on big data that projected data growth of 40 percent a year. What does that kind of growth mean for companies as they wrestle with data retention and data quality issues?

Data retention is a big and complex issue for companies -- determining their retention strategy and policies, how to implement those policies, and how to create automated processes with rules that depend on what country you're doing business with. If you're a large multinational company, there are always different regulations about how long you can keep customer data, how long you have to keep HR data, when you need to destroy it, and so forth. The rules are different depending on what country you're doing business in. It's a very complex area of data management and governance.

When we talk about data governance, regardless of the amount of data, where is the ROI? How do you demonstrate that return to company leadership, both on the IT and business sides?

You need to look at what the executives' objectives are, and tie data quality back to those. For example, if the strategic objectives of the company are revenue growth, focus on that. There are a couple of different ways you could do that. It could be that you'll see an increase in the response rate on a marketing campaign. Maybe a corporate strategy is to accelerate mergers and acquisitions, or reduce time to product launch, or increase cross-sell and up-sell. In each case, different line-of-business initiatives are tied to that strategic goal. You need to find those, then explain how information governance objectives further those goals.

As an example, let's look at increasing marketing campaign response rates. You can have data governance objectives around removing duplicate records, ensuring the accuracy of data -- the conformity and completeness – and harmonizing customer and product data so you can better target your campaigns. Each one of those objectives could then have some metrics around it. You're actually starting at the top with your strategic goals, then working your way down. Then you can work your way back up, showing how what you're doing from a data governance perspective ties back to the company's strategic goals. In my opinion, those are the sort of things that are going to get executives' attention.

When you talk to customers, are most of them doing what you've just described? Are they able to draw those lines between data governance objectives and the company's strategic goals?

Most of the customers I see are measuring [data governance benefits] from an IT perspective. They are looking at reduced number of full-time employees, data management success, and so forth, rather than tying back to business objectives.

There are a couple of areas where people are linking data governance to business objectives. Procurement is a good example -- companies often do data management around vendor and material master data, then tie that to procurement or to the supply chain. In that case, they can say, "We're doing business with all these different vendors, but if we consolidated, we might get a better rate on these materials," or "If we narrowed the number of vendors that we deal with, contract compliance [might be simpler]," and so forth. That's an area where I do see some traction in regard to tying [data governance initiatives] to ROI.

As a second example -- and I'm not sure it's necessarily ROI -- but regulatory compliance can help justify data governance initiatives. It depends on the industry -- financial services, health care and life sciences all tend to be highly regulated. In those industries, regulation seems to be a driver.

Citing regulatory oversight can help make the case for better data governance then?

Even issues such as archiving and retention management, e-discovery, or legal hold issues can be a concern. There are some heavy fines around those issues, so you can sometimes make ROI justifications based on that.

In the TDWI Webinar, you made some interesting remarks about establishing a benchmark for data quality and working from there. Could you recap that for us?

I talked about going out and doing some data profiling, and then looking at the data in terms of whatever measures are meaningful to you. Maybe you want to measure number of duplicate records, or how well your data conforms to whatever standards you have for, say, customer record completeness. Are there missing fields in different records?

It's really about establishing that baseline. Once you do that, you can start to see where your biggest problems are. Is it with customer data? Financial data? Supplier data? Material data? Once you have a baseline, you can go back to those strategic goals and objectives we talked about earlier. Using those, you can start to figure out which data issues really impact your bigger strategic goals and objectives. Which of them are areas where you could work quickly to show some value -- to generate some momentum for your data governance initiative? Those are the kinds of questions you want to be asking.

The other step, which software doesn't address, is looking at your policies, your procedures, and your standards. What is your standard for a customer record? What information should it include? Should all divisions be forced to conform to one standard, or should you do mapping between systems? Quite honestly -- and this is also true with balanced scorecard implementations I have been involved with -- the whole people-and-culture issues can be much harder than any actual software implementations they might choose.

What's an example of integrating data governance into business processes?

Most companies do data governance as a sort of an ad hoc, parallel process. [It's much better] to embed the governance into the business processes -- for example, to do data quality at the point of entry. Somebody is entering customer information, and if your standard for the customer record says you have to have certain fields, but the data entry person doesn't put in one of those fields, you don't allow them to save the record until they've entered all the information. That's a simple example of doing data quality checking at the point of entry -- of integrating it into the business processes instead of trying to clean it up later after it has proliferated in downstream systems.

Can you to talk specifically about SAP and what it brings to the discussion today?

When I look at information management in general, it seems to me that it's like business intelligence was maybe 10 years ago, companies realize they have multiple tools with overlapping functionality and are trying to figure out how to standardize, in my opinion. Customers started a data governance initiative because of some BI project. They bought a data modeling tool, they bought a data integration tool, they bought a data cleansing tool, and maybe they got some master data management. Some other group was doing system migration or an ERP consolidation project, and they also bought data integration, data cleansing, archiving retention, and master data management tools. Another group was doing business process reengineering, and they bought an enterprise modeling tool, content management, process orchestration, and master data management products.

Each of these groups within the company is running separate, ad hoc projects in a sense. The result is all these different tools that don't work that well together. They don't share metadata. I think the market as a whole is at a place where the lack of interoperability is starting to drive companies to standardize on a set of integrated capabilities. I see that as one of the biggest advantages that SAP offers. It's not just the breadth of capabilities that we offer, but the integration of those capabilities into a holistic solution.

The other area where we have an advantage is integration into the business process. Obviously, as an ERP vendor, we own those business processes. We've done a lot around master data management and content management to actually embed those capabilities into the business processes and workflow. Those are two of the biggest advantages that we offer as a company.

TDWI Membership

TDWI Members Get FREE anytime access to all reports and publications

Individual, Student, & Team memberships available.