RESEARCH & RESOURCES

Q&A: Pendulum Swings Back to Emphasis on Governance

Former TDWI director Wayne Eckerson discusses industry trends, including a coming shift that will reassert the importance of data governance.

Wayne Eckerson, founder and principal consultant with The Eckerson Group, served as director of education and research at TDWI for many years. He is an internationally recognized thought leader who has worked in the business intelligence and analytics field since the early 1990s. Eckerson is the author of 2012's The Secrets of Analytical Leaders: Insights from Information Insiders" and earlier Performance Dashboards: Measuring, Monitoring, and Managing Your Business. He is currently working on a book about data governance.

In this second part of a two-part interview on trends in the industry, Eckerson talks about a current emphasis in BI on speed and autonomy over governance and predicts a coming shift as enterprises strive to get a handle on their big data. "I think the pendulum will swing back one day soon," he says. "Companies will start to emphasize governance over speed, standards and consistency over agility, individual preference, and business unit autonomy." [Editor's note: Read part 1 of our interview here.]

BI This Week: Earlier in this interview, you mentioned building a big data ecosystem. What issues are arising on the architecture side of BI and data warehousing?

Wayne Eckerson: The big data ecosystem has a lot of moving parts, so I often break it into top-down and bottom-up. The top-down part generally is the reports and dashboards that [pull from] the data warehouse. They meet the needs of casual users who just need KPIs to manage their business processes. Those reports and dashboards tend to meet 50 to 60 percent of all questions in the business units.

However, there are always new questions; the business is always changing, so that's where the bottom-up role comes in -- the analytics. The top-down world, with the data warehouse, is not always suited to meeting user needs. After all, we build the data warehouse to answer questions that we can ask in advance -- questions that we can put on a dashboard, right? We provide enough detail and dimensions in the warehouse so that if users have questions, they can drill down or across these dimensions in this report, do some root-cause analysis, and get some details. The warehouse is really designed for corporate reports and dashboards, with some root cause analysis around the edges. It's not designed to support ad hoc analysis across any part of the data -- ad hoc meaning that you can't define it in advance. ... Putting data in a warehouse is very expensive, so you only put data in there if you know you want a report on it.

That leaves the true business analyst or data scientist out in right field. They very quickly get to the boundaries of the data warehouse and can't go any further. They get very frustrated that they can't go across other dimensions. They can't bring in other data. They can't go down to the raw data in many cases. They basically end up using the data warehouse as a big data dumper -- they use the data warehouse to dump data into Excel or Tableau (which is really just the new Excel). Then they combine it with other data they get from other source systems, or from the Web, from external sources, or from syndicated sources, and they mash it together. Now we're seeing all these data prep tools and data wrangling tools to make that job easier.

What I see, from many years in this industry, is that the warehouse was great for top-down BI, but it completely ignored -- and, in fact, it almost revolted against -- the bottom-up role. Now we're in an era where the whole focus is on the analysts and the data scientists and meeting their needs with self-service BI tools, self-service discovery tools, self-service data mashup tools, self-service advanced analytics tools. ...

Now we're helping that very specific set of power users. What issues does that raise?

Right -- now we're empowering the power user, which is great, but we still have the problem of [figuring out] how you architect for them. The warehouse isn't great for them. We can carve out partitions in the warehouse, such as Teradata does with its Data Labs, and allow users to upload their own data into those partitions, and combine it with corporate data, and do that without impacting anyone else. That's what I would call a sandbox, and I think only Teradata does that really well.

[Alternatively,] you can buy an analytic appliance and at least replicate your warehouse data in there, and then add on any other data that users want, and let them go to town, or you can buy Hadoop and create various types of Hive tables and other data structures and allow users to query those with whatever tools make sense. That's another kind of sandbox.

The architecture for power users is these sandboxes, basically. They allow users to mash data together inside the corporate ecosystem without having to do it on their desktops. If we can build some collaboration around that, and some reuse of what they've done, we could start to crowd-source governance from the bottom up in terms of, "Hey, this table, or this field, is what you really need to go to if you're doing this type of analysis." It gives [the next user] someplace to begin and starts to create some standards around metrics and analyses, which is what the company ultimately wants -- consistency in how people are defining things.

I think that sandboxes, built into various parts of the big data ecosystem, are empowering power users to work more effectively. It used to be that they spent 60 to 80 percent of their time just gathering data and mashing it together before they even did their analysis, which left little time for analysis.

Now, with these tools and collaboration, that percentage is going to go way down. That means they can spend more time analyzing and less time acting like data warehousing people -- because in that sense, they're basically human data warehouses. That's really the definition of a business analyst.

I like your phrase, "crowd-sourcing data governance."

That's a huge issue. People are talking about big data governance this year, finally. They're suddenly realizing that, yeah, you can bring all this data in, but it's garbage in, garbage out. You need to know what you're bringing in. You need to model it so people can access it in ways that makes sense to them, like SQL. You need lineage, security -- it's coming back into fashion slowly.

All this seems to happen in ten-year increments. It was 2010 when big data really got hot. We're halfway through the cycle, so in another five years, the tide will turn and we'll be much more IT-oriented. Things won't be dramatically different then; we won't be going back to the good old days. There will have to be much more of a collaborative partnership between business and IT -- there has to be, because what we have hasn't worked for so long. There will have to be greater respect for and acknowledgement of the need for IT to get involved to ensure that what we need gets built, and in an effective, scalable, secure, reliable, available way -- all that good stuff -- and IT has to do it without compromising faster, better, cheaper.

I think that slowly we're getting there. The pendulum is swinging from one side to the other, from completely centralized and IT-dominated, to completely decentralized and business-dominated, to somewhere in the middle, with collaboration and federated interplay between the business and technical folks.

I saw on your website that you're working on a new book, this one on data governance. Is that because you see the pendulum swinging back, as you say, toward more control of the data?

Last year around this time, I chose that issue for the book because with every client I talk to, the biggest issue is data governance -- building that common vocabulary of terms and metrics that drive the business. Without a common vocabulary, people can't communicate. That's what so many companies lack. It's not so much about the technology as about agreeing on what things mean so that you can measure consistently. That is hard to do.

Data warehousing professionals have experience with data governance because you can't have a data warehouse without getting agreement on certain terms and metrics you're going to use to report on. In many ways, they can lead the charge for corporatewide governance, where we apply these terms and definitions across all applications, and then validate their value using master data management.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.