TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

TDWI Blog

TDWI Blog: Data 360

Master Data Management Can Learn from Data Quality

Blog by Philip Russom
Research Director for Data Management, TDWI

For about a month now, I’ve been interviewing users on the phone, in search of speakers for upcoming TDWI events. I need speakers who can share their organization’s best practices and strategies for data management. As you can imagine, I’ve heard a lot great tips in these interviews, many of them concerning master data management (MDM).

A tip I’ve heard from people in multiple organizations is that MDM solutions achieve a higher level of success when they adopt some of the techniques and best practices of data quality (DQ). Let me give you some examples of DQ practices applied to MDM.

DQ techniques. For years, I’ve watched data integration solutions incorporate functions that originated with data quality tools, especially data profiling and data monitoring. In a similar trend, I’m now seeing MDM solutions incorporating DQ functions for data standardization, deduplication, augmentation, identification, and verification. After all, master and reference data benefits from these functions, just as any data domain would.

Data stewardship. DQ success usually depends on the processes of data stewardship. A data steward plays a key role in linking data quality work and standards to specific business goals and business applications. The average data steward can identify and prioritize DQ work that will yield a noticeable return for the business. I’m now seeing a similar stewardship approach to prioritizing MDM work.

Collaborative data management. Note that a steward’s priority list is only accurate, when developed in conjunction with business managers who know the impact of data’s quality on the business. Likewise, data stewardship can be a process for IT-to-business alignment and collaboration in the context of MDM, not just DQ.

Data governance (DG). I’ve seen a number of organizations take a successful data stewardship program (originally designed to support DQ) and evolve it into a data governance program. You see, a good data stewardship program will establish a process for proposing and authorizing changes to data and applications for the sake of improving data’s quality. A DG board or committee needs a similar process for the data standards and data usage policies it has to create and enforce. In fact, the first policies produced by a DG program usually govern data via quality rules. And a typical “next step” that a DG program takes is to apply said process to data standards and usage policies for MDM.

Change management. DQ and MDM share very similar goals, in that each strives to improve data, whether the data domain is master data, customer data, product data, financial data, etc. Achieving improvement almost always requires changes to data, applications, and how end-users use applications. Therefore, a change management process is key to effecting improvements. DQ has long standing change management processes via stewardship, plus new options for change management via data governance. MDM’s likelihood of effecting positive change is increased when it taps the data-oriented change management processes that evolved from DQ and stewardship.

Conclusion. Frankly, I’m not surprised that MDM solutions are absorbing DQ techniques and best practices. I’ve seen a similar absorption by DI solutions, going on for about ten years now. And I already mentioned how some data governance programs are essentially data stewardship programs, expanded into a data-standards-oriented form of data governance. So, it’s clear to me that a variety of data management disciplines can learn from DQ techniques and stewardship practices. And the discipline going through that cycle right now is MDM. You should follow this trend, if you’re not already.

So, what do you think, folks? Let me know. Thanks!

Posted by Philip Russom, Ph.D.0 comments

The State of Multi-Data-Domain Master Data Management (MDM)

Blog by Philip Russom
Research Director for Data Management, TDWI

Allow me a moment to parachute into the middle of an issue that’s come up a lot this calendar year, namely multi-data-domain master data management (MDM). I assume you are familiar with MDM; if not, spend a few minutes on Wikipedia.

The issue is that most user organizations deploy single-domain MDM solutions. The most popular data domain is customer data, but other common domains for MDM are (in priority order) financials, products, partners, employees, and locations.

Here’s the problem with single-data-domain MDM. It’s a barrier to having common, consensus-based entity definitions and standard reference data that would allow you to correlate information across multiple domains. For example, single-domain MDM is great for creating a single view of customers. But it needs to federate or somehow integrate with MDM for the product-data domain, if you want to extend that view to include (with a high level of accuracy and consistency) products and services that each customer has acquired or considered. Or you might include financial or location data. Some day, you’ll include data from social media. All this is easier and more accurate with multi-data-domain MDM.

The examples probably sound analytic to you, but they’re equally applicable to operations. And multi-data-domain MDM can improve lots of data management functions, like analytics, identity resolution, customer intimacy, data quality, data integration, deduplication, and sharing data across disparate departments and their IT systems.

I wish it weren’t true, but I still see most MDM solutions as focused on the customer data domain -- and that’s all. If MDM addresses other domains -- typically financial or product data -- that’s done in a separate solution, with little or integration with MDM for customer data. Some user organizations have multiple customer-focused MDM solutions, say one each for marketing analytics, direct marketing, sales pipeline, customer service, and so on. So much for a single view of the customer! These organizations have their hands full consolidating customer-data-domain MDM solutions, and that delays the next step, which is multi-data-domain MDM.

Despite these dire situations, I’ve also encountered user organizations that have successfully extended MDM to span multiple data domains. And some of these spoke at TDWI’s Solution Summit on Master Data in March 2011. For example, Cathy Burrows from Royal Bank of Canada explained how they consolidated multiple MDM solutions to create a single, central, and governed MDM solution that provides a rich, accurate, and even intimate view of each customer. They’re now enriching customer views with reference data about the products these customers have.

As another example, Mark Love of the Veterans Health Administration (VHA) talked about how the VHA started with a form of MDM for patient identity, then branched out into many other domains. To keep the domaines straight and to leverage hierarchical relations among domains, the VHA created a “master set of domains.”

I got to thinking about all this because, just yesterday, I was talking about multi-data-domain MDM with Ravi Shankar of Informatica. “Most of our recent MDM deals are multi-domain,” he said. Ravi talked through a list of Informatica customers who have multi-data-domain MDM in production today. I can’t tell you the customer names, but they’re in banking, high-tech manufacturing, food services, and government agencies. All began with one domain, then extended to others. Also, all deployed MDM in combination with their data integration and/or data quality solutions, which shows how MDM is interrelated with other data management disciplines. The list Ravi shared with me gives me confidence that more and more user organizations are succeeding with multi-data-domain MDM – and that’s a good thing.

But the future of multi-data-domain MDM isn’t totally rosy. At TDWI’s Solution Summit on Master Data in March 2011, we also heard from Evan Levy of Baseline Consulting (recently acquired by DataFlux). He said: “Multi-data-domain MDM is technically feasible today. But it makes no sense in terms of sponsorship, funding, or satisfying departmental and application-specific requirements.”

I agree with Evan’s second point wholeheartedly, because a number of users have explained to me over the years that sales and marketing need to own customer-data-domain MDM, even if it’s only applied within their customer-base segmentation, direct marketing, and sales contact applications. Likewise, the supply chain managers want to fund and control product and partner reference data. The financial guys have their own requirements for financial data, and HR has MDM requirements for employee data. All too often, these departments aren’t too keen on sharing.

But I don’t fully agree with Evan’s first point. I think there ARE situations where multi-data-domain MDM makes perfect sense, and I noted those earlier in this blog. In my experience, a common tipping point is often when technical and business people have reached maturity with customer-data MDM, and they realize they can’t get to the next level without consistent and integrated MDM about other domains.

Another way to put it is that the single view of the customer gets broader as it matures, thus demanding information from other domains. Yet another way to think of it is that multi-data-domain MDM often comes in a later life cycle stage, after single-data-domain MDM has proved the concept of MDM, in general. And much of the success of multi-data-domain MDM -- in my opinion -- is not about technology. Success depends on having a corporate culture that demands data sharing in support of cross-functional coordination.

So, folks, what do you think about the state of multi-data-domain MDM? Let me know. Thanks!

(Note that TDWI will repeat (for the fourth year) its Solution Summit on Master Data, Quality, and Governance, coming up March 4-6, 2012 in Savannah, Georgia. Mark your calendar!)

Posted by Philip Russom, Ph.D.0 comments

Advanced Analytics versus Online Analytic Processing (OLAP)

Blog by Philip Russom
Research Director for Data Management, TDWI

The current hype and hubbub around big data analytics has shifted our focus on what’s usually called “advanced analytics.” That’s an umbrella term for analytic techniques and tool types based on data mining, statistical analysis, or complex SQL – sometimes natural language processing and artificial intelligence, as well.

The term has been around since the late 1990s, so you’d think I’d get used to it. But I have to admit that the term “advanced analytics” rubs me the wrong way for two reasons:

First, it’s not a good description of what users are doing or what the technology does. Instead of “advanced analytics,” a better term would be “discovery analytics,” because that’s what users are doing. Or we could call it “exploratory analytics.” In other words, the user is typically a business analyst who is exploring data broadly to discover new business facts that no one in the enterprise knew before. These facts can then be turned into an analytic model or some equivalent for tracking over time.

Second, the thing that chaffs me most is that the way the term “advanced analytics” has been applied for fifteen years excludes online analytic processing (OLAP). Huh!? Does that mean that OLAP is “primitive analytics”? Is OLAP somehow incapable of being advanced?

I personally don’t think so. In fact, depending on how you design and implement it, OLAP can be quite advanced. For example, OLAP is very much about dimensions. In the 90s, eight dimensions was considered an advanced implementation. Nowadays I regularly talk with people who have twenty or more. I realize there’s a difference between advanced and mature. But I have to say that I’ve seen lots of mature OLAP implementations that support hundreds of cubes, hundreds of OLAP reports, and thousands of users. Over the years, different approaches to OLAP (multidimensional, relational, desktop, etc.) have consolidated into a hybrid OLAP, such that most vendor products today are quite mature, feature rich, and flexible.

Here’s another, related issue. While researching a new TDWI report on big data analytics, I ran across a few people (users, consultants, and vendors) who think that “advanced analytics” (or whatever you want to call it) will render OLAP obsolete. Therefore, user organizations should expunge OLAP from their BI portfolios. Uh, no. I don’t see that happening.

In defense of OLAP, it’s by far the most common form of analytics in BI today, and for good reasons. Once you get used to multidimensional thinking, OLAP is very natural, because most business questions are themselves multidimensional. For example, “What are western region sales revenues in Q4 2010?” intersects dimensions for geography, function, money, and time. Discoveries made in OLAP are easily “institutionalized” or “operationalized” (much more so than advanced analytics), so OLAP analyses are repeated over time with consistency. Since dimensions are easily expressed as parameters, an OLAP-based report can be as easy to use as a parameterized report, thereby putting OLAP-based analytics within the comprehension of a vast range of possible end-users.

The scope of discovery of an analytic method seems to be an important concern right now, as seen the current fascination with big data analytics. In that context, a possible limitation of OLAP is that most implementations are tightly coupled to datasets called cubes. If the information someone hopes to discover is not in a cube, then that can be a problem. Even so, so-called relational OLAP can be a solution, and OLAP tools are so friendly nowadays that just about anyone can create a cube. Depending on how an OLAP implementation is designed and which vendor tools are used, a cube can limit the scope of discovery, just as any analytic dataset can – even if it’s multi-terabyte big data.

In my mind, advanced analytics is very much about open-ended exploration and discovery in large volumes of fairly raw source data. But OLAP is about a more controlled discovery of combinations of carefully prepared dimensional datasets. The way I see it: a cube is a closed system that enables combinatorial analytics. Given the richness of cubes users are designing nowadays, there’s a gargantuan number of combinations for a wide range of users to explore.

So, OLAP’s not going away. Users would be nuts to abandon their large investments in such a handy technology. And it’s like most situations in IT. Few things go away. Organizations just keep adding more tools types and best practices to their portfolios. Therefore, user organizations should expect to maintain their useful investments in OLAP, while also digging deeper into other forms of exploratory and discovery analytics.

So, what do you think, folks? Let me know. Thanks!

Posted by Philip Russom, Ph.D.0 comments

Big Data Analytics: Avoid the Analytic Cul-De-Sac

Blog by Philip Russom
Research Director for Data Management, TDWI

Do you know what a cul-de-sac is? In French, it literally means “bottom of the bag.” But figuratively it means what most Americans would call a “dead-end street.” In residential real estate, a cul-de-sac is a desirable place to live. In analytics, a cul-de-sac is where the epiphanies of advanced analytics never get off a dead-end street to be fully leveraged elsewhere in the enterprise.

The current hype around big data analytics has most discussions of analytics focused on “discovery” analytics. That’s where a business analyst or similar user employs an advanced analytics tool (based on data mining, statistics, natural language processing, complex SQL, etc.) to discover facts never known before. For example, the analyst may discover the root cause for a new form of customer churn, a new partner behavior that’s potentially fraudulent, or the hidden costs that erode otherwise profitable customers.

While researching a new TDWI report on big data analytics, I’ve run across a number of business analysts who revel in the chase around the cul de sac, but can’t be bothered with operationalizing their epiphanies. “That’s someone else’s job,” one guy told me. Here’s what I mean.

Too often analysts drive through a figurative big data “bottom of the bag,” until just the right dataset yields an epiphany. Then they share their findings with managers and move on to the next analytic project.

This is an analytic cul-de-sac, when the analyst does not also take the findings off the dead-end street and “operationalize” them. In other words, once you discover the new form of churn, analytic models, metrics, reports, warehouse data, and so on need to be updated, so the appropriate managers can easily spot the churn and do something about quickly, if it returns. Likewise, hidden costs, once revealed, should be operationalized in analytics (and possibly reports and warehouses), so managers can better track and study costs over time, to keep them down.

I think that most analysts and similar users are avoiding analytic cul-de-sacs, by being sure that discovered epiphanies are operationalized by someone (whether by the actual analyst or another team member). I’m just saying that the product of analytics isn’t necessarily being leveraged to the hilt in every organization.

To avoid analytic cul-de-sacs and similar squanderings of insight, you might want to review some of the processes around your use of advanced analytics. In particular, be sure the process extends beyond discovery into operationalizing the epiphanies of analytics.

So, what do you think, folks? Let me know. Thanks!

Posted by Philip Russom, Ph.D.0 comments

Agile BI and DW: Dynamic, Continuous, and Never Done

Delivering value sooner and being adaptable to business change are two of the most important objectives today in business intelligence (BI) and data warehouse development. They are also two of the most difficult objectives to achieve. “Agility,” the theme of the upcoming TDWI World Conference and BI Executive Summit, to be held together the week of August 7 in San Diego, is about implementing methodologies and tools to that will shorten the distance to business value and make it easier to keep adding value throughout development and maintenance cycles.

We’re very excited about the programs for these two educational events. Earlier this week, I had the pleasure of moderating a Webinar aimed at giving attendees a preview of how the agility theme will play out during the week’s keynotes and sessions. The Webinar featured Paul Kautza, TDWI Director of Education, and two Agile experts who will be speaking and leading seminars at the conference: Ken Collier and Ralph Hughes.

Agile methodology has become a mainstream trend in software development circles, but it is much less mature in BI and DW. A Webinar attendee asked whether any Agile-trained expert could do Agile BI. “No,” answered Ken Collier. “Agile BI/DW training requires both Agile expertise as well as BI/DW expertise due to the nuances of commercial off-the-shelf (COTS) system integration, disparate skill sets and technologies, and large data volumes.” Ralph Hughes agreed, adding that “generic Agile folks can do crazy things and run their teams right into the ground.” Ralph then offered several innovations that he sees as necessary, including planning work against the warehouse’s reference architecture and pipelining work functions so everyone has a full sprint to work their specialty. He also advocated small, mandated test data sets for functional demos and full-volume data sets for loading and re-demo-ing after the iteration.

If you are just getting interested in Agile or are in the thick of implementing Agile for BI and DW projects, I would recommend listening to the Webinar, during which Ken and Ralph offered many wise bits of advice that they will explain in greater depth at the conference. The BI Executive Summit will feature management-oriented sessions on Agile, including a session by Ralph, but will also take a broader view of how innovations in BI and DW are enabling these systems to better support business requirements for greater agility, flexibility, and adaptability. These innovations include mobile, self-service, and cloud-based BI.

As working with information becomes integral to more lines of business and operations, patience with long development and deployment cycles will get increasingly thin. The time is ripe for organizations to explore what Agile methodologies as well as recent technology innovations can do to deliver business value sooner and continuously, in a virtuous cycle that does not end. In Ken Collier’s words, “The most effective Agile teams view the life of a BI/DW system as a dynamic system that is never done.”

Posted by David Stodder0 comments

Big Data Analytics: Preparing Analytic Data Differs from ETL for Data Warehousing

Blog by Philip Russom
Research Director for Data Management, TDWI

While researching a new TDWI report on big data analytics, I’ve run across a few BI professionals who are concerned about the seeming lack of data preparation that’s common with some forms of advanced analytics. Allow me a moment to sort this out.

On the one hand, all of us in BI and data warehousing are indoctrinated to believe that the data of an enterprise data warehouse (EDW) (and hence the data that feeds into reports) must be absolutely pristine, integrated and aggregated properly, well-documented, and modeled for optimization. To achieve these data requirements, BI teams work hard on extract, transform, and load (ETL), data quality (DQ), meta and master data management (MDM), and data modeling. These data preparation best practices make perfect sense for the vast majority of the reports, dashboards, and OLAP-based analyses that are refreshed from data warehouse data. For those products of BI, we want to use only well-understood data that’s brought as close to perfection as possible. And many of these become public documents, where problems with data could be dire for a business.

On the other hand, preparing data for advanced analytics requires very different best practices – especially when big data is involved. The product of advanced analytics is insight, typically an insight about bottom-line costs or customer churn or fraud or risk. These kinds of insights are never made public, and the analytic data they’re typically based on doesn’t have the reuse and publication requirements that data warehouse data has. Therefore, big data for advanced analytics rarely needs the full brace of ETL, data quality, metadata, and modeling we associate with data from an EDW.

In fact, if you bring to bear the full arsenal of data prep practices on analytic datasets, you run the risk of reducing its analytic value. This is ironic, because we usually think of ETL, DQ, and data modeling as adding value to data, not subtracting it. So, how can they harm analytic data?

To answer that question, let’s first take a look at so-called “advanced analytics.” This collection of analytic techniques would be better called “discovery analytics,” because that’s what users do with it. A business analyst or similar user applies techniques like data mining, statistical analysis, complex SQL, MapReduce, and natural language processing to discovery facts about the business that no one knew before. For example, you might discover the root cause of the latest form of customer churn. Or you might find a cluster of transactions that indicate a new kind of fraud. Or you could stumble onto an untapped customer segment.

In general, you can’t discover those entities and facts from the overly studied, calculated, modeled, and aggregated data of an EDW. Instead, you need big data, with lots of granular detail, typically in the schema of the source systems it came from. Some forms of analytics actually thrive on questionable data in poor condition. For example, analytic applications for fraud detection may depend on outliers and non-standard data as indications of fraud. And the insights of discovery analytics often focus in narrow slices of the business, like an obscure customer segment, or time frame or group of shipments or transaction types or risky neighborhood. These thin slices can easily disappear in an aggregation pass. Hence, if you apply ETL and DQ processes to big data, as you do for a data warehouse, you run the risk of stripping out the very nuggets that make big data a treasure trove for discovery oriented advanced analytics. This is why the preparation of data for discovery analytics seems minimal (even slipshod) – often just extracts and table joins – compared to the full range of data prep applied to EDW data.

Does this mean that we can throw out the sacrosanct best practices for ETL, DQ, metadata, MDM, and data modeling? No, of course not. Some organization will simply need to suspend these for discovery analytics with big data—but only temporarily. Here’s a typical scenario.

After business analysts and other users have discovered what they’re looking for in big data, they need to take the discovery to the BI and DW team, so the results can be “institutionalized” in the EDW. For example, when discovery analytics reveals valuable items – like new forms of churn, customer segments, cost centers, etc. – these need to be represented by data structures in the EDW and reports, so that business people can track them regularly. At that point, the best practices of data preparation come back into play.

So, what do you think, folks? Let me know. Thanks!

Posted by Philip Russom, Ph.D.0 comments