By using website you agree to our use of cookies as described in our cookie policy. Learn More

TDWI Upside - Where Data Means Business

It's Official: Metadata Management Is a Strategic Problem

Gartner recently published its first-ever "Magic Quadrant for Metadata Management Solutions". Managing metadata isn't a new problem but it has certainly become a strategic one.

Gartner recently published its first ever Magic Quadrant for Metadata Management Solutions.

Managing metadata isn't a new problem. Gartner's Magic Quadrant research reports aren't new, either -- the market watcher has been publishing them for eight years now. Why, then, did Gartner only publish its first-ever Magic Quadrant for metadata management in 2016?

Because it only now believes metadata management has become a strategic problem.

"The growing need for organizations to treat information as an asset is making metadata management strategic," write analysts Guido De Simoni and Roxane Edjlali in the Gartner report.

If you're an old data management (DM) hand, you might be thinking, "Metadata management? No problem!" In traditional DM, metadata management is a well-understood problem for which there is a wide range of workable technological and process-oriented solutions. Besides, you might say, even if some of the issues that come under the rubric of managing metadata do pose unique challenges (master data management, for one), they're nonetheless doable -- and you'd be right.

However, Gartner's inaugural Magic Quadrant for Metadata Management Solutions looks at the new lay of the land -- a land in which NoSQL systems such as Hadoop, Cassandra, MongoDB, and others are important and growing factors. Such NoSQL systems are typically characterized by an impoverished data management feature set. Hadoop and other platforms lack robust metadata management and data lineage tracking capabilities, as well as other critical DM features.

The projections in Gartner's Magic Quadrant for Metadata Management Solutions reflect this reality.

The report makes two strategic planning assumptions. The first is that data lakes -- a preponderance of which are based on Hadoop -- will not receive effective metadata management facilities until at least late 2018.

The second is that metadata will become ever more crucial to enterprise information governance initiatives. By 2020, Gartner projects that "50 percent of information governance initiatives will be enacted with policies based on metadata alone."

In making these assumptions, Gartner seems to be saying that managing metadata in the hybrid data architectures of today is still as tricky as ever -- and that enterprises are (or will be) more dependent on metadata standards and definitions than ever before.

These assumptions aren't necessarily in conflict with one another.

Metadata Standards Are the Thing

Research analyst Mike Ferguson, a principal with U.K.-based Intelligent Business Strategies, says companies are at an impasse with respect to the combined problems of governance and data and application integration. As evidence, Ferguson points to a confluence of problems, including the complexity of hybrid data and application architectures, and a more rigorous regulatory climate.

Most hybrid architectures are merely attempts to retrofit existing data and application architectures for new technology paradigms such as REST, microservices, and NoSQL. Meanwhile, initiatives such as the European Union's by-some-measures drastic General Data Privacy Regulation (GDPR), impose stringent penalties for the misallocation or misuse of personally identifiable information.

"[T]he issue is that the complexity of data management is far greater than people first realized. There is a legitimate concern about a kind of Wild West [of competing metadata standards] because what's created in one tool may not be intelligible to another," Ferguson said, alluding to the highly distributed nature of data and applications in most enterprises.

"It's difficult if not impossible to track where a file is, how many times it's been copied, whether it's deleted -- gone for good -- or not, and so on. Fundamentally, this is a metadata problem. I've already lived through several metadata failures," he said, citing IBM's ambitious Application Development/Cycle in the late 1980s and Microsoft's Open Information Model.

Ferguson believes we might be gearing up for yet another metadata standards push. He cites the momentum behind the open source Apache Atlas project and similar projects such as WhereHows (developed by LinkedIn), and Ground, a new effort championed by data prep specialist TriFacta and Joe Hellerstein of the University of California, Berkeley . The impetus for something is clear, Ferguson argues.

"Apache Atlas [is] the one that's furthest along. It's the first time we've seen an open source metadata definition, [although] not everybody's buying into that," he says. "I spoke at Enterprise Data World ... in April ... [and] in talking to all of the vendors, I asked them, 'What are you doing about Atlas?' The answer was, 'We're watching it.' There aren't many who are committing to it yet."

A Process of Assimilation, Reconciliation, and Reinvention

Mark Madsen, a research analyst with information management consultancy Third Nature, has a similar take. What's happening now with Hadoop and other NoSQL systems is analogous to what happened 30 years ago when client-server UNIX systems first began to displace mainframes -- the data integration and management regimen that propped up the status quo couldn't be translated to the new paradigm.

"It's a problem of architectural change," Madsen says. "One could argue that in the mainframe world, we had the same [situation where] the [existing tools] were not suitable for the client-server world. Now we have an even more distributed environment of systems and clusters, and the existing solutions, although functionally adequate, do not fit the bill for the new environment."

In many cases, the tools, methods, and processes developed for traditional data management aren't yet available in the new paradigm. In other cases, the central precepts of traditional data management are in conflict with core features of the new paradigm, such as self-service, cloud services, and other end user-oriented usage models. Over time, working with the open source software community and proprietary vendors, enterprises will derive workable, practical solutions. However, whether this will entail another failure of an ambitious metadata standards push remains to be seen.

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at [email protected].

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.