Three Practical Uses for Graph Databases in MDM
Graph databases are not ready to replace relational database MDM platforms, but the use cases are growing in number. Here are three practical examples of when to supplement your MDM practice with a graph database.
There has been some buzz lately about leveraging emergent technology such as graph databases for master data management (MDM). In theory, this is appealing, but graph databases are not ready to serve as standalone MDM systems. In this article, I offer examples of useful applications where graph databases can augment -- not supplant -- a conventional relational MDM platform.
Use the Right Tool for the Job
In my consultancy, we always recommend the best tool to fit the purpose. Much of the time, relational databases "fit" master data -- that is, the data looks like rows and columns. However, sometimes the relationship between data elements is, itself, master data. Note the difference in the words "relational" and "relationship."
In relational databases, you know that two entities are related when you join them together by a key. In relationship-oriented data (the bread and butter of graph databases), you know two entities are related and you know the nature of the relationship. The relationship itself even has its own attributes. Yes, you can achieve this with relational databases, but not without consequences, technical debt, and sometimes an unwieldy data model.
Then there is the issue of what constitutes relationship-oriented master data. Relationships such as "Joan is friends with Betty and both work at Acme Widgets" are not mutable; these are clearly master-data-level relationships. If Joan orders a product online that is clearly transactional data. If she views or reorders that product or ones like it multiple times, she may demonstrate an affinity for certain products, product categories, or brands. This begins to look like MDM again, at least analytical MDM or master data derived from analysis of transactions and/or behavior.
Graph databases span these relationships effortlessly, so they can easily have a foot in both master data and transactional worlds. This is a gray area for graphs and MDM. One might argue: give me a good, relational master data model, and a data warehouse and I can accomplish the same thing. Again, this will likely require extra work and complexity to build and maintain (technical debt).
Here are three practical ways a graph database can be used in MDM -- where it "fits."
1. Recursive Hierarchies
In MDM, a recursive hierarchy is derived from a recursive relationship. A recursive relationship is an entity that has relationships within the entity itself. The classic (and tired) example is the manager-direct report relationship. You have an employee entity where Humphrey reports to Ingrid. This case is no reason to stand up a graph database (unless you have one of those "fun" org charts with lots of dotted-line relationships, team orientations, or other unconventional hierarchies).
A more compelling example might be recipes in a restaurant chain. You have a recipe entity with an enchilada dinner that has ingredients of cheese, tortillas, and enchilada sauce. However, the sauce itself is made from scratch, so it is also a recipe (thus is also a member in the recipe entity). This sauce has a number of ingredients and is used in several other recipes. Those ingredients come from different suppliers and distribution centers for restaurants on the east coast versus those on the west coast, and all ingredients need to be costed correctly in the back office system.
As you can see, the relationships between the data elements are master data and a graph is a great way to manage them.
2. Analytical MDM
Even in cases where a relational database easily handles all the MDM relationships operationally, a supplementary graph database can be used for analytical purposes. Although the conventional MDM platform might be perfect for managing the data from an operational perspective, it can be cumbersome to connect and analyze master data -- particularly when there are:
- More than two or three degrees of separation between entities of interest (for example, product > part > supplier > distribution center > region) requiring multiple joins
- Multiple relationships or branches to traverse requiring complex or conditional joins
In these cases, a side-by-side representation of the same master data in the form of a graph would make it much easier for analysts to traverse. Even API developers could use it when the GET requirements involve combining data from two or more entities that are "far apart."
3. Data Completeness
Finally, graph databases could be used to discover MDM data quality issues, particularly completeness (as opposed to accuracy and consistency). The graph can be traversed to find places in hierarchies or data relationships with missing connections or values in members related to a different entity. Back to the recipe example, you could use a graph to find recipes and "sub-recipes" that have ingredients with missing supplier information. You could do that with a relational database, but again, it's extra work, requiring complex joins and SQL statements.
A Final Word
The real takeaway is that MDM in practice is a multifaceted discipline evolving to meet the demands of changing business data needs. There is rarely a one-size-fits-all solution, and sometimes enterprises need more than just an out-of-the-box relational database MDM platform. A graph doesn't replace this system, but it can help you better manage and explore the relationships embedded within the sphere of master data.
McKnight Consulting Group is led by William McKnight. He serves as strategist, lead enterprise information architect, and program manager for sites worldwide utilizing the disciplines of data warehousing, master data management, business intelligence, and big data. Many of his clients have gone public with their success stories. McKnight has published hundreds of articles and white papers and given hundreds of international keynotes and public seminars. His teams’ implementations from both IT and consultant positions have won awards for best practices. William is a former IT VP of a Fortune 50 company and a former engineer of DB2 at IBM, and holds an MBA. He is author of the book Information Management: Strategies for Gaining a Competitive Advantage with Data.