Graph Databases for Analytics (Part 2 of 4): Practical Applications
Is a graph database the solution to your business problem? In this article we'll explore the common characteristics of practical applications for graph databases.
- By David Loshin
- July 26, 2016
We live in an ever-more-connected world. Smart mobile devices are ubiquitous, and many industries are implementing a wide array of sensors, controllers, and similar devices across different domains.
These devices are continuously broadcasting streams of data that contain important information about connections. Relational database systems can represent these links but can't capture the characteristics of the links, so this explosion of Internet-connected devices is an opportunity for graph-oriented applications.
As discussed in Part 1, graph data processing engines can ingest and represent the qualitative characteristics of both the entities and the links among them. The captured information is embedded in the connections between things, not just the characteristics of the things themselves.
Business environments suited to a graph data processing solution share these general features:
-- Connectivity: First and foremost, the environment involves documenting and understanding connected entities.
-- Entity volume: There are a large number of entities that can possibly be connected, such as the number of e-commerce website visitors and the products they view.
-- Entity variety: There are entities with different characteristics, such as individuals with different job skills using a recruiting application.
-- Link attribution: There are relevant characteristics associated with the connections between entities. For example, a person may have an employment relationship with a company, and that relationship may have a title, a duration, a location, and a salary.
Within such environments, graph databases are best used for solving business problems that involve answering questions related to the topology, or shape, of the graph. Some examples include:
-- Proximity and distance: How close are two entities to each other within the graph?
-- Centrality: To what extent is an entity "important" in a network; for example, who is the most influential person in a social network?
-- Density: What entities have the least or most connections?
-- Communication paths: What are the best ways to propagate information between sets of entities?
-- Similarity and differentiation: How do the characteristics of the relationships expose similarities or differences among the entities? Can clustering be done using network distance metrics?
Uses for Graph Databases
If your enterprise collects connected data elements and needs to answer these types of questions, you can probably think of some applications for graph processing. Uses for graph databases include:
Social network analytics for marketing and advertising: Identifying influential individuals within a social network helps target your advertising. If you understand the structure of the connections within self-organized communities, you can customize your promotions to maximize responses and increase revenues.
Cybersecurity: The graph paradigm is a good fit for modeling the connections among millions of Internet domains, sites, and servers. You can detect telltale signs of an attempted data breach by analyzing the connectivity patterns (e.g., similar registration details, geographic proximity, or routing paths), transaction patterns (e.g., sources of denial of service attacks or phishing attempts), and links to known malicious sites. Patterns of connectivity can also identify possibly malicious sites as well as behaviors indicative of criminal activity.
Logistics: A supply chain demonstrates hierarchical relationships that can be graphed, such as items per container, containers per pallet, pallets that fit into trucks, the truck routes between delivery points, and the types and quantities of deliveries made at each location. The links contain relevant attributes such as the distance between origin and delivery or the aggregate weight of the pallets on a truck. You can use graph models to optimize travel time, improve fuel efficiency, and verify that each item was delivered to the right location.
Smart buildings: Understanding the interoperation of connected sensors and controls (such as thermostats) within a facility is another opportunity for graph processing. Investigating these connections could lead to reducing energy costs, improving air quality, and using predictive models for preemptive maintenance.
In this case, the relevant information is associated with the proximity of devices (such as temperature, humidity, and CO2 sensors within a room, or rooms on a building floor). The graph approach is nicely suited to modeling the dynamic relationships embedded within the hierarchies of device connections among the different facilities within a complex.
In each of these example cases we see similar features -- each contains a variety of connected things in a context where understanding the connections can lead to business opportunities. In the next article in this series, we will look at some graph analytics and algorithm basics.
Read Part 3 of the series here.
David Loshin is a recognized thought leader in the areas of data quality and governance, master data management, and business intelligence. David is a prolific author regarding BI best practices via the expert channel at BeyeNETWORK and numerous books on BI and data quality. His valuable MDM insights can be found in his book, Master Data Management, which has been endorsed by data management industry leaders.