TDWI Articles

How to Incorporate Graph Analytics into Your Information Strategy

As data volumes grow and business information needs change, graph analytics can be a viable alternative to traditional analytics for specific problems. To assess whether graph analytics fits into your enterprise, you need to understand what graph analytics is and where you can use it.

As data volumes grow and companies seek new ways to use that information to drive business results, the types and composition of problems become more varied. Traditional analytics methods fail to adequately address some of the new, complex problems, many of which require new technologies architected from the ground up for specific use cases. One such area is graph analysis, which generates business value by determining the connectedness of data points and the identification of clusters of related data points based on levels of influence, frequency of interaction, and probability.

For Further Reading:

Should a Graph Database Be in Your Next Data Warehouse Stack?

Three Practical Uses for Graph Databases in MDM

4 Reasons to Use Graphs to Optimize Machine Learning Data Engineering

Graph Terminology

To take advantage of graph technology to solve business challenges, you must understand the fundamental concepts around which graph analytics (and, more specifically, graph databases) was built. Graph analytics includes a set of analytics techniques that allow you to explore relationships between entities (such as organizations, people, or transactions). These entities and relationships are mapped into a graph of nodes, edges, and properties. Once data is mapped into this structure, graph analysis that was cumbersome and challenging in a traditional database can proceed in an optimized manner.

Each node represents an entity in the graph. The parallel to a node in a relational database is a row, a record, or a tuple.

Edges represent a relationship between nodes. The relationships not only connect the nodes but also have direction (either unidirectional or bidirectional). The direction and magnitude of these edges play a role in the graph analysis. Edges are a concept that is not inherently captured in the relational or document store databases. This construct is where graph databases are explicitly architected for graph-related problem sets.

Properties are the attributes -- the ancillary information -- that describe a node. Properties are the equivalent of columns or attributes in a relational database.

The difference between a graph database and a relational database is that the graph database is optimized to represent and analyze networks of relationships in a relatively quick and efficient manner.

Use Cases

Graph analytics finds patterns among the relationships between nodes. If your business faces problems that fall into this space, the use of graph-oriented technology can significantly enhance your analytics team's efficiency. Here are some of the more common use cases for graph analytics.

Social Network Analysis

By analyzing social networks, you can identify influencers, decision makers, and dissuaders. In a sales organization, this information can be used to isolate whom to talk to and who can maximize efforts to close a deal. This analysis can expose unintuitive insights that can expedite efforts to persuade behavior or expedite the decision-making process. It can mean the difference between success and failure in pushing prospects through the sales pipeline.

Social network analysis can also be leveraged as part of workforce analytics. It can isolate and identify trendsetters and social influencers among the workforce. The identification of these workers with high persuasive impact can be leveraged to expedite the adoption of workforce initiatives or to help head off problematic behavior. Targeting key influencers and ensuring their engagement and buy-in can make the difference in developing an engaged workforce.

Fraud Analysis and Identification

Fraud often relates to interactions between different actors. Being able to analyze these interactions can expose clusters of problematic entities within the system. Identifying bad actors and putting measures in place to head off their behavior before it harms your business can save your enterprise time and resources.

In the same vein, graph analysis can be used to identify illegal behavior and criminal activity. By analyzing formal and informal networks of people, it is possible for law enforcement agencies to identify money laundering and other criminal activity. Analysis can help you differentiate between malignant and benign behavior within the network.

System Resource Management

Computer and communication networks can be analyzed to optimize the configuration of system resources to balance loads and maximize system utilization. By analyzing the relationships between the components in the system, it is possible to identify which resources are overloaded, model reallocation of traffic to reduce risk, and reconfigure the topology to improve operations.

Utility companies can also use this type of analysis to load balance their resources and deliver their utilities in a manner that maximizes their effectiveness and reduces overall wear and tear on critical components.

Companies looking for route optimization can use graph analysis to identify either the fastest route or the safest route in areas such as transportation, distribution, or even foot traffic. This optimization can save time and money and ensure that the supply chain is as effective as possible.

Terminology

Once you have the basic understanding of how to structure the data associated with graph analysis and what use cases it can leverage, you must understand the terminology of graph analysis. This brief glossary includes several essential graph analytics terms.

Neo4J is the most widely used graph database. It is open source and optimized to scale both vertically and horizontally to handle vast graph networks. The Cypher language is used for querying and updating a graph store in Neo4J.

The Resource Description Framework (RDF) is a standard specification for modeling graph data. Some graph databases are based on this specification; AllegroGraph and MarkLogic are examples of RDF data stores. SPARQL is a protocol associated with querying the RDF structure.

Gremlin, a graph traversal language developed by Apache, works for both OLTP-based graph databases and OLAP-based graph processors.

Python is the most popular language associated with data analysis. It supports built-in data structures as well as dynamic typing and binding. R is the second most popular; this language and environment are optimized for statistical computing and visualization.

SQL refers to the American National Standards Institute (ANSI) standard for querying relational databases. Although graph databases are optimized for graph analysis, much of the data used for graph analysis lies in a relational database. Data extraction often must be accomplished by leveraging SQL.

Scala is a fully functional, object-oriented language. It is highly interoperable with Java through the Java Virtual Machine (JVM).

A Final Word

With a solid understanding of the base concepts associated with graphs, the use cases graph analytics is ideally suited for, and available technologies, your team can decide if graph analytics is right for your business. If it is, graph analytics and its associated technologies can significantly enhance your team's efforts and provide capabilities that were either extremely difficult or impossible using traditional analytics methodologies.

 

About the Author

Troy Hiltbrand is the senior vice president of digital product management and analytics at Partner.co where he is responsible for its enterprise analytics and digital product strategy. You can reach the author via email.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.