Executive Perspective: The Power of Graph Databases
In business, it’s critical to establish and nurture customer relationships. In the world of data analytics, relationships are just as important. Alicia Frame, lead product manager for data science at Neo4j, a graph database vendor, explains how this technology helps enterprises maximize data relationships.
- By James E. Powell
- November 10, 2020
Upside: What technology or methodology must be part of an enterprise’s data or analytics strategy if it wants to be competitive today? Why?
Alicia Frame: Graph technology should be a part of every enterprise’s data and analytics strategy. A graph database platform is superior to other technologies for connected-data applications for multiple technical reasons, including power, performance, flexibility, and developer productivity.
Relational methods for storing data have plagued software developers for decades with the constant need to translate from business objects into relational tables and back again. In their infancy, relational systems were more flexible than the older technologies they replaced. However, as relational systems grew in complexity, the sheer number of tables and joins often crippled system maintainability and performance.
Graph applications avoid the disconnect between the technical and business worlds by maintaining data in a rich model of entities and their relationships rather than in tables. As a result, everyone in an enterprise organization -- including developers, system architects, and business managers -- can use the same understandable graph concepts to create data models and system designs that transform directly into applications.
What one emerging technology are you most excited about and think has the greatest potential? What’s so special about this technology?
I’m super excited about the maturation of graph techniques in data science from essentially an academic exercise to a real enterprise-ready technology. In the last decade, we’ve seen an explosion in interest in graph data science: over 20,000 peer-reviewed publications (node2vec has over 3,500 citations alone!) have been published in the last decade, and Gartner’s Top 10 Trends in Data and Analytics for 2020 identified graph techniques as the foundation for modern analytics.
Graph techniques -- from algorithms to embeddings to neural networks -- are all about using the network structure of your data, or the relationships between your data points, to make better, more generalizable, more contextual predictions.
What is the single biggest challenge enterprises face today? How do most enterprises respond (and is it working)?
The biggest challenge I’ve seen lies in translating something from a cool idea to something tangible that creates business value. As a data scientist, I spend a lot of my time trying to sort out how we can translate the hottest new method published in an academic journal into something that actually works. Everyone I work with is tired of hearing me say, “But will it scale?” It’s honestly the first question I ask.
There are so many amazing techniques out there that just aren’t built to run on terabytes (or petabytes!) of data: beautiful algorithms or mathematical proofs that just don’t translate into enterprise technology. I think some enterprises respond by hiring brilliant people and telling them to make it work -- that doesn’t usually work out.
Other enterprises critically evaluate what translates from a journal into business value and are willing to accept that some things won’t translate or will need to be re-engineered. I spend a lot of my time with customers who’ve read an inspiring paper and don’t understand why we don’t offer that specific technique. It often comes down to understanding what they actually want to do (and the value it provides) and finding the solution that works on the scale of data they have rather than picking the cool, new tech first.
Is there a new technology in data or analytics that is creating more challenges than most people realize? How should enterprises adjust their approach to it?
I think the volume of data we have today creates a lot of challenges for people when it comes to finding meaning and evaluating quality beyond just the broad trends or the predictions that black-box models spit out. There’ve been many examples recently of data science gone wrong: biased and unethical predictions or models based on pre-COVID data that just aren’t applicable now.
We often think more data will solve the problem, but really the solution is more context to the data we have. Where did this prediction come from? What does the data around this recent event look like? Graphs are a really powerful tool to enable us to integrate our big data such that we can access the clues buried in patterns or small data or so we can put our predictions into a broader context.
What initiative is your organization spending the most time/resources on today? In other words, what internal project(s) is your enterprise focused on so that your company (not your customers) benefit from your own data or business analytics?
Neo4j launched our cloud database-as-a-service, Neo4j Aura, last year, and just announced the availability of Aura on GCP. We’ve spent a lot of time and resources getting our cloud offering right -- but we also need to use it ourselves if we want to fully understand the user experience.
The analytics engineering team has been shifting our internal benchmarking data from files on internal servers to a Neo4j instance on Aura. We’ve learned a lot as a team about sizing databases appropriately and actually modeling our results as a graph -- so we can query results and compare versions, understanding what makes our model perform better or worse -- rather than falling back on CSV files.
Where do you see analytics and data management headed in the rest of 2020 and into 2021? What’s just over the horizon that we haven’t heard much about yet?
During the rest of 2020 and beyond, graph technology will be used to increase AI accuracy. Graphs are designed to treat the relationships between entities as equally important to the entities themselves. Because relationships are highly predictive of behavior, organizations will start using graphs to get more predictive power out of the data they already have.
Many of today’s AI uses will be improved by adding various types of context. For example: adding contextual and adjacent data so we’re not over-focused in our learning (which leads to narrowly applicable AI), leveraging relationships and network structures in machine learning to improve model accuracy and reduce false positives, using context to make heuristic AI smarter with a framework for probabilistic decisions, and using AI supply-chain tracking to help ameliorate human failings (such as unintentional bias). If we want AI systems to be inherently more transparent from the outset, they need to be built in conjunction with relevant context.
Describe your product/solution and the problem it solves for enterprises.
Neo4j is a native graph database, meaning it is purpose-built from the ground up for connected data that powers AI, fraud detection, real-time recommendations, and much more. The first advantage of this technology is that the physical representation of the data more or less matches the way you would draw it on a whiteboard, making it easy to communicate, evolve, and understand. Second, it’s especially great at spidering through complex networks of data easily and efficiently, particularly as data sizes grow. Data is connected as it’s loaded, which turns connected queries into pointer-chasing operations, which are incredibly fast and efficient.
James E. Powell is the editorial director of TDWI, including research reports, the Business Intelligence Journal, and Upside newsletter. You can contact him
via email here.