By using website you agree to our use of cookies as described in our cookie policy. Learn More

TDWI Upside - Where Data Means Business

Why Software That Unlocks Microbiome Data Could Be Key in the Climate Change Battle

A pioneer in applying network science to biology discusses how data science, and data management in particular, are starting to generate highly useful life sciences insights.

For the past 100 years, humanity's agricultural processes have relied on the use of chemical-based fertilizer. That's given us abundant, cheap food and resilient crops. However, these benefits have come at a high cost, as an over-reliance on synthetic fertilizers has been gradually decimating the bacterial diversity of the soil.

For Further Reading:

Q&A: Data Fabric Technologies: Stitching Together Disparate Data for Analytics

Data Fabrics for Big Data

Using Advanced Analytics for Societal Good

We need to break this cycle, and do so urgently. Microbes in soil work to capture and fix carbon while plants capture carbon from the carbon monoxide in the atmosphere and transfer it in to microbes in the soil (where it is fixed). Unless we revive these microbial communities, we may not be able to reverse the climate crisis. If we can resuscitate the microbial universe, nature can reverse climate change -- perhaps in as short as 30 years.

Balancing Climate and Our Species' Future

The challenge is that we know very little about the microbial universe of the soil -- a mere 1 percent has so far been properly categorized. However, what is providing vital assistance in helping scientists understand these processes is highly sophisticated data management technology.

There are still many practical hurdles to overcome, especially around data standardization. However, modern data management and applied network science, using graph-based data structures and AI-based analytics, are allowing everyone from climate change scientists to soil experts and agri-manufacturers to benefit from meaningful microbiome-based knowledge discovery and analytics.

Before any scientific team can make useful conclusions or open up innovation from data, it must first be able to:

  • Access all of it
  • Be confident in its integrity
  • Combine it
  • Make reliable interpretations with it
  • Use it purposefully

A New Player has Entered the Game: Data Fabric

The richer and more diverse the data sources, the harder all of these tasks become. A core tenet of advanced knowledge discovery, then, is building something the IT industry terms the "data fabric," which describes the automated design and deployment of data curation, cataloging, and valuation from a diverse range of sources. The concept transcends individual databases, data warehouses, or data lakes and is meant to define a unified technology architecture that can manage all potential data sources, treating diverse assets as a meshed environment.

Rich metadata (information about the data) plays an especially important role in organizing and locating data, supporting powerful and continuous analytics within defined parameters. Analysts at Gartner say, "The data fabric is a single architecture that can address the levels of diversity, distribution, scale, and complexity in an organization's data assets." It adds that a data fabric can also reduce time for integration design by 30 percent, deployment by 30 percent and maintenance by 70 percent.

A fit-for-purpose data fabric ensures a sufficiently comprehensive set of data is always available and accessible, has integrity and value, plus can be exchanged, compared, and understood in reliable and meaningful ways by non-data science colleagues. Graph technology is another visualization technique that provides insight into interrelationships between data, especially in complex scientific projects.

Although graphs are a well-established way of representing data, advances in their application are transforming what companies can do with that data, especially in complex scientific tasks and projects. It is here that hypergraphs (a knowledge graph with several interconnected nodes) have entered the spotlight; they allow scientists to explore discrete connections and understand correlations between multiple, diverse data sets, thereby transforming knowledge discovery.

Going one step further involves multilayering of multiple hypergraphs. Stacking like this allows scientists to gain even greater microbiome insight though understanding more about the context between the data. This approach helps with interpretability, as researchers can see at a glance what is happening at different layers or in a particular context.

If something is of potential interest, the user can zoom in on an entity or node and see another graph, allowing them to go deeper in their investigations. For instance, at the study layer you are looking at multiple cross-studies, such as other soil geographies, another layer is the soil/plants/bacteria within the plant, and another layer is where you understand what combination of fertilizer or pesticides is optimal.

Graph is a Unifying Language for Data

AI-enhanced graph modelling is a method that elevates microbiome-associated discovery to a higher level. Gartner also notes that up to 50 percent of current client inquiries on the topic of AI involve discussion of the use of graph technology. AI can help decode patterns that might not be obvious to the human observer, and once the structure of data has been identified, this knowledge can be represented in a graph. In this sense, the graph becomes a unifying language for representing data, so biologists and other scientists don't have to be data engineers to manipulate the data or experts in statistics to decipher the connections being made.

Finally, advanced application of causal inference science distils and investigates associations between diverse data, allowing scientists to design new studies to delve deeper into root-cause analysis. To advance understanding, scientists must also be able to differentiate between correlation and causality in order to know where to focus on new areas of investigation.

As we can see, this is a complex picture in terms of data management and analysis. However, the promise of using graph and AI techniques and causal inference to explore the microbiome will lead to a sharp acceleration in new discoveries about soil biology and in how to farm more sustainably.

Could humanity look forward to meeting the challenges of the 21st century with the help of this important microworld that advances in data science and AI are opening up for us? The chances are looking very good thanks to advanced data software's intervention.

About the Author

Anthony Finbow is chief executive officer at Eagle Genomics Ltd., a U.K.-based pioneer in applying network science to biology linked to the microbiome, which is working with five of the top 10 household and personal care companies in the world to create products that work in harmony with the human and ecological microbiome. You can contact the author via LinkedIn.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.