TDWI Articles

Data Observability in the Cloud: Three Things to Look for in Holistic Data Platform Governance

Thanks to AI, next-generation data observability tools will extend beyond identifying problems and explain how to resolve the problem. To do so, your data platforms will need these three key features.

Today, the speed of innovation is accelerating at an astronomical pace as companies, regardless of their market sector, strive to do more with more -- more data, more AI and LLMs, more users, more projects, and ominously, more opportunities for pipelines to break or stall. All of this can add up to prohibitively higher costs on cloud data platforms and slower time to value.

For Further Reading:

Executive Q&A: Application and Data Observability Explained

How OpenTelemetry Is Revolutionizing Observability

Executive Q&A: All About Data Contracts

There’s often a black hole in understanding where cloud data spending goes, and cloud data vendors aren’t especially forthcoming in explaining these costs either. Worse still, data teams are tasked with building, scaling, and monitoring more applications faster. However, these projects end up invariably in a state of “hurry up and wait” as teams struggle to figure out where and why pipelines are breaking (because they will break) while also having to find opportunities to optimize for performance. It’s these performance problems that inevitably slow time to market and value.

As the saying goes, you can’t manage what you can’t see. Historically, data observability tools focused on this visibility -- providing insight into where problems occur. Today, this functionality serves as the starting point for the next wave of data observability tools. With the incorporation of AI, these next-generation tools will extend beyond showing where problems exist to providing a holistic data platform governance solution that also explains what to do about a problem and how to stop it from happening again.

Look for the following three key features in your data platforms.

Automation

Data observability solutions should incorporate AI/ML to detect anomalies and issues with cost, performance, and data. These tools should include:

  • Predictive analytics, which anticipates data issues before they occur and heads them off, ensuring that pipelines run as cost-effectively as possible without sacrificing reliability
  • Guardrails, the enforcement mechanism for predictive analytics, using AI and automation to help developers optimize code for cost and performance efficiencies.
  • Automatic self-healing (or human-in-the-loop systems) trigger alerts and automated, corrective actions in real time, thereby reducing the time (and need) for manual intervention

FinOps Beyond Data Observability

To be truly meaningful in addressing the pain associated with data and AI pipelines, data observability tools must expand into FinOps. It’s no longer enough to know where a pipeline stalls or breaks -- data teams need to know how much the pipelines cost.

In the cloud, inefficient performance drives up computing costs, which in turn drives up total costs. Tools must encompass FinOps to provide observability into costs pertaining to both infrastructure and computing resources, broken down by job, user, and project. They must also include advanced analytics to provide guidance on how to make individual pipelines cost-efficient. This will free up data teams to focus on strategic decision-making rather than spending their time reconfiguring pipelines for cost.

Purpose-Built, Platform-Specific

Each cloud data platform has its own set of pricing and performance nuances, meaning data observability tools which feature FinOps and AI-driven insights will need to be compatible with each specific data platform.

To meet these demands, data observability solution vendors must offer custom products that allow customers to see on a platform-specific level such things as detailed cost visibility, efficient management of storage costs, chargeback/showback, and where the expensive projects, queries, and users lie. Additionally, they must provide insights that mitigate expensive jobs that offer little or no ROI and prevent poorly written code (that is costly and prone to breaking) from ever getting into production.

About the Author

Eric Chu is vice president/product at Unravel Data where he is responsible for envisioning and defining the company's product road map, leading the product engineering team to build a high-quality product that addresses customers’ needs. You can reach the author via email or LinkedIn.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.