TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

TDWI Articles

Executive Q&A: Application and Data Observability Explained

Monitoring and tracking operations and gaps is the focus of observability. We spoke to Lior Gavish, CTO and co-founder of Monte Carlo, to learn more.

By Upside Staff
February 16, 2023

What does “observability” mean in a business context? In a technical context?

Observability is a management strategy that highlights critical issues affecting the reliability or performance of a workflow, process, or system. Observability can be applied to a variety of industries, from software development and security to analytics data and supply chain management.

For Further Reading:

Unified Observability Takes the Load Off of Overworked IT Teams

Myth-Busting DataOps: What It Is (And Isn’t)

Q&A: The Fundamentals of Data Quality

In a business context, observability refers to the ability to monitor and track operations and identify performance gaps that impact revenue, the customer experience, or other bottom-line metrics. In a technical context, observability provides visibility into the end-to-end health of a system, often connected by multiple technologies.

What is application observability?

Application observability refers to the end-to-end understanding of application health across your software environment to prevent application downtime. Application downtime is a period of time during which your software is laggy, unusable, or otherwise performing poorly.

It’s easy to think of application downtime in terms of software performance and “outages” (for instance, when websites crash or software applications lag to a noticeable degree). Application observability uses metrics, traces, and logs to understand software performance, putting together a holistic picture of application health that makes it easy to identify where and how your system broke. Tools like Datadog, New Relic, and Splunk are popular commercial products that piece together these disparate operational data sources for a more comprehensive view of application reliability.

What is data observability?

Like application observability, data observability also tackles system reliability but of a slightly different variety: analytical databases.

Data observability is an organization’s ability to fully understand the health of the data in their systems. Data observability eliminates data downtime by applying best practices learned from DevOps to data pipeline observability.

Data observability tools use automated monitoring, automated root cause analysis, data lineage, and data health insights to detect, resolve, and prevent data anomalies.This leads to increased visibility into data quality issues, such as stale data, duplicate rows, or even null values, before they’re surfaced to downstream stakeholders.

Why are so many enterprises starting to focus on data observability?

It has been the job of companies’ software engineering teams to focus on applicationobservability for the better part of two decades. Now, given organizations’ heavy reliance on data, it is the responsibility of data teams to track, resolve, and prevent what we refer to as data downtime, in other words, periods of time when data is missing, erroneous, or otherwise inaccurate. According to a 2022 survey conducted by Wakefield Research, data teams spend 30 to 50 percent of their time tackling data downtime and other data quality issues.

Problems with data quality can permeate deep into the enterprise, impacting customer service, marketing, operations, sales, and ultimately revenue. When data powers digital services or mission-critical decisions, the stakes can multiply. Preventing data downtime and having end-to-end observability of your company’s data has become a top-line priority for enterprises entering 2023.

What’s motivating them?

To put it simply, reliable data is equivalent to time, money, and customer trust. For data engineers and analysts, unreliable data means wasted time and resources; for business leaders, incidents of data downtime can cost companies millions of dollars -- even in the hundreds of millions. For data consumers, or consumers of the company, it erodes confidence in the data for decision making, and confidence in the company overall.

If data is the new oil, it is critical to monitor and maintain the integrity of this precious resource. Just like most of us would not tape over the “check engine” light in our vehicles, we need to pay attention to data observability practices together with infrastructure and AI observability for businesses that rely heavily on those areas.

What benefits are enterprises looking for or expecting by employing observability?

For Further Reading:

Unified Observability Takes the Load Off of Overworked IT Teams

Myth-Busting DataOps: What It Is (And Isn’t)

Q&A: The Fundamentals of Data Quality

Observability ensures that technical teams are the first to know when issues arise, before downtime or performance issues affect downstream consumers. By surfacing additional context about the root cause and impact of issues, observability helps improve stakeholder trust in the technology or system being observed (i.e., a website or dashboard).

As highlighted by Gartner in their Innovation Insight for Observability report, “Observability enables quick interrogation of a digital service to identify the underlying cause of a performance degradation, even when it has never occurred before.” By giving software, data, and other functional teams the ability to detect and resolve issues before they’re realized by customers and other end users, observability provides businesses with insurance on their technical investments.

What are the biggest technical overlaps between software observability and data observability?

Observability is no longer just for software engineering. With the rise of data downtime and the increasing complexity of the data stack, observability has emerged as a critical concern for data teams, too.

Just as DevOps takes a continuous integration and development approach (CI/CD) to the development and operations of software, DataOps emphasizes a similar approach for how data engineering and data science teams can work together to add value to the business.

Similarly, just as software engineers use unit tests to identify buggy code before it’s pushed to production, data engineers often leverage tests to detect and prevent potential data quality issues from moving further downstream.

What are some of the biggest differences between application observability and data observability?

The key differences between application observability and data observability tools are the responsibilities they hold across their respective systems.

Application observability solutions monitor across three key pillars: metrics, a numeric representation of data measured over intervals of time; traces, a representation of a series of distributed events that encode the end-to-end request flow through a distributed system; and logs, timestamped records of discrete events that happened over time.

Data observability solutions similarly monitor across five key -- but very different -- pillars: freshness, how up-to-date your data tables are; distribution, a function of your data’s possible values, and tells you if your data is within an accepted range; volume, which refers to the completeness of your data tables and offers insights on the health of your data sources; schema, changes in the organization of your data, in other words, schema, often indicates broken data; and lineage, which helps you answer the question of “where?” when data breaks by telling you which upstream sources and downstream ingestors were impacted, as well as which teams are generating the data and who is accessing it.

Who are the users of software observability tools? Who are the users of data observability tools?

The key personas leveraging and building application observability solutions include software engineer, infrastructure administrator, observability engineer, site reliability engineer, and DevOps engineer.

Companies with lean teams or relatively simple software environments will often employ one or a few software engineers whose responsibility it is to obtain and operate an application observability solution. As companies grow, both in size of the team and in complexity of applications, observability is often delegated to more specialized roles such as observability managers, site reliability engineers, or application product managers.

At the end of the day, data reliability is everyone’s problem, and data quality is a responsibility shared by multiple people on the data team. Smaller companies may have one or a few individuals who maintain data observability solutions. However, as companies grow both in size and quantity of ingested data, the following more specialized personas tend to be the tactical managers of data pipeline and system reliability.

Other specialized roles within a company that are responsible for data observability include data engineers, data product engineers, analytics engineers, and data reliability engineers.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

TDWI Articles

Executive Q&A: Application and Data Observability Explained

Related Articles

Trending Articles

Agentic BI Is Still Not Ready for Enterprise Prime Time

Self-Healing and Intelligent Data Delivery at Scale (Part 2 of 2)

Self-Healing and Intelligent Data Delivery at Scale (Part 1 of 2)

The Hidden Cost of AI at Scale: Why Data Architecture Matters More than Models

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

TDWI Articles

Executive Q&A: Application and Data Observability Explained

Related Articles

Trending Articles

Agentic BI Is Still Not Ready for Enterprise Prime Time

Self-Healing and Intelligent Data Delivery at Scale (Part 2 of 2)

Self-Healing and Intelligent Data Delivery at Scale (Part 1 of 2)

The Hidden Cost of AI at Scale: Why Data Architecture Matters More than Models

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career