What Is Ground Truth? The Data Concept at the Heart of Machine Learning
Every supervised machine learning model learns the same basic way. It looks at an input, makes a prediction, compares that prediction to the correct answer, and adjusts itself based on the difference. Do that enough times across enough examples, and the model gets better at making predictions.
The correct answers are called ground truth.
That definition sounds simple. The practice of obtaining, maintaining, and trusting ground truth is considerably less so.
Ground truth is the label attached to each training example that tells the model what the right answer is. In an image classification model, the ground truth is the label that says this image contains a cat, or a car, or a tumor. In a fraud detection model, it's the label that says this transaction was fraudulent or legitimate. In a sentiment analysis model, it's the label that says this customer review is positive, negative, or neutral. The model has no way to learn without these labels. They are the signal it's learning from.
Where ground truth comes from varies significantly by domain, and the source matters enormously for the quality of what the model learns.
Sometimes ground truth is unambiguous and arrives naturally. A spam filter can use user behavior as ground truth: emails that users move to spam are labeled spam, emails they engage with are labeled not spam. A recommendation system can use clicks and purchases as ground truth for what users found relevant. In these cases, ground truth is generated continuously by the system itself as users interact with it, which makes it relatively cheap to obtain at scale.
More often, ground truth requires human judgment. Medical imaging models need radiologists to label scans. Content moderation models need human reviewers to classify posts. Document classification models need subject matter experts to categorize examples. This process, called data annotation or data labeling, is time-consuming, expensive, and introduces its own quality risks. Human annotators disagree with each other. They have different interpretations of ambiguous cases. They make mistakes, especially on the millionth example of a long annotation task. The ground truth that results is not perfectly reliable, and the model learns from whatever signal is actually in the labels, including the