New Research Warns of AI Bias Dangers
Findings reveal how the demographics and backgrounds of people training AI models influences model outputs.
Note: TDWI’s editors carefully choose press releases related to the data and analytics industry. We have edited and/or condensed this release to highlight key information but make no claims as to its accuracy.
New research analyzing data from the Prolific research platform reveals how the demographics of people labelling the data used to build and train AI models influences the models’ decisions. For example, what one person finds offensive another may find perfectly acceptable. This has major ramifications for the development of AI systems with the danger that existing biases are baked into them and amplified.
Machine learning and artificial intelligence systems often rely on high-quality human labelling and annotation -- people reviewing and categorizing the output of language models to train them (e.g., to learn what kind of content is harmful or to better understand human intentions). This is often referred to as “human-in-the-loop” or reinforcement learning from human feedback (RLHF).
The study conducted collaboratively by Prolific, Potato (a web-based annotation tool), and the University of Michigan, found that age, race, and education are statistically significant factors in determining how data is labelled. For example, when asked to rate the offensiveness of online comments, Black participants tended to rate the same comments as being significantly more offensive than other racial groups.
Prior research on annotator background has mostly focused on specific aspects of identity, such as gender, and on certain tasks, such as toxic language detection. This study undertook a much broader analysis, including offensiveness detection, question answering, and politeness. The data set contains 45,000 annotations from 1,484 annotators, drawn from a representative sample of the U.S. population regarding sex, age, and race.
Findings from the research include:
- Gender: The research found no statistically significant difference between men and women in rating content as offensive.
- Race: The study found significant racial differences in offensiveness rating. Black participants rated the same comments as significantly more offensive than all other racial groups. The scores of white participants strongly correlated with the original Ruddit data set, which suggests that the original annotations were likely done by white annotators.
- Age: People aged 60 and over tend to find comments more offensive than middle-aged/younger participants.
- Education: There were no significant differences found with respect to participant education.
Despite this task being largely objective (i.e., questions correlated to single right answers), accuracy in question answering did vary according to background. The largest effects were seen with race and age variation, with a smaller effect for education. The performance differences mirror known disparities in education and economic opportunities for minorities compared to their white male peers in the U.S.
Politeness is one of the most prominent social factors in interpersonal communication. The study found that:
- Women judged messages as being less polite than men did.
- Older participants were more likely to give higher politeness ratings.
- Those with high education levels tended to give lower ratings.
- Black participants rated messages as being more polite than their white peers.
- Asian participants gave lower politeness ratings overall.
Commenting on the research, Phelim Bradley, CEO and co-founder of Prolific said, “Artificial intelligence will touch all aspects of society and there is a real danger that existing biases will get baked into these systems. This research is very clear: who annotates your data matters. Anyone who is building and training AI systems must make sure that the people they use are nationally representative across age, gender, and race, or bias will simply breed more bias.”
“Systems [such as] ChatGPT are increasingly used by people for everyday tasks,” said assistant professor David Jurgens of the University of Michigan School of Information. “But whose values are we instilling in the trained model? If we keep taking a representative sample without accounting for differences, we continue marginalizing certain groups of people.”
The correct training and fine-tuning of AI systems is incredibly important to the safe development of AI, and to avoiding these systems amplifying existing biases and toxicity. This means ensuring that annotators are nationally representative across race, age, and gender.
The fair treatment of annotators is another crucial element of AI training and development. Reports have emerged of low-paid workers in developing countries being used for labelling and being subjected to reams of toxic online content.
This research was conducted by Jiaxin Pei and David Jurgens from the School of Information at the University of Michigan. They analyzed POPQUORN (the Potato-Prolific data set for Question-Answering, Offensiveness, text Rewriting, and politeness rating with demographic Nuance). POPQUORN contains 45,000 annotations from 1,484 annotators drawn from a representative sample of the U.S. population regarding sex, age, and race.
The participants were asked to complete four common natural language processing (NLP) tasks:
- Judge 6000 Reddit comments from the Ruddit data set for their level of offensiveness.
- Answer question tasks from the SQuAD data set.
- Rewrite an email to make it more polite.
- Rate the politeness of the original and new emails generated in the previous task.
The data set is being made publicly available to offer AI companies an opportunity to explore a model that accounts for intersectional perspectives and beliefs.