By using tdwi.org website you agree to our use of cookies as described in our cookie policy. Learn More

TDWI Upside - Where Data Means Business

Pillars of AI Governance with Fern Halper

Fern Halper, Ph.D., vice president and senior research director for advanced analytics at TDWI, talks about the six pillars of data governance for artificial intelligence -- including transparency and ethics.

In this latest “Speaking of Data” podcast, Fern Halper, Ph.D., vice president and senior research director for advanced analytics at TDWI, explores the six pillars of data governance for artificial intelligence, including important approaches for measuring bias and safety. [Editor’s note: Speaker quotations have been edited for length and clarity.]

For Further Reading:

Generative AI and Its Implications for Data and Analytics

Tackling Bias and Explainability in Automated Machine Learning

Mastering AI Quality: Strategies for CDOs and Tech Leaders

Halper began by explaining that with the ongoing expansion of types and sources of data organizations are using for their analytics -- including unstructured and semistructured data -- there are increasing opportunities for applying governance.

“We’re still in the process of working through what the framework for governing this looks like,” she said. “For instance, the models used to power AI and machine learning need to be governed just as much as the data that’s fed into them.”

Halper explained that TDWI is working with six “pillars,” or key components:

  • Trustworthiness
  • Transparency
  • Performance
  • Protection and privacy
  • Safety
  • Ethics

Trustworthiness, she said, is about data integrity as well as the data’s suitability for the problems it’s needed to solve. This involves traits such as completeness, accuracy, relevance, and reliability -- the usual characteristics we expect in high-quality data.

She continued by saying that the second pillar -- transparency -- is related to trustworthiness but also includes other features such as explainability and accountability, such that someone in the organization is aware of who is using the data and for what purposes.

Performance is also important, Halper continued, because an AI tool that takes too long to produce a result is of little use to the organization. “This is an area where observability tools can often come in handy as a way to monitor the health of the data landscape,” she added.

“Protection and privacy are obviously important,” Halper said, “given how many regulations and laws there are around keeping user data safe.” However, with generative AI, she noted, there’s the added aspect of having to keep AI safe from malicious prompts such as intentional hallucinations or “jailbreak prompts” -- prompts designed to circumvent the limitations placed on an AI tool.

Another example of assuring the safety of AI outputs Halper mentioned is that several organizations are providing benchmarks for how likely a given tool is to produce hallucinations. One such index is Galileo Labs’ Hallucination Index, which provides a ranking of the most popular LLMs and how likely they are to return hallucinations to user queries. She also noted that some organizations are implementing “human in the loop” processes, where humans evaluate output to ensure that it is not toxic or harmful. This process is naturally much more expensive, she explained, and so likely to be less used. Another idea for dealing with toxic output is what’s called constitutional AI, which involves training one AI to evaluate the output of another.

Finally, Halper turned to the topic of ethics.

“Ethics is an important aspect that probably isn’t getting enough attention,” she said. “For instance, TDWI published a Best Practices Report about responsible data and analytics and although fairness and ethics were on respondents’ radars, they didn’t rank as highly as one might have hoped. We currently have a survey in the field asking a similar question and the responses aren’t trending much differently.”

“One aspect of ethical AI is addressing bias. At one level, there’s making sure your training data isn’t biased, Halper said. “One popular example is the recruiting system that was unfairly biased against women engineers because the historical data it was trained on was heavily weighted toward male engineers. At another level, there’s making sure your algorithm isn’t biased in some way.

“It’s promising that there’s so much work being done to address this, though,” she continued. “For example, a search of the literature for ‘learning fair representations’ turns up a good number of papers. Vendors are also doing a lot of work in this area. For example, IBM has its Fairness 360 Toolkit, Google has its responsible AI practices, and Microsoft has its Fairlearn toolkit.”

Halper’s ultimate hope is that organizations start to take AI governance seriously enough to make it a part of their initial efforts rather than try to patch it on after the fact.

“It’s going to be a constantly moving target,” she said, “so organizations should already be making an effort to incorporate these pillars -- especially those related to ethics -- into their AI efforts.”

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.