Data Management in the Age of AI
Fern Halper, TDWI vice president and senior research director for advanced analytics, discusses data management in the age of AI -- including exciting use cases and potential challenges.
- By Upside Staff
- August 15, 2023
In this recent “Speaking of Data” podcast, Fern Halper discussed the effect of recent developments in artificial intelligence on data management. Halper is TDWI’s vice president and senior research director for advanced analytics. [Editor’s note: Speaker quotations have been edited for length and clarity.]
Halper began with a quick overview of how data management for AI differs from data management in general.
“The key word in generative AI is ‘generate.’ Rather than analyzing or processing data, generative AI technologies learn from existing data and then create new data that's consistent with the original data set,” she said. “This requires huge volumes of diverse data on which to train the models, enormous numbers of pipelines to ingest and integrate that data, and the computing power to manage it all.”
Halper noted that much of this work relies on modern architectures such as cloud data lakehouses, which combine aspects of the structured data warehouse with the unstructured data lake.
Halper went on to explain that some products, such as ChatGPT, can be considered “foundation models” -- existing models that can be trained to work using a company’s data -- which require an additional level of consideration when being implemented.
“Companies are not going to want their proprietary data to be part of someone else’s model,” she said, “so they’re going to have to think about what their data management infrastructure looks like before implementing one of these foundation models.” Given that generative AI involves some of the largest and most computing-intensive machine learning models on the planet, not everyone will be able to train a foundation model, she added, but some will.
“For example,” she said, “at our Executive Summit in Orlando this fall, we’ll have a number of companies that have done exactly that come to talk about what they did and how it worked for them.” Executive Summits, she explained, are geared toward director-level practitioners and above to come and learn from peer organizations that have first-hand experience with a topic, as well as to hear from experts in the field who talk about trends and best practices. Other topics that will be covered in Orlando include what a next-generation data strategy looks like, how to design an architecture to support AI, and how to govern modern data and analytics.
The discussion then turned to the topics that are top of mind for TDWI listeners today.
“There’s been a lot of interest in new data platforms, such as unified platforms to support modern analytics and AI, because organizations are tired of data silos,” Halper said. “Typically, this is some sort of cloud stack or data lakehouse, but sometimes it’s a virtualization or semantic layer, such as a data fabric.”
“Of course, we’re seeing a lot of interest in generative AI. Attendees at our webinars have been asking a lot of questions about implementing generative AI, the different approaches vendors have to supporting it, and how to govern it -- a lot of ‘nuts and bolts’ issues.”
“We also get questions about broader issues such as what to do when models are generating inaccurate or otherwise bad responses,” Halper said. There is some work being done in academia, she explained, into ways to see inside these “black box” models and actually determine how they came to a particular decision.
“One would imagine that vendors offering products based on their own foundation models will have to offer some kind of support to their customers.” Even still, Halper added, if organizations want generative AI products that are going to be production-ready, trustworthy, and compliant, they will have to be ready to dig into the work of managing it.
“Just as you have to consistently monitor machine learning models to ensure they don’t drift or go stale, keeping generative AI products sharp and on target may require as much work or more,” Halper said.
The conversation then turned to the current market for generative AI-enabled products.
“First, people should know that there’s a lot of vendor activity happening,” Halper explained. “There have been hundreds, if not thousands, of startups recently to provide point solutions -- text generators, music generators, image generators, and speech generators.” Halper expressed not being impressed with the results from most of the tools she’s tried so far, but as more vendors incorporate generative AI into their products -- such as Databricks’ recently announced Lakehouse IQ, which purports to enable users to interact with their data using natural language -- she advised users to keep an eye on what is likely to be a steady stream of new product releases.
She did warn, though, that there is already increased regulatory and legal scrutiny of these new technologies, which may slow adoption and increase overhead for enterprises adopting generative AI. This is in addition to issues that continue to affect traditional AI, such as scaling and deploying models, and identifying and eliminating model drift and bias.