By using tdwi.org website you agree to our use of cookies as described in our cookie policy. Learn More

TDWI Articles

Data Management and the Next Generation of AI

James Kobielus, TDWI’s senior director of research for data management, talks about how to update your data management practices to prepare for the next generation of AI.

In a recent "Speaking of Data" podcast, TDWI’s James Kobielus talks about how to update your data management practices to prepare for the next generation of AI. Kobielus is senior research director for data management at TDWI. [Editor’s note: Speaker quotations have been edited for length and clarity.]

For Further Reading:

Maximizing Business Value Through Data and AI Tops List of Data Management Trends for 2024

Three Ways Generative AI Is Overhauling Data Management

3 Data Management Rules to Live By

“One thing to note is that although generative AI is certainly the biggest trend at the moment, it isn’t the only thing happening in AI right now,” Kobielus began. “For example, there are still advances being made in the core elements of mainstream AI -- machine learning, deep learning, natural language processing, computer vision, and so on.” Improvements on fundamental technologies such as neural networks and statistical models are forming the underpinnings of the new generation of generative AI tools, he explained.

“For example, multimodal AI is coming along very quickly. The ability for a tool to take input of any modality -- text, voice, video, image -- and output content in similarly varied modes is growing at an exciting pace. One such instance is an announcement from Samsung that their smartphones will soon be able to take a photo and provide a full text description of the content of that image.”

Kobielus was asked about the specific technologies data management professionals need to know about that make all this possible.

“The main thing data professionals need to know about -- which isn’t necessarily new but may be new to many in the field -- is vector embeddings, which are high-dimensionality artifacts created by running source data through a neural network. The vectors in a vector embedding represent in an abstract way the connections, patterns, and structures implicit in that data, regardless of what form it takes.”

He explained that these vector embeddings are then stored in what’s called a vector database. Kobielus noted that graph databases and document databases can also be used to store and manage these embeddings.

“Once you have a database set up, you can go on to query that database using its vector search capabilities to identify similarities or affinities among its elements and return results that match your query, along with annotations or descriptions of what they are and where they came from.”

Kobielus noted that it’s only been in the past several years that technologies such as transformer models within large language models have become practical enough to make this vector revolution possible.

A number of other developments Kobielus mentioned have made this innovation in AI possible, such as the multimodal data fabric and the data lakehouse. It is these data storage infrastructures, set up and optimized to handle diverse data types, formats, and sources -- especially unstructured -- that has made the difference, he said.

“Running these new architectures is extremely complex and will require a great deal of bandwidth, storage, and processing power distributed across a large, cloud-based, multinodal infrastructure,” he said.

Another aspect of generative AI Kobielus mentioned that organizations will need to address is what is called prompt engineering -- the crafting of text instructions to be entered into the AI tool to produce the desired output. Because small differences in prompt language can produce large differences in output, the process of prompt engineering is often iterative, requiring repeated attempts at obtaining the necessary output.

“Of course, there are also the same issues of governance that apply to all data management efforts,” Kobielus added. “Things such as ethics, transparency, privacy, and preventing biased outputs are critically important, especially in light of the large amount of legislation that’s being taken up. For example, in the U.S. alone, Congress has over 100 pieces of legislation to regulate AI up for consideration.” All of this is well-intentioned, he said, but only covers pieces of the problem.

“I wish there were a single magic formula or standard to propose in this regard,” Kobielus said, “but there isn’t one. In the end, I believe AI is on the side of good, even if it may need to have its wings clipped a little to ensure it stays that way.”

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.