How to Succeed with Unifying DataOps and MLOps Pipelines
James Kobielus, TDWI’s senior research director for data management, discusses the opportunities and challenges associated with enterprise unification of DataOps and MLOps pipelines.
- By Upside Staff
- August 8, 2023
In this “Speaking of Data” podcast, TDWI’s James Kobielus explores the opportunities and challenges associated with unifying DataOps and MLOps pipelines. Kobielus is senior research director for data management at TDWI. [Editor’s note: Speaker quotations have been edited for length and clarity.]
Kobielus began with a quick overview of what makes up DataOps and MLOps pipelines.
“Modern intelligent applications involve elements that need to be engineered and deployed in a unified fashion to create high quality applications,” he said. “DataOps refers to the end-to-end workflow for preparing data for incorporation into intelligent applications. Then there are machine learning models and the entire set of processes from building them to training, deploying, and testing them. That's MLOps. The term ‘pipeline’ refers both to the platforms supporting all these capabilities and to the workflows, structures, and workflows necessary to do it all.”
Kobielus went on to explain why organizations might want to go through the effort of unifying their data and ML pipelines.
“The most basic business case for unification is that it’s grounded in things that directly affect the bottom line.” For example, unification can cut the average time needed to train ML models for deployment, shortening time to value on their investment in AI or advanced analytics. He also explained that faster model deployment can make an analytics program much more scalable, which is key to being able to respond to market conditions.
Unification can also help automate much of the development and deployment process, he added, freeing an organization’s data scientists and engineers to take on higher-level tasks.
One of the primary use cases Kobielus pointed out was the increase in democratized data science.
“Increasingly,” Kobielus explained, “subject matter experts or analysts focused on a particular domain -- business users -- who are being asked to take on more data science–related responsibilities, so there has been a rise in self-service tools. These low- or no-code front-end tools rely on automated back-end pipelines to allow the business users to generate models and deploy them.”
How to actually integrate data and ML pipelines depends on an organization’s existing overall structure. “Organizations are essentially either centralized or decentralized,” Kobielus said. For those that are already centralized to one degree or another, unifying data and ML pipelines is really just a question of converging the existing back ends -- often in the form of a data lakehouse.
In the case of a more decentralized organization, Kobielus explained, unification of the different back ends requires an abstraction layer that enables users to query data in a uniform, simplified way across all the disparate environments where it may reside. For many organizations, this layer is taking the form of a data mesh or a data fabric that consolidates access to data and analytics across a range of environments.
“The bottom line for success,” Kobielus said, “is to what extent you can build more monetizable data and analytics and the degree to which you can automate all of it. That automation needs to happen on the back end.” He added the key enablers for success are automation, augmentation (the inclusion of AI and other features into intelligent applications), and acceleration.
“Another essential factor in building monetizable analytics is the rapid development of cloud data marketplaces -- places where data sets or derived data products are sold or licensed.” The success of DataOps, MLOps, and unified pipelines is the ability to find a market for the end product of these processes.
“Ultimately,” Kobielus said, “it’s not just a matter of integrating the pipelines but doing so in a way that’s highly adaptable to new challenges in terms of new models or applications you might build.”
[Editor’s notes: You can listen to the entire conversation on demand. Learn more about pipelines inthe 2022 TDWI Best Practices Report: Unifying Data Management and Analytics Pipelines.]