InsightOps for Advanced Analytics and Data Science
DevOps addresses the challenge of engineering, deploying, and maintaining software for production use. IBM says we need something similar for advanced analytics and data science.
- By Steve Swoyer
- September 1, 2016
You have armies of data scientists, data engineers, business analysts, and programmers working together to research, identify, train, and -- eureka! -- codify analytics insights, but what are you actually doing with those insights once you have them?
Do you have a program for the automated deployment, maintenance, and -- if necessary -- obsolescence of insights? Do you have a program to manage and reuse the assets that are byproducts of the critical R&D of your data-science teams? Not just the finished products -- data sets, analytics models, and enabling code snippets -- but the transformations, manipulations, and algorithms used to produce data sets, models, and code or the many versions and iterations of data sets, code bits, and algorithms created (and scrapped) in the process? Do you have a program to ensure that the work done by your data-scientific teams can be reproduced, documented, and governed (however loosely)?
Tim Vincent, CTO of IBM's Analytics Group, does. He calls it "InsightOps," a play on DevOps.
DevOps addresses the challenge of engineering, deploying, and maintaining software in the context of continuous software development. Vincent says we need something similar for advanced analytics and data science.
"One of the things we see happening over and over is that the data scientist will go through their cycle and produce something -- an analytics model or a derivative of that model, for example -- and then one of the big challenges [for an organization] is how do you actually deploy that as an insight, [because] it's only useful if you can use it [i.e., in production] as an insight," he explains.
"The thinking here is that what we're calling 'InsightOps' really is analogous to DevOps. Like DevOps, InsightOps spans the whole life cycle, from development and training [of an analytics model] to deployment and maintenance [of an analytics insight in production]."
DevOps isn't just a model for producing software. Because it provides a continuous feedback loop from development into production and back again, DevOps addresses the problem of maintaining -- and, if necessary, retiring -- production software. InsightOps, as envisioned by Vincent, has a similar scope. It's a program for unifying and rationalizing the life cycle of analytics development.
Two Distinct Phases of Analytics Development
It's long overdue. Generally, there are two distinct phases of analytics development, each of which has its own life cycle and priorities. The first phase comprises the sourcing and preparation of data sets, the development, training, and refinement of analytics models, and the codification of analytics insights. The second is that of the production data environment, with its deployment, maintenance, and upgrade schedules and, of course, its ongoing operations.
The now-dominant model of analytics development gives priority to the first phase at the expense of the second. It isn't that deploying software is an afterthought. It's that we tend to artificially segregate analytics development from deployment and operations and vice versa. Deployment isn't part of the purview of data science teams and analytics developers; it isn't, so to speak, their problem. In describing this situation, it isn't much of a stretch to invoke the old metaphor of developers or data science teams "throwing" their work "over the wall" to operations.
If anything, a feedback loop is even more critical for analytics development than for general-purpose software development. In the first case, the successful use of analytics insights can and should change what you're doing: not only how you operate and organize your business but how you identify as a business. Analytics insights permit you to more deftly expand into new markets, exploit new business opportunities, or even take up entirely new business models.
In the second, the accuracy of an analytics insight tends to diminish over time. As the conditions of the world on which an analytics model is based change, so, too, does the predictive power of that model. IBM's Vincent cites as an example machine learning rules and models, which, over time, need to be retrained -- and, in some cases, scrapped.
"When you actually deploy a [machine learning] model, you want to be constantly monitoring that model to see how accurate the predictions are," he says. "If you don't have a means to monitor accuracy, you don't have a constant feedback loop. We see this as an iterative loop where you're constantly refining models, retiring models, managing models, and versioning models. It's not only a one-way direction, which is what you have now."
Just a Dream?
It's a compelling, dreamlike vision. Is it close to becoming a reality, however? There are a lot of moving parts here, not least of which is the presumed need for a common metadata foundation of one kind or another. (A single, unified metadata standard has been an IT goal for decades. Several prominent projects, from IBM's Application Development/Cycle to Microsoft's Open Information Model, have failed.) Vincent concedes that making good on the InsightOps vision won't be easy, but stresses that IBM is committed to doing just that. He says Big Blue has decided to back Apache Atlas as its metadata foundation of choice.
"You need to ... build out a really strong foundational metadata layer, [one that supports] not only the traditional technical representation of the data that everybody thinks about ... but also includes business terms, business models, the mappings between those, [along with] lineage information, as well as a governance layer that also includes [the possibility of] enforcement," he points out.
Taking the First Step
The first planned milestone on IBM's InsightOps road map is integration between Apache Atlas and its own Data Works data prep service. "By tracking [what a person using a self-service data prep tools does] and where you've been and what data you're looking at, [the Atlas-based metadata foundation] will actually smooth the translation from one point to another in an automated fashion," he says, noting that this capability will be offered as part of "the next generation of Data Works."
Other milestones (e.g., integration with API management tools) will be announced in time, he says.
There's another wrinkle here, too, says Harriet Fryman, vice president of growth with IBM's Analytics Group. InsightOps, she maintains, also addresses the challenge of accommodating self-service data discovery, data prep, and advanced analytics. "Right now, we're really living in the world ... of either/or. Either you can have a production environment to produce your data warehouse and BI that's in the hands of IT or you can have self-service, let's call it maverick, behavior," she points out.
"People have that false choice. Either/or. What we see with InsightOps is that the technology we're building to tie products such as Watson analytics together with governed data, that's in the realm of possibility. People no longer have to choose the either/or … they can get the both/and. You can get the value creation from the data science realm and be able to place the output into products, too."
Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at [email protected].