Guided Analytics and the Future of Data Science in the Enterprise
Using these four properties will help data scientists successfully establish the right environment for the next breed of smart applications.
- By Michael Berthold
- May 17, 2019
Consumers everywhere depend on Alexa, Siri, or Google’s Assistant for all sorts of things -- answering obscure trivia questions, checking the weather, ordering groceries, getting driving directions, turning on the lights, or even inspiring a dance party in the kitchen. These are wonderfully useful (often fun) AI-based devices that have enhanced people’s lives. However, humans are not actually partaking in deep, meaningful conversations with these devices. Rather, automated assistants answer the specific requests that are made of them.
If you’re exploring AI and machine learning in your enterprise, you may have encountered the claim that, if fully automated, these technologies can replace data scientists altogether. It’s time to rethink this assertion.
The Problem with Fully Automated Analytics
How do all the driverless, automatic, automated AI and machine learning systems fit into the enterprise? Their goal is either to encapsulate (and hide) existing data scientists’ expertise or to apply sophisticated optimization schemes to the fine-tuning of data science tasks.
Fully automated systems can be useful if no in-house data science expertise is available, but they are also rather limiting. The business analysts who rely on data to do their jobs get locked into the prepackaged expertise and a limited set of hard-coded scenarios.
In my experience, automation tends to miss the most important and interesting pieces, which can be vitally important in today’s highly competitive marketplace. If data scientists are allowed to take a slightly more active approach and guide the analytics process, however, the world opens up considerably.
Why a Guided Analytics Approach Makes Sense
For enterprises to get the most out of AI and data science -- to effectively predict future outcomes and make better business decisions -- completely automatable data science sandboxes need to be left behind. Instead, enterprises must start interactive exchanges between data scientists, business analysts, and the machines doing the work in the middle. This requires a process referred to as “guided analytics,” in which human feedback and guidance can be applied whenever needed -- even while an analysis is in progress.
The goal of guided analytics is to enable a team of data scientists with various preferences and skills to collaboratively build, maintain, and continuously refine a set of analytics applications that provide business users with different degrees of user interaction. Simply put, all stakeholders work together to generate more useful analysis.
This is not for the faint of heart. Enterprises that want to create a system that facilitates this type of interaction while still developing a powerful analytics application face a big -- but not insurmountable -- challenge.
The Big Four: Common Characteristics
I’ve identified four common properties that help data scientists successfully establish the right environment for the next breed of smart applications -- the ones that will help them glean true business value from AI and machine learning.
The applications that give business users just the right amount of guidance and interaction enable teams of data scientists to collaboratively merge their expertise. When specific properties work together, data scientists can build interactive analytics applications that show adaptive potential.
The ideal environment for guided analytics shares these four characteristics:
- Open: Applications shouldn’t be burdened with restrictions on the types of tools used. With an open environment, collaboration can happen between scripting gurus and those who want to reuse their expertise without diving into their code. Additionally, it’s a plus to be able to reach out to other tools for specific data types (text, images, etc.) as well as interfaces specialized for high-performance or big data algorithms (such as H2O or Spark) from within the same environment.
- Uniform: At the same time, the experts creating data science should be able to conduct all their work in the same environment. They need to blend data, run the analysis, mix and match tools, and build the infrastructure to deploy the resulting analytics applications all from that same intuitive and agile environment.
- Flexible: Underneath the application, the environment should also be able to run simple regression models or orchestrate complex parameter optimization and ensemble models -- ranging from one to thousands of models. It’s worth noting that this piece (or at least some aspects of it) can be hidden completely from the business user.
- Agile: Once the application is deployed, new demands will arise quickly: more automation here, more consumer feedback there. The environment used to build these analytics applications needs to also make it easy for other members of the data science team to quickly adapt existing analytics applications to new and changing requirements so they continue to yield valuable results over the long term.
Putting It into Practice
Some AI-based applications will simply present an overview or forecast at the press of a button. Others will just allow the end user to choose the data sources to be used. Still others will query the user for feedback that ends up refining the model(s) trained underneath the hood, factoring in the users’ expertise. Those models can be simple or arbitrarily complex ensembles, or entire model families, and the end user may or may not be asked to help refine that setup. The control over how much of such interaction is required lies in the hands of the data scientists who designed the underlying analytics process with their target audience, the actual business users’ interests (and capabilities), in mind.
The big question you may be asking is how do I actually do this in my organization? You may think this is not realistic for your team to build on its own; you are resource-constrained as it is. The good news is you don’t have to.
Software, specifically open source software, is available that makes it practical to implement guided analytics. Using it, teams of data scientists can collaborate using visual workflows. They can give their business analyst colleagues access to those workflows through web interfaces. Additionally, there is no need to use another tool to build a web application; the workflow itself models the interaction points that comprise an analytics application. Workflows are the glue holding it all together: different tools used by different members of the data science team, data blended from various sources by the data engineering experts, and interaction points modeling the UI components visible to the end user. It is all easily within your grasp.
The Road Ahead
Interest in guided analytics is growing, allowing users to not only wrangle data but also fine-tune their analyses. It is exciting to see how much collaboration this triggers. It will also be fascinating to witness how data scientists build increasingly powerful analytics applications that assist users in creating analyses with real business impact.
Rather than taking experts out of the driver’s seat and attempting to automate their wisdom, guided analytics aims to combine the best of both. This is good for data scientists, business analysts, and the practice of data analytics overall. Ultimately, it will be essential for innovation as well. Although it might seem challenging now, the effort will be worth it to ensure a better future.
Michael Berthold is the founder and CEO at KNIME. He holds a doctorate in computer science. He has worked in academia as a professor at University of Konstanz for 15 years and in industry at Intel and Tripos in the U.S. for 10 years. He loves helping people “make sense of data,” and his research and expertise spans data analytics, machine learning, artificial intelligence, and rule induction. Follow Michael on Twitter, LinkedIn, and the KNIME blog.