TDWI Articles

Why Data Science Must (And Will) Be Automated

Data science today is extremely labor-intensive. The automated or quasi-automated features widely used in self-service BI aren't commonplace in data science, but Gartner says that's about to change.

The biggest knock against data science today is that it's extremely labor-intensive. The automated or quasi-automated features (e.g., wizards) that are widely available in self-service BI, data warehousing, and other types of (conventional) analytics are far from commonplace in data science.

Market-watcher Gartner says that's about to change, predicting that by 2020, fully 40 percent of all data science tasks will be automated. As Gartner sees it, this will put data science-like technologies and methods in the hands of nontraditional data scientists -- "citizen data scientists."

The Roadblocks Facing Citizen Data Science

Gartner has been trying to make citizen data scientist a thing for at least the last couple of years. Putting aside the term's buzzword-like qualities, the concept of the citizen data scientist -- i.e., a person who "creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics" -- makes a lot of sense.

If the citizen data scientist is ever to become useful and widespread, however, the most daunting (and inescapably technical) aspects of data science -- e.g., the identification, selection, and preparation of statistically viable data sets; the selection and use of algorithms; and the development and training of models -- must first be radically simplified.

"Most organizations don't have enough data scientists consistently available throughout the business, but they do have plenty of skilled information analysts [who] could become citizen data scientists," said Joao Tapadinhas, research director at Gartner, in a prepared release.

"Equipped with the proper tools, they can perform intricate diagnostic analysis and create models that leverage predictive or prescriptive analytics. This enables them to go beyond the analytics reach of regular business users into analytics processes with greater depth and breadth."

Automating the Tools

How will data science become more automated? Software vendors will lead the way, according to Gartner: vendors in the data management and analytics spaces are building automation features into their platforms. They're placing particular emphasis on automating processes for data integration and data preparation and for building and training analytics models.

"Making data science products easier for citizen data scientists to use will increase vendors' reach across the enterprise as well as help overcome the skills gap," said Alexander Linden, research vice president at Gartner, in a statement. "The key to simplicity is the automation of tasks that are repetitive [and] manual[ly] intensive and don't require deep data science expertise."

In just two years, Gartner predicts, so-called citizen data scientists will generate more advanced analysis than self-described data scientists. Citizen data scientists, not data scientists, will contribute to the establishment and refinement of the analytics-driven business, Gartner argues.

They'll likewise free up data scientists to focus on the hard stuff: problems (such as the research, discovery, and refinement necessary for putting advanced analytical insights into production) that cannot be automated or accelerated. "Access to data science is currently uneven; due to lack of resources and complexity not all organizations will be able [to] leverage it," Tapadinhas said. "For some organizations, citizen data science will therefore be a simpler and quicker solution -- their best path to advanced analytics."

For Further Reading:

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.