TDWI Upside - Where Data Means Business

Debunking the Data Scientist Mythology

Data science is a team effort that depends on vital contributions from data engineers, data architects, business analysts, and business subject matter experts.

We don't just hold data scientists in high esteem -- we mythologize them. After all, they understand business, vertical- and domain-specific business processes, IT architecture, and coding. Data scientists know all about statistics, mathematics, numerical methods, and other, even more arcane, disciplines.

Most important, data scientists grasp analysis. They excel at decomposing problems, identifying their elements, and devising solutions. Whether it's slicing, dicing, or splicing -- data, algorithms, and intractable business problems -- the data scientist does it all.

Or do they?

Most data scientists see themselves as all too human, all too fallible, and all too dependent on the contributions of others. According to research analyst Michael Ferguson, a principal with UK Intelligent Business Strategies, the data scientist is just one piece of a much bigger unit: the data science team.

"What I see from a management perspective is that a lot of [data science teams] have data engineers and business analysts. We're actually preparing a large survey on this. In our sample, data engineer is actually the most dominant role [on a data science team]. There are very few data scientists. What management wants to do is to try to turn the analysts into data scientists," he says.

The Power and Danger of the Data Science Myth

Just who is mythologizing the work of the data scientist, anyway?

It's people who don't don't fathom vertical- and domain-specific business processes, IT architecture, coding, statistics, and mathematics. To the average, everyday person, the work the data scientist does can seem like magic.

To would-be opinion-makers, it's especially tempting to wax breathlessly about data science and its practitioners. After all, data science has captivated the business imagination, there's an acute shortage of data science talent, and readers are hungry for any information they can get about recruiting or retaining data scientists.

We're happy to oblige them, often by amplifying -- by dramatizing -- the most theatrical aspects: the complexity and esotericism of data science, the raree-show phenomenon of the data scientist position (the difficulty of finding, wooing, and retaining one), and, not least, the challenge of building a competitive data science program.

Such reporting isn't just unhelpful, it's pernicious. It distorts the role of the data scientist even as it's deaf, dumb, and blind to the concrete particulars of what it means to be a data scientist.

It likewise diminishes the contributions of the many people, in a number of different roles, who work with and enable the data scientist: from those with arcane skills -- such as data engineers or data architects -- to the more conventional skill sets of the business analyst or business subject matter expert. Data scientists rarely work in isolation.

Industries Take Time to Master

What's more, data scientists are rarely, if ever, complete masters of the business of which they're a part. In certain key verticals -- e.g., consumer packaged goods (CPG) and pharmaceuticals -- data scientists are uniquely dependent on contributions and guidance from subject matter experts, business analysts, and line-of-business representatives.

"In domains such as finance and pharmaceuticals and even CPG, the complexity of your products and processes are such that it's very difficult for a newcomer to grasp the business stakes. It's easier to understand a new e-commerce product or social networking technology compared to understanding the complexity of a CPG product line," says Florian Douetteau, cofounder and CEO of Dataiku, a data science start-up based in the EU.

"Because of the complexity of the pharmaceutical industry, or the complexity of financial and insurance products, you cannot expect anybody coming out of school, or [just] coming onboard [at a company], to grasp everything in these industries in six months," he continues.

The Pursuit of a Data Science Phantom?

Mythologizing and overhyping the data scientist into a kind of unicorn can also create a self-fulfilling prophecy. Data scientists are prized, pampered, and, most important, pursued out of all proportion to what they -- or any human being, perhaps -- can realistically accomplish.

The effect is a vicious cycle of job hopping, with detrimental results for employee and employer alike. "The average data scientist is heavily recruited. Headhunters are just all over the data science teams. They're being recruited and wooed with all kinds of perks," Ferguson says, citing survey data.

"As a result, the lifespan of a data scientist within a particular company can be quite short. When they leave, it creates a maintenance problem [for a company]. Who's going to pick up the [half-finished] Python programming this guy wrote? There's this parallel trend [in which] corporations are focused on reducing time to value. They're looking for tools that will rapidly improve productivity and give them better maintenance if the data scientists are here one day -- and gone the next."

Dataiku's Douetteau expands on this. "The mythical data scientist-genius mostly makes sense in start-up companies where ... someone can bring a project from one end to another in an expedited manner, from design to production. That's the start-up way," he argues.

"This approach does not really work when you reach some maturity level in large companies. Very few, if any, data scientists work alone [in this context]. It's a team effort. Because of the scarcity of data scientists, [companies are] having trouble retaining data scientists."

The Origin and Future of Data Science

"If we look at the etymology of [the term] data scientist, the word was created on the West Coast of the United States -- specifically in tech firms on the West Coast -- that basically hired Ph.D.s and developers who were really data-savvy," Douetteau explains.

"These people ultimately became 'data scientists.' Firms in the Silicon Valley, especially, possibly built most of their analytics practices by hiring data scientists and creating teams of 30, 40, or 50 people. Typically, the same thing did not happen anywhere else in the world, one reason being scarcity of resources and the ability to hire tech resources."

It's that model -- teams of data scientists working with analysts and business users, not the myth of the lone scientist-magician -- that other enterprises should be looking to emulate if they truly want to nurture data science as a valuable enterprise discipline.

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at evets@alwaysbedisrupting.com.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.