The Seven Habits of Highly Data-Scientific Organizations
Want to be a data science star? Heed this advice from an industry veteran.
- By Stephen Swoyer
- March 17, 2015
If you want to be a data science star, check out the advice from industry veteran and director of TDWI Research David Stodder has to say. Stodder is the author of the latest entry in TDWI's "Checklist Report" series -- in this case, "Seven Steps for Executing a Successful Data Strategy."
The first step, according to Stodder, is to determine whether it's a requirement.
"Before embarking on a data science project, the first question to ask is a simple one: Do we need data science?" he writes. It's a surprisingly involved question, not least because the archetype of the data scientist has become something of a sensation: she's the latest in a litany of IT "unicorns."
The data scientist's is the "sexiest job of the 21st century," according to Tom Davenport and D.J. Patil in an article published in the Harvard Business Review. The position is "vanishing," according to many, "mythical" according to others.
The point, according to Stodder, is that you just might not be ready for data science or a data scientist, and if you aren't, it's best to be honest about it. "Users may appear content with spreadsheets, business intelligence ... applications, and the selection of structured data available through data warehouses or other IT-managed repositories. Existing reports and dashboards may seem sufficient," he points out. "[I]nvesting in data science and technology to expand the reach of analytics into more data types, including semi-structured and unstructured, may appear unjustified."
If you are a good candidate for data science, begin by putting together an effective team. This is the nub of the problem. Stodder, for example, invokes the "u" word -- "unicorn" -- in describing the highly specialized, multi-disciplinary attributes (preferred or required) of the good data scientist. Framing the issue in this way is more distracting than helpful, according to Stodder.
"Rather than focus on finding one or a few individuals who seem to be able to do it all, a wiser course is to develop a stable team that brings together the talents of multiple experts," he writes, stressing that intangibles -- such as curiosity and enthusiasm -- matter.
"The team's members must understand business drivers and not lose sight of the goal of delivering actionable business value. Each member of the team should also have enthusiasm, curiosity, and creative energy for working with business leadership on data and analytics projects."
In data science, as in virtually any other human discipline, communication is key.
A big problem with communication in any context is that one's audience might not want to hear what one has to say. This is an especially thorny issue in data science because the findings of a study or analysis might call into question (or completely upend) the operating assumptions, cherished beliefs, or even strategic priorities of key business decision-makers, Stodder writes.
"Often this is not easy, especially if the presentation of the findings calls into question executives' 'gut feel' assumptions about business strategy, strays from tightly controlled modes of BI reporting and analysis, or suggests that established processes are ineffective or outdated. Data science often points to the need for change—and change can be difficult," he notes.
"Communication is also vital to improving collaboration in a data science project. Often, along with data scientists, key players -- such as statisticians, business analysts, data analysts, and developers -- are scattered in silos across the organization, or business and data analysts may work in a separate department than the business stakeholders, who should also be part of the data science effort."
Storytelling tools, concepts, and methods can help with communications issues, says Stodder. This is particularly true for people -- decision-makers, stakeholders, or rank-and-file information consumers -- who don't (and in many cases can't) understand the probabilistic non-determinism that's core to data science itself. "Data science thrives in an analytics culture. However, not all personnel in an organization are going to be part of data science teams, nor should they be," he writes.
"To bring more users into the analytics culture, organizations should explore technologies that can support the democratization of BI, analytics, and data discovery. These products are increasingly able to address users' self-service demands for data access and interaction without IT hand-holding."
For data science to work, it requires data. A signal innovation of data science (one that, like data science itself, is a function of technological, cultural, and economic trends) is its preference for all potentially relevant data. Put differently: Sampling, that standby of statistical analysis, data mining, and primitive data science, is so last millennium. This doesn't just entail opening up the data integration (DI) spigot in order to give data scientists raw (strictly structured) data from operational systems. There's much more involved, Stodder argues.
"Structured data can, of course, be voluminous and varied, especially when brought in from diverse applications. However, data science is often more closely associated with the desire to analyze semi- and unstructured data because these sources are growing rapidly and have been analyzed little, if at all," he writes. DI is arguably the most complex or time-consuming activity in traditional BI and analytics; in the realm of data science, it's even more critical.
"Preparing this breadth of data, assessing its quality, looking for gaps and errors, and performing exploratory analysis to determine relevant extracts are essential data science activities," Stodder indicates. "They can take up the lion's share of a data science team's time." Although tools can automate steps, data science teams need to get close to the data to properly move forward with analytics and algorithm development."
Elsewhere, Stodder writes, successful practitioners of data science must focus relentlessly on operationalizing their data scientific insights. One way to do this is by focusing on predictive analytics, which could be thought of as applying the "what?" and "why?" questions that pure data science likes to ask. This is just the beginning, he notes. "Firms must move to the next stage: to "prescriptive" analytics, which is about producing not just predictive insights but also suggested actions. Prescriptive analytics can be useful to both humans responsible for business processes and for guiding emerging automated decision systems."
Finally, good governance is always a Very Good Thing. In the context of data science, governance has to do with the secure, responsible, and ethical use of data: it involves safeguarding data from compromise, be it from without (as in data breaches) or from within, when sensitive information -- inessential to a data scientific analysis -- is exposed or otherwise disseminated.
"Data science teams, along with business leadership, must be cognizant of the right balance between what they can achieve through advanced analysis of consumer data and what is tolerable -- and ethical -- from the public's perspective," he concludes.
Download Stodder's report, free of charge, here.