Building Transition Plans into Your Data Science Team
A transition plan built into your workflow can make personnel turnover smooth.
- By Ryan Bower
- July 16, 2018
I recently had a conversation with a coworker and fellow data scientist. I quipped that as data scientists, we can't be both unhappy and underpaid -- there are just too many recruiters knocking on our doors to let that happen. I don't know if he didn't enjoy my company or if he was underpaid, but within a few weeks, he was gone.
Quick turnover creates problems in any industry, but with knowledge workers, the issue is especially severe. If we assume that data scientists will continue to be in demand, their subsequent meandering career paths create a continuity problem. How do we maintain progress and knowledge when the people doing the work are constantly in flux?
Sharing and Showing Your Work
There are, of course, a few easy steps that can be taken to ensure that access to project work is shared across teams. As data science tools continue to evolve, most languages allow projects to be packaged up and passed from analyst to analyst. After packing up this code, it can be stored in a repository where everyone can access it. Whether using databases or data sets, analysts can reference the data in question so it need not "live" on any single machine but can be referenced from a database or shared on a cloud tool.
The sharing of code and data is only a small piece of the work that needs to be done to ensure continuity, however. At Elicit, for example, we also strive to ensure all of our data science work is reproducible. This means that a project or script that works on one data scientist's machine should produce the exact same results for everybody else on the team. Even the process of munging and prepping data can be structured in such a way that the data remains in a shared environment.
Having set up an environment in which we expect data scientists to share all work products, the next expectation is that each project should be something that can be understood by other data scientists. This means going beyond simply adding a few comments to the code or applying a coding standard. This starts at a much deeper level, documenting the data that goes into the analysis.
Documenting Data Prep
Depending on what you read, you will hear that 60 to 90 percent of a data scientist's time is spent cleaning and preparing data. Data prep is a tedious, thankless task. Most analysts want to finish data prep and move on as quickly as possible. Typically, the goal of data prep is just to finish data prep. We have found better results, however, from focusing on a different goal.
Our data prep now focuses on building a reusable code book that describes the data set(s) that have been created. This document is not just a reusable piece of documentation for other data scientists, though. It is also a useful document for having a conversation with both IT and business users to ensure that the data set is accurate and the assumptions make sense.
At the end of data prep, we have a clear deliverable and a buy-in from outside stakeholders. We also use this time to check in with a technical reviewer. The purpose of having a technical reviewer is twofold. First, the technical reviewer ensures that the code and analysis is correct and as error-free as possible. Second, and just as important, the technical reviewer confirms that they understand the analysis, can run it on their machine, and could step in and complete the work if it were interrupted.
These touchpoints do not stop when the data cleaning portion of the work is done, either. Data scientists should not be their own devil's advocate. Setting up a bi-weekly handoff ensures that the work is regularly reviewed and deeply understood by at least two people. This shared, deep understanding provides continuity within the team and yields a better work product.
To summarize, the work should be stored online, be executable by anyone within the organization, and there should always be a pair of data scientists who deeply understand any analysis. It still stings each time a data scientist decides to pursue greener pastures. After all, it's hard not to form a bond with your teammates. Now, in the event that we have to say "goodbye," and subsequently "hello," at least we have a transition plan already built into our workflow that makes the transition as smooth as possible.
Ryan Bower is a senior data scientist at Elicit. With experience studying customer behavior in travel, retail, finance, and gaming, Ryan was analyzing data before data science had a name. He has a master’s degree from the University of Virginia.