How to Grow a Data Scientist
With the advent of algorithmic business, data scientists are a critical resource for the organization of the future. Increased demand creates a tight labor market, forcing companies to look inward to develop their own data scientists.
- By Troy Hiltbrand
- October 11, 2019
Across organizations, there exist individuals who seem to have a knack for data analytics. They might be part of your database team with deep expertise in SQL or part of your finance team who have mastered the art of doing amazing feats with just a spreadsheet. These are the individuals who businesses go to when answers to their questions are locked in the data.
As algorithmic business becomes the norm and organizations start to see that their future viability depends on their ability to implement advanced analytics (such as machine learning and artificial intelligence), they scramble to find resources who can help with this transformation. As they look to the market, they find that people with mature data science skills are difficult to find, costly to recruit, and hard to retain. With all of the difficulty in bringing in outside data science resources to execute your analytics transformation, the answer might be to leverage these internal data experts and nurture them into the data scientists you need.
How do you mature these resources beyond queries and spreadsheets into the data science team that will transform your business?
Developing a Plan
It is critical to have a sound understanding of what this target data scientist looks like and what the result of your efforts needs to be. Having this end state in mind, you will be able to work with these data experts to create a personalized development path to the role of data scientist.
A data scientist is responsible for the development, deployment, and management of predictive analytics models. The data science life cycle includes generating a hypothesis, extracting data (including non-structured data) from multiple sources, engineering new features from the existing data, and creating, testing, and operationalizing predictive models. The data scientist lives at the intersection of business analyst, programmer, and statistician, and needs to be versed in many domains of knowledge.
With a goal in mind, your next step is to look across the organization to identify resources who, with time and experience, can grow into that role. Those with backgrounds in physics, chemistry, biology, actuarial science, computer science, engineering, finance, economics, or mathematics often are the people with the greatest aptitude for this type of work. Those who are already involved in data analytics often rise to the shortlist of candidates, but there might be others with hidden competencies who, when given the right circumstance, could rise to the top of your list.
Because data science lives on the frontier of many other domains, the development plan to transform these resources into your data scientists of the future will vary. Work collaboratively with them to identify their path.
One of the key abilities of a data scientist is to know what questions need answers. Machine learning and artificial intelligence can be used to answer many questions, but only some of those questions will have an impact on your organization's goals.
Your data scientist team needs to understand the business well enough to sort through this list of questions and determine what matters. This task requires a high level of contextual thinking and business acumen. Resources pulled from the sciences or mathematics may not be skilled at (or haven't even been exposed to) the business domain at a deep level. If so, engage them with business domain experts to develop their competency.
A data scientist needs to move beyond spreadsheets and databases to build algorithms that process large quantities of data and drive business objectives. Those who come from areas outside of technology, such as finance or economics, will need to develop skills that have been outside of their area of emphasis. They can take their knowledge of the logic in their spreadsheets and the macros that they have built and start to learn new languages and platforms.
The most popular languages for data scientists today include Python, R, and SQL. These languages open up libraries that allow data scientists to extract and enhance data, build and test models, and run these models in a production environment. In addition to these languages, many data scientists need to use Java, Julia, Scala, and C/C++ to create models that scale in a non-desktop environment.
In addition to development languages, there are data integration tools (e.g., Informatica, Talend, SAP), data discovery tools (e.g., Tableau, Qlik, Microsoft PowerBI), and machine learning tools (e.g., RapidMiner, Knime, Statistica) that provide interfaces for doing many of the steps of the data science life cycle and can be an easier entry point for non-developers.
Each individual's development plan needs to take into account where the person is, the level of their technical experience, and their learning aptitude. Teaching a software engineer a new language is much different than teaching a financial analyst how to program from the ground up. Graphical tools might be a better first step for many individuals instead of throwing them into the deep end of hardcore programming.
Statistical and Mathematical Thinking
At the core of data science is statistics. Machine learning is not usually about finding the right answer but finding the sufficiently probabilistic optimal answer to achieve the business goals. It is also about training the system to determine whether the best answer today is the same as the best answer yesterday or if the underlying factors have changed sufficiently to alter the analytics process. This process is often termed as a heuristic approach to problem-solving.
As a starting point, data scientists will need to understand the basics of statistics. As they grow and become more involved in deep learning and neural networks, they will also need to develop an understanding of linear algebra, tensors, and calculus.
These concepts might require that some of your resources (from such areas as finance and accounting -- and even software engineering), to approach problems differently and unlearn methodologies they have used in the past.
One characteristic that separates the best data scientists from others is an advanced level of curiosity. They are always questioning -- what if we try something a little different? This experimentation mindset is often found in scientists with a background in biology, physics, or chemistry, but is not nearly as prevalent among your computer scientists or accountants. Curiosity is something that can be acquired through practice, but the person has to have the desire to learn to think differently.
Creating a Plan
Sit down with each candidate and work through a plan to identify what parts of the data scientist role he or she has already mastered and which ones need to be developed. From there, look for activities that can help fill those gaps, such as :
- Pairing potential data scientists from across different domains to collaborate and learn from each other
- Formal education, such as data science programs through universities or organized boot camps
- Informal and online education, such as books, videos, and massively open online courses (MOOC)
- Organization hackathons that allow experimentation around a real problem set
- Reverse mentoring, where more senior staff in a domain learn from interns or new graduates about new trends coming out of universities
- Hosting knowledge-sharing opportunities to allow people to teach one another new and insightful skills that they have acquired
By looking in-house for high-potential resources and establishing development plans, you can build your own data scientists even if you can't identify, attract, and retain these elusive resources from the external environment. Successfully developing these resources will position you and your business for growth in a new economy of algorithmic business.