TDWI Articles

4 Skills for the Aspiring Data Scientist

The role of data scientist is extremely hot in today's talent market. Here are the four skills you need to develop to fill this role.

According to The Quant Crunch: How the Demand for Data Science Skills is Disrupting the Job Market, published by IBM, the number of jobs for all U.S. data professionals will increase by 364,000 openings (to 2,720,000) by 2020. Of those analytics positions, the fastest-growing roles are data scientist and advanced analyst. Many of these positions will go unfilled due to a lack of talented people who can match the requirements.

For Further Reading:

Data Scientist Skills for Success

Candidate Profile for a Great Data Scientist

5 Skills You Need to Build Predictive Analytics Models

With the expected surge in demand and the lack of supply, the question aspiring analytics professionals ask is, "How can I become a data scientist?" Although this path is not for everyone, there are four skills that you can seek out to get started.

Skill #1: Critical Thinking

Among the most important skills you must develop is critical thinking. This includes learning how to structure a problem so that it can be solved as a mathematical model. If you were one of the kids in grade school who groaned when you had to solve story problems, data science might not be your ideal career path. The job of a data scientist is to take real-world problems (which are messy to define and even messier to break into solvable components) and transform them into mathematical models that, when automated, create repeatable business processes.

Oftentimes, your business peers will identify "wouldn't it be amazing if" ideas about the potential benefit the business could achieve if it better utilized its informational assets. A data scientist must take these statements and break them down into a description of the desired result, determine what data is needed to get that result, and understand how that data can be converted into a model that can be repeated in a systematic fashion.

Consider autonomous vehicles. The idea is "wouldn't it be amazing if cars could drive themselves?" The solution requires applying hundreds of sensors to the car, each ingesting millions of data points per second, and running those through a model to determine what the next action must be. Each of these responses represents a quantifiable model. Does the car need to go forward? Does it need to stop? Does it need to turn left or right? Being able to break those "what if" ideas down and identify how to reach the goals is the first step in becoming a data scientist.

Skill #2: A Love of Data

Data is different from information and knowledge. Data is the raw material from which information and knowledge are created through the application of analytics.

In the case of autonomous vehicles, data includes all the sensor readings: position of the car, position of other objects around the car, speed, road conditions, etc. To make the data useful, you need to transform it into a result. The data is not always clean, and it is not always in the format that can be easily fed into the model. This is an area where good data scientists excel. They know how to transform millions of data points into something that ultimately derives an answer that solves a business problem.

Effective data scientists have a solid understanding of what this takes. Many data scientists spend upwards of 80 percent of their time just munging the data. Even those data scientists who work on a team and have others managing and cleaning their data must still be intimately familiar with the data transformations and how those transformations may influence the final model.

Skill #3: An Understanding of the Modes of Learning

A data scientist needs to understand the difference between supervised and unsupervised learning. You'll need to know the different statistical models available to each type and when to effectively employ them to solve a problem. This complements the first skill -- critical thinking and the ability to break down a problem into parts that can be solved individually.

The main difference between supervised and unsupervised learning is how much is known about the result before you begin. With unsupervised learning, little is known about the result. Data is fed into the process and the result is a set of patterns that manifest themselves from within the data. With supervised learning, a fixed set of categories is known at the start. The result is the probability that the input data fits within one of these categories.

One example of unsupervised learning could be creating new customer segments. Suppose you try to break your customers into three groups. You don't have names for these groups, you just know that you want to profile your customers and put them in buckets of similarity. Once this initial segmentation is complete, you can give each of these groups names based on the attributes they have in common, but the initial analysis is to just break them into groups.

With a similar problem of customer segmentation, supervised learning has a defined target. If you have already determined that you have profitable customers and unprofitable customers, a supervised learning problem would be to take a new customer and identify if this new customer is likely to be profitable or not. The model uses common attributes of profitable customers to assess this new customer and determine if this new customer fits the same characteristics.

Skill #4: Speak the Business's Language

You must learn the language of business. If you secure a position as a data scientist, you will be most likely working for an enterprise. Whether that is a nonprofit organization, a government agency, or a for-profit business, you need to be able to learn what makes the entity successful. Your success ultimately is tied to the enterprise's success. If you are not solving problems that advance its goals and fulfill its objectives, your value as a data scientist is minimized.

Many people who self-identify as potential analytics professionals get so caught up in the science and engineering inherent in the field that they forget that what will ultimately make them successful is a balance of business acumen and scientific prowess.

Mastering These Skills

The field of analytics and data science is very hot right now with demand outpacing supply. My guidance is to focus on critical thinking, passion for the data, understanding of the models, and business acumen. If you can master these, you are well on your way to becoming a great data scientist.

 

About the Author

Troy Hiltbrand is the senior vice president of digital product management and analytics at Partner.co where he is responsible for its enterprise analytics and digital product strategy. You can reach the author via email.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.