Data Scientist Skills for Success
With the increasing demand for data science professionals, what skills are necessary to succeed in our new digital world?
- By Devavrat Shah
- February 14, 2017
Click on the "Jobs" tab on the LinkedIn website and search for the term "data scientist." As of mid-January, you'll find more than 8,600 available jobs. Clearly there's a strong and growing opportunity for careers in the field of data science.
Many of these jobs are with companies that provide technology-focused products or services, including consulting firms and big industry players such as Amazon Web Services, Juniper Networks, and Verizon. Other positions are with companies that may not immediately leap to mind as being in the market for a data scientist.
What Data Scientists Do
Consider C&S Wholesale Grocers, Inc. in Keene, NH. The company, founded in 1918, says it supplies more than 6,000 stores with over 150,000 products. Like so many enterprises, it wants to use data science as a competitive advantage. In this case, the company seeks better inventory management for the $25 billion worth of goods that flow through the company each year. Its job posting for a lead data scientist reads, in part:
In this role, you will provide guidance, insights, and statistics to the Demand Planning and Procurement teams. You will build, test, and automate algorithms and processes, while building the right feedback loops with different stakeholders for humans to add value where needed. You will be a key player in the development of the best-in-class demand-planning process. The skills and experiences you will gain will provide career advancement opportunities, both as a data scientist and as a manager.
On the company's own website the job posting falls under the "Procurement" section, not IT, perhaps because of data scientists' ability to impact sales through accurate models and predictions. Regardless, it's an indication of how data science is infiltrating many aspects of business and transforming into a multidisciplinary field.
The challenge is that organizations don't just need data scientists -- they need highly skilled data scientists. They need professionals, trained to work with modern tools, who can identify appropriate methods and models and design human-friendly interfaces. If the next generation of data scientists wants to take advantage of the ample opportunities available -- and successfully extract business value from big data -- they will need to build three things:
- A sensing platform that gathers or collects the right data
- Infrastructure that stores the data and provides the ability to perform computation at scale
- An information extraction and decision-making system that uses statistical and machine-learning approaches to extract information from data and make meaningful decisions
The third component is perhaps the most important. Over the past few decades, we have built infrastructure that can sort and process massive amounts of data. However, we still lack the critical ability to stitch together all the various pieces to deliver meaningful insight. With more data being collected than ever before, extracting value from this data is only going to become more intricate and demanding. That means that in addition to the foundational programming skills, data scientists of the future will need a new level of statistical and probability capabilities that are essential for using machine-learning techniques.
Choosing Your Skills
How do professionals determine where to prioritize their focus given these new technologies and advances? An effective approach is to study the state of the practice. Learn what is happening at top companies such as Amazon, Google, and Netflix. How are these modern consumer-facing companies able to process data at scale to extract the meaningful information that leads to massive success?
Look at domains in other industries or areas of expertise. Are there any trends in the strategies and technologies that others adopted? It's notable that many of the postings today note that they require programming skills in languages such as Python, Java, Scala, and C++. This shows that a strong proficiency in object-oriented programming is mandatory for modern data scientists. The key isn't knowing any single technology, model, or practice. Professionals should be well-versed in a variety of tools, perspectives, and approaches so they can identify which methods and models are most appropriate for each use case.
In addition to managing and analyzing data, data scientists also need to understand business implications, communicate results, and understand how data insights can be applied effectively to drive decisions. This requires considerable programming and computing knowledge as well as social and business capabilities.
The trend hasn't gone unnoticed by educators. For example, to address the skills challenges, MIT Professional Education created a novel program that brings together data science, information and decision systems, and social sciences and connects to engineering domains in a rigorous way. Topics include recommendation engines, regressions, network and graphical modeling, anomaly detection, hypothesis testing, machine learning, and big data analytics.
We have long recognized that enterprises of all sorts would need data science professionals. As the LinkedIn listings show, there's plenty of opportunity at organizations in virtually any sector you can name. They all want to extract real business value from the troves of big data now at their disposal. However, to meet the market need and take advantage of the tremendous opportunities ahead for professionals in the field, data scientists must continue to evolve.
Over the next five years, data scientists will develop the ability to utilize all sorts of data in real time. This will fuel the need for more intricate predictions and computations at scale, which will spark the emergence of new data science paradigms to satisfy the requirements of future applications.
More data will be used to drive key business decisions, enabling innovations such as deep learning that allow for accurate predictions and decision making. Further, modern applications have brought to the fore new statistical paradigms such as recommendation systems that are key enablers for many modern businesses, be they media portals, e-commerce portals, or social interaction platforms.
Regardless of how things evolve, one thing is clear: skilled data scientists, statisticians, and business analysts will be the key to unlocking the endless possibilities of big data. Ongoing education can help them acquire the range of skills necessary for their new role as they become more central to business success.
For Further Reading:
About the Author
Devavrat Shah, codirector of the Data Science: Data to Insights course, is a professor in MIT’s Department of Electrical Engineering and Computer Science, director of MIT’s Statistics and Data Science Center (SDSC), and a core faculty member at the MIT Institute for Data, Systems, and Society (IDSS). He is also a member of MIT’s Laboratory for Information and Decision Systems (LIDS) and the Operations Research Center (ORC). The online course is open to any data science professional wishing to learn how to apply data science techniques to more effectively address an organization’s needs.