Q&A: 5 Essential Skills to Look For in a Data Scientist
The huge interest in big data has brought about a corresponding swell of interest in analytic skills -- and the title data scientist.
- By Linda L. Briggs
- October 1, 2013
The huge interest in big data has brought about a corresponding swell of interest in the skills necessary for analytics, predictive modeling, and the like. BI This Week talks with Teradata's chief analytics officer, Bill Franks, whose Taming the Big Data Tidal Wave examines the role of data scientists in depth. In the first half of a two-part interview, Franks discusses why the term is drawing so much attention, and what skills -- some of them surprising -- make up a good data scientist.
BI This Week: Why all the buzz around the data scientist title?
Bill Franks: The popularity of the concept of a data scientist is very much tied in with the huge popularity of the term big data. With both, it's beyond what the terms themselves mean. Instead, it's that the trend of data and analytics is starting to permeate businesses more and more. Big data and its rise, along with the data scientist, which has been tied in also, have somehow focused everyone on this. Data and analytics have been there for a while but are becoming more and more important. We're witnessing the arrival of that trend into full view -- it's been bubbling under the surface for the last few years.
Has the role of a data scientist changed, or are we just using a new term for a position that's been around for a while?
That's a good question, one that can stimulate some debate.
Some think that the data scientist position is something completely new. I feel strongly that if you dive under the hood and look at what data scientists are actually doing with analytics, and why, and what their companies are doing with those analytics, it's very similar to what analytics professionals in the business world have been doing for a long time.
In the big-business environment where Teradata lives, there have been people for years doing exactly what data scientists attempt to do: get all the company's data -- some of which it may already understand, much of which it doesn't -- and do some kind of exploratory, interactive, iterative analysis to uncover the "big bang" finding, and then help to get that finding implemented.
The way people are defining a data scientist is fine, and it's certainly a necessary skill set to have on board. I just don't think it's something new. It's a title that has been attached to a skill set that people like me have been recruiting for a long time.
The one big difference between a data scientist and a traditional analytics person tends to be the tools and technologies they're using to do the analysis. While someone like me might largely be using tools on relational databases, a data scientist might be much more likely to be using Hadoop, R, or Java, for example. The fact that we're using a different programming language on a different data management environment doesn't change that we're really doing the same things.
What sorts of skills make up a good data scientist or analytics professional? I know you spent an entire chapter on this in your recent book, Taming the Big Data Tidal Wave. Can you summarize that discussion?
In the book, I use the term "necessary but not sufficient" in referring to skills -- that comes from a mathematical proof class I took in college. The premise is, "Having skills in statistics, math, and programming is certainly necessary to be a great analytic professional, but they are not sufficient to make a person a great analytic professional."
To put in another way, you absolutely have to have the technical skills, but what really makes someone stand out as a statistician or a predictive modeler -- and I think the same thing applies to a data scientist -- are some additional skills. Other people have their versions, but there are five skills I look for.
Assuming someone has the technical skills, then, what are those five additional skills?
First of all, there's commitment. Analytics is already hard in general, and when you're on the cutting edge of using new data in new ways to solve problems the company hasn't solved before, it's going to involve late nights, frustration, and dead ends. You really need a certain level of commitment.
Next is creativity. This is the skill I talk about as being one of the biggest filters for people who have the right technical requirements. I'd say 15 percent of people at most actually pass my creativity filter during the hiring process.
Why is creativity so important?
Outsiders think data scientists are out there programming and running models -- all very by-the-book. The fact is, in school it might be by-the-book, but in a business environment, there are so many unknowns -- the data is not as clean as it should be, for one thing. It takes lots of creativity to figure out how to take this non-perfect set of data, analytics, methods, and business problems you've been given and make something really useful out of it, so creativity is a big one.
Then there's the importance of business savvy. I've never heard anyone discuss a data science profile without talking about understanding the business. Again, it's critical to have the person running the analysis fully understand -- and be interested in -- why this question is being asked, what the business person would do given the results, and why they would make that decision.
What happens without business savvy?
Here's the example I like to use: I could prove to you exhaustively that if you move your huge department store a half mile down the road, the sales would go up by 10 percent. Pragmatically speaking, no one in his right mind is going to move an entire department store that distance. It's one of these findings that mathematically or statistically is interesting, but it's simply not tied to business reality, so it really doesn't matter.
Then there's presentation and communication skills. ... At the end of the day, I can have a billion-dollar opportunity that I uncover for an organization. If I consider my job done at that point, by simply going to someone and saying, "Here's a billion-dollar opportunity; let me know what happens," I've helped them achieve nothing. All they have is some potential.
Being able to deliver the information effectively is important as well, then?
Not just that. A huge part of being successful in this kind of role is following through: working the politics, the internal selling; getting people to listen and convincing them to take action. That's all part of the job.
The final skill on my list is intuition. I think of it as one of those intangibles; some people just have the ability to make the right decisions quickly and effectively, and to adjust when they make a wrong decision. It's really hard to put your finger on that one until you see somebody in action. I have some interview filters for the other skills, but intuition seems to come from some combination of creativity and experience.
That's an intimidating list. I assume it's fairly rare to find somebody with all of those skills?
Sure. However, we get into some interesting concepts at this point. Someone on LinkedIn had a good illustration. He took my five terms and made a pie chart with five equal-sized pie slices. Now, if you're going to be a world-class organization, you're going to have at least one person who covers the whole pie, maybe helping lead the team and be the main driver of the effort. As you build out a team, it's critical that your team covers the pie, not necessarily everyone on the team has to cover the whole pie.
For example, I can have someone who totally gets the business. He or she is really good at analytics but is not good at and does not like to present. That's OK if I pair up that person with somebody who maybe isn't as strong in analytics but can understand the results and is completely comfortable going out and selling it to the executives. You pair up those two people and you have everything covered.
The key is to have a team that covers all the bases -- and I prefer that each person covers several bases. Accepting that almost no one has all the skills in one package is one way to keep your costs down and be more realistic in your hiring.
[Editor's note: In the second half of our interview, available next week, Franks discusses how to identify potential data scientists within your own company and the business value they can bring.]