RESEARCH & RESOURCES

Q&A: When is Your Predictive Analytics Team Ready for a Data Scientist?

A veteran industry consultant and mentor suggests building your predictive analytics team first, then bringing in a data scientist.

Data scientists are scarce, valuable, and costly, as you probably know if you've tried to hire one. In this two-part interview with industry veteran Keith McCormick, we explore the tremendous interest around the data scientist title and skill set, including how to structure your predictive analytics team to make best use of a data scientist, what skills to look for, and how to determine when it's time to hire (or bring on someone from within).

For additional thoughts from McCormick on becoming a data scientist, TDWI members can read our interview in the Volume 20, Number 3 issue of TDWI's Business Intelligence Journal.

McCormick is a senior consultant, mentor, and trainer with The Modeling Agency, which provides training, consulting services, and mentoring on data mining, predictive modeling, and analytics. McCormick has served as a keynote speaker and moderator at international conferences focused on both analytic practitioners and leadership. Since 1990, he has been designing and developing advanced analytic solutions involving structured, text, and big data analytics using both popular commercial solutions and open source tools.

McCormick spoke at a recent TDWI Webinar, Your First Hire in Predictive Analytics (Hint: It's Not a Data Scientist).

In this first part of a two-part interview, he discusses the role of a data scientist on a predictive analytics team and how to save money by building the analytics team first, then bringing in a data scientist.

BI This Week: As you pointed out in a recent TDWI Webinar, the term "data scientist" has been around for a while. Why is there such focus now on finding and hiring data scientists?

Keith McCormick: What's happening is -- to get a little colorful with my metaphors -- everyone wants to just add water and have a data science practice. Companies seems to think that if they can manage to attract that person with the most experience, the longest resume, the most letters after their name, a Ph.D.-type person -- if they just make that hire, everything else will fall into place.

But like most things in business transformation, it's just not that easy.

In my experience, if you're going to attract and keep a data scientist and have them be at their most useful throughout their employ, you have to do some preparation before they arrive. They can't be Step One in your business transformation. The first step is starting to build the human resources around what might eventually become a predictive analytics team.

What's happening is that some organizations are a bit intimidated by the most technical aspects of [predictive analytics]. They think, "Oh my gosh, I don't even know how to interview this kind of person. I'm just going to search for people on LinkedIn, and if they have the skill, let's just bite the bullet and get the most experienced, most titled person we can, and they'll help us make all these other decisions."

They know that they need somebody who has some of these skills -- and they probably do -- but they fall into the trap of thinking that they have to compete for the top person. That competition, of course, is driving salaries up, but a Ph.D. in statistics is not always also a good IT manager. A Ph.D. in stats doesn't necessarily know Hadoop. Those are just examples, but you can see where I'm going here. Getting somebody with the longest technical resume isn't necessarily the same as getting someone who can successfully lead your business transformation.

When we talk about a data scientist, are we always talking about hiring someone to lead a predictive analytics effort?

From my point of view, yes. Data mining as a phrase is somewhat out of vogue. For a decade, I called myself a data miner, and I'm starting to do that less. [Part of the reason is because] people talk about data mining as something they can do in an afternoon in Tableau. I love Tableau, and I love visualization tools in general, but the kinds of things I'm talking about here can't be done in an afternoon.

For me, predictive analytics always involves prediction. It always involves deployment; the predictive model gets inserted into the business practice. It's not just some insights about our best customers using X, Y, and Z, but rather something much more complex and nuanced.

You said in a TDWI Webinar recently that it could be one, two, even three years before an analytics team is ready for a senior data scientist to come on board and lead it.

I think so. When I put that presentation together, I was thinking Year Two. When I bounced the presentation off a couple of trusted colleagues, they said, "Really? The second year? Will the team really be ready then?"

So let's say this: That senior person, the data scientist on whom you're going to spend a substantial amount of salary dollars, is most helpful when you have several analytic models at different stages of development. Maybe you can borrow an experienced person, either from elsewhere in the company or from a consultancy, for just a week or two to help you plan a project. Then for weeks after that planning session, it might be IT team members simply engaging in data preparation.

Generally, I've seen organizations underestimate the amount of transformation that the data has to go through before it's ready for the analytic models. Remember, with predictive analytics, the data is being designed for a completely different situation than the reporting function it was originally intended for.

Can companies save money by not hiring an expensive data scientist right away and instead focus on areas such as data transformation?

I call it the "rent-a-unicorn" proposition. Sometimes, borrowing a unicorn is less expensive than hiring one full-time initially because it's not just about expense. It's about having the right environment for them to work in.

By the time you need a senior data scientist full-time, you probably have at least one project behind you. You probably have one or two projects that are ongoing, and you're probably maintaining and monitoring those deployed projects. You now have the machinery of a team going and you're finally ready to use that senior person full-time.

Otherwise, you're going to bring them in -- particularly if your budget only allows for one hire, which happens all the time -- and hiring managers and HR people are probably going to be surprised by how much they have to pay for that person. You end up hoping that one person can show tremendous value before you go back to management, and to HR, and start building a team around them.

That's not very realistic because for the first few months, or maybe the first year, that data scientist that you hired to run the team is the team, in fact, and they are doing everything. That's stressful and difficult. They have to be the analytics evangelist as well as cook and bottle-washer, and the data cleaner, and they're coding. That's a lot to put on someone's plate. In addition, they may be part-time HR person, because they're also interviewing to fill additional positions on the team. I've seen it enough to know that it creates a difficult environment for that person.

You mentioned another downside to this approach in the Webinar -- that hiring from within doesn't tend to happen when the data scientist comes from outside and starts the team from scratch.

Yes. The scenario I just described means that the company is probably hiring people from outside. That means you have a team leader -- an entire team, in fact -- that's new to the business. That makes for a tough first year in analytics.

Within the company, there are often people who understand the data really well. What about that strategy -- hiring and training someone from within the company to lead the analytics team?

One gentleman e-mailed me after the Webinar asking how he can get into this area. He is a 15-year veteran of IT, with numerous certifications. If someone like that raises a hand and says, "Sure, I'll go to a three-day boot camp. I'll learn R. I'm excited by this." If that happens, you really need to recognize how amazing it is. Just think about how much experience that person can bring. Maybe they haven't been in a management role, but they're probably going to be comfortable lobbying for something they're passionate about, even with the most senior members of their company.

If you spot someone like that in your company, start them out part-time in this new role. Make participation in the analytics project part of their responsibilities. Since analytics and IT are often aligned, and often have the same boss, so go ahead and let them be halftime on a project for 10 or 12 weeks. Then, if it's hard for all the members of the team to get their work done because they have some many additional responsibilities, you hire the most junior member of the team -- instead of the most senior member -- from the outside. That's so much easier than bringing a senior person in on top of them at first. Instead, let them get through that first project.

How do you know that you're at that point in the process when you're ready for a data scientist?

Organizations differ in how long it takes them to complete a project. Some organizations will start that first analytics project and finish it successfully out of the gate, but many more companies tend to stumble the first time out.

You're not really ready for an experienced team leader, at least full time, until you've survived the first project. Even if you involve a data scientist only part time during that first project -- and that's what I recommend -- I don't think it's something you want to jump into.

In fact, I emphasized this in the Webinar -- you need to have someone in your Rolodex that you can trust, who can lead that analytics team the first time out. Now, that could be someone that the vendor recommends (if you're using a commercial package for analytics) or it could be a data scientist in your area with a consulting practice. There are several options.

What skills should companies focus on in hiring a data scientist?

Without a doubt, companies definitely underestimate the importance of soft skills. Every project that I've ever been on goes through a period of organizational resistance. You think, "Wow, everybody wants this technology. Everybody wants data science. Why would there be any organizational resistance?" Well, what you're doing is inserting new pieces of information into the predictive model that's going to drive business decisions, and if someone is already making that business decision now, their role is going to evolve.

Here's a classic example: a colleague of mine worked on a project for a big national retail chain. The issue was what to stock at the store level. Regional managers were making that decision. The idea that stock was going to be recommended by a predictive model based on data -- of course managers still had discretion in individual stores in their own areas -- made them very uncomfortable at first.

Like all good modelers, my colleague did a dress rehearsal. He implemented the model at five percent of stores, and it was so incredibly successful that all the stores then wanted to line up to be a part of it. They had all pushed back at first, and that can be assumed.

If you bring somebody in, let's say because of their technical background or their statistics background, there may not be a lot in their training that prepares them for managing organizational resistance. Most MBAs, on the other hand, may have experience with case studies and business transformation, but they probably don't know the inner workings of a neural net. So often, though, the soft skills are underestimated because of the tremendous focus on technical skills.

Most of the other elements fall into place. People know there's a computer science element to it. They know that although the team lead probably doesn't have to be a programmer themselves, they better "feel" the programmer talk enough that the programmers on their team respect them. You look for that mix of programming, computer science, statistics, and the soft skills.

That sounds like a tough combination to find in one individual.

That's why we use the term unicorn. How many statisticians have experience being in front of five members of the C-suite? Once you've decided that you're going to deploy this model, and it's going to drive, let's say, all of your outbound sales team -- well, the next thing you know, you're in front of the chief marketing officer or the COO making your case. It happens all the time.

You want to hire someone who's very comfortable in that environment. You don't want somebody who says, "Just let me do my job. I want to go off and write my technical models and not lobby and deal with the politics."

Being a data scientist leading an analytics team is a much more political job than people realize, because again, it's about business transformation.

Hiring with a focus on soft skills is often the opposite of what companies do, right?

That's right. The challenge for companies is, you need someone you can call on who is a bit of an advisor. If you're using a commercial software package, from IBM or SAS, for example, you can be guided through that first project [by someone provided or recommended by the vendor]. Often, with that first project, the commercial vendor really wants you to have a success story, so that can work.

You do need someone to help guide you. Let me be clear -- the junior resources on their own need someone to mentor them, so someone recommended by the vendor is an option, and consulting is another option. In the context of a 200- or 300-hour project, which is not at all unusual with analytics, internal resources can do two-thirds or three-quarters of the work.

Speaking of that guide or mentor who comes in before you have a data scientist on board, is this typically someone who sits on the business side, or on the IT side?

A lot of the work is IT, but a lot of the folks who are going to have experience with a project like this from start to finish, and at that strategic level, probably are coming from the business side or they are that unicorn that you have in your Rolodex -- a consultant that you're not hiring but whose expertise you have limited access to.

For instance, this year I'll probably play some kind of mentoring role in as many as six to eight projects. If I were a full-time project lead, I might only do two projects this year, but in a mentoring capacity I can do three of four times as many. That's because I'm only inserting myself when I'm most needed -- during the planning phases, then showing up during modeling week, that kind of thing. During the data preparation phase, I can be available to someone I'm mentoring for just a few hours a week.

In that mentoring role, are you assisting more with technology questions or soft skill questions?

That's a great question. The answer is both, but at different phases, as I've discovered over many years of using this approach at The Modeling Agency. At the beginning is [an assessment phase]. It's not a technology phase. The assessment phase is literally that -- we ask a lot of questions, interview team members, and collect information. During that phase, it's really all about soft skills. Of course, if you're a project veteran, you're asking technical questions, but it's really all about the soft skills.

For instance, I just went through that phase this week with a client. I was primarily working with a hospital analyst -- so on the business side, not IT -- but over the course of a day and a half, in a conference room, there were eight to 10 middle-management folks from the hospital, all from different departments, including both business and IT folks. It was an incredibly disparate group. Over that day and a half, the whole idea was to say, "What projects are you contemplating?" By asking questions and discussing, project definitions completely transformed. With my help, they turned problems into projects that can be solved using actual data. We work to translate them from business problems into predictive analytics problems. That translation process takes a lot of experience, and it definitely calls for soft skills.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.