RESEARCH & RESOURCES

Q&A: When to Add a Data Scientist to Your Predictive Analytics Team

A veteran industry consultant and mentor discusses how to tell when your predictive analytics team is ready for a data scientist.

When is the right time to bring a data scientist on board to lead your predictive analytics effort? In this second of a two-part interview with industry veteran, consultant, and trainer Keith McCormick with The Modeling Agency, says that "at takeoff and landing is when you really want the most experienced person ... because at first you want to decide [whether] this project even makes sense, and if so, what type of project are we taking on? What's the scale of effort? The most experienced person in the room is going to be able to give you the answers to that." At deployment, McCormick continues, "organizational resistance can come back. At that point, again, you really want an experienced person."

The Modeling Agency provides training, consulting services, and mentoring on data mining, predictive modeling, and analytics. McCormick has served as a keynote speaker and moderator at international conferences focused on both analytic practitioners and leadership. Since 1990, he has been designing and developing advanced analytic solutions involving structured, text, and big data analytics, using both popular commercial solutions and open source tools. McCormick spoke at a recent TDWI Webinar, Your First Hire in Predictive Analytics (Hint: It's Not a Data Scientist).

For additional thoughts from McCormick becoming a data scientist, TDWI members can read our interview in the Volume 20, Number 3 issue of TDWI's Business Intelligence Journal.

BI This Week: Let's talk about predictive analytics teams and their need for data scientists. Who do we need on a good predictive analytics team? You've noted that teams often don't need a data scientist initially, so who should be on that team?

Keith McCormick: You don't need the data scientist full time that first year. You really need that expert resource for only a few weeks that first year while you're getting people acclimated.

Instead, you need someone in a data steward/data prep role, for example. The data steward is the one that knows -- as one client said to me -- where the bodies are buried. That's the person who knows that if there's a variable, it's in the math table. "Oh, we didn't used to measure that variable, but we started measuring it three months ago." It's the person in the hospital system who knows the ethics system, who knows the healthcare database backwards and forwards. It's the person who can tell the analyst whether the data exists and where to find it.

Then there's the data prep person who is manipulating and changing the data -- the person doing all kinds of SQL queries or doing that same manipulation in commercial packages. Most organizations have some of that kind of talent already.

Then there's the modeler. The modeler could be from the business side or the IT side -- maybe that person we discussed earlier who raised a hand and offered to learn new skills -- and you always need that architecture role.

You're going to need that experienced data scientist, but as I said, you may want to borrow them in the first year because if you bring them in too early and they are sitting on their hands, one, you're not getting full benefit out of them, and two, they may get frustrated. ... They want to be in the game, too. They want to be doing exciting projects. When you really need them is when you have several models for several projects going on for them to work on.

By "borrow" do you mean borrow from somewhere else in the company?

No, probably borrow from the outside. You'd be looking at an independent consultant resource, possibly. I think it's very helpful to have someone sit down with you to help plan your attack. Depending on who that person is, it may be the kind of person you're trying to bring in full time down the road. It's somebody who has been through a project before.

How do you structure the team so you're making the best use of your shiny new data scientist? How do you make sure you're making their best use of his or her time and talents? How do you make sure they're not doing something that someone less costly could be doing?

I remember working with a client, a Fortune 50 chemical company. When I was there in a consulting capacity over the course of a year, I must have gone through that "assess" stage that we talked about earlier, at least eight, 10, maybe 12 times. In most cases, those projects were never undertaken because a well-done assessment stage will oftentimes tell you: not now or or this isn't the right project for us. That's when you really want that experienced person to lead that assess stage.

A metaphor I sometimes use is that of a pilot of a large airplane. Think of the pilot a few years ago who managed to land a crippled plane on the Hudson. That was a result of deep experience, among other things. That emphasizes the fact that at takeoff and landing is when you really need the most experienced person. At project assessment and at deployment is also when those soft skills become the most important. Initially, you need to decide if the project even make sense, and if so, what type of project you are willing to take on. What's the scale of effort? The most experienced person in the room is going to be able to give you the answers to that.

At deployment is when organizational resistance can come back. At that point, again, you really want an experienced person.

I think you're going to know that you have a need for a full-time data scientist when they are spending 30 to 40 percent of their time doing project assessment and project genesis activities. They're kicking things off as well as getting mature projects into deployment. That frequently involves going back to the senior VPs and the C suite again.

If 80 percent of the team's time is assessment, modeling, and deployment, now you have that data scientist doing what data scientists should be doing. Of course, they're overseeing data prep, they're hiring, they're training -- they're doing all of those things -- but they are focused on assessment, modeling, and deployment.

What are some of the big mistakes you see companies making in terms of when to bring in a data scientist and how to structure the team?

For one thing, I think too many companies rush to modeling. That's a real mistake. Hiring a data scientist too early is more of a symptom. They rush to modeling because that's the sexy phase.

Maybe they get the software first -- which isn't always a bad thing -- and they try it on their own. Then they feel a little stuck. At that point, they've skipped all the earlier phases. They haven't done a good assessment, they haven't done good planning, they haven't done a good prepare. They leaped right into modeling and they're stuck. They think they're stuck around technical issues so they jump on LinkedIn and they find a data specialist for their particular software or tool, and the starting premise is, help me fix my model.

Then, in my experience, a lot of consultants don't do the right thing at this moment. They say, "No problem. I'm an expert on software project Z. I will help you with your model." Instead, they should go all the way back to the beginning. The company has a bad experience and thinks, "Wow, we don't want to go through that bad experience again. We better bring in an expert form the outside." It's becomes a vicious cycle.

What companies really need to do is spend more time on that assess stage. It's well worth the time, believe me, to figure out why they are doing a project and what they are trying to accomplish before going any further.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.