Six Myths About Building a Data Science Team
We bust six common misperceptions about building a strong analytics team.
- By Keith McCormick
- October 30, 2018
As a data scientist, building a team may not come easy. Understanding successful team structures and practicing team management (employing interpersonal skills) are far different from learning analytics skills. Even worse, there are half a dozen myths about data science teams that can derail your team's success.
Here are six common myths you may have heard (and may believe) about building an effective data science team -- from what it is to who works on it to what they know -- and the facts you need to be successful.
Myth: It's all about technology
Reality: It's actually more about process
We all have memories of first encountering chess. The rules are not overly complex, but they aren't trivial either. When we learn how to set up the board and how each piece has unique characteristics, we feel that we've learned something important and memorable.
For many junior data scientists, mastering the grammar of a programming language or the intricacies of an algorithm gives them that same sense of initial confidence. The problem is that it does almost nothing to prepare them for a real-world project. Learning algorithms and languages no more prepares one for real work than merely learning the rules of chess prepares one to play an actual game against a human competitor.
Analytics isn't math. It's using math-based techniques to improve the operations of human organizations, and there are two ways to get good at it: lots of trial-and-error practice or borrowing from the experience of others. Something special happens after an analytics professional has led about a dozen projects, but it takes decades to get there.
All data scientists should dedicate a substantial part of their professional development time to learning how to lead these complex projects -- step by step and from start to finish. Their managers should insist on it and HR should look for it.
Myth: The team will manage itself
Reality: Analytics leadership is key
Another myth -- one related to our fascination with technology -- is that data scientists can work as independent specialists virtually untethered to the rest of the organization. The belief, particularly for nontechnical managers, sounds like, "I'll simply pass on the request. They'll know what to do. I'll just enforce deadlines and get them the resources they ask for."
This doesn't work for at least two reasons.
- In this scenario no one is taking responsibility for properly translating the business problem into an analytics challenge. If the team simply accepts management requests uncritically, they might take on a project that has little hope of success.
- No one is in a position to ask the tough questions -- the most important being whether the project should be done at all. Does it have a good chance at a substantial ROI?
A good analytics manager provides a knowledgeable interface between the data scientists and the rest of the organization, protecting their team from poorly conceived projects.
Myth: It's all about the vertical
Reality: Every team needs a generalist
On about half of my consulting engagements I'm asked if I have highly specific vertical industry experience. For example, I'm asked how many mortgage default models I've built for midsize regional banks. It can be a frustrating question because the potential client might feel that any effort to diminish the importance of industry experience is just self-serving.
I've learned over the years that the best team is a mix of people with industry experience and people with modeling experience (including building models in a variety of industries). Experience with a variety of models is an advantage, not a disadvantage. Understanding how and when to leverage domain experts at specific points in the modeling process is key.
A failure to recognize this might prompt an analytics manager to hire nothing but finance and banking talent for a mortgage default model. You need that knowledge in the room, but if you overdo it you might develop groupthink. Go for the diverse team and promote collaboration -- you'll produce better results.
Myth: It's hard to find data science talent
Reality: Develop talent by promoting from within
The reason organizations think hiring data scientists is tough is that their premise is wrong. They think that they need to hire analysts who will arrive knowing everything they need to know and won't need to be managed. When you are operating under this myth you focus too much on compatibility with whatever technology you think you're going to employ.
Any outsider will have to learn the organization. No outsider is completely effective on Day One when working alone. Why not pair them with someone who knows the organization but needs some mentoring in the data science area?
The fact is that data science is a team sport. There are experienced team members already in-house (perhaps working in other departments) who are discouraged from applying internally because their data science resumes aren't extensive, yet they have one of the most valuable traits that no outsider can ever have: they know the data and they know the company.
By all means, fill in technical gaps by hiring from the outside, but promote from within, integrate business specialists with technical analysts, and provide training across the team. This is also the most effective way to use temporary external resources. Don't merely outsource a project. Use every project as a mentoring and enablement opportunity for the overall cross-disciplinary team.
Myth: It's all about the latest algorithms
Reality: Don't let shiny objects obscure transparency
Three groups within your organization can get distracted by the latest shiny object: those that purchase technology, those that hire, and the modelers themselves. The latest and greatest algorithms are enticing because there is no question that they sometimes produce the most accurate models. Yet, fully half of organizations need their models to be transparent -- that is, they need explicit rules that are generated and not so-called black box models that are difficult or impossible to explain.
Any industry that deals with regulations and government oversight will almost automatically fall into this category. Most companies will choose transparent or explainable models over black box models simply because they are more aligned with the project goals. However, almost all of the more recent and more powerful modeling approaches, such as XGBoost and deep learning, are black box models.
It would be a mistake to assemble a team that had a strong collective fascination with these approaches. Yet "old school" techniques such as decision trees or logistic regression are perceived as stodgy and are too often eliminated from consideration. You need a well-rounded team that chooses a modeling algorithm because it fits a sound project definition and not merely to impress their peers with their technical prowess.
Myth: The analyst is the primary role
Reality: Five different roles must collaborate purposefully
This emerging trend seems innocent enough but is potentially very destructive. It is an extension of the algorithm myth, but in this case, data science is seen as being all about the modelers. When operating from this premise, many organizations seek seemingly self-sufficient modelers who can work independently.
Seeking so-called unicorns who can do it all, even when working alone, seems attractive at first, but it is neither practical nor possible. When someone is spending the whole project working alone, they are overlooking a critical dynamic; analytics projects are not abstract investigations that are going to be presented at a science fair. They are intended to improve the efficiency of the organization.
Focusing this efficiency requires knowledge of business metrics, platforms, operations, and organizational priorities. The only way to acquire this knowledge is socializing in a purposeful way with personnel who work in each of these areas. Working in isolation is inherently counterproductive. The key players on any project are the internal client, a subject matter expert, a modeling lead, a data steward, and a deployment lead.
To further demystify this myth, refer to my TDWI "Ask The Expert" webinar presentation, "The Roles and Construct of a Thriving Analytic Practice," available to TDWI members.
[Editor's note: You may also find this Q&A with the author -- "When is Your Predictive Analytics Team Ready for a Data Scientist?" -- a valuable resource. It is available to all readers. Keith McCormick is also a TDWI faculty member and teaches a broad range of courses offered through TDWI's Onsite Education Program, which brings tailored BI and analytics courses directly to an enterprise's conference room. Keith works directly with clients to understand their training needs, develop custom curriculum, and deliver content specifically aligned to corporate objectives. For more information, visit our library of onsite courses or contact [email protected].]