Distributed Advanced Analytics Through Citizen Data Scientists
For analytics teams to be successful, they need to learn how to scale their human resources using citizen data scientists. This will require new thinking and new practices.
- By Troy Hiltbrand
- November 28, 2022
When you hear the term “distributed advanced analytics,” what do you imagine? Do you think of large, horizontally scaled Hadoop clusters with Spark running on them processing complex models and generating advanced insights? What if the future of distributed analytics in 2023 was less about scaling machines horizontally but rather scaling human resources horizontally?
Since the beginning of the last decade, universities around the world have pushed to train a new generation of data scientists. They incorporated courses in their curricula focused on teaching students about R, Python, and Scala syntax and usage. They built courses on data engineering processes and the process of building advanced ecosystems of open source and proprietary technology that can process big data in its various facets. This has greatly increased the supply graduates who can think analytically, but not all of them are destined to join the analytics team.
The challenge is that as enterprises strive to be more data-driven, the demand for data and analytics resources outpaces the supply. The answer to this challenge is to adopt the citizen data scientist model. In this case, the term “citizen” really refers to anyone who is not part of the analytics organization. They can be internal or external resources with the skills needed to augment efforts to become a data-driven organization.
To accomplish this scale-out, there are three questions an organization must answer: where do we find citizen data scientists, how do we enable them, and how do we protect the organization from their potentially destructive behavior, whether intentional or unintentional?
Finding Citizen Data Scientists
When searching for citizen data scientists, you need to find individuals who are analytical by nature and have complementary skills to those of your analytics team. You are looking for resources with skills that can add a new perspective to the problems you are trying to solve.
There are many resources in other parts of your business, in scientific fields of study, or in the IT department that can provide these complementary skills (see Finding Talent on the Periphery). By incorporating these individuals into your analytics team, you add diversity of thought and new tools and techniques for solving problems.
Identifying these resources is the first step. The next step is making the opportunity attractive and compelling. You need to establish a value statement about what you are doing that is attractive to these resources and engages them to want to help. This could include setting an attractive vision, establishing a compelling compensation structure, and incorporating them into your culture and team structure.
Enabling Citizen Data Scientists
When working with citizen data scientists, you will have a mix of skill sets. You could have individuals versed in different languages (e.g., R, Python, Scala, SAS) and at different levels of maturity in data preparation and data engineering capabilities. You want to start with the skills they have and build from there.
Your goal is to establish tools that provide these citizen data scientists with the appropriate data and functionality to assist in your advanced analytics efforts. This will include model development, model validation, and data visualization tools. Because these citizen data scientists do not perform these duties on a full-time basis, the user experience (UX) of these tools is of the utmost importance. The tools need to be easy and intuitive to use without sacrificing their power.
In addition to enabling these data scientists, ensure that their work is documented and verifiable. In advanced analytics, model lineage and explainability are critical components to your success. Without them, your analytics results may not be reproducible and may not garner trust from your key decision makers.
Governing Citizen Data Scientists
With citizen data scientists, the concept of trust but verify is essential. You need to have solid controls around what they can and can’t do, but you also need to be able to trust in their results and in the processes they used to achieve those results. This requires a new level of citizen data scientist governance. This can be based on the governance practices you use with your internal teams but must be augmented to address the unique environment in which citizen data scientists work.
The key to governing citizen data scientists is to balance innovation and data governance (see Three Ingredients of Innovative Data Governance). Although this balance is not easy to achieve, when done right it can be empowering for your internal team and for your extended team of citizen data scientists.
What’s Coming in 2023
In 2023, we are likely to see advances in distributed advanced analytics. By finding data scientists, enabling them, and creating a data governance model that works to balance innovation and data protection, we are sure to see an evolution of analytics teams and an increase in their capacity to accelerate the transformation of organizations to become data-driven. These practices will result in the horizontal scaling of human resources and not just the horizontal scaling of the systems.
Troy Hiltbrand is the chief information officer at Amare Global where he is responsible for its enterprise systems, data architecture, and IT operations. You can reach the author via email.