The Rise of the Data Curator
The new role of data curator focuses on bridging data analytics between the worlds of business and IT.
- By Kelly Stirman
- January 26, 2018
A new role is emerging in the world of data analytics: the data curator. As companies become more sophisticated in their use of data to solve their most critical business challenges, they develop specializations in their teams to address each stage of the end-to-end process.
Today these roles include individuals who reside in IT -- data engineers -- as well as those who sit in the business: data analysts and data scientists. In 2018, we will likely see more data curators, a new role that focuses on bridging the worlds of business and IT in terms of data analytics. Let's review the responsibilities of each role:
- Data analysts typically use tools such as Tableau and Power BI to develop visualizations, reports, and dashboards that tell a story about business data. They work within the business and rely on IT to provide access to data from different applications and systems.
- Data scientists use tools such as Python and R to build models that provide predictions, recommendations, and visualizations based on data. They also work within the business and rely on IT to provision their data.
- Data engineers have a deep understanding of the systems and infrastructure that generate and store the business data. They work in SQL, Python, Java, and other languages to query, transform, aggregate, and move data between systems for different end user needs. They work within IT.
- Data curators use tools such as Dremio to curate data for different analytical tasks, to allocate computational resources for accelerating data analysis, to add semantic meaning to a data catalog, to blend data sets together, and to organize project areas for teams of data analysts and data scientists to work together more effectively. [Full disclosure: the author works for Dremio.]
How Data Curators Streamline Analytics
Data analysts and data scientists understand the meaning of the data but are dependent on IT to source the data, including applying transformations, blending data from different sources, and securing access at all steps in the process.
It is common for data analysts and data scientists to begin an analytics task by opening a ticket with IT. In this ticket they describe the data sets required for the job as well as specific formatting requirements, update frequency, and what tools they will use to perform their analysis. IT takes this ticket and assigns the task to a data engineer, who in turn is responsible for gathering any additional requirements and performing the work necessary to fulfill the request.
This back and forth between the business and IT can slow down the process due to a lack of common understanding of the data and the processing required to make it available for analysis. Data engineers have a deep understanding of the infrastructure and the formatting of the data but not of the data itself. Analysts and data scientists, on the other hand, have a deep understanding of the data but not the underlying systems and tools used to process it.
Data curators fill this gap and streamline the process of sourcing, organizing, and accelerating data for analysis. They know the data and understand the analytics workloads better than data engineers because they are closer to the business units. Data curators also have a good understanding of the types of systems that store the data and the types of tools that can be used for processing the data, even if they are not practitioners of these technologies themselves. They have up-to-date knowledge about data sets, their provenance, and what data curation is needed. They also understand the different types of analysis to be performed on specific data sets as well as the expectations of latency and availability set by diverse business users.
By working with data engineers, data analysts, and data scientists, the data curator develops a deep understanding of how data is used by the business and how IT applies technology to make the data available. Data curators are making data analysts and data scientists more productive by allowing them to focus on what they do best.
Kelly Stirman is the VP of strategy and CMO at Dremio. He oversees the planning, development, and execution of Dremio’s strategic initiatives centered on messaging, brand awareness, customer satisfaction, and business development. Previously, Stirman was VP of strategy at MongoDB where he worked closely with customers, partners, and the open source community. For more than 15 years, he has worked at the forefront of database technologies.