TDWI Articles

Q&A: How to Build a Data Team for a Modern Data Stack

Data teams are critical for successful enterprises. Rohit Choudhary, founder and CEO of Acceldata, offers several tips for building your ideal team.

What’s the role of a data team when it comes to data operations?

Critical requirements for successful data operations include eliminating complexity and ensuring data quality while improving data pipeline reliability. Data teams are in charge of monitoring and analyzing the data flow within an enterprise's systems. Therefore, it is essential that data teams and engineers understand exactly how their data is being collected, processed, and used to identify and troubleshoot issues that may arise.

For Further Reading:

Modernizing Your Data Team and Its Best Practices

What the Modern Data Team Looks Like and Where It’s Headed

Out of the Loop: Why Agile Alienates Data Teams and What to Do about It

A successful data team can monitor the health and performance of data pipelines, identify bottlenecks or errors, and take corrective action. The team has continuous access to detailed information within data pipelines and systems to quickly identify and debug issues. Data teams are responsible for assessing the quality of data being collected, processed, and stored so members can identify issues or discrepancies at scale. Finally, all team members must understand the data flow and use within an organization, enabling them to collaborate more effectively and work toward a common goal.

Who is part of a data team? What is its typical size?

A successful modern enterprise data team contains a diverse group of individuals, ranging in skills and expertise. Typically, these teams include data engineers, data scientists, database/data warehouse/data lake/data lakehouse administrators, platform engineers, data analysts, and other types of data practitioners. Let me explain what some of these roles look like and how to find the perfect candidate for each.

Data engineers are responsible for building and maintaining infrastructure and pipelines necessary for ingesting, storing, and processing data. Data engineers play a critical role as organizations add more layers of tools and manage data on various platforms. A candidate for this role should have strong programming skills and experience with tools such as SQL, Python, and Apache Beam.

Data scientists typically play a role on the front end of data, where they translate the meaning of the data for a variety of business use cases. This role applies statistical and machine learning (ML) techniques to analyze and understand data, which requires exceptionally accurate data from an organization. Data scientists should have strong math and programming skills as well as expertise in statistical analysis and ML.

Data analysts use data to answer business questions and provide insights to an enterprise across all departments. This individual needs strong SQL skills and proficiency in data visualization tools, such as Tableau or Power BI.

These are just three key roles needed to build a champion modern data team. Remember that your team will vary based on the type of organization, data stack, complexity, and business needs.

How can data teams ensure data is usable and effective when and where it’s needed?

Data engineering teams are continuously challenged to ensure that data is of the highest quality to keep data flowing efficiently. A recent study showed that 45% of data leaders experienced data pipeline failure 11-25 times over the past couple of years due to data quality problems or errors that were discovered too late. Yet over half (53%) said their data teams spend between one and six days per month addressing data quality issues.

Today’s third wave of CDOs are challenged with supporting business’ ravenous appetite for new data workflows, but as we all know, data engineers are in high demand, with short supply, and require premium compensation. Automation, efficiency, and active incident management are the only way to scale data reliability. Artificial intelligence (AI) automates more sophisticated policies such as data drift to determine if data is varying from historical patterns and distribution -- significantly reducing the time for implementation of reliable data. Previously, the only way to detect data drift was to query the data manually and either run it against a home-built machine learning model or spot-check it visually.

What are some strategies for building the best data team for your organization?

It’s essential that CDOs and other data leaders identify how to align their organizational needs with capable individuals and technologies that can help achieve their goals. Here are a few steps to achieve this.

For Further Reading:

Modernizing Your Data Team and Its Best Practices

What the Modern Data Team Looks Like and Where It’s Headed

Out of the Loop: Why Agile Alienates Data Teams and What to Do about It

1. Define current and future data stack needs based on the tools in your stack, such as specific data tools. Is open source a big component of your data strategy? If so, you will likely need experts to help you manage your open source plans. Implementing a data observability solution provides a common framework offering full visibility into your data supply chain to best determine the needs of your stack.

2. Determine your enterprise’s plans for data platforms, by considering what your platform situation is now and what it will be in 12-18 months. If you’re on premises only right now, and considering a hybrid approach, you need data team members who have a background in cloud environments or specialists in digital transformation.

3. When possible, hire individuals with a range of skills and experiences. Data environments are not one-size-fits-all; just as they adapt to new needs, you will need team members who can also adapt. The ideal situation is to have experts who are also overall problem solvers and can identify how to optimize your enterprise's data stack, regardless of its components.

What trends do you see developing this year related to building or enhancing data teams?

Here are some identified trends for building data teams in 2023 based on our work with forward-thinking enterprises:

  • Data teams will increasingly use AI and ML techniques to analyze and understand data and to automate various data processing tasks.

  • Enterprises will continue to adopt cloud-based technologies (such as cloud-native data lakes and data warehousing solutions) to easily and efficiently process and store large amounts of data.

  • We are seeing teams use more real-time data streams, such as those generated by IoT devices, to drive real-time decision-making and improve responsiveness.

  • Organizations will increase the integration of data and business processes to incorporate data into business processes and decision-making at all levels of the organization.

How do you see the role of data teams changing in the next year or two?

We expect to see an increased role for AI and ML to address growing data complexity and variety, leveraging advancements in AI and ML technologies, such as generative AI and ChatGPT, for increased demand in data-driven insights and increased automation and efficiency. Additionally, we are seeing the following:

  • Improved collaboration with cross-functional teams. Data teams are expected to work more closely with cross-functional teams, including business units, IT, finance, cybersecurity, and legal, to align data strategies with overall business objectives.

  • Increased agile and DevOps adoption in data engineering and analytics. Data teams are already adopting agile and DevOps methodologies to streamline data engineering and analytics processes, promote collaboration, and accelerate time-to-insights.

  • The emergence of data literacy and upskilling. With the increasing importance of data in organizations, data teams will focus on improving data literacy across the organization by providing training and skill improvement programs to business users and other stakeholders.

[Editor’s note: Rohit Choudhary is the CEO and co-founder of Acceldata, a San Jose-based startup that has developed an end-to-end data observability cloud to help enterprises observe and optimize modern data systems and maximize return on data investment. Prior to Acceldata, Choudhary served as director of engineering at Hortonworks, where he led development of Dataplane Services, Ambari, and Zeppelin among other products. While at Hortonworks, Rohit was inspired to start Acceldata after repeatedly witnessing his customers’ multimillion-dollar data initiatives fail despite employing the latest data technologies and experienced teams of data experts. You can reach Choudhary on Twitter or LinkedIn.]

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.