TDWI Upside - Where Data Means Business

3 Keys to Building an Agile, DevOps-Powered Analytics Environment (Part 2 of 3)

Working with data requires adaptation of many traditional agile techniques. How do we put agile practices and DevOps tools to use for analytics and data science projects?

In Part 1 of this series, we explored the benefits of using agile techniques, and some of the cultural changes, processes, and tools needed to make it work. In this article, we will look at three key concepts for making sure you have the big picture of how to adapt your team to agile practices and how to adapt agile practices to your team.

Set the Right Expectations

Small teams need simple processes; large teams with complex requirements need heavier, more formal processes. The following chart can help you understand where your team might fit on the spectrum between the most basic and most advanced needs.

If you are a small team, you should be ready to pick and choose the parts of agile methodology and DevOps that make sense for your situation. You don't have to do it all! A kanban approach to managing tasks often takes less time and meetings than scrum. You have ample time after business hours to make changes to your production environment, so you can probably skip the test environment and Change Approval Board (CAB) formalities.

For Further Reading:

Data Ops: What Is It? Do We Need It?

Anchored but Agile: The New Vision for Data Management

2019 Industry Predictions for Data Professionals

With a smaller group of developers and stakeholders, one person can play the role of product owner and developer, but that is no excuse for poor communication with business users and long delays in releasing new data and changes. Small teams have the advantage of being more responsive to changes in business priorities and quicker to turn around new features, so don't burden team members with more roadblocks than absolutely necessary.

The largest, most complex organizations require more rigorous development processes. If your team is on the right side of the spectrum, the agile methodology is crucial to being able to deliver value in the timeframe that business sponsors demand. Teams working in this environment must place a priority on frequent communication in real time, must have tight version control to avoid conflicts, and must have strong product management. The bigger you get, the less agile you tend to become and the more important it is to focus on starting less work and finishing more work sooner.

Overlay Agile Methodology on Top of Data Competencies

The goal with agile development is to frequently deliver working, tested data processes. The following illustration shows a core set of data competencies, surrounded by a framework for iterating quickly through new user requirements.

To make a full circuit around the development cycle, you need to be very careful to limit how much work you start. I recommend you:

  • Begin with one set of source data
  • Build only one model or dimension
  • Pick one report
  • Start with the most basic API
  • Avoid complex merges in the beginning

Whatever you do, make sure to finish it and release it to users within two to three weeks.

Overcoming Obstacles

Agile projects can run into difficulties. Here are half a dozen of the most common obstacles unique to agile data projects and recommendations for overcoming them.

Obstacle #1: You only have one set of source data from third-party source systems.

Solution: Replicate the source data into the three environments (development, test, and production). You are iterating the way you transform and visualize data, so focus on development lanes for those transformation processes.

Obstacle #2: You are using a cloud visualization/reporting tool that has only one environment.

Solution: Use folders or report naming to differentiate between development/test/production versions of reports. Prefix development reports with "dev-" or save them in a folder named "dev." Restrict permissions on those objects.

Obstacle #3: You are using a cloud-based or GUI-based data transformation or ETL tool that doesn't allow version control or deployment between environments.

Solution: Every tool allows cloning or copying processes. As in the previous solution, use file naming or directories to manage and promote objects from development to test to production environments.

Obstacle #4: You have a large amount of master data and need significant time to wrangle it before delivering any transactional data or reporting.

Solution: Start with just one piece of the data set. Create a framework for managing the data and show stakeholders the process. Just populating the framework with a small set of actual data and showing it to users is your first release.

Obstacle #5: You have too much data to create additional copies for test and production environments.

Solution: Cheaper data lake storage may be one approach. You may need to have one set of raw data that is accessed by development/test/production environments for the transformation steps. Alternatively, you may be able to use a random sampling technique to shrink the data set for development and test environments.

Obstacle #6: You are disrupted constantly by "shoulder taps" from the stakeholders, throwing off your sprint plans and priorities.

Solution: It is not enough for just the development team to switch to sprints. Business stakeholders must understand and make the switch as well. They should be heavily involved with planning and prioritization meetings. Their requests should be funneled to a product owner who will make sure their needs are considered for the next sprint.

Troubleshoot with the 3 V's

When running well, every task your development team performs fits these three criteria:

  • Valuable. Can the developer explain how this impacts the business? Is the user story fully developed enough to code against and to see how it fits in the big picture?

  • Visible. Hours are tracked, pacing is tracked, everything is prioritized against other backlog tasks and projects

  • Validated. The business receives the work, provides feedback, and the next iteration is adapted based on input

When problems occur, you can usually assign the root cause to one of those three areas. Delayed tasks may be due to the first V. Low-productivity teams may suffer from issues with the first two V's. Scope creep is a combination of all three V's. Managers must be relentless in reinforcing all three V's in every meeting and in every communication.

Next in This Series

In Part 3 of this series we will look at the day-to-day activities involved in an agile team, from organizing your project into stories and sprints to creating branches and releases.

About the Author

Stan Pugsley is a senior business intelligence architect at Xerva, a BI consulting and DWaaS (Data Warehouse as a Service) company based in Orem, UT. He has been building business intelligence and data science solutions for 20 years, having worked at PricewaterhouseCoopers, HP, Sharp Analytics, and a number of startup companies. You can reach the author at StanP@xerva.com.


TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, & Team memberships available.