Avoiding Reporting Snafus in Agile Development Environments
Even with well-designed systems, agile development practices can slowly -- and sometimes dramatically -- erode user confidence in reporting and analysis systems. Learn what to look out for and what you can do about it.
- By Cass Brewer
- March 28, 2016
Data warehousing teams and software development teams can be like relatives who, although cordial enough at family reunions, don't really share enough interests so that you'd want to hang out with them. Likewise, teams interact intensively on development projects, but then communication drops off. Most of the time, this is OK. ("How's Uncle Lou been?" "The same." Repeat annually.)
Every once in a while, though, important communication breaks down. A project work's name is left off the CC: line of an e-mail, or someone assumed a person from another team "already knew." ("How's Uncle Lou?" "Won the lottery. Didn't you know?")
More Often than You'd Think
How does this disconnect happen? Particularly in agile environments, software development pivots fast, occasionally leaving connected systems behind. Continuous, rapid software releases are driven by requests from "outward facing" product owners who might not consume data warehousing outputs. Development managers focused on functionality, system stability, and deadlines can easily forget to close the loop with data and analytical teams. As you might guess, documentation is also not an agile requirement.
Meanwhile, any incremental software change might significantly divide how data is collected for and processed by various systems. Even functionally trivial tweaks, like re-timing a cron job for better load balancing or rounding a calculation to reduce database size can, depending on where they happen in relation to ETL processes, put different numbers in production databases and the data warehouse.
In fact, these sorts of "data forks" are likely to happen if nobody is seriously assessing the impact of planned changes on downstream processes and analytical systems. Although quality assurance (QA) teams provide a backstop for software bugs and quirks, QA's scope is generally limited to testing actual functionality against developers' intentions. In the case of unintended difference between analytical and production data, only consumers of both would be likely to spot the difference.
Surprised? That's Agile!
When analytical data that comes out of the data warehouse differs from production data, there are a few usual suspects:
- A process downstream from where data is extracted for the warehouse changed production data
- Changes to a stored procedure in the production data dictionary changed inputs to the production database, but analytical data was extracted earlier in the data flow
- Changes in production software retroactively changed production data, which was never rewritten to the data warehouse
- Someone (usually a process owner or developer) manually changed production data, which was never rewritten to the data warehouse
Any of these changes would be captured and communicated by a sound change management practice. However, agile teams often bend conventional rules, favoring failure tolerance and on-the-fly change control. When releases are frequent and incremental, the theory goes, most changes are small enough for development teams to evaluate -- and roll back with minimal damage in the case of failure.
This works, until it doesn't. As noted earlier, development teams and business users aren't always capable of assessing the full impact of a planned change. Although agile emphases stakeholder communication, developers are also urged to keep processes lean.
In reality, this often translates as the shortest approval chain and the fewest (technical) fingers in the pie. When data is assumed to be stable or the data warehouse is tangential to production needs, data warehouse stakeholders get left out.
A Bit Political, but Not Impossible
For better or worse, agile often puts the burden of data vigilance squarely on BI, data warehousing, and analytics managers. Unfortunately, in most agile environments, you can't simply demand to be looped in on all changes on the off chance you'll spot a "bad" one. Bottlenecks are highly impolitic under agile, but there are some less-unobtrusive alternatives:
- Work with development managers to identify yellow-flag change scenarios: types of changes most likely to impact how data is collected and processed. Request notification on those changes only. If possible, automate notifications in task-tracking software.
- Pull data warehousing managers and system designers in a room to visually map the entire data flow, from entry points to both production databases and the data warehouse. The map should detail how, when, and where data is captured, transformed, and stored. Although mapping meetings are time-intensive, they almost always produce surprising revelations. The result -- a common and easy-to-read view of the shared architecture -- will pay off in faster diagnostics and design solutions down the road.
- If possible, periodically run a few lightweight reports on production data sets, then compare them to equivalent reports from the data warehouse. These diagnostic reports shouldn't impact significant performance metrics: they're simply to see whether data "at the source" differs from the data warehouse, which is often visible in just a few records.
- If you already know your problems spots (for example, a broken billing system where users can manually overwrite data at any time), and you can't fix them for whatever reason, periodically reload that source data into the data warehouse.
Remember, agile teams generally value interactions over processes and tools, responsiveness over planning. Your goal is to ultimately reduce the risk of data discrepancies. The politic approach is to directly address the problems without slowing developers down.
Cass Brewer is the editorial and research director for the IT Compliance Institute (ITCi), an independent firm focusing on the intersection of information technology, regulatory compliance, and business governance. A prolific author and presenter, Ms. Brewer is a member of the Association for Computing Machinery (ACM) and the Organizers' Collaborative for the Grassroots Use of Technology.