Can Data Integration be Agile?
Methods for agile software development have been around for ten years, and for the last couple of years they’ve been penetrating projects for business intelligence (BI), data warehousing (DW), and data integration (DI).
- By Philip Russom, Ph.D.
- June 17, 2010
Methods for agile software development have been around for ten years, and for the last couple of years they’ve been penetrating projects for business intelligence (BI), data warehousing (DW), and data integration (DI). Traditional agile methods stress iterative revisions that quickly lead to a usable software product based on direct input from end users. The agile approach seems to apply well to BI, in the sense of reports, dashboards, BI portals, and analytic applications. Yet, agile methods don’t seem to apply as directly and effectively to DW and DI. In this article, I’ll list the barriers to achieving agile DI (with a few comments about agile DW), then list recommendations for vaulting the hurdles.
To review, agile methods originated to speed up the development of code-laden procedural logic for operational and transactional applications that automate a business process. BI’s reports and dashboards are similar to these applications, so agile methods apply directly to BI. However, DW and DI tasks are quite different in that they focus on data and the repurposing of data, in the context of long-term infrastructure that will be shared by many teams.
For example, BI reports and dashboards depend on data to populate them, but that same data, in turn, has far more dependencies from a project viewpoint. When provisioning data for BI purposes, DI development involves many time-consuming tasks that resist acceleration. These include getting approval to access source systems, profiling source data, improving and documenting metadata and master data, developing data transformations, deploying interfaces, modeling data for target systems, assuring data quality, and so on.
In addition, DI and DW work must comply with agreed-upon standards for data models, cross-system interfaces, architecture, governance, and stewardship. These enterprise-scale standards transcend the individual development project and so cannot be omitted for the sake of speed or agility.
Furthermore, DI in support of DW is not a one-off, standalone project, as many operational and transactional applications are. Instead, DI builds shared infrastructure in that DI assembles data that many reports, dashboards, analyses, and a wide range of applications will tap. To assure an appropriate level of re-use for DW data, DI solutions must be crafted to collect the right data, transform and cleanse it for an intended purpose, and document it carefully so BI developers and business end users understand exactly where the data came from and what it represents.
Given that background, we can now answer the question posed in this article’s title: “Can Data Integration be Agile?” Yes, DI can indeed be agile, though perhaps not as agile as BI. In fact, a new practice for agile DI is currently emerging, despite the hurdles explained earlier. Even so, take note that the following caveats and recommendations:
Agile BI developers must allow time for DI/DW development. Team members who develop BI prototypes and drive iterative versions must plan for how the BI product will be populated with data. If the necessary data is not already in the DW, the BI developer must allow time for data modeling and DI development, plus coordinate BI work with related DI/DW work from other team members.
Agile DI specialists should build skills for generating test data. The point is to provision data for early prototypes and iterations created by BI developers, but do so quickly and with minimal effort.
Expect to refactor test datasets in support of rapid iterations. Incrementally improve the prototype dataset, even if it will be tossed out eventually. In parallel, you must apply what you’re learning to a permanent DI/DW solution.
Tolerate wasteful practices if they accelerate broader development cycles. Tossing a prototype dataset seems like a waste of time, although we could argue that prototypes are supposed to be disposable.
Agile DI practitioners still have to document data that’s headed for the DW. Otherwise, you put in peril the DW’s role as “the single version of the truth” that’s reused by many people and applications. Documentation is usually applied to metadata, master data, and DI objects (e.g., routines, jobs, data flows). Depend on DI tools that can automatically generate documentation from these. Resist application-centric agile methods that seek to expunge all documentation.
Don’t let anyone use agility as an excuse to throw out DI best practices. Given the speed and disposability of dataset prototypes, it’s hard to find time or even rationalize applying best practices in data quality, transformations, metadata development, and data modeling. These best practices may be ignored at the beginning of an agile sprint but should be incorporated incrementally in mid-to-late iterations of the sprint, as you abandon the prototype and start formal development of the deliverable.
Philip Russom is senior manager of TDWI Research. Philip can be reached at [email protected] .