Agile Data Quality Best Practices
How enterprises can accelerate the creation of new data quality solutions while aligning with business goals.
- By Philip Russom, Ph.D.
- July 30, 2013
Data quality (DQ) has always been a moving target, because enterprise data represents real-world entities (such as customers, products, partners, and employees) that naturally evolve over time. As if that weren't challenging enough, data quality professionals are under renewed pressure to identify and provide quality improvements for new sources and types of data, as organizations deploy new applications, implement new customer or partner channels, explore big data, and tap into new sources (such as machine data and social media).
To keep pace with accelerating demands for data quality solutions, many data quality teams and tools have embraced practices drawn from agile development methods. The agile method for software development has been in use for over ten years, and its tenets are summarized in the Manifesto for Agile Software Development (http://agilemanifesto.org). Agile methods originally focused on the development of hand-coded procedural logic for operational and transactional applications. Agile data quality is where agile methods are applied to data quality projects and solutions.
Agile data quality typically has four goals:
- Deliver each software solution for DQ as early as possible to shorten time to business benefit and use
- Build DQ solutions that align with business goals
- Make DQ development nimble so an enterprise can seize business opportunities and correct new data errors
- Change DQ solution direction, as business requirements evolve
These goals are achieved by adopting a lean team structure. Agile teams are usually led by two people who represent business and technical constituencies, respectively. The two leaders communicate directly to cut bureaucratic red tape, which in turn both speeds development and assures that technical work aligns with business needs. With agile DQ, team leadership commonly consists of two positions:
- A data steward is a line-of-business manager (or his/her representative) who knows exactly what a department, business unit, or business process needs for business success from data and its quality. He or she is also responsible for ensuring that the non-technical elements of data quality projects, such as the need for business process improvement and culture change, are put in place to reinforce the IT changes delivered.
- The DQ technical lead is a developer who translates the steward's needs into actionable technical specifications, but with minimal documentation, then develops or oversees the solution.
Note that agile DQ maintains the established practices of data stewardship, which focus on rapidly identifying data issues and addressing them, but adapts it to be even more responsive. For example, agile DQ requires more regular and direct collaboration between the steward and technical lead than data stewardship does. In addition, the steward and technical lead have even more independence and authority for prioritizing business needs relating to DQ and defining the details of DQ solutions.
Here are some recommendations for successful practices in agile data quality:
- Make agile data quality about business goals, not just development speed
- Wait equals waste, so eliminate time sinks for greater DQ agility
- Provide self-service for certain users to eliminate wasteful waiting
- Demand an improved, prototype dataset early and often, to assess project direction
- Practice data-driven documentation, which is what agile DQ needs
- Rely on DQ services for plug-and-play agility
- Look for vendor DQ tools that are conducive to agile DQ
To learn more, read the new TDWI Checklist Report, Agile Data Quality, available here.