Benefits of Agile Data Warehousing: A Real-World Story
How agile data warehousing has transformed CN's BI delivery environment.
By Mark Giesbrecht, Senior Manager, Canadian National Railways
[Editor's note: As Canada's largest and North America's fifth largest railroad, Canadian National Railways (CN) has been leveraging data related to shipments and logistics, crews, and rail asset management, and the numerous back office processes in a data warehouse environment since the early 1990s. In this article, Mark Giesbrecht, senior manager, BI at CN, previews his keynote address at the TDWI World Conference in San Diego (August 18-23, 2013) in which he will explain the company's journey of agile data warehouse adoption and work still to be done.
The transportation industry is a process intensive business that is heavily dependent on data. On a daily basis, CN picks up, delivers, and invoices 15,000 carloads of freight while managing the inspection and maintenance of 22,000 miles of track, 2000 locomotives, and 120,000 rail cars. CN was an early adopter of data warehousing as a means to address the performance and data integration limitations of transactional systems. The warehouse was very successful in meeting CN's business needs and grew in size with increased demand for new solutions and reporting capabilities. Complexity also increased as efforts were made to take advantage of emerging technologies, creating a large web of different data stores, ETL processes, and standalone applications.
As the warehouse grew in complexity, inefficiencies inherent in the traditional waterfall method became increasingly apparent. Despite the BI team's efforts to get project releases out every six months, the business sponsors found delivery to be slow and expensive. The BI delivery team was challenged by the low level of business engagement and the continuous stream of exceptions and changing requirements. The BI project management office (PMO) was frustrated with the chronic misses on deadlines and estimates. The DW practice and support teams were unable to get the business and delivery teams to leverage opportunities that arose during solution architecting to address existing warehouse issues.
Independent of BI's delivery struggles, CN's IT department began to adopt an approach labelled as "agile," where coding started as soon as possible with very little overhead. Business sponsors began to migrate to the "agile team" with their BI requests and were happy to see results come quickly with few constraints. The agile team thrived on the high level of business engagement but often felt a lack of product vision caused requirement churn and re-work.. They also had difficulty scaling a project beyond a few team members or maintaining early success if the project extended beyond a couple of months. Focus on time-to-market rarely left time for solution re-use, standardization, or BI self-service capability. Quality was highly developer dependent and minimal documentation meant many agile team members remained informally responsible for support long past deployment.
CN was stuck between the chronic poor performance of the waterfall approach and what was perceived to be an agile free-for-all. The breakthrough occurred in September 2011 when the BI PMO was exposed to a book published in 2008 (Agile Data Warehousing: Delivering World-Class Business Intelligence Systems Using Scrum and XP by Ralph Hughes) that framed data warehousing in an agile/scrum delivery approach called agile data warehousing (ADW). The many adaptations addressed concerns about agile's reputation for poor scalability and sustainability. This framework brought the majority of stakeholders on board giving the BI PMO the mandate to drive a program of ADW adoption.
Establish an Improvement Framework
Adoption started on existing projects. Where practical, teams were co-located. The roles of scrum master and project architect were clearly assigned and defined. The work outlined in the project plans was organized into sprints of three to four weeks following the classic plan/act/demo/retrospective cycle. A backlog of ADW techniques to be implemented was created and a commitment made to progressively incorporate them into the delivery approach. This focus on continuous improvement driven by the collaborative spirit of the Retrospective Prime Directive increased team and business sponsor engagement almost immediately.
Improve Delivery Predictability
User story conferencing was used to recast and decompose requirements as epic/themes/user stories/developer stories. Developer stories allowed the team to introduce the story point estimation process to better predict what could be accomplished in the sprint. Business sponsors came to accept locking in the sprint's scope, knowing they could re-prioritize in the next sprint. In parallel, a widely available sprint tracking tool was deployed to monitor daily progress in burn down charts. For the larger projects, a pipeline approach was implemented with a design sprint building a backlog for the development sprint. The "specialized" sprint concept was further extended into integrated testing sprints and promotion rehearsal sprints. As the teams gained experience, they became increasingly accurate in their sprint estimates which, extrapolated forward, allowed project plans to be rebaselined.
Improve Delivery Productivity
Numerous initiatives were implemented to improve productivity. The BI PMO introduced standardized weekly status reporting to all stakeholders administered by a shared project control officer (PCO). New tools were developed to automate design and code standard compliance. Working with the DW practices, design and coding standards and guidelines were updated and reformatted into a self-service model to improve team autonomy and accelerate the addition of new team members, including offshore developers for simple ETL (e.g., staging tables). Additional warehousing environments were set up (by project when necessary) to streamline and isolate integration testing and promotion rehearsals.
Establish an Enterprise Perspective
The BI PMO provided the forum and encouragement for sharing best practices between delivery teams and between delivery and the DW practice. The DW practice listed objects to serve as reference architecture, "definition of done," and a basis of estimate (BoE). A process was established to continually update the BoE based on the delivery team actuals. A current BoE allowed new projects to provide more accurate project estimates right from the start. The DW practice also formalized a solution knowledge repository to which all the delivery teams contributed their as-built documentation.
Fully engaged and getting results, business sponsors were far more receptive to incorporating technical debt (such as inefficient models or ETL jobs) into the backlog. They also began to buy into an enterprise philosophy of "re-use before re-building but re-build to re-use," where practical to address long-standing warehouse deficiencies. Early in 2013, the ADW sprint concept has even been applied to enhancement and support work with a consolidated backlog feeding several teams aligned by business function.
Agile data warehousing has transformed CN's BI delivery environment. The consistent track record of project success has created a virtuous cycle of engagement, predictability, and productivity. Demand for BI services has increased significantly and the delivery team has scaled up in size to accommodate it. Most important, a healthy tension has developed in place of the conflict usually inherent in the objectives of delivery time-to-market and a sustainable, enterprise data warehouse.