Q&A: Agile Data Warehouse Design
Why agile BI needs modelstorming: agile techniques for data warehouse analysis, modeling, and design.
- By James E. Powell, Lawrence Corr
- July 24, 2012
Agile BI means adopting the best practices of agile software development but it also means adapting DW/BI's own best practices for agility. Lawrence Corr, an experienced data warehouse architect and co-author of Agile Data Warehouse Design (2011, DecisionOne Press), argues that dimensional modeling is one such best practice, ripe for agile BI. Corr is teaching Agile Dimensional Modeling at the TDWI BI Symposium in London September 10-12, 2012 and the TDWI World Conference, focusing on agile BI, in Boston September 16-21, 2012.
BI This Week: What does agile BI mean to you?
Lawrence Corr: Agile software development emphasizes the early, frequent, and sustainable delivery of working software that adds business value. Agile BI should therefore, be about the early, frequent, and sustainable delivery of data in a format that provides valuable business insight. That emphasis on iterative, incremental development makes perfect sense because it matches the natural rhythm of how BI requirements come into existence and change, but it's greatly at odds with the serial way in which the data management community gathers data requirements, model them and design databases.
Unfortunately, the established agile techniques and ideas do little to address this cultural clash; they are notoriously light on prescriptive advice about data in general and data modeling specifically. This has led many organizations to apply agility later on in the BI lifecycle to ETL and BI development but enterprises largely ignore it for the earlier phases of data warehouse analysis and design -- which is a shame. One could argue that's where we need agility most.
What agile practices do you see BI professionals focusing on?
BI teams obviously fixate on early and frequent delivery because it is the most tangible outcome and measurable benefit of becoming agile. They also worry a lot about responding to change -- a core value of the agile manifesto -- given the ever-increasing data volumes that data warehousing and BI have to deal with. They adopt existing agile project management and development practices such as scrum and extreme programming (XP) and, in typical IT fashion, seek out new technologies that promise to accelerate development and mitigate against database change. The use of, or interest in, automation tools and reusable design patterns is particularly prevalent.
What mistakes do organizations make when they apply agile to BI and data warehousing?
The number one mistake has to be not engaging BI customers early on in the process of becoming agile. BI teams can't become agile without agile BI stakeholders and users. The original agile manifesto and its 12 underlying principles stress the importance of customer collaboration, face-to-face communication, and users and developers working together, just as much as it champions delivering working software, striving for technical excellence and better team dynamics. Both scrum and XP are predicated upon customer evolvement mainly through the agile requirement gathering technique user stories.
User stories can be an excellent way of gathering small increments of interactive functionality for application development or individual report/dashboard requirements for BI (while avoiding bloated requirements documentation), but it is difficult to see how they can be used successfully to define architectural and data requirements. Hence, there's a tendency to postpone using agile techniques until a data warehouse is in place. Which can lead to "agile tomorrow, never today." It's extremely hard for a project/team that doesn't start with some agile practices to become agile.
What best practices can you recommend to start agile and engage BI stakeholders?
It's no use pretending that agile BI isn't still very much concerned with query database design, whether it's traditional physical or soon to be in memory, in the cloud, or virtualized. To address this, we need agile data modeling: data modeling that can be done early, frequently and collaboratively with BI stakeholders to tease out their data requirements without having to wait for less-direct requirements analysis techniques (e.g., decode data requirements from interview notes, lengthy requirements documents, or user stories).
"Data modeling directly with business people" might sound like crazy talk to some modelers, but if we make the process more systematic, visual, interactive, and, dare I say it, fun, and make the results less abstract and more tangibly valuable -- as tangible as working software -- data modeling can be something that business people and whole BI teams, not just data modelers, really want to take part in. After all, database schemas (or virtualization semantic layers) are the earliest pieces of working software needed to deliver BI.
To actively involve BI stakeholders in data modeling, we need to create our own agile techniques to add to the agile common body of knowledge. We don't just need user stories, we need data stories or story-driven data modeling: techniques that give business analysts just enough data modeling techniques and data modelers just enough analysis skills to discuss BI data requirements with the business.
BEAM: Business Event Analysis and Modeling -- described in the book Agile Data Warehouse Design (2011, DecisionOne Press) which I co-wrote with Jim Stagnitto -- is one approach. It is a set of agile techniques for dimensional modelstorming (which is what it sounds like: modeling and brainstorming) with BI stakeholders. It steals ideas from linguistics (storytelling), business modeling (the business model canvas), and visual thinking (the 7Ws) while eschewing entity relationship modeling notation and dedicated data modeling tools in favor of example data tables and simple inclusive tools such as flipcharts, whiteboards, sticky notes, and spreadsheets. It's based on dimensional modeling because, whether we build physical star schemas or not, dimensional modeling is still our best way of thinking about our business processes and how we can use BI tools to measure them. That's the reason we model anything: to raise our understanding of how it works and how we might use it.
Where do you see agile BI in, say, two years?
My crystal ball is a little hazy at the moment. We'll have to wait for some of the dust from the current big data and cloud computing hype and reality to settle. I am sure, as their enabling technologies mature, we will have more powerful hardware and software platforms that further support agile BI. What is not so sure, for me, is that we will all be doing agile big data BI or agile real-time analytics. These will, I think, remain niche activities for some time yet.
What I do see is a huge growth in proactive BI development. The lead time between operational development (the creation of new data sources) and reactive BI development has been eroding for years. In the past, we built BI upon relatively stable, relatively well-understood, source systems and business processes. Now and in the future, the implementation of disruptive new business models and the increased use of agile for operational development means we must design BI solutions with far fewer source system certainties and develop them in parallel with those uncertain sources.
If we thought gathering BI requirements using established techniques was challenging when stakeholders knew their established business processes and had already touched their operational data, we're going to need radically new agile approaches to be more proactive and define tomorrow's BI requirements in advance of source data. "In advance" might sound a little crazy, too, but the benefits of getting BI-biased design in the driving seat, ahead of OLTP-biased design, to help define those future business processes and data sources could be amazing.