Q&A: How Data Warehouse Design Can Reduce Time to Value without Compromise
WhereScape's CEO examines current data warehouse design, agile development, and the best route to revamping a data warehouse.
- By Linda L. Briggs
- February 12, 2013
Current data warehouse projects simply take too long to produce value, says CEO Michael Whitehead. All too often projects are built on the assumption that they won't need to change. Instead, Whitehead advocates delivering value quickly, thus winning the time and budget to continue to improve design.
In this interview, Whitehead shares his thoughts on current data warehouse design, agile development, and the best route to revamping a data warehouse. WhereScape, a data warehousing company, offers WhereScape 3D, a data warehouse planning tool, and WhereScape RED, an integrated development environment (IDE) for building, deploying, managing, and renovating data warehouses and data marts.
BI This Week: What are some of the problems you see with current data warehouse projects?
Michael Whitehead: They take too long to get to value. You earn the right to do more work based on the value you provide -- and that value is defined by the user community, not the data warehouse team. If the users don't care about data quality (or more accurately, are already aware of the data limitations and factor that into their decision-making), then don't try and fix it before you start. If data quality is the No. 1 issue, then put less effort into design. If you deliver value quickly, you'll be given the time and budget to make it better.
What about design techniques?
Design is just one piece of the puzzle. It's not a deliverable on its own, and a design is only useful when it's populated with data and being used to support business decisions. The availability, grain, and quality of data are always constraints that need to be dealt with in a data warehouse project. To reduce rework, data should be combined with the design much earlier than it traditionally is. Problems can then be identified far earlier, and in some cases entire projects can be failed before significant time and budget is spent realizing that there is no way a design can be populated.
Why is it so hard for the traditional data warehouse to respond to changing business conditions, and what do you see as the remedy?
Too often, data warehouses are built on the assumption that they won't need to change. Too often, there is a belief that there is a perfect design (created on the assumption that data is subservient to design) that users have agreed will meet their requirements now and into the future. Yet everything changes -- users, requirements, data, source systems, technology stacks.
Assuming everything will change means picking your battles -- keep everything as simple as possible, automate everything you can, and put time and effort only into those processes that you really need to.
What's your definition of the term "agile," and how do you see agile development addressing some of these issues?
Agile is a manifesto and set of principles over and above a set of common practices. Agile development keeps you close to the user community, encourages collaboration and communication, keeps you from going (too far) off track, and delivers value incrementally. It's important to note that agile does not necessarily mean a different end product, and it certainly is not an excuse for no requirements, no governance, and no documentation -- that's not called agile, it's called sloppy.
What kinds of mistakes do companies make in implementing or revamping a data warehouse?
Too often, there are multiple teams of functional experts with no common definition of success. It's one of the reasons I like agile approaches -- they encourage communication between team members. Agile often results in re-allocating work to different teams for better results. For example, I cringe when I see front-end teams doing enormous amounts of work because of limitations at the back end -- limitations that could easily and quickly be rectified by the back-end team.
What about mistakes companies make in moving to a new data warehouse?
Anyone moving to a new data warehouse today should be considering the impact of these two things: appliances and big data techniques. I say big data techniques deliberately, as I am referring to changes in the cost equation that come into play when you consider the impact that cheap storage and late binding can have on your architecture, your design effort, and on time-to-value. Combine this with the power of fit-for-purpose appliances and your new data warehouse may well look quite different from the one you're replacing.
Our customers no longer have, say, a Teradata data warehouse. They now have a data warehouse landscape that includes Teradata, Hadoop, Aster Data, SQL Server, and so forth.
Are organizations sometimes better served in starting completely fresh in the move to a new data warehouse rather than trying to move the design they have?
Organizations should always look to redevelop rather than port to a new platform. In a small number of circumstances, they should investigate porting as an initial option -- and then redevelop. Over time, all systems incur technical debt. Decisions are also made that take advantage of the particular strengths of the current technology stack.
Redevelopment, done right, is orders of magnitude less expensive than the initial development. You already know what you want (or don't want), and it's a great time to take advantage of data warehouse build automation. It's always better to have easily maintainable, optimized code for the new platform rather than porting what in a lot of cases is historical hacks, and hoping it will run better on a new platform. Moving from Oracle to Teradata, for instance, means you can take advantage of "merge," rather than porting inserts and updates that were the right answer once but are no longer current.
How does WhereScape fit into what we've talked about? What does it bring to the discussion?
At WhereScape, we're fascinated by the concept of reducing the time to value -- without compromise. Why does there have to be a tradeoff between doing it right and doing it now? We think that data warehouses are a very good platform from which to answer a range of questions, but that traditionally they take too long to build and are too hard to change.
We want to provide software to data warehouse professionals that enables them to rapidly build the best data warehouse possible, automating what we can for them while not restricting them. That's why we wrote WhereScape RED. We also want to make sure that projects are started off correctly and that problems are discovered in the first week or month, not the last. That's why we wrote our latest product, WhereScape Data Driven Design (3D).
Where is the ROI in implementing WhereScape? How quickly can a return be realized?
We want our customers to get value in the first project they use our software for. The price of hardware has come tumbling down, databases are cheaper and more powerful than they've ever been, yet the cost of good people is always on the rise. You can look to offshore development to save people costs, but we think a small team of smart people using smart data warehouse automation software is going to provide a better solution every time.