Data Warehousing: Ingenuity, Labor, and Obsolete Practices
Proponents of data warehouse automation say the way we design, build, and manage data warehouse systems is obsolete. They say ingenuity, not labor, is the most important contribution human beings can make to the design, development, and optimization of the warehouse.
- By Steve Swoyer
- February 19, 2016
Proponents of data warehouse automation (DWA) say how we design, build, and manage data warehouse systems is obsolete. They say ingenuity, not labor, is the most important contribution human beings can make to the design, development, and optimization of the warehouse as well as to the business intelligence (BI) applications and analytics that run on top of it. Machines should be building and managing data warehouse systems, they argue; humans should be perfecting them.
"Data warehouse automation comes into play because people want to be more efficient. Most are working with ETL tools, which have all kinds of complexities, or they're working with scripts. Everything takes too long to build. There's no reuse. There's no [managed] automation. When they want to change something, even something minor, it takes too long again," argues Gertjan Vlug, founder of DWA vendor BIReady, which data integration specialist Attunity Inc. acquired in 2014.
Ingenuity is conspicuously and definitively human. Labor, by contrast, is best left to the machines. "Labor" is tedious, time-consuming, and repetitive. It's stuff we do without thinking. Labor entails activities that don't require interactive, critical, and imaginative engagement. DWA recognizes that traditional data warehouse development entails a simply shocking amount of tedious, repetitive activity, such as planning, scoping, designing, and developing data models; the generation of documentation, metadata, and other critical artifacts plus warehouse loading, the codification and persistence of stored procedures, user-defined functions (UDF).
"The data warehouse is known for ... taking a long time [to build] and for being very expensive [to manage]. Data warehouse automation actually adds value to the data warehouse into the 21st century. It gives you the [ability to] automate the time-consuming [work] of building and managing [a warehouse]," says Heine Krog Iverson, CEO of DWA specialist TimeXtender Inc.
Proponents say DWA focuses on eliminating tedious, laborious, time-consuming, repetitive tasks -- in other words, the kinds of activities that don't require human intelligence or creativity but which are still being done by living, breathing, salaried human beings. DWA treats this as a spurious example of what in economics is called "Baumol's Cost Disease." Briefly, classical economics tells us that wage growth tends to be positively linked with productivity. In the classical model, an increase in labor productivity in a specific sector should translate into an increase in wages in that same sector.
Baumol's Cost Disease describes what happens when this relationship or linkage is effectively broken -- i.e., when an increase in wages isn't linked to (or can't be substantiated by) an increase in productivity. According to economists William Baumol and William Bowen, such a phenomenon does occur, and for good reason. In some sectors, such as education or the performing arts, it can be difficult or impossible to boost labor productivity. It's inadvisable, for example, to eliminate intensive human involvement in the care and mentoring of six-, seven-, and eight-year-old children.
The classic case concerns the performing arts. Why, Baumol and Bowen wondered, do the wages of the human actors who perform the arts continue to increase? It isn't as if improvements in labor productivity have contributed to wage growth; nevertheless, a contemporary performer in an off-Broadway production of, say, Six Characters in Search of an Author will earn significantly more than did her counterpart in a 1920s Broadway production of that same play. Why is that?
According to Baumol and Bowen, it's because the contemporary off-Broadway production still has as many performers as the very first production, in 1921. In this case, the human component of the performing arts is conspicuous and definitive. As a contemporary reviewer put it, the performing arts "cannot share fully in the growth of productivity due to technology" because "[t]he live human activity remains a 'handicraft' product whose cost rises both relatively and absolutely."
The performing arts is a special, exceptional case.
Is data warehouse and BI development a special, exceptional case? Is it a field or discipline -- such as the performing arts -- that is plagued by the Baumol Effect? Data warehouse development and management, too, emphasizes intensive human involvement, "handicraft" products -- hand-coded scripts, custom-built infrastructure plumbing, and other artifacts -- and esoteric expertise. To some extent, this really is true of general-purpose software development, which places a premium on human ingenuity and creativity. Erik Brynjolfsson first discussed this problem in a landmark 1993 article, The Productivity Paradox of Information Technology. Brynjolfsson's article appeared after early enthusiasm for computer-aided software engineering, CASE, had given way to disillusionment.
The data warehouse, and data management itself, is a different case. Productivity can and should be boosted in the data warehouse model because of (1) the maturity of available DWA tooling and (2) the unique abstraction of SQL itself. (SQL is a very efficient and productive high-level language.) In other words, proponents argue, it's possible to use human actors much more efficiently in data warehouse development than in general-purpose software development. In data warehouse development, costs tend to increase exponentially because human talent and ingenuity are squandered on productivity-wasting stuff -- such as generating documentation.
DWA advocates say data warehouse development has not effectively exploited technological and methodological innovations that would improve productivity, particularly. In other words, they argue, even though productivity in data management hasn't increased significantly, the cost of the human labor required to build, support, and scale data warehouse systems -- from core DI infrastructure to reports, analytics, and other artifacts -- has increased at a consistent rate.
Put another way, the data warehouse toolkit of today still looks too much like the data warehouse toolkit circa-1994. There's no reason for this. "Data warehouses take too long to build and are too hard to change. This isn't magically going to get better unless we change what we're doing," says Michael Whitehead, CEO of data warehouse automation specialist WhereScape Inc.
"Data warehouse automation is a program for reducing cost, time, and risk. It's based on the recognition that change is inevitable, and it helps you to build in the ability to change from day one."
About the Author
Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at [email protected].