TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Redefining Clinical Operations with Agentic AI: Accelerating Innovation Across Data Management and Site Monitoring July 30, 2025
  - Smarter Marketing in Retail: How AI and Modern Data Foundation Drive Growth July 31, 2025
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Platforms & Architecture Week July 29, 2025
  - Data Governance Week July 29, 2025
  - AI Bootcamp Week July 29, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Q&A: Agile Data Engineering Using Advanced Data Modeling Concepts

How data architects can solve age-old dilemmas with new techniques.

By James E. Powell
July 16, 2013

Traditional data modeling techniques have served us well, but data warehouses requirements are quickly evolving, and data architects must find a new way to keep up -- to be agile. We spoke to Ralph Hughes, chief systems architect at Ceregenics, Inc., about a new concept -- data engineering -- that can help. Ralph will be presenting his ideas in greater detail at the TDWI World Conference in San Diego (August 18-23, 2013). We spoke to him about what data engineering is and how it fits with agile business intelligence.

BI This Week: Ralph, your course at TDWI's World Conference in San Diego is called "Agile Data Engineering." The phrase "data engineering" will be a new term for many attendees. What is your basic definition of this concept?

Ralph Hughes: When following traditional DW/BI development approaches, data architects believe the warehouse's data model should be essentially complete and correct before ETL coding begins, yet it's nearly impossible to get an enterprise data warehouse (EDW) model complete at the start of a project. There's always so much left to discover about an organization’s true business requirements. Moreover, we can't get the data model perfect up-front because those business requirements will change over the many months it takes us to build the ETL for a given project.

Given those realities, it would be far easier if the data architects could instead:

Model for what they know now
Adapt the data model across the years and many DW releases as business conditions change
"Automagically" morph the EDW's existing data to fit into the new data model

Over the past decade, I’ve said that the DW/BI fringe has developed some new data modeling techniques that allow us to economically work in this fashion. Building for what's needed now and evolving the warehouse as requirements change -- that's what I call "data engineering."

To put a finer point to this new concept, how is "data engineering" different from the data modeling or data architectural work most of us already practice?

Traditional data architecture and modeling practices strive to deliver a design that can be built once and remain largely static for many years. With that goal, data modelers must add many features to the data schemas intended to handle the rare cases that will break the warehouse design if they should ever occur. Such "future-proofing" is a level of technical perfection that adds much complexity and delay to DW/BI projects.

By contrast, look at the work of other types of engineers outside the software industry, such as mechanical and civil engineers. You quickly realize these physical engineers are not striving for technical perfection per se. Instead, they narrow the performance specifications their creations will support, and then balance technical considerations against crucial constraints such as time, material costs, and construction labor. They work to an economical solution.

Reflecting upon my 30 years of building data warehouses with large systems integrators, I would say considerations such as minimizing time and cost have not been given the weight they deserve. Instead of meeting current business needs, the data architects I've worked with have chosen to future-proof their designs, accepting delay and expense as necessary evils.

If we in the DW/BI profession would incorporate some of the mind set from physical engineering, we'd design for the problem at hand, and adapt that design if and when the problem space changes. The additional challenge for DW software engineers is that we have to roll the existing data forward into each new design, rather than throwing it away. "Data engineering," then, is data architectural techniques adapted to take advantage of the flexible data schemas that some new data modeling paradigms have made possible.

Given that data can now evolve, data modelers can bring cost and schedule constraints back into the mix of priorities. They can focus on providing economical designs that solve current BI requirements without having to invest even a tenth as much in future proofing.

How does this notion of "data engineering" abet the concept of agile business intelligence?

Agile is all about constantly delivering value to the business customer. Agile data warehousing practitioners repeatedly deploy small improvements to a BI application every two or three months. These many increments build upon one another until soon an enterprise data warehouse emerges from the overall collection of online modules. We have achieved 3X accelerations in delivery speeds following this approach, which was born out of carefully reviewing traditional DW/BI development methods and eliminating all activities that do not directly contribute to a new increment of value to the customer.

Yet, even this super-fast delivery method only addresses the ETL development portion of a DW/BI project. We've still had to wait 12 months or longer for the data architects to provide a data model that was sufficiently future-proofed that it would be reasonably immune to expensive re-engineering costs every time business requirements changed.

Agile data engineering eliminates this wait time by either of two new, "hyper data modeling" techniques that permit you to build out a warehouse's database a few objects at a time. The re-engineering cost of conversion scripts is eliminated by these techniques because they allow already-loaded data to be easily re-cast into a new design when the warehouse's underlying database tables must be redesigned.

Typical schema updates can be accomplished by re-structuring at most only a small table or two. Such capabilities eliminate the risks that used to force data architects to insist on near-perfect data designs before they would let a development team start coding ETL. We can now get a project started quickly, adapting the data as we learn about the business problem and as business conditions change. Fast starts and incremental deliveries provide steady value to the customer and delight our EDW business stakeholders. That's agile!

Your description of agile data engineering sounds like a very big topic for a single day's class. How are you planning on focusing the presentation, especially for attendees who are not sure they want to adopt any of the practices you'll be explaining?

The focus will be on comparing the four data modeling paradigm DW/BI practitioners must choose from before they design their next warehouse. The four data modeling approaches include the two traditional styles of standard normal forms and conformed dimensional forms, plus the two new styles of hyper-normalized and hyper-generalized forms. We will quickly review the traditional modeling styles, provide a medium-level introduction of the two new styles, and then detail the impact that each style has on the speed and expense of both EDW development and EDW re-engineering.

In particular, the class will revolve around estimating the level-of-effort required to solve four common data model changes that typically require expensive re-programming and/or conversion scripting:

Promoting attributes to new entities
Upgrading to a party model for customers, employees, and vendors
Adding a new slow-change trigger
Adjusting the granularity of a subject area

By contrasting the re-engineering costs of these types of changes across the four data modeling approaches, we'll equip attendees to return to their work sites ready to demonstrate to their colleagues where the new data modeling styles will save development time and money. They'll be able to provide a well-reasoned estimate of how large those savings will be.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Q&A: Agile Data Engineering Using Advanced Data Modeling Concepts

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research