TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - The State of Self-Service Analytics: Results from TDWI’s Latest Research September 8, 2025
  - Expert Panel: Building an AI-Driven Data Strategy September 15, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Blog

TDWI Blog: Data 360

FAQ: Next Generation Data Integration

A few days ago, I presented a TDWI Webinar based on my newly published TDWI Best Practices report about “Next Generation Data Integration” (NGDI). Almost three hundred people attended the broadcast, and (with such a large turnout) I got a ton of great questions from the audience about data integration (DI).

I’d like to share some of those questions with you (and my responses to Webinar attendees who asked them), as a way of expanding and clarifying the research findings of the report. If you care about DI, this should be interesting for you.

Concerning bulk upload, should we use a batch upload mechanism or Web services?

It depends on the dataset being bulk loaded. You should stick to your old reliable bulk loader for datasets that are very large, too large for a service bus, don’t have an immediate delivery requirement, or demand multiple complex passes (as many multidimensional structures do, when being loaded into a data warehouse). Most services, messages, or events used in a DI context handle time-sensitive data, which is delivered faster over a message or service bus. Also, real-time DI often enables Operational Business Intelligence (OpBI), where data is drawn frequently from ERP, CRM, and other operational applications, then loaded into a warehouse, mart, or other BI data store. OpBI may also use DI to publish improved data back to those applications. Many operational applications (especially SAP) are best extracted from via the application layer, and services and messages usually support such an interface. From these examples, you can see that the old (bulk loaders) and the new (services) intermingle in the newest DI generation.

Do staging tables play an important role in DI?

Yes. The newest generation of DI still relies of older, tried-and-true designs and DI architectures. And these typically have a variety of data landing and data staging areas, including databases (like operational data stores) and tables (whether physically in the data warehouse or external to it). One new spin on this is that 64-bit computing and very large memory spaces in server hardware now enable more effective DI pipes. This is where data is staged and processed in server memory, not landed to disk. This both speeds up DI transformational processing and boosts scalability for large data volumes. For many organizations, NGDI is about adjusting (not abandoning) useful best practices like this to take advantage of newly available platform capabilities.

Is DI architecture and information architecture the same thing?

No, they’re different. Information architecture is usually about the data models and schema within individual enterprise databases, plus data dependencies across multiple ones. DI architecture concerns the design of data flows, plus development standards (like preferred interfaces for specific applications). For DI, hub-and-spoke is the most common architecture, where a vendor’s DI tool or a control server (in home-grown DI solutions) is equivalent to a hub. But point-to-point interfaces still abound in DI jobs, and DI over a bus is subject to whatever the bus requires. My report explains that designing and using just the right DI architecture has become a critical success factor for satisfying next-generation requirements, like scalability, real time, governance, and DI team collaboration.

Where do you see ERP choices within the context of NGDI?

In my world, Operational Business Intelligence (OpBI) has become quite common. OpBI requires much from a DI tool. The DI tool has to support feature-rich interfaces to ERP and other application types. The DI tool must have optimization to draw data fast, frequently, and non-invasively from ERP modules and applications. And the DI tool must understand ERP data structures and function calls to make sense of ERP data, before integrating it elsewhere. OpBI and other real-time business practices wouldn’t be possible without real-time DI. In fact, my report shows that various real-time DI functions are the ones users will increase the use of most over the next three years.

Other common DI practices involving ERP include synchronizing customer data (and other data domains, especially product data) across multiple ERP modules and instances. Synchronizing reference data is a similar practice, one that’s growing quickly. Since some ERPs are almost impermeable, DI is regularly called in to assist with data access for data quality. This kind of coordination between DI and DQ is one of the hallmarks of NGDI.

Do you think certain aspects of traditional EAI are going to be part of NGDI?

Well, first of all, I regularly find some DI functions executed over EAI and similar buses in user organizations that have already made a substantial investment in a robust EAI infrastructure. Firms in financial and insurance industries are typical examples. Second, I think what’s happening in such firms is that DI is simply leveraging more deeply an existing infrastructure, just as other users, applications, and tools are. Third, DI is being driven to EAI, in situations where EAI has better interfaces (especially to packaged applications) or certain time-sensitive data has a real-time requirement (for which EAI messages are easily configured). Even so, there’s still a need for standard data interfaces over the enterprise LAN.

Any metrics around how much operational cost is associated with near real-time data integration vs the traditional batch model?

Ten years ago, real-time DI via EAI was possible, but it usually required the purchase of extra tools. Plus, real-time functions in tools and applications weren’t very robust, so an administrator had to watch and tweak them constantly. These two characteristics drove up the cost. Luckily, a lot of RT functionality is built into today’s applications, databases, and DI tools. Many firms have a robust EAI or service bus infrastructure that DI can tap for real time. For firms that have kept their enterprise software and infrastructure up-to-date, real time DI is quite accessible, reliable, and inexpensive, as compare to the recent past. But that’s with EAI in mind. From a different direction, batch processing has improved, too. It may be preferred in the form of so-called micro-batches for frequent intra-day extract that needn’t be truly RT.

Can you expand on RT event processing, including contexts for applicability?

You probably don’t want to handle just any kind of event via a DI tool. Instead, some kind of “complex event” benefits from DI processing. A complex event is actually multiple events, typically occurring at different times (even different months or years) that need to be correlated. ETL-ish DI can access the many diverse data sources and data models where complex data events may be managed. Today, I almost exclusively find federal intelligence or security agencies doing this, to recognize and quantify security threats. The TSA and Coast Guard come to mind. But it’s just a matter of time before such DI-enabled practices are common with customer events in for-profit corporations.

CONCLUSION

If you have a question or answer about Next Generation Data Integration (or a reaction to one presented above), please share them by responding to this blog.

Register for and replay the TDWI Webinar these questions came from at
http://tdwi.org/webcasts/2011/04/next-generation-data-integration.aspx?tc=page0

Download a free copy of the TDWI Best Practices Report titled Next Generation Data Integration, at http://tdwi.org/research/list/tdwi-best-practices-reports.aspx

Find tweets about NGDI by searching Twitter.com for the hash tag #NGDI.

Posted by Philip Russom, Ph.D. on April 19, 2011