TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Redefining Clinical Operations with Agentic AI: Accelerating Innovation Across Data Management and Site Monitoring July 30, 2025
  - Smarter Marketing in Retail: How AI and Modern Data Foundation Drive Growth July 31, 2025
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Platforms & Architecture Week July 29, 2025
  - Data Governance Week July 29, 2025
  - AI Bootcamp Week July 29, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

Enabling the Real-Time Enterprise with Data Streaming

Data-driven businesses are replacing batch ETL uploads with pipelines to transform analytics and business intelligence.

By Yair Weinberger
August 7, 2017

In virtually every industry, organizations want to unlock real-time intelligence from disconnected data sources in order to improve business agility and competitiveness. However, legacy batch upload architectures are too rigid to accommodate applications that require continuous data streams for adaptive decision making.

Although a growing number of organizations have placed their data warehouses in the cloud, on-demand storage and compute cycles do not address their data pipeline challenges.

Because batch processes only upload data once a day, twice a day, or even once an hour, their ability to support the data-to-data needs of warehouses and today's real-time BI applications is diminished.

For Further Reading:

The Benefits of Streaming Data Are Contagious

Modernizing a Data Warehouse with Real-Time Functions

Define Your Business Case for Streaming Analytics

Data Streaming: A Disruptive New Approach

To take its place, an innovative new architecture called data streaming has emerged. It was conceived to create real-time data pipelines for implementing data streaming applications powered by open source technologies such as Apache Kafka.

Data streaming creates secure pipelines that stream data in real time from various sources -- notably databases, applications, and APIs -- to cloud data warehouse platforms.

It enables organizations to connect any data source within minutes to Amazon Redshift, Google BigQuery, Snowflake Computing, and other cloud data warehouses.

These real-time pipelines can support dozens of the most popular data sources, including transactional databases (such as Oracle, Microsoft SQL Server, or MySQL), SaaS vendors (such as Salesforce and HubSpot), and direct event streams from a Kafka topic or directly from iOS, Android, or JavaScript.

Organizations in virtually every industry are moving to a real-time data model to be more agile and achieve a competitive edge through faster, better decision making. Data streaming ensures that customers can integrate all their disparate data silos with the cloud provider of their choice.

Using Amazon Redshift, Google BigQuery, Snowflake Computing, and others, enterprises can create a central warehouse for data available from back-end relational databases, online events and metrics, support services, and other internal and external sources. Without centralization, analytics are both piecemeal and siloed. This makes it difficult, if not impossible, to produce real-time intelligence.

Creating this centralized data warehouse is not without its challenges because this data is spread across multiple sources and different systems in different formats. Some of it's flat, some is relational, some is JSON. Instead of writing custom scripts to integrate it all, which is beyond the resources of most companies, data streaming technology can perform these tasks.

Data streaming relieves IT staff from the drudgery of data movement so IT can focus entirely on data analytics. Because data streaming technologies can support a comprehensive set of integrations, enterprises can easily stream and access all their data in the cloud data warehouse of their choice. Every bit of data -- no matter how big or small and regardless of the source or format -- can be moved to the cloud without errors or requiring a team of engineers to write scripts.

Data Streaming Checklist

Here are the key elements to consider when planning a data streaming project.

Flexible data integration: The ability to transport data in the format required to any data warehouse, regardless of whether the data is structured or semistructured, direct or customized, static or changing. This includes sources such as:

Transactional databases (e.g. Oracle, PostgreSQL)
Salesforce.com (account information, stage, ownership, etc.)
Website tracking (all Web event data)
Web servers (customer activity such as adding inputs and deploying new code in the code engine)
Back-end logs (internal platform events such as data being loaded to the output, new table created)
Monitoring systems (to capture system issues such as input connections and latency)

Schema import and schema inference: Expect step-by-step data preconfiguration tools that make it easy to map every field of structured or semistructured data to a table and control how data is loaded into the data warehouse.

Code engine: Data scientists and engineers should be able to write custom code to enrich and cleanse data, create alerts, implement sessionization, and detect anomalies. To eliminate lengthy, high-latency data preparation jobs, all changes should be executed in stream in real time so data reaches its intended destination.

Live monitoring: Real-time visibility into data streams for monitoring behavior, identifying potential discrepancies, and debugging data records saves considerable time and helps you avoid problems. Live monitoring also lets you track incoming throughput, latency, loading rates, and error rates and can generate Web and email alerts.

Pipeline transparency: A dashboard that provides continuous views of data in motion and notifications that allow users to view incoming events, monitor throughput and latency, and identify errors in real time.

Schema management: When data changes, a real-time response is needed to make sure no event is lost. The ability to manage these automatically or generate notifications so changes can be made on-demand is critical.

About the Author

Yair Weinberger is cofounder and CTO of Alooma, a company that specializes in data integration. He is an expert in data integration, real-time data platforms, big data, and data warehousing. Previously, he led development for ConvertMedia (later acquired by Taboola). Yair began his career with the Israel Defense Forces (IDF) where he managed cybersecurity and real-time support systems for military operations.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Enabling the Real-Time Enterprise with Data Streaming

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

Enabling the Real-Time Enterprise with Data Streaming

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career