TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
  - Redefining Clinical Operations with Agentic AI: Accelerating Innovation Across Data Management and Site Monitoring July 30, 2025
  - Smarter Marketing in Retail: How AI and Modern Data Foundation Drive Growth July 31, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Platforms & Architecture Week July 25, 2025
  - AI Bootcamp Week July 25, 2025
  - Data Governance Week July 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

TDWI Articles

The Big Data Pendulum Swings from Centralized to Distributed Workloads

As businesses try to improve how they leverage data and increase their competitive agility, a converged data platform will be key to their success.

By Jack Norris
March 1, 2016

Tech cycles have swung back and forth over the years, from centralized to distributed workloads. When it comes to big data, organizations typically focus their efforts on centralized data lakes. The benefits of a centralized data lake lie in stark contrast with the alternative -- the operation of separate data silos.

Of course, the benefits of centralization are numerous, including reduced data duplication, simplified management, and robust applications that benefit from disparate data feeds. Although there are many benefits to data centralization, large organizations will increasingly move to distributed processes for big data in 2016 in order to properly manage across devices, data centers, and global use cases.

Let's take a look at some of the challenges and issues that organizations encounter with centralized workloads. Disparate workloads, for example, can prompt the need for separate processing clusters. Database applications are typically run on a separate cluster from Hadoop to avoid conflicts and to make them easier to manage. Organizations looking to take advantage of streaming analytics with open source technologies such as Spark or Storm will need to deploy additional clusters to handle streaming data and to coordinate separate data feeds to Spark (for streaming analytics) and Hadoop (for batch analysis).

In addition to workload obstacles, disparate users and groups can dictate the separation of clusters. As data access permissions and concerns regarding data privacy and protection mount, organizations are often forced to deploy separate physical clusters unless their platform has multi-tenancy capabilities that can properly provide the privacy and logical data separation.

Data gravity is another key driver. Many companies distribute processing workloads across multiple data centers in separate geographic locations. In addition to speed, the need for local processing is often driven by government regulations such as safe harbor privacy provisions. These provisions drive companies to separate the storage and processing of user data and to clearly define the acceptable borders for the processing of that data.

Emerging technology trends will further push the need for distributed processing. According to a recent study by Cisco, the Internet of Things (IoT) will result in over 50 billion devices by 2020. The data emitted by these devices needs to be collected, processed, and analyzed. The best demonstrated practices for IoT will consist of distributed processing, selective filtering, and aggregation of data that is then transmitted to various locations.

The Emergence of a Converged Architecture Approach

Distributed processing of big data will increasingly be required, but organizations will not need to make an "either/or" decision when it comes to centralization. As the pendulum swings to distributed processing, a centralized and converged architecture approach will rise in prominence. Although seemingly an oxymoron, a distributed converged architecture is a natural outgrowth of big data evolution. As big data usage evolves and scales, local processing requirements increase. As more applications are deployed, different workloads can conflict and impact performance and job completion times.

A converged data approach addresses the challenges of evolving big data and applications deployments. First, a converged platform manages disparate workloads without impacting performance. Second, a converged platform provides full multi-tenancy features that provide logical separation of data, job execution, and end-user access. Finally, to accommodate execution across remote sites, a converged data platform supports distributed processing with a logically centralized architecture.

What exactly does this mean? Let's look at a global automated advertising platform provider. This company provides an auction exchange for real-time bidding where advertisers buy and sell online ad impressions. This company has six data centers distributed across the world to provide the necessary performance to accomplish regional ad auctions. Information, however, needs to be shared across all locations for management purposes and to better serve customers. Global customers need to understand how ads are performing in individual regions, but they also need to understand how campaigns perform globally. In other words, information needs to be logically centralized to provide transparency and visibility. Customers need to understand regional ad performance and compare costs across regions so they can quickly adjust spending and shift priorities based on results.

Logical centralization also extends to the administration and control of a distributed cluster. How data is accessed, protected, and managed needs to be centralized. Features such as wide area replication are required to ensure that data in each remote location has full disaster recovery protection. Wide area replication can also provide synchronization across sites to support real-time reporting and dashboards to manage business results and processing.

A converged data platform includes core features that address the big data challenges organizations are facing today; it integrates file, database, stream processing, and analytics and delivers these benefits across a diverse set of applications and workloads.

As tech cycles continue to swing, successful organizations will reap the benefits of centralized big data while addressing the challenges of diverse and distributed workloads, users, locations, and regulations. As businesses continue to look for ways to improve their ability to leverage data and increase their competitive agility, a converged data platform -- which gives them the ability to process data locally and leverage it globally -- will be the foundation for success.

About the Author

Jack Norris, chief marketing officer at MapR Technologies, has over 20 years of enterprise software marketing experience. He has demonstrated success from defining new markets for small companies to increasing sales of new products for large public companies. Jack’s broad experience includes launching and establishing analytic, virtualization, and storage companies and leading marketing and business development for an early-stage cloud storage software provider. Jack has also held senior executive roles with EMC, Rainfinity (now EMC), Brio Technology, SQRIBE, and Bain and Company. Jack earned an MBA from UCLA Anderson and a BA in Economics with honors and distinction from Stanford University.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.

↑

TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

The Big Data Pendulum Swings from Centralized to Distributed Workloads

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI

Engage

Research

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

TDWI Articles

The Big Data Pendulum Swings from Centralized to Distributed Workloads

Related Articles

Trending Articles

Breaking Barriers in Conversational BI/AI with a Semantic Layer

AI in 2025: Key Considerations for Technology Leaders

The Tech Blanket: Building a Seamless Tech Ecosystem

What’s Ahead in Generative AI in 2025? (Part Two)

TDWI Membership

Accelerate Your Projects, and Your Career

TDWI

Engage

Research

Accelerate Your Projects,
and Your Career