TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: What's Next in Data Integration: Powering the AI-Driven Enterprise August 25, 2025
  - Expert Panel: Improving Data Quality, Accuracy, and Consistency August 27, 2025
  - The State of Self-Service Analytics: Results from TDWI’s Latest Research September 8, 2025
  - Expert Panel: Building an AI-Driven Data Strategy September 15, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Data Governance Week July 30, 2025
  - Platforms & Architecture Week July 30, 2025
  - AI Bootcamp Week July 30, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Up Close: Speed, Parallel Processing, and Massivly Parallel Processing

Neither parallelism (ala Oracle) nor parallelism (ala Microsoft) is classically parallel, or massively parallel (like Teradata, Netezza, and others)

By Stephen Swoyer
October 15, 2008

Recently, both Oracle Corp. and Microsoft Corp. entered the high-end data warehousing (DW) market, touting a pair of offerings -- respectively, the Oracle Database Machine and Microsoft's Project Madison -- that emphasize performance, scalability, and sky's-the-limit expandability.

Central to both offerings is a massively parallel processing (MPP) implementation -- the special sauce made famous by high-end DW firm Teradata Corp., and (more recently) espoused by data warehouse appliance vendors Netezza Corp., DATAllegro Corp., Dataupia Inc., Kognitio, ParAccel Inc., and others. There are questions, however, about the nature of the MPP of both the Oracle Database Machine and the Postgres-cum-Ingres-cum-SQL Server DATAllegro technology. Neither is "classically" MPP – at least not in the Teradata sense of the term. There's a sense, in fact, in which both approaches reflect the database-centric viewpoints of their progenitors.

Consider Oracle's MPP-like implementation, which has an Oracle database server parceling out (and receiving) SQL queries from a shared-nothing cluster of Oracle Exadata Servers. (The latter connected to the database server by means of an InfiniBand pipe.) Industry watcher Curt Monash, a principal with Monash Research, describes this as "node heterogeneity," which he contrasts with the approaches used by MPP stalwarts such as Teradata, Netezza, and Dataupia.

"Oracle is the first major vendor for whom it is important to remember that different parts of a query plan get parallelized across completely distinct sets of processors and processing units," Monash wrote on his DBMS2 blog. The jury's still out on how Oracle's approach will stack up relative to the MPP main, however. "[H]ow good is all this parallel technology? On the one hand, we know Oracle has been shipping it for a long time, and has it widely deployed. On the other, we also know that Oracle performance has been very problematic for large parallel queries. Surely most of those problems were due to the shared-disk bottleneck, but were they all [or mostly all]? I don't yet know."

Dataupia CTO John O'Brien, himself a data warehouse architect, concedes that this approach is both functional and (potentially) highly scalable, but nonetheless claims that it isn't entirely innovative. "Oracle is using a parallel approach that says 'Let's do a lot of filtering projection down at the disk level and let's put an Oracle database down there.' What Oracle is doing is leveraging a larger Oracle RAC instance to be their aggregator node, and that's how they bring it back into the shared architecture from their shared-nothing architecture."

It's an approach that O'Brien says he used himself back when he was building high-end data warehouses for Oracle shops. "I could've built that on my own three or four years ago. Three years ago, I was building 50 TB Oracle systems, so it isn't really a breakthrough in that sense."

Microsoft's MPP story is slightly trickier -- in part because the specifics of the underlying DATAllegro technology aren't all that well understood.

What seems clear, however, is that DATAllegro's MPP implementation differs from the approaches used by Teradata, Netezza, Dataupia, and others. That stems, in part, from the same design decision that made DATAllegro -- more than its competitors -- so attractive to Microsoft: the DATAllegro technology doesn't require significant customization to the underlying database. DATAllegro itself shifted from a Postgres to an Ingres foundation over the course of about 18 months.

"What they do is they take the SQL and send it out to all of [their] nodes for processing. [From there] they take all of the results [and] put [them] back into an aggregator, so if you did a group with a sort, you'd have to get the results from all of the nodes," says a data warehousing architect familiar with DATAllegro's technology.

The problem with this approach, this person says, is that the aggregator becomes the bottleneck. It's for this reason that most MPP players (e.g., Teradata, Netezza, and Dataupia) took a different route. "When you're dealing with large data sets, you can blow out your aggregator pretty quick. Your memory, your I/O -- you end up with a very low concurrency feature."

Adds this DW professional: "One of the things [Microsoft] really liked about the DATAllegro architecture was the fact that they could basically unplug the Ingres database and plug in SQL Server and get all of those aggregations, so you could see that Microsoft was interested in the fact that they weren't buying another massively parallel database that would've been pretty hard to integrate into SQL Server. They were buying aggregated modules that would be pretty easy to integrate into SQL Server [so they could] get some pretty easy parallelization."

Unlike Oracle, of course, Microsoft hasn't yet delivered its MPP entry. That begs the question of just how long it's going to take Microsoft to finally productize (or SQL Server enable) DATAllegro's MPP implementation.

Back in July, for example, consultant, author, and data warehousing architect Mark Madsen, a principal with DW consultancy Third Nature, predicted that it would take Microsoft "three years, when the next rev of SQL Server comes out." DATAllegro CEO Stuart Frost, for his part, downplayed such pessimism. "It's not going to take years, as some people in the blogosphere are predicting," Frost said immediately after the acquisition. "Just from [the integration work] we've already done, we've actually found that it's going to be pretty straightforward. All of the hooks are there already [such as] the APIs. We don't have to change a line of code in SQL Server."

Earlier this month, Microsoft disclosed plans to ship Project Madison in the first half of 2010 -- as much as two years after it first acquired DATAllegro. Industry watchers such as Madsen point to Microsoft's delays with SQL Server 2000, SQL Server 2005, and SQL Server 2008 as reasons to doubt even that projection.

The challenge, he argues, is that even though DATAllegro touts a database-independent architecture, Microsoft will almost certainly have some difficulty "porting the shared-nothing bits from Ingres, Linux, and C/C++ to SQL Server, Windows, and C#. That's a lot of technology change to deal with, even if you don't have to change the database kernel."

Moving bits is just what Microsoft says it's doing -- although company officials reject the notion that the integration of DATAllegro's assets is shaping up to be an inordinately involved process.

"The main thing that we're doing here is we're moving DATAllegro bits on to Windows. Currently they have it running on Linux. The second piece of it is that they have Ingres as the database that's part of the solution, and that's being replaced by SQL Server," says Herain Oberoi, group product manager for SQL Server with Microsoft. "They [DATAllegro] specifically built an architecture where they didn't have any proprietary code inside of Ingres itself, [which] makes it relatively easy to swap Ingres out and put SQL Server in."

For this reason, Oberoi doesn't see any reason why the Project Madison timetable should slip. "Right now, both Kilimanjaro [a BI-centric version of SQL Server 2008] and Madison are scheduled to ship in the first half of 2010. There's no reason to think we won't hit that," he insists.

Microsoft also has a next-generation SQL Server release planned for a 2011 delivery. That's a lot on the SQL Server team's plate. "The plan is to ship it [Project Madison] as a separate thing [from Kilimanjaro]. We don't know what the packaging of that separate thing is going to look like, but as of now it's not going to be a part of Kilimanjaro," Oberoi concludes.

"The next major release [of SQL Server] will be in 24 to 36 months. That will be 2010 to 2011. Before we do that, though, because we have these new capabilities that we have to get out the door, we'll be able to ship these in the contexts of Kilimanjaro and Project Madison."

About the Author

Stephen Swoyer is a technology writer with 20 years of experience. His writing has focused on business intelligence, data warehousing, and analytics for almost 15 years. Swoyer has an abiding interest in tech, but he’s particularly intrigued by the thorny people and process problems technology vendors never, ever want to talk about. You can contact him at [email protected].

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Up Close: Speed, Parallel Processing, and Massivly Parallel Processing

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research