TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Modernize and Govern: Unifying Your Data Strategy July 10, 2025
  - Expert Panel: Best Practices for Modernizing Your Data Environment July 14, 2025
  - Powering Data Science with AI-Driven Tools and Practices July 15, 2025
  - Data Integration for AI: Overcoming Modern Pipeline Challenges July 23, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Executive Summit AI Accelerate 2025, Brought to You by AI Boadroom & TDWI August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
- Virtual Live Seminars
  - TDWI Data Governance Principles and Practices: Managing Data as an Asset June 25, 2025
  - Building Your Company’s Data Governance Roadmap June 25, 2025
  - Data Governance: Driving Engagement and Organizational Change June 26, 2025
  - A Framework for Modern Data Governance June 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

Page 2 of 3

Vectorwise: Parallelization by Any Other Name

August 14, 2012

The Vectorwise database engine from Actian Corp. was conceived in 2003 to do what few other DBMSes were able to do. Back then, Vectorwise -- dubbed X100 -- was incubating as a project with the Centrum Wiskunde & Informatica (CWI) in Amsterdam.

Nine years later, the rest of the industry is starting to catch up.

Slowly but surely, competitors are chipping away at what Actian's Fred Gallagher describes as Vectorwise's "wow" factor: its support for vector processing and microprocessor-level parallelization.

Actian isn't sitting still. For example, in June, the company unveiled a new version of Vectorwise, release 2.5, and, at TDWI's recent World Conference in San Diego, officials trumpeted a bi-directional connector for Hadoop and Vectorwise that is slated to ship by year's end.

According to Gallagher, "We've received some really good traction because of this [technological] approach."

Parallelization by Any Other Name

Back when it was conceived, the X100 was designed to exploit microprocessor-level features to help accelerate query performance. These included capabilities such as vector processing and support for single instruction, multiple data (or SIMD) parallelization, which -- by the late 1990s -- were being baked into commodity x86 processors from Intel Corp. and Advanced Micro Devices Inc.

This approach was comparatively novel in 2003, and it was still relatively novel in 2008, when the former Ingres Corp. acquired distribution rights to the Vectorwise (formerly X100) technology.

Much as happened since then: Ingres rechristened itself "Actian" and several competitors announced catch-up moves. In most cases, these involved retrofitting existing DBMS engines to exploit hardware-level parallelization. In the case of SAP AG and HANA, however, this involved developing an entirely new analytic appliance.

Vectorwise is fast: objectively so. In 2011, for example, Vectorwise recorded a world-record result in the costly -- and (perhaps for that reason) infrequently used -- TPC-H benchmark. Second, it bucks an established trend: over the last decade, the analytic DBMS market has come to be dominated by massively parallel processing (MPP) data stores; Vectorwise, however, isn't an MPP engine. It's a column-based engine, like most of its competitors, but it eschews scale-out parallel processing for scale-up symmetric multiprocessing (SMP) -- with a hardware-accelerated twist.

"Vectorwise is a great, very fast, pipelined [database] built to move data without [a lot of the] stalls and cache misses" that are associated with more traditional DBMSes, says industry veteran Mark Madsen, a principal with consultancy Third Nature Inc.

Although the Vectorwise engine can be deployed across a highly distributed topology, Madsen sees heavy, scan-based SQL workloads -- typically running in the context of a single (large) system -- as its bread-and-butter use case. Such workloads include "your basic BI -- as batch reporting, ad hoc reporting, dashboards -- and maybe SQL-based analytics if the optimizer and execution are up to it. That's essentially 80 percent of the market for these things," he explains. "I imagine their sweet spot would be the columnar speed-up-BI area. Infobright pushed that, and have been ... successful so far."

Vectorwise isn't the only non-MPP player on the market. Until it released its Shard query option last year, for example, the open source Infobright DBMS was an SMP-only store. Meanwhile, competitors continue to contest the MPP bona-fides of Oracle Corp.'s Exadata platform and SAP's HANA analytic appliance.

The upshot, then, is that MPP isn't a guarantor of top query performance. In fact, according to Third Nature's Madsen, the largest and fastest data warehouses (DW) in the world used exclusively to run on huge SMP or non-uniform memory access (NUMA) configurations from (the former) Sun Microsystems Inc., Hewlett-Packard Co., IBM Corp., and other computing behemoths.

One reason the DW market abandoned large SMP- or NUMA-powered systems was that MPP offered better price/performance, using then-available technologies.

Actian's Gallagher says it shouldn't come down to an MPP-versus-SMP issue.

Notwithstanding Vectorwise's hardware-level optimizations (viz., its support for SIMD parallelization and CPU-native vector processing), MPP databases are disadvantaged in other respects, too, argues Gallagher. These days, even the smallest MPP nodes are chock-a-block with processor cores: the smallest single-chip systems -- which are typically configured as individual nodes in an MPP cluster -- can pack six, eight, 10, 12, or even 16 cores, depending on which chip is used.

One upshot of this is that a single MPP node can be stuffed with the equivalent of 12, 16, 24, or 32 processors. From its origins in the X100 project, the Vectorwise engine was designed to scale as efficiently as possible across large SMP configurations. Few MPP players, Gallagher claims, have paid as much attention to node-level SMP scalability.

"We optimize to scale [linearly] across all [of] the available cores. In fact, one of the benefits that our architecture has is that we've actually seen the same version of our software run about 30 percent faster with each iteration of Intel chips. We're able to achieve that [linear scalability] across the new chips" as they've become available, he argues. "Right now, we run across 40 [cores], but by the end of this year, we're going to get up north of 60 cores" on a single SMP node, says Gallagher.

There's another wrinkle here, too. Thanks to the disruptive effects of high-end analytic database entrants from Oracle and SAP, the precise meaning of MPP -- or, more precisely, the meaning of the middle "P" in MPP -- has become fuzzier. It could be argued, for example, that Vectorwise's support for SIMD amounts to a kind of system level parallelization. As implemented by Vectorwise, after all, SIMD permits the same operations to be parallelized across all of the processor cores in a single SMP system.

That very argument has been made by one of Vectorwise's competitors: SAP.

A Very, Very Big Pie -- And Drastically Diluted Differentiation?

As Gallagher sees it, the market for big data analytics isn't just enormous, it's mostly untapped. Potential customers are only beginning to come to terms with big data. Once they do, he says, they'll recognize that the platforms of the past just can't cut it.

Even now, Gallagher claims, Actian and its analytic database competitors are mostly vying to displace RDBMS platforms from IBM Corp., Microsoft Corp., Oracle Corp., or SAP Sybase: i.e., traditional databases, deployed as data warehouses, that have simply run out of gas. That makes for a very big pie, he argues. Factor in a big data-occasioned paradigm shift and you're talking about a perfectly huge pie.

"Most of [our customers] are coming from traditional relational [database] deployments or they have a brand new greenfield projects," says Gallagher, "but definitely more than half are coming off of traditional relational [databases] that for one reason or another haven't scaled."

However, Vectorwise's "wow" factor might finally be losing its punch.

The Vectorwise name derives from what Actian calls "Vectorized Query Execution," i.e., its combination of vector processing and SIMD parallelization.

The language of vector processing and SIMD is no longer unique to Vectorwise. Competitors are increasingly talking -- and architecting -- in such terms, too. They're also paying more attention to the problem of scaling in SMP configurations.

Take SAP, for example, which made a big splash late last year with HANA.

"When we issue a query [in the HANA architecture], we've actually worked it [out] with Intel to do vector processing across the Westmere processor, so we can take a query and utilize full compute and parallelization across" all of the available Westmere cores, Prakash Darji (vice president and general manager for data warehousing solutions and the HANA platform, told BI This Week in a May interview.

SAP's implementation, like Actian's, relies on parallelization-by-another-name. It uses NUMA, which SAP says lets it create massive single system images with 1 TB of RAM. "Because we've done the vector processing at the chipset level, we can go out and parallelize all of that compute [across all available cores]. Whether parallelizing compute is [i.e., constitutes] MPP or not, that's debatable. What people usually think about MPP is multi-node scale-out ... which we have with HANA," Darji continued, citing the example of a HANA system with 1 TB of physical RAM. "[If I] have two separate 512 GB nodes, those are two separate appliances with 40 [Westmere] cores each, that's still 80 cores and 1 TB of memory. Hana can treat that as a [single] coherent memory structure."

Such efforts aren't limited to SAP. At TDWI's Chicago World Conference in May, for example, Roger Gaskell, CTO with MPP veteran Kognitio, discussed his company's emphasis on SMP scalability, an area in which Vectorwise likewise claims an advantage. (SAP addresses this issue in a non-SMP way: by means of NUMA.) With chipmakers packing more cores into processors, Kognitio can differentiate itself based on its ability to scale across large SMP configurations, Gaskell said.

"[W]e parallelize every aspect of every query, [and] we use every single CPU on every single server for every single query that runs. All of the cores are equally involved. If you add more cores, then each core gets a smaller chunk of data ... [and] if you increase the data, you increase the core's proportion [of the data]," Gaskell said. "If you just keep adding the cores ... and you keep the data to core ratio the same, you'll get the same performance."

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

Vectorwise: Parallelization by Any Other Name

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research