TDWI | Training & Research | Business Intelligence, Analytics, Big Data, Data Warehousing

Think
- Research & Resources
  - TDWI Playbook | Next Generation Data Science: The AI-Driven Data Science Life Cycle
  - TDWI Data Points | The Data Foundation for AI
  - TDWI Best Practices Report | Data Strategies and Foundations for Modern Data Management
  - TDWI Insight Accelerator | Adopting a Platform Approach for Gaining Insights from Unstructured Data
- Webinars
  - Expert Panel: Leveraging AI-Powered Solutions for Data Management July 28, 2025
  - A Generative AI Framework for Credit and Financial Markets July 29, 2025
  - Redefining Clinical Operations with Agentic AI: Accelerating Innovation Across Data Management and Site Monitoring July 30, 2025
  - Smarter Marketing in Retail: How AI and Modern Data Foundation Drive Growth July 31, 2025
- Virtual Summits
  - Virtual Events Keys to Making Your Data AI Ready September 10, 2025
  - Virtual Events Data Quality for BI, Analytics and AI October 22, 2025
  - Virtual Events Modern Data Strategy November 12, 2025
  - Virtual Events What’s Ahead in 2026 for Data & Analytics December 10, 2025
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Speaking of Data Podcast
  
  Current Research Surveys
Train
- In-Person Events
  - Conference TDWI Transform 2025 San Diego August 18, 2025
  - Executive Summit TDWI Modern Data Leader's Summit San Diego: AI in the Enterprise August 18, 2025
  - Conference TDWI Transform 2025 Orlando November 16, 2025
  - Executive Summit TDWI Data & AI Leaders Summit Orlando: Governing Data, Analytics, and AI November 17, 2025
- Virtual Live Seminars
  - Platforms & Architecture Week July 25, 2025
  - AI Bootcamp Week July 25, 2025
  - Data Governance Week July 25, 2025
- Online Learning
- By Topic
  - By Topic
    
    Explore the Latest AI, Analytics, and Data Research and Training by Topic
  - BI, Analytics, and Data Literacy
  - AI, Data Science, and Machine Learning
  - Data Management and Governance
  - Platforms and Architecture
  - Strategy and Methods
- Train Your TeamCustom solutions for training your team
  
  Get CertifiedEarn a professional credential in BI and Analytics, Data Governance, or AI
  
  TDWI MembershipExclusive access to the research, tools, training, and connections
Engage
- Connect
  - Connect and Contribute to Our Vibrant Community of Data Leaders
    
    Subscribe to TDWI Stay up to date on the latest news and events. Sign Up
    
    Become a TDWI Member Gain exclusive access to the research, tools, training, and connections to move your careers, teams, and projects forward. Learn More
    
    Become a Part of the TDWI Research Panel Make a difference in the data and analytics industry and earn incentives by sharing your insights with TDWI. Explore Now
    
    Speak at TDWI Events Share your expertise and build your personal brand as a speaker at a TDWI In-Person or Virtual Event. Submit a Proposal
    
    Become a TDWI Research Fellow Apply to be a member of TDWI’s industry leading research team. Apply Today
    
    Become a Member of the Data & AI Leaders Forum Engage in collaborative discussions, stay ahead of the curve, and stay in the know. Apply Now
    
    Showcase Your Data & AI Solutions Reach and engage with TDWI community through multi-channel marketing programs. Learn More

RESEARCH & RESOURCES

A Platform for All Data, Big and Small

In an era of ever-expanding data volumes and heterogenous data-processing requirements, Vertica really shines, HP officials maintain.

By Stephen Swoyer
November 3, 2015

The recent Strata + Hadoop World Conference was packed with a panoply of players, many of them start-ups or upstarts. In this context, established powers such as Hewlett-Packard Co. (HP), Informatica Corp., Microsoft Corp., Oracle Corp., SAS Institute Inc., and Teradata Corp. might have seemed like the odd vendors out.

The Strata + Hadoop World tent isn't just a big one, it's an ever-expanding one, too. On the massive expo hall floor, these and other established vendors -- systems management specialist BMC Software Inc. was there, too, as were Cisco Systems Inc. and Dell Inc. -- seemed no stiffer or squarer than, say, Cloudera Inc., Hortonworks Inc., and MapR Technologies Inc., all of which have been in business for a few years. In other words, all of these vendors seemed a little stiff, at least in the midst of that scrappy horde of upstarts, start-ups, and would-be disrupters.

Stiff, maybe, but certainly not fazed. Take HP, which was on hand to trumpet its array of big data-oriented products and services. One of HP's core big data offerings is its Vertica massively parallel processing (MPP), columnar database, which turned 10 this year. (Database design maestro and start-up maven Michael Stonebraker founded Vertica in 2005; HP purchased it almost five years ago, early in 2011.) Coincidentally, the Hadoop platform itself turned 10 in 2015. Vertica, like Hadoop, can scale to support petabytes of data. Vertica, like Hadoop, can ingest, store, and process semi-structured data, such as text.

What's more, Vertica can also query directly against data stored in several of Hadoop's columnar storage formats, viz., Parquet and Optimized Row Columnar (ORC) files. To be sure, Hadoop and the Apache Spark cluster computing framework can do things Vertica can't do, such as ingest, store, and process non-relational multi-structured data (such as file objects of any kind), but Vertica itself can do something Hadoop can't do, argued Steve Sarsfield, product marketing manager with HP Vertica, earlier this year: it's a query processing platform par excellence.

This has a lot to do with its MPP database underpinnings as well as its design and optimization for decision support and data warehousing workloads.

"Vertica is a relational database, and, like any relational database, it has columns and tables, it speaks SQL, and it connects to other data sources via things like ODBC. It connects and talks to business intelligence tools, in addition to self-service tools. Tableau, Looker, Qlik, all of that stuff works very well with Vertica," Sarsfield said.

"For some people, this is unsexy stuff, but Vertica can query against data in Hadoop, so if you want to add Hadoop nodes and make use of them, that's part of the equation. You can easily do that, and Vertica has built-in capabilities [i.e., functions and algorithms] for analyzing text and unstructured data. Most important, Vertica is an MPP database, so it was designed for processing data at massive scale."

According to Sarsfield, Vertica also has an advantage when it comes to storing data at big-data scale -- or, at least, Very Large Volumes of data. "We developed our own data compression algorithms, so we have excellent data compression and we're able to achieve excellent efficiency. Vertica can analyze the data [it's ingesting] and decide which algorithm offers the best compression [for storing it]," he explained.

In most cases, compression entails a performance trade-off of some kind, but a columnar architecture can help mitigate this trade-off to some extent, Sarsfield argued. "Vertica also supports what is called 'late materialization,' so it can actually perform [in-memory] operations on compressed data without uncompressing it. This can significantly increase performance," he argued.

Super-charging the Data Warehouse -- And NoSQL Analytics

Sarsfield described Vertica as the equivalent of a "super-charger" for aging data warehouse systems. In this respect, he noted, reports of the data warehouse's death have been greatly exaggerated. Indeed, the Strata + Hadoop World expo hall all but teemed with would-be data warehouse replacements, most of which are designed to run on Hadoop or Apache Spark. (Examples include relative newcomer AtScale Inc., along with established players such as DataMeer Inc., and Platfora Inc. AtScale explicitly markets itself as an OLAP technology for Hadoop/Hive. Research analyst Mark Madsen, a principal with Third Nature Inc., once dubbed Platfora "PlatfOrLAP" for similar reasons.)

What's more, vendors such as Cloudera, Databricks Inc. (the commercial entity behind Spark), Hortonworks, and MapR, along with IBM Corp. and Teradata, have invested significantly in shoring up Hadoop's ANSI SQL bona fides. Cloudera via its investments in Impala, an interactive SQL-like query engine for Hadoop; Hortonworks via its work with Hive (a SQL interpreter for Hadoop) and Tez (a replacement for Hadoop's MapReduce engine that supports interactive processing); MapR via its work with Drill, the open source implementation of Google Inc.'s Dremel distributed query technology; Databricks and IBM via their investments in Spark (which has its own SQL variant, Spark SQL); and Teradata via its investment in Presto, a SQL query engine for Hadoop. If the data warehouse as an institution is dying, data warehouse architecture -- as a conceptual framework -- is alive and well.

Optimized platforms such as Vertica could be considered "better than data warehouse data warehouse systems," Sarsfield argued. In point of fact, there are several extant optimized MPP database platforms. These include Actian's Matrix, EMC's Greenplum, IBM's Netezza, Microsoft's SQL Server Parallel Data Warehouse, and Teradata's Aster Appliance. All of these systems address traditional ad hoc query and analysis requirements; interoperate with both traditional and newer self-service BI tools; can scale to big-data volumes; and can store, process, and analyze non-traditional data formats, including non-relational multi-structured data.

Most can also access and query against data stored in Hadoop.

HP and other vendors aren't just marketing these platforms for SQL-based analytics but for NoSQL analytics, too. Teradata is especially vociferous in this regard. Even though it resells a Hadoop appliance, Teradata insists that either its Aster Discovery platform or its Teradata database -- or both -- can cost-effectively perform most if not all of the same non-relational analytical workloads as Hadoop and/or Spark.

Sarsfield acknowledged that some customers think it's neither cost-effective nor practicable to run NoSQL analytics in a nominally SQL platform.

HP is working to convince them otherwise, he said. "In Hadoop, you don't have all of the analytical functions and algorithms you have in Vertica. Hadoop is going to be slower [than Vertica] for most of these [non-relational workloads], too. On top of this, you won't have nearly the same query[-processing] performance in Hadoop that you get in Vertica. You won't have the same governance, the same security, [or] the same service levels. Hadoop can't support high concurrency. Hadoop has poor [support for] metadata [management] and [data] lineage," Sarsfield maintained.

"Hadoop is a cost-effective alternative [to the data warehouse] for [storing] certain kinds of data and [processing] certain kinds of workloads," he continued

Sarsfield argued, however, that Vertica's ability to access, ingest, and/or query against data in Hadoop permits an organization to deploy a cost-effective hybrid platform. "It's using the platform that's best for your needs. For ad hoc query, for decision support, BI tool access, self-service access, [an MPP database platform such as] Vertica is almost always going to be the best solution. For time-series and other kinds of advanced analytics, it's going to be better," he noted.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.

Learn More

↑

Research & Resources

Webinars

Virtual Summits

By Topic

In-Person Events

Virtual Live Seminars

Online Learning

By Topic

Connect and Contribute to Our Vibrant Community of Data Leaders

RESEARCH & RESOURCES

A Platform for All Data, Big and Small

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

TDWI

Engage

Research