RESEARCH & RESOURCES

EMC Delivers Unified Big-Data Analytics Appliance

Greenplum introduces scalable, modular system that combines shared-nothing MPP relational database with enterprise-class Apache Hadoop for structured and unstructured data co-processing.

Note: ESJ’s editors carefully choose vendor-issued press releases about new or upgraded products and services. We have edited and/or condensed this release to highlight key features but make no claims as to the accuracy of the vendor's statements.

EMC Corporation has released the EMC Greenplum Modular Data Computing Appliance (DCA), a complete, big-data analytics platform. Available now, the Greenplum DCA features a modular architecture that allows enterprises combine a shared-nothing MPP relational database with enterprise-class Apache Hadoop -- along with Greenplum partner BI and ELT applications -- to achieve true structured and unstructured data co-processing and expand it as needed in a single, unified platform.

DCA modules enables enterprises to start small and expand the appliance in more flexible and cost-effective quarter-rack increments based on processing performance or storage capacity needs. In addition to mixing and matching Greenplum Database and Greenplum HD (Hadoop) modules, enterprises can also bring all their BI applications and ELT tools directly into the cluster, in the same DCA, through the use of new Greenplum Data Integration Accelerator modules. The result is a unified big-data platform combining structured and unstructured data and applications in a single infrastructure that is also monitored, managed, and supported by EMC.

Today, enterprises are seeking to make better use of their data warehouses for advanced analytics, and this trend will accelerate as organizations strive to move from running business intelligence point solutions to deploying comprehensive analytics enterprise-wide. At the same time, enterprises are placing greater importance on integrating and analyzing their accumulation of their unstructured and semi-structured data. But as data warehouses get bigger, enterprises are facing scalability, performance degradation and management complexity, and are seeking ways to enable more concurrent users to access the data for business applications.

Four Greenplum Data Computing Appliance modules are now available:

  • The Greenplum Database Module is a purpose-built, highly scalable data-warehousing appliance module that architecturally integrates database, computing, storage, and network into an enterprise-class, easy-to-implement system.

  • The Greenplum Database High Capacity Module is designed to host multi-petabytes of data without surging power consumption, increasing costs, or mushrooming space. For businesses that require detailed analysis of extremely large amounts of data or those looking for a longer term archive, this high-capacity version offers a low cost-per-unit data warehouse.

  • The Greenplum HD Module is a high-performance data co-processing Hadoop appliance module. It marries Hadoop with the Greenplum Database, allowing co-processing of both structured and unstructured data within a single solution.

  • The Greenplum Data Integration Accelerator (DIA) Module hosts partner analytics applications and places them directly on the same high-performance, low-latency interconnect as the other appliance modules. This enables market-leading data loading performance in a parallel and scalable model, to shorten batch loads or implement micro-batch loading.

Enterprises can start with a single, primary rack, which includes a single standard or high-capacity Greenplum Database quarter-rack module and room for three additional modules as well as two master servers that are responsible for authentication, optimizing the query, balancing the workload among the different segment servers, managing the fault-tolerant mechanism of data and other tasks for the cluster. Enterprises may expand the appliance in quarter-rack increments using Greenplum Database, Greenplum HD, or Greenplum DIA modules in any order and amount, up to six racks total as their demand for processing capacity grows. All modules are linked via a redundant, high-performance, low-latency interconnect.

This release of the Greenplum DCA also includes increased high availability and simple integration with EMC’s solutions for data protection and disaster recovery. High availability is addressed via automated master-node failover and the Greenplum Database High Availability Group, enabling each full-rack DCA to sustain up to four server failures, one in each HA Group, nearly doubling the high availability rate. It also integrates with EMC Data Domain deduplication and backup technology from high-speed backup and restore and wide area disaster recovery. The Greenplum DCA SAN Mirror Solution uses EMC Symmetrix VMAX, TimeFinder/Snap, and Symmetrix Remote Data Facility (SRDF) for advanced storage and data replication between two sites in synchronous mode.

More information is available at www.emc.com.

TDWI Membership

Get immediate access to training discounts, video library, BI Teams, Skills, Budget Report, and more

Individual, Student, & Team memberships available.