By using tdwi.org website you agree to our use of cookies as described in our cookie policy. Learn More

TDWI Articles

Alluxio’s New Service Enables Structured Data Applications to Interact Up to 5x Faster with Data

Alluxio provides just-in-time data transformation of data to be compute-optimized for applications, independent of the storage solution or format

Note: TDWI’s editors carefully choose vendor-issued press releases about new or upgraded products and services. We have edited and/or condensed this release to highlight key features but make no claims as to the accuracy of the vendor's statements.

Alluxio, a developer of open source cloud data orchestration software, has released Alluxio Structured Data Service (SDS) featuring a data catalog service and transformation service, two new major architectural components of its Data Orchestration Platform. Data engineers, architects, and developers can now spend fewer resources storing data and more time delivering data to analytical compute engines. 

As users and enterprises leverage widely available analytics engines such as Presto, Apache Spark SQL, or Apache Hive, they often run into inefficient data formats and face performance challenges. Typically, those engines consume structured data in different databases with “tables” consisting of “rows” and “columns” rather than “offset” and “length” in files or objects. This gap creates multiple challenges and inefficiencies, such as mappings or creating converted copies of the data. With the new service, users benefit from a simpler data platform that enables connections to different catalogs for access to structured data, with fewer copies and pipelines and more compute-optimized data.

“Alluxio now provides just-in-time data transform of data to be compute-optimized, independent of the storage format for OLAP engines, such as Presto and Apache Spark,” said Haoyuan Li, founder and CTO of Alluxio. “These schema-aware optimizations are made possible with the new Alluxio Catalog Service which abstracts the widely used Apache Hive Metastore, so regardless of how the data was initially stored -- CSV and text formatted files, for example -- the data is now transformed into the generally recognized compute-optimized parquet format. A second type of transformation will [unite] many smaller files, enabling the data to be combined into fewer files, which is more efficient to process for SQL engines. A third type of transformation enables table columns to be sorted, adding to the efficiency of queries, newly available in our Enterprise Edition. ”

Alluxio Structured Data Service 

With Alluxio Structured Data Service, Alluxio can expose the data to be effectively accessed by the SQL engines independent of how and where the data is stored. New capabilities and services include:

Presto Connector for Alluxio: A new Presto connector for Alluxio is now available. This allows easy integration and configuration of Alluxio with Presto.

Catalog Service: The new Alluxio Catalog Service manages the metadata of structured data in the system. It is responsible for all the database, table, and schema information, as well as the location of all the stored data. There is no longer a need to change any table locations in the Hive metastore or to restart or reconfigure any Hive services. The Alluxio Catalog Service enables schema-aware optimizations for any type of structured data. For example, once the Hive metastore is attached to the Alluxio Catalog Service, the catalog service will automatically mount the appropriate table locations and automatically serve the table metadata with the Alluxio locations.

Transformation Service: The new Alluxio Transformation Service transforms data into a compute-optimized representation of the data that is independent from the storage-optimized format. This enables physical data independence. Three types of transformations are available for tables: coalesce, format conversion, and sorting. Although results depend on the specific formats and workloads, internal tests have shown increase in query performance.   

Availability 

Alluxio 2.2 Community and Enterprise Edition with Structured Data Service are generally available for download now at https://www.alluxio.io/download/

 

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.