RESEARCH & RESOURCES

Alluxio Releases Performance and Ease-of-Use Enhancements for GPU-Centric AI/ML Workloads

Updates to data pre-processing and loading phases designed to enable better utilization of GPUs, improving AI/ML training efficiency and reducing overall cost.

Note: TDWI’s editors carefully choose vendor-issued press releases about new or upgraded products and services. We have edited and/or condensed this release to highlight key features but make no claims as to the accuracy of the vendor's statements.

Alluxio, the developer of open source data orchestration software for large-scale workloads, released version 2.6 of its Data Orchestration Platform. This release features an enhanced system architecture enabling AI/ML platform teams using GPUs to accelerate their data pipelines for business intelligence, applied machine learning, and model training.

In the latest release, Alluxio improves its system architecture to best support AI/ML applications using the POSIX interface. System performance is maximized by removing interprocess latency overheads, which is critical for enabling full utilization of compute resources. Aside from I/O performance, the end-to-end workflow of data preprocessing, loading, training, and result-writing is well supported by Alluxio’s data management capabilities.

Alluxio 2.6 Community and Enterprise Edition features new capabilities, including:

  • Faster data access with a large number of small files. Alluxio 2.6 unifies the Alluxio worker and FUSE process. By coupling the two, significant performance improvements are achieved due to reductions in interprocess communication. This is especially evident in AI/ML workloads where file sizes are small and RPC overheads make up a significant portion of the I/O time. In addition, containing both components in a single process greatly improves the deployment of the software in containerized environments, such as Kubernetes. These enhancements substantially reduce data access latency, enabling users to process greater amounts of data more efficiently to deliver more AI/ML benefits to the business.
  • Simplified data management and operability. Alluxio 2.6 enhances the mechanism to load data into Alluxio managed storage and introduces more traceability and metrics for easier operability. This distributed load operation is a key portion of the AI/ML workflow, and adjustments to the internal mechanisms have been made to optimize for the common case of loading prepared data for model training.
  • Improved system visibility and control. Alluxio 2.6 adds a large set of metrics and traceability features enabling users to drill into the system’s operating state. These range from aggregated throughput of the system to summarized metadata latency when serving client requests. This new level of visibility can be used to measure the current serving capacity of the system and identify potential resource bottlenecks. Request level tracing and timing information can also be obtained for deep performance analysis. These new features enable users to get new levels of visibility and control for improving SLAs of their large-scale data pipelines for a wide variety of use cases.

For details, visit https://www.alluxio.io/.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.