RESEARCH & RESOURCES

Updated Alluxio Enterprise AI Accelerates GPUs

Features new native integration with Python ecosystem and expanded cache management

Note: TDWI's editors carefully choose vendor-issued press releases about new or upgraded products and services. We have edited and/or condensed this release to highlight key features but make no claims as to the accuracy of the vendor's statements.

Alluxio, the developer of the open-source data platform, has enhanced Alluxio Enterprise AI. Version 3.2 showcases the platform's capability to utilize GPU resources universally, improvements in I/O performance, and competitive end-to-end performance with HPC storage. It also introduces a new Python interface and sophisticated cache management features. These advancements empower organizations to fully exploit their AI infrastructure, ensuring peak performance, cost-effectiveness, flexibility, and manageability.

AI workloads face several challenges, including the mismatch between data access speed and GPU computation, which leads to underutilized GPUs due to slow data loading in frameworks such as Ray, PyTorch, and TensorFlow. Alluxio Enterprise AI 3.2 addresses this by enhancing I/O performance and achieving over 97% GPU utilization. Version 3.2 offers strong storage performance using existing data lakes, eliminating the need for extra storage. Lastly, managing complex integrations between compute and storage is challenging, but the new release simplifies this with a Pythonic file system interface, supporting POSIX, S3, and Python, making it easily adoptable by different teams.

Alluxio Enterprise AI includes the following key features.

Leverage GPUs anywhere for speed and agility. Alluxio Enterprise AI 3.2 empowers organizations to run AI workloads wherever GPUs are available, ideal for hybrid and multicloud environments. Its intelligent caching and data management bring data closer to GPUs, ensuring efficient utilization even with remote data. The unified namespace simplifies access across storage systems, enabling seamless AI execution in diverse and distributed environments, allowing for scalable AI platforms without data locality constraints.

Storage performance. MLPerf benchmarks shows Alluxio Enterprise AI 3.2’s strong storage performance. In tests such as BERT and 3D U-Net, Alluxio delivers strong model training performance on a variety of A100 GPU configurations, proving its scalability and efficiency in real production environments without needing additional storage infrastructure.

High I/O performance and GPU utilization. Alluxio Enterprise AI 3.2 enhances I/O performance, achieving up to 10GB/s throughput and 200K IOPS with a single client, scaling to hundreds of clients. This performance fully saturates 8 A100 GPUs on a single node, showing over 97% GPU utilization in large language model training benchmarks. New checkpoint read/write support optimizes training recommendation engines and large language models, preventing GPU idle time.

New file system API for Python applications. Version 3.2 introduces the Alluxio Python FileSystem API, an FSSpec implementation, enabling seamless integration with Python applications. This expands Alluxio's interoperability within the Python ecosystem, allowing frameworks such as Ray to easily access local and remote storage systems.

Advanced cache management for efficiency and control. The 3.2 release offers advanced cache management features, providing administrators precise control over data. A new RESTful API facilitates seamless cache management, and an intelligent cache filter optimizes disk usage by caching hot data selectively. The cache free command offers granular control, improving cache efficiency, reducing costs, and enhancing data management flexibility.

Availability

Alluxio Enterprise AI version 3.2 is immediately available for download at https://www.alluxio.io/download/.

TDWI Membership

Get immediate access to training discounts, video library, research, and more.

Find the right level of Membership for you.