Executive Q&A: Controlling Cloud Egress Costs
As the cloud grows in popularity and its use expands, enterprises are finding that their cloud costs also expand, including little-understood egress costs. Adit Madan, director of products at Alluxio, sheds light on several best practices that can help you reduce these costs.
- By Upside Staff
- July 7, 2023
Upside: Cloud storage has gained popularity in recent years. Managing egress costs can be a significant challenge. What constitutes egress? Do all the major cloud providers charge for the same thing?
Adit Madan: Egress refers to the cost an enterprise pays whenever data traverses regions or goes outside a specific cloud provider’s network. This is a significant challenge not only for enterprises implementing a hybrid or multicloud strategy but also for enterprises that have silos of data spread across multiple regions. In both these situations, analytical processing that needs access to all this data bears egress charges based on the volume of data crossing the boundary of a single region or cloud.
All major cloud providers charge based on the same metrics, although there is an emerging class of storage clouds attempting to challenge the status quo. These storage clouds have only seen success for scenarios where data is not accessed frequently, such as archiving data.
Realizing each enterprise is different, what percentage of an enterprise’s cloud service expense might egress fees reasonably represent and why are they often such a surprise to enterprises?
For smaller enterprises, egress charges are fairly minimal as most data resides in a single cloud region and is accessed within that region. For larger enterprises, the number of scenarios which incur egress fees is higher. One such scenario is implementing a hybrid cloud for cost management or a multicloud to make use of the latest optimized computing hardware that might not be available in the primary cloud.
For these scenarios, egress fees might be as high as a third of the cloud service expense with naive implementations. More optimal implementations can bring down the egress cost but still fall short as more management complexity is introduced and operations staff needs to be hired to compensate. The reason such fees come as a surprise is that it's hard to predict how much data is going to be accessed across regions, and usually this number only increases with time.
With 34% of enterprises being affected by data egress costs, in a study by S&P Global, what specific challenges do organizations face in managing these costs?
In most cases, manually copying data across regions is a technique employed to reduce egress fees so that repeated access of the data does not cross network boundaries. This approach is brittle and often needs an entire team to manage this challenge. Further complexity is introduced by redundant services and extra capacity that needs to be provisioned. Copying data also introduces compliance and governance risks as frequent synchronization is needed to ensure that updates, such as access control policies, are propagated throughout the enterprise network.
What are your recommended best practices for saving egress costs as businesses scale their cloud usage and continue to evolve their data platform architecture? Are there other practices that might unknowingly cause inefficiency?
A data lake approach offers maximum flexibility without redundancy and is the right choice for managing most kinds of data used for analytics and machine learning. Initial data processing, such as data curation and tagging for security, should be performed close to the source of ingestion.
Moving raw data across network boundaries is infeasible. Building a federation layer to query across all curated data is key. This layer should be able to minimize cross-network traffic while placing data and computing resources wherever suitable based on cost and availability. Just as in software programming, abstractions are also critical to manage the complexity of data platforms so that underlying changes to the platform are shielded from data consumers. Often, changes to one part of the data stack have a domino effect that permeates across inefficiencies manifesting throughout the organization.
What savings can businesses expect if they follow each best practice?
Savings can be manyfold. The first kind of saving is the infrastructure or cloud service spending, and efficiencies gained here are significant on their own. Another major kind of saving is the spending on the number of people needed to manage the platform. Employing best practices can help reduce the cost here by as much as 75%. Finally, employees’ productivity and ability to drive solutions to business problems are impacted by the agility of a disciplined organization employing sound practices, and there is no factor which is more important than that.
Data caching is an effective technique to reduce egress costs. Can you explain what cached in the cloud means and its impact on application performance? For instance, do cloud managers have to explicitly say what is and isn’t cached? What kind of savings does caching offer?
Caching is a technique to represent data close to the consumer when the consumer is separated from the data source itself. By making data appear close to computing resources, access is made efficient by overcoming the effect of high network latency and low bandwidth on application performance.
For instance, caching can be employed when accessing data that resides on the East Coast of the U.S. from the West Coast. In this scenario, with caching, the application performance is the same as if data resided in the same cloud region. With specialized caching systems, cloud managers do not explicitly have to specify what to cache, and policies maintain what is and isn’t cached. For ad hoc analytics and model training, caching is very efficient as the same data is accessed multiple times over and can hide 80% of all egress charges.
As organizations increasingly adopt multicloud strategies that span on-premises and public clouds, how should data egress costs impact their decision about what data should reside on which cloud platform?
Each data set should reside in only one cloud without redundancy. The choice should be based on the service responsible for producing the data, and the cloud of choice is the cloud which is most suitable for running that particular computing service.
Let’s use this example of a model training pipeline. If cloud A is suitable for data preprocessing, then the curated training data should reside on cloud A. However, if cloud B offers the most efficient hardware for training, such as GPUs, then the trained model should reside on cloud B while being deployed for inference to the cloud most suitable for that stage. Egress charges must be considered carefully for a platform that benefits from maximum agility.
[Editor’s note: Adit Madan is the director of product management at Alluxio. Adit has extensive experience in distributed systems, storage systems, and large-scale data analytics. He holds an MS from Carnegie Mellon University and a BS from the Indian Institute of Technology -- Delhi. He is also a core maintainer and Project Management Committee (PMC) member of the Alluxio Open Source project.]