The True Cost of Moving to the Cloud
Managing your cloud environment requires attention to practices and procedures that can keep costs from ballooning.
- By Asim Razzaq
- August 2, 2021
For many years, businesses have been shifting their infrastructure to the public cloud to accelerate their businesses' time to value. Cloud-provided infrastructure as a service (IaaS) is forecast to grow by almost 27 percent in 2021 compared to 2020. The public cloud services market as a whole is expected to grow over 18 percent, with business process as a service (BPaaS) experiencing single-digit annual growth rates, according to statista.com.
Research by Gartner states, "[C]loud services can initially be more expensive than running on-premises data centers. [However,] cloud services can become cost-effective over time if organizations learn to use and operate them more efficiently. The statement is backed by an example of workload migration for 2,500 virtual machines from an on-premises data center to Amazon Web Services EC2." This reduction in cost over time is mainly dependent on managing cloud costs.
Moving to the cloud is one thing, managing it is a whole other feat. If not done properly, costs can become lofty and waste can run rampant. In fact, according to research published in the Flexera State of the Cloud 2021 report, organizations waste 30 percent of their cloud spending.
Cost of Cloud Misuse
Reserving capacity, which, simply put, enables companies to reserve compute capacity for any duration of time, is one way to reduce cloud cost because discounts are available for reserving a block of resources for a specific time in a particular location. However, you pay for the reserved amount whether you use it or not. If you go over your reserved amount, you pay the higher on-demand rate, so unless you have a good handle on your resource needs, reserving is prone to costly mistakes.
Another way to receive a surprise bill at the end of the month is by not releasing a cluster's resources. If you don't release the resources after completing a job, you continue to pay for them. This situation can be especially costly if the cluster is left running overnight or on the weekend.
In addition, applications may be given the ability to provision resources automatically -- yet another way to costs can rise. Take, for instance, automation in a Kubernetes environment. A defect may cause an application to grow its resource footprint without having a need to do so. If privileges are set too high, the cost escalates quickly.
Also, note that these errors are not mutually exclusive and can happen multiple times across the system, all adding cost. The real question in controlling these situations iswhen do you know waste is occurring?
Controlling and Cutting Costs
The key to controlling cloud waste, which will cut costs, is detecting it and notifying someone to take appropriate actions.
The first step in this process begins with applying tags to resources or instances. Tags are a label assigned to cloud resources/instances, which consists of two parts: a key and a value. There are various reasons to use tagging, but the most significant is for greater visibility and organization across a company.
There are several types of tags. The first set is automatically generated by Amazon Web Services (AWS) or Microsoft Azure (called instance ID or subnet ID) and cannot be alerted. The second set of tags consists of user-defined tags, which are created by an organization and applied manually. To utilize user-defined tags effectively, organizations must avoid these common mistakes:
- Not having standardized names to prevent duplication. Tags help organize, filter, and identify resources, so if names are not consistent across tags, they cannot be allocated or handled properly.
- Failing to create a tag structure that mirrors the organization's team and application structure. For example, using tags that identify Dev, Sandbox, and Production environments allows engineering and finance to understand development versus cost of goods sold (COGS) expenses.
- Not automating tag assignments. This is critical in container environments where new containers are spun up automatically, and should inherit their parent process tags at creation to keep track of costs.
Structuring tags facilitates taking the actions that an organization needs to control cloud costs. Because of the dynamic and complex nature of the cloud environment, maintaining tags without rules-based automation requires full-time trained personnel. Yet again, this adds to cloud costs and often results in manual errors.
Once you have an automated tag environment in place, you can use it for monitoring, forecasting, and use allocation. Providing well-structured analytics to the people who can act on them is critical to any cost-management function.
Data organization is also essential. A necessary analytics feature makes it easier to drill down and diagnose causes for the excess cost. More beneficial is the use of artificial intelligence (AI) and machine learning (ML) applied to the data to make recommendations based on cost rules and use data. Due to the complexity of cost rules, using an automated recommendation system helps managers reduce costs.
Detecting anomalies is another key to controlling cost. Costs add up in real time, so real-time monitoring and anomaly detection are necessary. Here again, the use of AI and ML can be essential. The cloud activity monitored in real time can detect anomalies and route them to a person for action. ML can also adjust the model to reduce false positives.
Managing the cloud requires a new operations model to monitor and control costs. The model must be nimbler and more responsive to change. Cost allocation and routing the information to people responsible for controlling cloud costs on a timely basis is critical. Managing this complex dynamic environment can be challenging but with a set of best practices in mind, the cloud can boost production.
Asim Razzaq is the CEO and founder of Yotascale. In his career, Razzaq was senior director of platform engineering (head of infrastructure) at PayPal, where he was responsible for all core infrastructure processing payments and logins. He led the build-out of the PayPal private cloud and the PayPal developer platform generating multibillion dollars in payments volume. Asim has held engineering leadership roles at early-to-midstage startups and large companies including eBay and PayPal. His teams have focused on building cloud-scale platforms and applications. You can reach the author via email, Twitter (@asimrazzaq), or LinkedIn.