#25. Your Workload Optimizations Checklist — Part 2: Compute

Here is your comprehensive checklist to optimize Compute usage on Cloud

In this article we introduced the simple 80:20 rule that helps you to optimize workloads effectively and achieve cost savings through investigating the business value.

In this series of articles, we will consider together how to optimize the usage of services used by most organizations (i.e., compute, storage, databases, networking and SaaS), by following simple checklists that questions their business value. In this article we will cover compute services.

The Compute Optimization Checklist

In in the previous article, I would split the check up list for compute opitmizations into 5 reviews: Business, performance, technical, license and rates.

Business Value Review

Consider Automated Shutdowns

When considering the compute workloads running, I start with the environment in which they run. If anything works in a non-production environment, there is usually no reason for it to be running around the clock. Hence, I start by considering the automatic shutdowns. A simple scheduler that can be set in 5 minutes in code or console can automate the shutting down of instances outside working hours, saving a lot of waste.

Align Resource Configuration to Business Value

Next, consider how the resources were configured. When spinning up compute resources, often we do not notice that the default values set by Cloud providers are beyond our business demands. For example, how often should a snapshot be created? How much storage volume do I need? What about the networking configurations and IP addresses used? Aligning business demands to these configurations can save a lot.

Performance Review

Rightsize

The most common recommendation you will see in any Cloud dashboard is usually related to rightsizing. The reason is, there is no one-size-fits-all by default for Cloud compute, so it is logical that you have to observe the demand and rightsize instances accordingly.

Burstable Instances

In certain cases where demand spikes may occur, it would be good to consider burstable instances (common in all three cloud providers). These instances can provide peak performance when needed but otherwise charge only for baselines. For example, when using AWS T4g burstable instances, you get up to 40% higher price/performance while achieving 20% lower costs vs T3.

Technical Review

Clean up Orphan Resources.

Orphan resources are Cloud assets that are no longer in active use or attached to any running services, but still exist within a cloud environment. For example you run a VM and attach a volume to it. Later, you delete the instance, but to your surprise, you still pay for the volume, which is now unattached to any instance.

You have to scan all environments for such resources and remove them. This includes disks once used (volumes), public IP addresses assigned, network interfaces and Snapshots saved.

Upgrade VM Family or Type

What about the VM industrial version? Are you running the latest hardware version or family? For example, all being identical, the minor change of switching between m3 to m5 compute family in AWS can save you a whopping 13% on costs.

‍

INSTANCE	CONFIGURATION	MONTHLY RATE
m3.2xlarge	8 vCPU, 32 GB	$311.94
m5.2xlarge	8 vCPU, 32 GB	$271.61

‍

Same applies to the type of processor. Moving from x86 processor to ARM can save over 32%.

‍

Instance Rates

INSTANCE	CONFIGURATION	PROCESSOR	MONTHLY RATE
m5.xlarge	4 vCPU, 16GB	x86	$156.22
m5a.xlarge	4 vCPU, 16GB	AMD	$140.16
m5g.xlarge	4 vCPU, 16GB	GRAVITON	$125.56

‍

Autoscaling

If your compute demands regularly change due to spikes or scheduled workloads, consider shifting to autoscalers. These can scale resources either vertically (increasing performance through memory or CPU) or horizontally (adding parallel instances to handle the same workload). The choice of autoscaling mechanism depends on your workload's nature and architecture (e.g., stateless or stateful workloads).

Autoscaling offers a clear advantage: you only pay for what you use. You start with a minimal configuration and scale up as demand increases. When the spike ends, instances automatically shut down, returning to the base configuration.

Here's an extra expert tip: to maximize savings, consider using an autoscaling fleet with spot instances. It's common practice to run a fleet with a mix of on-demand and spot instances, combining the reliability of on-demand with the cost savings of spot instances. You can even attach multiple spot instances to a fleet with a load balancer and autoscaler — it works essentially like on-demand compute but at a much lower cost.

Modernize your Applications

Last but not least, modernization should be the focus for any workload in the cloud. Start by weighing the pros and cons for price, management costs, and other technical constraints. If possible, go all the way serverless — this can reduce costs by up to 99%. Otherwise, containerization is a good step toward preventing vendor lock-in and improving scalability. Finally, if everything else proves too complicated, cut infrastructure management costs by shifting to PaaS solutions like app engines.

License Review

If you are using a licensed operating system in your VM, consider migrating to an open-source OS (e.g., Linux), especially if it's for development. Otherwise, check if you are entitled to use license benefits such as Azure Hybrid Benefit (AHB) if you are using Azure.

Rate Review

Rate Plans (RI/SP/CUD)

It might not be clear for many Cloud users, but you often save more on rates than usage. Make sure that you have the right rates in place. Start with Standard and Convertible RIs for stable workloads, and increase coverage by adding more flexibility through savings plans.

Set enterprise agreements for on-demand usage. This can reduce costs on on-demand compute.

Spot Instance Utilization

Do not forget to switch to Spot instances whenever possible. The rates of Spot instances are far cheaper than any other rate plans.

Summary

Similar to the previous article about optimizing databases, we examine five key reviews: business, performance, technical, license, and rates. When these reviews are conducted thoroughly, you can be confident that your cloud compute costs are fully optimized.

Do you like our blog posts? It means a lot to us when you rate (👍, ❤️, 👏) and share them. Thank you so much for your support.

‍

💡 References: Many of the good tips in this article had been taken from the excellent book Efficient Cloud FinOps authored by Alfonso San Miguel Sánchez , Danny Obando García

‍

Author

Oliver Assad