To infinity and beyond seamless autoscaling with in place resource resize for k8s

In the ever-evolving landscape of cloud computing and container orchestration, Kubernetes has emerged as the de facto standard for managing and scaling containerized applications. With its robust set of features, Kubernetes provides unparalleled flexibility and scalability, allowing organizations to efficiently deploy and manage their workloads at scale. However, as applications continue to grow in complexity and demand, the need for dynamic and seamless autoscaling solutions becomes increasingly paramount.

In this talk Kohei Ota and Aya Ozawa took us on a journey through resource management in Kubernetes. They started with a few basic concepts on resources. For example the problem of pods being OOMKilled (memory) or Throttled (cpu limits) which result in degraded performance.

With resource requests and limits we can optimize resource usage in a cluster. But what if we want to allocate as less as possible resources to a pod that sometimes have memory or cpu spikes? What would be the best approach to tackle this issue. We don’t want to allocate the entire memory or cpu spike resource amount to the pod if it’s not needed in the entire process.

This calls for a Vertical Pod Autoscaler. A VPA in Kubernetes terms will be able to update pod resources if they are not met with the resource usage that the pod needs at any given time of the process. These Vertical Pod Autoscalers have a couple of update modes that can be used in the cluster.

Off (calculates the recommended values, does nothing)
Initial (sets the resources for pod creation)
Recreate (evict the pod and recreates the pod if resources are not as recommended)
Auto (almost the same behaviour as recreate)

These VPAs are very useful in a cluster where resource management is important. Both for optimized usage and cost allocation. But what can we do if we want to alter pod resources with pods running a process that we don’t want to stop? For this use case we can start using In Place Pod Resizing.

In Place Pod Resizing is in Alpha since Kubernetes version 1.27 and will make us able to reconfigure resources and patch pods without the need of pod restarts. To use this option we will have to enable the feature-gate InPlacePodVerticalScaling. This can be done via the kube-api server or at the initial cluster setup. It will unlock some new fields in Kubernetes resource definitions. For example, if we want to alter the cpu allocation after creation of the pod we can use the following.

resizePolicy: 
  resourcename: cpu 
  restartPolicy: NotRequired

With this block the pod will not be recreated if the cpu resource values are modified. This could be very valuable in for example machine learning workloads with memory or cpu spikes.

In the future we should be able to use VPA and in place resource resize combined to make even more autonomous resource managent strategies in your cluster. But there are still a lot of TODOs to make this happen.

Instead of creating a new demo I will refer to the video used by Kohei at Kubecon Europe 24.
https://www.youtube.com/watch?v=MDybm2PVGag

I will also refer to the talk at KubeCon Europe 24 if you want to have more in depth examples and would like to rewatch the talk of Kohei and Aya.
https://www.youtube.com/watch?v=9lKa8bWU9II&t=207s

For me personally this was a really interesting talk since not having to restart pods on the altering of resource allocation is a great way to allocate less resources in our Machine Learning clusters and free up some more workable resources that don’t need to be allocated to the pods the entire lifespan of that given pod.