Feature: Autoscaled compute node pool cannot stay at zero

### Motivation
The motivation behind this issue is to allow Claudie clusters to remain operational using only controller nodes when there are no CPU/GPU workloads running, during idle time, and to provision GPU compute nodes only when GPU workloads are actually requested.

### Description
When Claudie deploys infrastructure with:

- a controller node pool using a static node count, and
- a compute node pool managed by an autoscaler with min=0,

it initially provisions only the controller nodes. However, shortly after deployment, the autoscaler provisions compute nodes automatically.

This happens because some pods from the `longhorn-system` namespace and several pods from `kube-system` (notably hubble-* pods) do not tolerate the control-plane taint:

`node-role.kubernetes.io/control-plane:NoSchedule`

As a result, these pods cannot be scheduled onto the controller nodes. The cluster autoscaler then detects unschedulable workloads and immediately provisions a new node from the autoscaled compute pool.

Because of this behavior, the autoscaled compute node pool never remains at zero nodes after deployment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Autoscaled compute node pool cannot stay at zero #2118

Motivation

Description

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Feature: Autoscaled compute node pool cannot stay at zero #2118

Description

Motivation

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions