Skip to content

Cloudability metric-agent not working in GKE Autopilot #335

@rayasm1

Description

@rayasm1

Hi,

We have a mix of GKE standard and Autopilot clusters. The metrics agent works fine in GKE Standard clusters but not starting in GKE Autopilot clusters

Agent version: 2.11.39, similar issue with latest(2.14.12)

Below are the logs and observation in GKE Autopilot cluster

kubectl get pods -n platform-finops                                    ✔  │ 10:14:51
NAME                                          READY   STATUS             RESTARTS         AGE
cloudability-metrics-agent-65b6b489f8-x2s6x   0/1     CrashLoopBackOff   18 (3m33s ago)   71m

There is no network polices

kubectl get netpol -n platform-finops                             ✔ │ 4s  │ 10:55:04
No resources found in platform-finops namespace.

Logs from the pod

kubectl logs cloudability-metrics-agent-65b6b489f8-x2s6x -n platform-finops
time="2026-03-15T23:50:42Z" level=info msg="Starting Cloudability Kubernetes Metric Agent version: 2.11.39"
time="2026-03-15T23:50:42Z" level=info msg="Metric collection retry limit set to 3 (default is 1)"
time="2026-03-15T23:50:42Z" level=info msg="ForceKubeProxy is set, direct node connection disabled"
time="2026-03-15T23:50:42Z" level=info msg="Node [https://100.118.16.1:443/api/v1/nodes/gk3-<nodename>/proxy/stats/summary] available via proxy connection? false"
time="2026-03-15T23:50:42Z" level=info msg="Node [https://100.118.16.1:443/api/v1/nodes/gk3-gk3-<nodename>/proxy/stats/summary] available via proxy connection? false"
time="2026-03-15T23:50:42Z" level=info msg="Node [https://100.118.16.1:443/api/v1/nodes/gk3-gk3-<nodename>/proxy/stats/summary] available via proxy connection? false"
time="2026-03-15T23:50:42Z" level=info msg="Node [https://100.118.16.1:443/api/v1/nodes/gk3-gk3-<nodename>/proxy/stats/summary] available via proxy connection? false"
time="2026-03-15T23:50:42Z" level=info msg="Node [https://100.118.16.1:443/api/v1/nodes/gk3-<nodename>/proxy/stats/summary] available via proxy connection? false"
time="2026-03-15T23:50:42Z" level=info msg="Node [https://100.118.16.1:443/api/v1/nodes/gk3-<nodename>proxy/stats/summary] available via proxy connection? false"
time="2026-03-15T23:50:42Z" level=info msg="Node [https://100.118.16.1:443/api/v1/nodes/gk3-<nodename>/proxy/stats/summary] available via proxy connection? false"
time="2026-03-15T23:50:42Z" level=info msg="Of 7 nodes, 0 connected directly, 0 connected via proxy, and 0 could not be reached"
time="2026-03-15T23:50:42Z" level=warning msg="Only 0 percent of ready nodes could could be connected to, agent will operate in a limited mode."
time="2026-03-15T23:50:42Z" level=warning msg="Warning non-fatal error: Agent error occurred verifying node source metrics: unable to retrieve required metrics from any node via direct or proxy connection\nFor more information see: https://help.apptio.com/en-us/cloudability/product/k8s-metrics-agent.htm"
time="2026-03-15T23:50:42Z" level=fatal msg="unable to retrieve node summaries: unable to retrieve required metrics from any node via direct or proxy connection"

logs from cloud logging

"nodes "gk3-node-name is forbidden: User "system:serviceaccount:<namespace>:<sa>" cannot get resource "nodes/proxy" in API group "" at the cluster scope: GKE Warden authz [denied by managed-namespaces-limitation]: cluster scoped resource "nodes/proxy" is managed and access is denied"

Is this due to the restriction of node access in GKE Autopilot or is there any workaround to get it working?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions