Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -530,7 +530,7 @@ spec:
name: accelerators-thanos-querier-datasource
# Multiply by 100 so we can read it as a percentage without setting a unit (avoids CUE unit conflicts)
query: >
100 * avg(vllm:gpu_cache_usage_perc)
100 * avg(vllm:kv_cache_usage_perc)

"18":
kind: Panel
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ spec:
kind: PrometheusTimeSeriesQuery
spec:
datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
query: avg(vllm:gpu_cache_usage_perc{namespace="$NS",service="$SVC"}) or vector(0)
query: avg(vllm:kv_cache_usage_perc{namespace="$NS",service="$SVC"}) or vector(0)
minStep: "15s"

core_running_ts:
Expand Down Expand Up @@ -168,7 +168,7 @@ spec:
spec:
datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
# multiply by 100 to present percentage; omit format.unit to avoid schema conflicts
query: (avg(vllm:gpu_cache_usage_perc{namespace="$NS",service="$SVC"}) * 100) or vector(0)
query: (avg(vllm:kv_cache_usage_perc{namespace="$NS",service="$SVC"}) * 100) or vector(0)
minStep: "15s"

core_kv_usage_pct_ts:
Expand All @@ -187,7 +187,7 @@ spec:
kind: PrometheusTimeSeriesQuery
spec:
datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
query: (avg by (service) (vllm:gpu_cache_usage_perc{namespace="$NS",service="$SVC"}) * 100) or vector(0)
query: (avg by (service) (vllm:kv_cache_usage_perc{namespace="$NS",service="$SVC"}) * 100) or vector(0)
minStep: "15s"

# --- Per-Pod breakdowns (works on Simulator & Real) ---
Expand Down Expand Up @@ -246,7 +246,7 @@ spec:
spec:
datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
# if your exporter labels kv metric with pod (the sim does), this works; otherwise it will just return empty
query: (avg by (pod) (vllm:gpu_cache_usage_perc{namespace="$NS",service="$SVC"}) * 100) or vector(0)
query: (avg by (pod) (vllm:kv_cache_usage_perc{namespace="$NS",service="$SVC"}) * 100) or vector(0)
minStep: "15s"

# --- Real vLLM only (zeros on simulator) ---
Expand Down
2 changes: 1 addition & 1 deletion examples/online_serving/prometheus_grafana/grafana.json
Original file line number Diff line number Diff line change
Expand Up @@ -852,7 +852,7 @@
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "vllm:gpu_cache_usage_perc{model_name=\"$model_name\"}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The metric name vllm:gpu_cache_usage_perc is outdated and should be updated to vllm:kv_cache_usage_perc to align with the latest metrics. This discrepancy could lead to incorrect monitoring and alerting.

It's critical to ensure that monitoring dashboards use the correct metric names to provide accurate insights into system performance and resource utilization.

          "expr": "vllm:kv_cache_usage_perc{model_name=\"$model_name\"}"

"expr": "vllm:kv_cache_usage_perc{model_name=\"$model_name\"}",
"instant": false,
"legendFormat": "GPU Cache Usage",
"range": true,
Expand Down