Skip to content

Conversation

ctrueblood-epri
Copy link

@ctrueblood-epri ctrueblood-epri commented Oct 22, 2025

Purpose

Update grafana.json to rename the deprecated metric "vllm:gpu_cache_usage_perc" to "vllm:kv_cache_usage_perc". This change is required for Cache Utilization to display values on a Grafana dashboard as of vLLM 0.11.0.

For reference, the GPU cache usage metric was renamed and deprecated here: vllm/v1/metrics/loggers.py#L415-L438

Test Plan

  1. Install and run vLLM version 0.11.0
  2. Configure Prometheus and Grafana as instructed here: https://docs.vllm.ai/en/latest/examples/online_serving/prometheus_grafana.html
  3. In Grafana edit JSON model by renaming "vllm:gpu_cache_usage_perc" to "vllm:kv_cache_usage_perc" and "GPU Cache Usage" to "KV Cache Usage"

Test Result

Success. Cache Utilization now displays values on the Grafana dashboard.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify
Copy link

mergify bot commented Oct 22, 2025

Documentation preview: https://vllm--27341.org.readthedocs.build/en/27341/

@mergify mergify bot added the documentation Improvements or additions to documentation label Oct 22, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request renames the deprecated metric "vllm:gpu_cache_usage_perc" to "vllm:kv_cache_usage_perc" in grafana.json. This change ensures that Cache Utilization displays values correctly on a Grafana dashboard for vLLM version 0.11.0 and later. I have provided a review comment to address a critical issue.

@markmc
Copy link
Member

markmc commented Oct 22, 2025

Thanks, #27133 has the same fix and is ready to merge

@markmc
Copy link
Member

markmc commented Oct 23, 2025

Fixed by #27133, thanks again

@markmc markmc closed this Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants