You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
UCM (Unified Cache Management) provides comprehensive observability features to monitor cache performance and behavior. This document describes two complementary monitoring approaches:
4
-
5
-
1.**Prometheus Metrics**: Real-time metrics exposed via Prometheus endpoints for live monitoring and visualization
6
-
2.**Operation Logging**: File-based operation logs for offline analysis, debugging, and auditing
7
-
8
-
Both features can be used independently or together, depending on your monitoring needs.
3
+
UCM (Unified Cache Management) provides detailed metrics monitoring through Prometheus endpoints, allowing in-depth monitoring of cache performance and behavior. This document describes how to enable and configure observability from the embedded vLLM `/metrics` API endpoint.
9
4
10
5
---
11
6
12
-
## Part 1: Prometheus Metrics
13
-
14
-
Prometheus metrics provide real-time monitoring of UCM operations through the embedded vLLM `/metrics` API endpoint. This approach is ideal for live dashboards, alerting, and performance monitoring.
7
+
## Quick Start Guide
15
8
16
-
### Quick Start Guide
17
-
18
-
#### 1) On UCM Side
9
+
### 1) On UCM Side
19
10
20
11
First, set the `PROMETHEUS_MULTIPROC_DIR` environment variable.
You will also find some `.db` files in the `$PROMETHEUS_MULTIPROC_DIR` directory, which are temporary files used by Prometheus.
80
71
81
-
####2) Start Prometheus and Grafana with Docker Compose
72
+
### 2) Start Prometheus and Grafana with Docker Compose
82
73
83
-
#####Create Docker Compose Configuration Files
74
+
#### Create Docker Compose Configuration Files
84
75
85
76
First, create the `docker-compose.yaml` file:
86
77
@@ -123,7 +114,7 @@ scrape_configs:
123
114
124
115
**Note**: Make sure the port number in `prometheus.yaml` matches the port number used when starting the vLLM service.
125
116
126
-
##### Start Services
117
+
#### Start Services
127
118
128
119
Run the following command in the directory containing `docker-compose.yaml` and `prometheus.yaml`:
129
120
@@ -133,21 +124,21 @@ docker compose up
133
124
134
125
This will start Prometheus and Grafana services.
135
126
136
-
#### 3) Configure Grafana Dashboard
127
+
### 3) Configure Grafana Dashboard
137
128
138
-
##### Access Grafana
129
+
#### Access Grafana
139
130
140
131
Navigate to `http://<your-host>:3000`. Log in with the default username (`admin`) and password (`admin`). You will be prompted to change the password on first login.
141
132
142
-
##### Add Prometheus Data Source
133
+
#### Add Prometheus Data Source
143
134
144
135
1. Navigate to `http://<your-host>:3000/connections/datasources/new` and select **Prometheus**.
145
136
146
137
2. On the Prometheus configuration page, add the Prometheus server URL in the **Connection** section. For this Docker Compose setup, Grafana and Prometheus run in separate containers, but Docker creates DNS names for each container. You can directly use `http://prometheus:9090`.
147
138
148
139
3. Click **Save & Test**. You should see a green checkmark showing "Successfully queried the Prometheus API."
149
140
150
-
##### Import Dashboard
141
+
#### Import Dashboard
151
142
152
143
1. Navigate to `http://<your-host>:3000/dashboard/import`.
153
144
@@ -159,7 +150,7 @@ Navigate to `http://<your-host>:3000`. Log in with the default username (`admin`
159
150
160
151
You should now be able to see the UCM monitoring dashboard with real-time visualization of all 9 metrics.
161
152
162
-
### Available Metrics
153
+
## Available Metrics
163
154
164
155
UCM exposes various metrics to monitor its performance. The following table lists all available metrics organized by category:
165
156
@@ -178,7 +169,7 @@ UCM exposes various metrics to monitor its performance. The following table list
178
169
| **Lookup Hit Rate Metrics** | | |
179
170
| `ucm:interval_lookup_hit_rates` | Histogram | Hit rate of UCM lookup requests |
180
171
181
-
### Prometheus Configuration
172
+
## Prometheus Configuration
182
173
183
174
Metrics configuration is defined in the `ucm/metrics/metrics_configs.yaml` file:
184
175
@@ -201,105 +192,4 @@ prometheus:
201
192
# ... other metric configurations
202
193
```
203
194
204
-
---
205
-
206
-
## Part 2: Operation Logging
207
-
208
-
In addition to Prometheus metrics, UCM provides a file-based operation logging feature that records detailed operation data (load and dump operations) to log files. This feature is useful for offline analysis, debugging, and auditing.
209
-
210
-
211
-
### Quick Start Guide
212
-
213
-
#### 1) Enable Operation Logging
214
-
215
-
1. Create or modify the metrics configuration file (`ucm/metrics/metrics_configs.yaml`).
216
-
217
-
2. Start the UCM service. If the configuration has `enabled: True`, operation logging will be automatically enabled.
218
-
219
-
#### 2) View Log Files
220
-
221
-
Log files are written to the directory specified by `log_dir` in the configuration file:
The operation logging feature is configured in the `operation_db` section of the metrics configuration file. You can use `ucm/metrics/metrics_configs.yaml` or create a separate configuration file.
Both features can be enabled simultaneously. Prometheus metrics are ideal for real-time monitoring, while operation logs provide detailed historical records for in-depth analysis.
0 commit comments