delete offline metrics part

t00939662 · t00939662 · commit c93b1dd6c841 · 2025-11-27T20:49:02.000+08:00
diff --git a/docs/source/user-guide/metrics/metrics.md b/docs/source/user-guide/metrics/metrics.md
@@ -1,21 +1,12 @@
-# UCM Observability
+# Observability with Prometheus
 
-UCM (Unified Cache Management) provides comprehensive observability features to monitor cache performance and behavior. This document describes two complementary monitoring approaches:
-
-1. **Prometheus Metrics**: Real-time metrics exposed via Prometheus endpoints for live monitoring and visualization
-2. **Operation Logging**: File-based operation logs for offline analysis, debugging, and auditing
-
-Both features can be used independently or together, depending on your monitoring needs.
+UCM (Unified Cache Management) provides detailed metrics monitoring through Prometheus endpoints, allowing in-depth monitoring of cache performance and behavior. This document describes how to enable and configure observability from the embedded vLLM `/metrics` API endpoint.
 
 ---
 
-## Part 1: Prometheus Metrics
-
-Prometheus metrics provide real-time monitoring of UCM operations through the embedded vLLM `/metrics` API endpoint. This approach is ideal for live dashboards, alerting, and performance monitoring.
+## Quick Start Guide
 
-### Quick Start Guide
-
-#### 1) On UCM Side
+### 1) On UCM Side
 
 First, set the `PROMETHEUS_MULTIPROC_DIR` environment variable.
 
@@ -78,9 +69,9 @@ curl http://$<vllm-worker-ip>:8000/metrics | grep ucm:
 
 You will also find some `.db` files in the `$PROMETHEUS_MULTIPROC_DIR` directory, which are temporary files used by Prometheus.
 
-#### 2) Start Prometheus and Grafana with Docker Compose
+### 2) Start Prometheus and Grafana with Docker Compose
 
-##### Create Docker Compose Configuration Files
+#### Create Docker Compose Configuration Files
 
 First, create the `docker-compose.yaml` file:
 
@@ -123,7 +114,7 @@ scrape_configs:
 
 **Note**: Make sure the port number in `prometheus.yaml` matches the port number used when starting the vLLM service.
 
-##### Start Services
+#### Start Services
 
 Run the following command in the directory containing `docker-compose.yaml` and `prometheus.yaml`:
 
@@ -133,21 +124,21 @@ docker compose up
 
 This will start Prometheus and Grafana services.
 
-#### 3) Configure Grafana Dashboard
+### 3) Configure Grafana Dashboard
 
-##### Access Grafana
+#### Access Grafana
 
 Navigate to `http://<your-host>:3000`. Log in with the default username (`admin`) and password (`admin`). You will be prompted to change the password on first login.
 
-##### Add Prometheus Data Source
+#### Add Prometheus Data Source
 
 1. Navigate to `http://<your-host>:3000/connections/datasources/new` and select **Prometheus**.
 
 2. On the Prometheus configuration page, add the Prometheus server URL in the **Connection** section. For this Docker Compose setup, Grafana and Prometheus run in separate containers, but Docker creates DNS names for each container. You can directly use `http://prometheus:9090`.
 
 3. Click **Save & Test**. You should see a green checkmark showing "Successfully queried the Prometheus API."
 
-##### Import Dashboard
+#### Import Dashboard
 
 1. Navigate to `http://<your-host>:3000/dashboard/import`.
 
@@ -159,7 +150,7 @@ Navigate to `http://<your-host>:3000`. Log in with the default username (`admin`
 
 You should now be able to see the UCM monitoring dashboard with real-time visualization of all 9 metrics.
 
-### Available Metrics
+## Available Metrics
 
 UCM exposes various metrics to monitor its performance. The following table lists all available metrics organized by category:
 
@@ -178,7 +169,7 @@ UCM exposes various metrics to monitor its performance. The following table list
 | **Lookup Hit Rate Metrics** | | |
 | `ucm:interval_lookup_hit_rates` | Histogram | Hit rate of UCM lookup requests |
 
-### Prometheus Configuration
+## Prometheus Configuration
 
 Metrics configuration is defined in the `ucm/metrics/metrics_configs.yaml` file:
 
@@ -201,105 +192,4 @@ prometheus:
     # ... other metric configurations
 ```
 
----
-
-## Part 2: Operation Logging
-
-In addition to Prometheus metrics, UCM provides a file-based operation logging feature that records detailed operation data (load and dump operations) to log files. This feature is useful for offline analysis, debugging, and auditing.
-
-
-### Quick Start Guide
-
-#### 1) Enable Operation Logging
-
-1. Create or modify the metrics configuration file (`ucm/metrics/metrics_configs.yaml`). 
-
-2. Start the UCM service. If the configuration has `enabled: True`, operation logging will be automatically enabled.
-
-#### 2) View Log Files
-
-Log files are written to the directory specified by `log_dir` in the configuration file:
-
-```bash
-# List log files
-ls -lh /vllm-workspace/ucm_logs/
-
-# View active log file
-tail -f /vllm-workspace/ucm_logs/ucm_operation.log
-
-# View compressed log file
-zcat /vllm-workspace/ucm_logs/ucm_operation.log.gz | head -20
-```
-
-#### 3) Analyze Log Data
-
-Since log files are in JSON Lines format (one JSON object per line), you can easily analyze them:
-
-```bash
-# Count load operations
-grep '"op_type":"load"' /vllm-workspace/ucm_logs/ucm_operation.log | wc -l
-
-# Count dump operations
-grep '"op_type":"dump"' /vllm-workspace/ucm_logs/ucm_operation.log | wc -l
-
-# Extract all block IDs from load operations
-grep '"op_type":"load"' /vllm-workspace/ucm_logs/ucm_operation.log | jq -r '.blocks[]'
-
-# Count unique blocks
-grep '"op_type":"load"' /vllm-workspace/ucm_logs/ucm_operation.log | jq -r '.blocks[]' | sort -u | wc -l
-```
-
-### Configuration Parameters
-
-The operation logging feature is configured in the `operation_db` section of the metrics configuration file. You can use `ucm/metrics/metrics_configs.yaml` or create a separate configuration file.
-
-| Parameter | Default Value | Description |
-|-----------|---------------|-------------|
-| `enabled` | False | Enable/disable operation logging |
-| `log_dir` | `/vllm-workspace/ucm_logs` | Directory where log files are stored |
-| `log_name` | `ucm_operation` | Base name for log files |
-| `max_file_size` | 104857600 | Maximum size of a single log file in bytes (100MB). When exceeded, file rotation occurs |
-| `batch_size` | 100 | Number of log entries to batch before writing to disk |
-| `flush_interval` | 5.0 | Time interval (seconds) to force flush buffered logs |
-| `encoding` | `utf-8` | File encoding |
-| `compress_rotated` | True | Whether to compress rotated log files using gzip |
-| `compress_level` | 6 | Gzip compression level (1-9, where 1 is fastest and 9 is smallest) |
-| `max_log_files` | 30 | Maximum number of log files to retain (including compressed files) |
-| `max_log_days` | 7 | Maximum number of days to retain logs |
-| `max_log_total_size` | 1073741824 | Maximum total size of all log files in bytes (1GB) |
-
-### Example Configuration
-
-Add the following section to your `ucm/metrics/metrics_configs.yaml` file:
-
-```yaml
-operation_db:
-  enabled: True
-  log_dir: "/vllm-workspace/ucm_logs"
-  log_name: "ucm_operation"
-  max_file_size: 104857600
-  batch_size: 100
-  flush_interval: 5.0
-  encoding: "utf-8"
-  compress_rotated: True
-  compress_level: 6
-  max_log_files: 30
-  max_log_days: 7
-  max_log_total_size: 1073741824
-```
-
----
-
-## Comparison: Prometheus Metrics vs. Operation Logging
-
-| Feature | Prometheus Metrics | Operation Logging |
-|---------|-------------------|-------------------|
-| **Purpose** | Real-time monitoring and alerting | Offline analysis and debugging |
-| **Data Format** | Time-series metrics | JSON Lines (detailed operation records) |
-| **Storage** | Prometheus time-series database | File system (compressed logs) |
-| **Retention** | Configurable in Prometheus | Configurable (file count, days, total size) |
-| **Query Interface** | PromQL queries, Grafana dashboards | File-based analysis (grep, jq, etc.) |
-| **Performance Impact** | Minimal (async metrics collection) | Minimal (async file writes) |
-| **Use Cases** | Live dashboards, alerting, performance monitoring | Debugging, audit trails, detailed analysis |
-
-Both features can be enabled simultaneously. Prometheus metrics are ideal for real-time monitoring, while operation logs provide detailed historical records for in-depth analysis.
+---