Complete monitoring reference for ARM Mali G610 GPU and Ollama metrics.
- Range: 0-100%
- Optimal: 70-95% during inference
- Alert: >90%
- Idle: 45-55°C
- Operating: 55-75°C
- Alert: >80°C
- Critical: >95°C
- Light: 20-40%
- Moderate: 40-70%
- Heavy: 70-85%
- Alert: >85%
# Health check
curl http://localhost:9100/api/health
# Current metrics
curl http://localhost:9100/api/status | jq .
# GPU metrics only
curl http://localhost:9100/api/metrics | jq '.gpu'
# Ollama models
curl http://localhost:9100/api/ollama/models
# Alerts
curl http://localhost:9100/api/alerts
# History (last 60 minutes)
curl "http://localhost:9100/api/history?minutes=60"
# Prometheus metrics
curl http://localhost:9100/metrics# GPU status
watch -n 1 'curl -s http://localhost:9100/api/status | jq .gpu'
# Temperature monitoring
watch -n 2 'curl -s http://localhost:9100/api/metrics | jq .gpu.temperature'
# Full status
./scripts/ollama/status.sh --verbose- GPU is bottleneck
- Optimize model size
- Reduce batch size
- CPU cannot feed GPU
- Consider model conversion
- Reduce model size
- Unload unused models
Edit services/rtpi-gpu-monitor/configs/monitor.yaml:
alerts:
gpu_utilization_threshold: 90
temperature_threshold: 80
memory_threshold: 85# Service not responding
docker compose restart rtpi-gpu-monitor
# Check logs
docker compose logs rtpi-gpu-monitor
# Restart stack
./scripts/ollama/start.sh --force
# Stop stack
./scripts/ollama/stop.sh
# Verify GPU devices
ls -la /dev/mali0 /dev/dri/*
# Check GPU frequency
cat /sys/class/devfreq/fb000000.gpu/cur_freqscrape_configs:
- job_name: 'ollama-gpu'
static_configs:
- targets: ['localhost:9100']
scrape_interval: 5sFor complete details, see monitoring documentation.