Production-ready monitoring and telemetry stack using open-source observability tools. Supports both production deployment with Traefik and local development environment.
Complete observability platform covering the three pillars:
- Metrics: Prometheus + Node Exporter + cAdvisor
- Logs: Loki + Promtail
- Tracing: Jaeger + OpenTelemetry Collector
- Visualization: Grafana
graph TB
%% Styles
classDef proxy fill:#00bfff,stroke:#0080ff,stroke-width:3px,color:#fff
classDef visualization fill:#ff6b35,stroke:#ff4500,stroke-width:2px,color:#fff
classDef metrics fill:#4caf50,stroke:#2e7d32,stroke-width:2px,color:#fff
classDef logs fill:#ffc107,stroke:#ff9800,stroke-width:2px,color:#000
classDef tracing fill:#9c27b0,stroke:#7b1fa2,stroke-width:2px,color:#fff
classDef collector fill:#00acc1,stroke:#00838f,stroke-width:2px,color:#fff
classDef exporter fill:#78909c,stroke:#546e7a,stroke-width:2px,color:#fff
%% Entry Layer
Traefik[Traefik<br/>Reverse Proxy + SSL<br/>:80, :443]:::proxy
%% Visualization
Grafana[Grafana<br/>Dashboards & Alerts<br/>:3000]:::visualization
%% Storage Layer
Prometheus[Prometheus<br/>Metrics & Alerts<br/>:9090]:::metrics
Loki[Loki<br/>Log Aggregation<br/>:3100]:::logs
Jaeger[Jaeger<br/>Distributed Tracing<br/>:16686]:::tracing
%% Collector
OTEL[OTEL Collector<br/>Unified Telemetry<br/>:4317, :4318]:::collector
%% Exporters
NodeExp[Node Exporter<br/>System Metrics<br/>:9100]:::exporter
cAdvisor[cAdvisor<br/>Container Metrics<br/>:8080]:::exporter
Promtail[Promtail<br/>Log Collection<br/>Agent]:::exporter
%% Data Sources
Apps[Applications<br/>Microservices]
System[System<br/>Host + Docker]
%% Entry Flows
Traefik -->|HTTPS| Grafana
Traefik -->|HTTPS| Prometheus
Traefik -->|HTTPS| Jaeger
Traefik -->|HTTPS| Loki
%% Grafana Connections
Grafana -->|Query| Prometheus
Grafana -->|Query| Loki
Grafana -->|Query| Jaeger
%% Metrics Flow
NodeExp -->|Scrape| Prometheus
cAdvisor -->|Scrape| Prometheus
OTEL -->|Export| Prometheus
%% Logs Flow
Promtail -->|Push| Loki
OTEL -->|Export| Loki
System -->|Read| Promtail
%% Traces Flow
OTEL -->|OTLP| Jaeger
%% Apps to OTEL
Apps -->|OTLP| OTEL
%% System Data
System -->|Metrics| NodeExp
System -->|Metrics| cAdvisor
Port: 3000
Unified visualization dashboard with pre-configured datasources (Prometheus, Loki, Jaeger), alerting support, and auto-provisioning.
Port: 9090
Metrics collection and alerting system. Scrapes metrics from Node Exporter, cAdvisor, and OTEL Collector. Default retention: 2 days.
Port: 3100
Log aggregation system with filesystem storage, BoltDB schema v11, and 7-day retention.
Port: 16686
Distributed tracing system with native OTLP support and gRPC/HTTP collectors.
Ports: 4317 (gRPC), 55681 (HTTP), 8889 (metrics)
Unified telemetry collector with pipelines for metrics, traces, and logs.
Log collection agent that reads system logs (/var/log) and Docker container logs, pushing them to Loki.
Port: 9100
System metrics exporter for CPU, memory, disk, network, and processes.
Port: 8080
Container metrics exporter for Docker containers.
Ports: 80 (HTTP), 443 (HTTPS)
Reverse proxy with automatic SSL/TLS via Let's Encrypt and HTTP to HTTPS redirection.
# Create networks and volumes
docker network create -d overlay network_public
docker volume create certificates
# Deploy stack
docker stack deploy -c docker-compose.yml metricsAccess via:
- Grafana: https://metrics.your-domain.com
- Prometheus: https://metrics.your-domain.com/prometheus
- Jaeger: https://metrics.your-domain.com/jaeger
docker-compose -f docker-compose.local.yml up -dAccess via:
- Grafana: http://localhost:3000 (admin/admin)
- Prometheus: http://localhost:9090
- Jaeger: http://localhost:16686
- cAdvisor: http://localhost:8080
- Datasources:
grafana/provisioning/datasources/datasource.yml - Dashboards: Add JSON files to
grafana/provisioning/dashboards/ - Environment variables: See
environmentsection in docker-compose
Edit loki/loki-config.yml to:
- Adjust log retention
- Configure ingestion limits
- Change storage backend
Edit otel-collector/otel-config.yml to:
- Add new receivers
- Configure custom processors
- Adjust exporters
├── docker-compose.yml # Production stack
├── docker-compose.local.yml # Local development
├── grafana/provisioning/
│ ├── dashboards/
│ └── datasources/
├── prometheus/
│ ├── prometheus.yml
│ ├── rules.yml
│ └── web.yml
├── loki/loki-config.yml
├── otel-collector/otel-config.yml
└── promtail/promtail-config.yml
Prometheus - Edit prometheus/prometheus.yml to add scrape targets or alert rules.
# Via Docker Swarm
docker service ls
docker service ps metrics_prometheus
# Via Docker Compose
docker-compose -f docker-compose.local.yml ps
docker-compose -f docker-compose.local.yml logs -fAccess Prometheus → Status → Targets to verify all exporters are UP.
Import these dashboards in Grafana (Dashboard → Import):
| ID | Name | Description |
|---|---|---|
| 1860 | Node Exporter Full | Detailed system metrics |
| 893 | Docker and System Monitoring | Container overview |
| 13639 | Loki Dashboard | Log visualization |
| 13407 | Jaeger Dashboard | Trace analysis |
groups:
- name: service-alerts
interval: 30s
rules:
- alert: ServiceDown
expr: up == 0
for: 5m
labels:
severity: criticalfrom prometheus_client import Counter, push_to_gateway
counter = Counter('my_metric', 'Description')
counter.inc()
push_to_gateway('prometheus:9091', job='my_job', registry=registry)const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const exporter = new OTLPTraceExporter({
url: 'http://otel-collector:4317',
});# Production
docker stack deploy -c docker-compose.yml metrics
docker stack rm metrics
docker service logs -f metrics_grafana
# Development
docker-compose -f docker-compose.local.yml up -d
docker-compose -f docker-compose.local.yml down
docker-compose -f docker-compose.local.yml logs -f
# Maintenance
docker exec -it prometheus promtool tsdb analyze /prometheus