-
Notifications
You must be signed in to change notification settings - Fork 1
Features Observability
asekka edited this page Jan 2, 2026
·
1 revision
TensorWall provides comprehensive observability for LLM operations.
Every request is traced with:
- Unique request ID
- Timestamps (start, end)
- Latency measurements
- Token counts (input, output)
- Cost estimation
- Decision (ALLOW/WARN/BLOCK)
- Security findings
The admin dashboard shows:
- Requests over time
- Cost by application/model
- Latency percentiles
- Error rates
- Security events
Langfuse is an open-source LLM observability platform.
- Create a Langfuse account at https://cloud.langfuse.com
- Create a project and get API keys
- Configure TensorWall:
export LANGFUSE_PUBLIC_KEY="pk-..."
export LANGFUSE_SECRET_KEY="sk-..."
docker-compose up -dTensorWall sends to Langfuse:
- Traces: Full request lifecycle
- Generations: LLM call details (model, tokens, cost)
- Spans: Security checks, policy evaluation
- Metadata: TensorWall decision, security findings
Trace: tensorwall-gpt-4o
├── Generation: gpt-4o
│ ├── Input: [messages]
│ ├── Output: "response..."
│ ├── Usage: {input: 150, output: 50}
│ └── Cost: $0.003
└── Span: security-check
└── Findings: []
TensorWall exposes Prometheus metrics at /metrics:
# Request latency histogram
tensorwall_request_latency_seconds_bucket{model="gpt-4o",le="0.5"} 100
tensorwall_request_latency_seconds_bucket{model="gpt-4o",le="1.0"} 150
# Request counter
tensorwall_requests_total{model="gpt-4o",status="success"} 1000
tensorwall_requests_total{model="gpt-4o",status="blocked"} 5
# Token usage
tensorwall_tokens_total{model="gpt-4o",type="input"} 150000
tensorwall_tokens_total{model="gpt-4o",type="output"} 50000
# Cost
tensorwall_cost_usd_total{model="gpt-4o",app="my-app"} 45.30
# Security events
tensorwall_security_findings_total{category="prompt_injection"} 12
Import our pre-built Grafana dashboard:
# Coming soon
curl -o tensorwall-dashboard.json https://...Enable detailed logging per-request:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "X-Debug: true" \
...Response includes _tensorwall metadata:
{
"choices": [...],
"_tensorwall": {
"request_id": "req_abc123",
"decision": "ALLOW",
"security": {
"safe": true,
"risk_score": 0.0,
"findings": []
},
"policies": {
"evaluated": 3,
"matched": 0
},
"budget": {
"current_spend": 45.30,
"limit": 100.00
},
"cost_usd": 0.003,
"latency_ms": 450
}
}All requests are logged to the database for audit:
SELECT
request_id,
app_id,
model,
decision,
cost_usd,
created_at
FROM llm_request_traces
WHERE created_at > NOW() - INTERVAL '7 days'
ORDER BY created_at DESC;Configure retention:
AUDIT_RETENTION_DAYS=90 # Keep logs for 90 daysConfigure logging verbosity:
# In development
LOG_LEVEL=DEBUG
# In production
LOG_LEVEL=INFOLogs include:
- Request received
- Authentication result
- Policy evaluation
- Security check
- Provider selection
- Response sent