Features Observability

Observability

TensorWall provides comprehensive observability for LLM operations.

Built-in Features

Request Tracing

Every request is traced with:

Unique request ID
Timestamps (start, end)
Latency measurements
Token counts (input, output)
Cost estimation
Decision (ALLOW/WARN/BLOCK)
Security findings

Dashboard Analytics

The admin dashboard shows:

Requests over time
Cost by application/model
Latency percentiles
Error rates
Security events

Langfuse Integration

Langfuse is an open-source LLM observability platform.

Setup

Create a Langfuse account at https://cloud.langfuse.com
Create a project and get API keys
Configure TensorWall:

export LANGFUSE_PUBLIC_KEY="pk-..."
export LANGFUSE_SECRET_KEY="sk-..."
docker-compose up -d

What Gets Sent

TensorWall sends to Langfuse:

Traces: Full request lifecycle
Generations: LLM call details (model, tokens, cost)
Spans: Security checks, policy evaluation
Metadata: TensorWall decision, security findings

Example Trace

Trace: tensorwall-gpt-4o
├── Generation: gpt-4o
│   ├── Input: [messages]
│   ├── Output: "response..."
│   ├── Usage: {input: 150, output: 50}
│   └── Cost: $0.003
└── Span: security-check
    └── Findings: []

Prometheus Metrics

TensorWall exposes Prometheus metrics at /metrics:

Available Metrics

# Request latency histogram
tensorwall_request_latency_seconds_bucket{model="gpt-4o",le="0.5"} 100
tensorwall_request_latency_seconds_bucket{model="gpt-4o",le="1.0"} 150

# Request counter
tensorwall_requests_total{model="gpt-4o",status="success"} 1000
tensorwall_requests_total{model="gpt-4o",status="blocked"} 5

# Token usage
tensorwall_tokens_total{model="gpt-4o",type="input"} 150000
tensorwall_tokens_total{model="gpt-4o",type="output"} 50000

# Cost
tensorwall_cost_usd_total{model="gpt-4o",app="my-app"} 45.30

# Security events
tensorwall_security_findings_total{category="prompt_injection"} 12

Grafana Dashboard

Import our pre-built Grafana dashboard:

# Coming soon
curl -o tensorwall-dashboard.json https://...

Debug Mode

Enable detailed logging per-request:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "X-Debug: true" \
  ...

Response includes _tensorwall metadata:

{
  "choices": [...],
  "_tensorwall": {
    "request_id": "req_abc123",
    "decision": "ALLOW",
    "security": {
      "safe": true,
      "risk_score": 0.0,
      "findings": []
    },
    "policies": {
      "evaluated": 3,
      "matched": 0
    },
    "budget": {
      "current_spend": 45.30,
      "limit": 100.00
    },
    "cost_usd": 0.003,
    "latency_ms": 450
  }
}

Audit Logs

All requests are logged to the database for audit:

SELECT
  request_id,
  app_id,
  model,
  decision,
  cost_usd,
  created_at
FROM llm_request_traces
WHERE created_at > NOW() - INTERVAL '7 days'
ORDER BY created_at DESC;

Configure retention:

AUDIT_RETENTION_DAYS=90  # Keep logs for 90 days

Log Levels

Configure logging verbosity:

# In development
LOG_LEVEL=DEBUG

# In production
LOG_LEVEL=INFO

Logs include:

Request received
Authentication result
Policy evaluation
Security check
Provider selection
Response sent

Features Observability

Observability

Built-in Features

Request Tracing

Dashboard Analytics

Langfuse Integration

Setup

What Gets Sent

Example Trace

Prometheus Metrics

Available Metrics

Grafana Dashboard

Debug Mode

Audit Logs

Log Levels

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Documentation

Getting Started

Features

Providers

API Reference

Advanced

Community

Clone this wiki locally