RAG of Fire is an AI-powered Incident Response System that automatically analyzes production incidents and recommends mitigation actions using historical data and machine learning.
When a production system experiences an incident (like a database timeout, memory leak, or API rate limiting), your on-call engineer needs to:
- Understand what's happening (diagnosis)
- Find similar past incidents (search historical knowledge)
- Know what worked before (recommendations)
- Act quickly to minimize downtime
RAG of Fire automates all of this.
- REST API that handles incident analysis
- Connected to a vector database (ChromaDB) for semantic search
- Has 10 pre-loaded historical incidents with real metrics
- Can match new incidents to past ones and recommend solutions
- Dashboard to visualize incidents
- Forms to search historical data
- Displays recommendations and confidence scores
- PostgreSQL: Stores structured incident data
- ChromaDB: Vector database for semantic search (finds similar incidents)
RAG = Retrieval-Augmented Generation
- Retrieval: Find similar past incidents from vector database
- Augmented: Combine with current incident data
- Generation: Use LLM to create personalized recommendations
Example:
Current Incident: Database timeout, 450% spike
↓
Search vector DB for similar incidents
↓
Found: INC-2025-001 (database timeout, 450% spike, fixed by throttling 30%)
↓
LLM generates: "Recommend: Throttle database connections to 30%"
↓
Confidence: 95% (because we have exact match in history)
The test_system.py file runs 7 tests simulating real production incidents:
error_type: "database_timeout"
spike_percentage: 450% # Database response time increased 4.5x
Expected: "Throttle connections to 30%"What it tests: Can the system detect a database timeout and recommend connection throttling?
error_type: "kafka_consumer_lag"
spike_percentage: 320% # Message queue backed up
Expected: "Throttle consumers to 45%"What it tests: Can the system detect message queue buildup and recommend consumer throttling?
error_type: "memory_leak"
spike_percentage: 280% # Memory usage increased 2.8x
Expected: "Throttle traffic to 20%"What it tests: Can the system detect memory pressure and recommend traffic reduction?
Reports streaming anomalies in real-timeWhat it tests: Can the system accept live anomaly data?
Search historical RCA documents
Get statistics about stored incidentsWhat it tests: Can the system retrieve and summarize historical data?
┌─────────────────┐
│ Frontend │
│ (Next.js) │
└────────┬────────┘
│ HTTP requests
↓
┌─────────────────────────────────────┐
│ Backend (FastAPI) │
│ - Decision Engine │
│ - Document Search │
│ - Streaming Anomalies │
└────────┬────────────────────────────┘
│
┌────┴─────┬────────────────┐
↓ ↓ ↓
┌────────┐ ┌──────────┐ ┌────────────┐
│PostgreSQL│ ChromaDB │ LLM Service │
│(DB Data) │(Vector │ (OpenAI or │
│ │Search) │ Mock) │
└──────────┴──────────┴─────────────┘
-
LLM Service
- Currently using a mock LLM (deterministic responses)
- Can integrate real OpenAI/Anthropic API
- Location:
backend/services/llm_engine.py
-
Vector Database
- Currently using ChromaDB (in-memory, great for demo)
- Can integrate Pinecone, Weaviate, etc.
- Location:
backend/services/vector_db.py
-
Real Database
- PostgreSQL models already defined
- Need to connect to real PostgreSQL instance
- Location:
backend/database/db.py
-
Streaming Pipeline
- Currently using asyncio.Queue (mock Kafka/Flink)
- Can integrate real Apache Kafka
- Location:
backend/services/stream_processor.py
-
Frontend Features
- Basic Next.js UI created
- Can add real-time dashboards, advanced filtering
- Location:
app/andcomponents/
- ✅ Core logic working
- ✅ All 7 tests passing
- ✅ API endpoints functioning
- ✅ Backend starting successfully
A persistent connection between browser and server that allows LIVE, TWO-WAY communication.
Browser Backend
| |
|-------- Connect via WebSocket ------|
| |
|<---- Real-time Decision Updates ----|
|<---- Anomaly Alerts ------------------|
|<---- System Events ------------------|
// Frontend (JavaScript)
const ws = new WebSocket('ws://localhost:8000/ws/incidents');
ws.onmessage = (event) => {
const message = JSON.parse(event.data);
if (message.type === 'decision') {
// Update dashboard in real-time
console.log('New Decision:', message.data);
// Display: "Database Timeout - Recommend: Throttle to 30%"
}
if (message.type === 'anomaly') {
// Show alert in UI
showAlert('Anomaly Detected: ' + message.data.error_type);
}
};
// Send ping to keep connection alive
setInterval(() => {
ws.send(JSON.stringify({ type: 'ping' }));
}, 30000);# Backend generates a decision
decision = {
"id": "dec_12345",
"matched_incident": "INC-2025-001",
"symptom": "Database response time increased 450%",
"recommended_action": "Throttle connections to 30%",
"confidence_score": 0.95,
"latency_ms": 245
}
# Broadcast to ALL connected browsers
await notification_service.ws_manager.broadcast({
"type": "decision",
"data": decision
})Result: 100+ web browsers get the same notification instantly! ⚡
A team messaging platform. Slack integration means the system sends alerts to your Slack channel automatically.
Backend detects incident
↓
Generates decision/recommendation
↓
Sends formatted message to Slack webhook
↓
Slack channel receives alert
↓
Team members see: "🚨 Database Timeout - Throttle to 30%"
Step 1: Go to https://api.slack.com/apps and create an app
- Name: "Incident Response Bot"
Step 2: Enable "Incoming Webhooks"
- Get the webhook URL (looks like):
https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX
Step 3: Add to .env file
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXStep 4: Restart backend
uvicorn backend.main:app --reloadWhen an incident is detected, Slack receives:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🚨 Incident Decision: INC-2025-001
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Symptom:
Database response time increased 450%
Confidence:
95%
Recommended Action:
Throttle connections to 30%
⏱️ 245ms | ID: dec_12345
# backend/services/notification.py
class SlackNotifier:
async def send_decision_alert(self, decision):
# Build a nice formatted message
payload = {
"text": "🚨 Incident Mitigation Decision Generated",
"blocks": [
{
"type": "header",
"text": {
"type": "plain_text",
"text": f"Incident Decision: {decision.matched_incident}"
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": f"*Recommended Action:*\n```{decision.recommended_action}```"
}
}
]
}
# Send HTTP POST to Slack webhook
async with aiohttp.ClientSession() as session:
await session.post(self.webhook_url, json=payload)Scenario: Production database connection pool exhausted
1. MONITORING SYSTEM
└─ Detects: Database response time 450% spike
└─ Sends to: POST /api/v1/decisions/stream-anomaly
2. BACKEND (Decision Engine)
├─ Searches vector DB for similar incidents
├─ Finds: INC-2025-001 (exact match!)
├─ Generates: "Throttle connections to 30%"
├─ Calculates: Confidence 95% (based on exact match)
└─ Creates Decision object
3. NOTIFICATION SERVICE
├─ WebSocket Broadcast:
│ └─ Sends to 42 browsers viewing dashboard
│ └─ Each browser shows alert immediately
│
└─ Slack Notification:
└─ Sends formatted alert to #incidents channel
└─ On-call engineer sees alert in Slack
4. TEAM RESPONSE
├─ Engineer reads: "Throttle to 30%"
├─ Executes: kubectl patch deployment...
├─ Database recovers: Response time back to normal
└─ Incident resolved ✅
# Step 1: Incoming anomaly
POST /api/v1/decisions/stream-anomaly {
"error_type": "database_timeout",
"spike_percentage": 450
}
# Step 2: Decision generated (backend/routes/decisions.py)
decision = await engine.generate_decision(...)
# Step 3: Notifications sent (backend/services/notification.py)
await notification_service.notify_decision(
decision=decision,
channels=["websocket", "slack"] # Send to BOTH
)
# Step 4: Results
- 42 browsers: See live alert (WebSocket)
- Slack #incidents channel: Formatted message with recommendationCurrently you have:
| Component | Status | What It Does |
|---|---|---|
| WebSocket | ✅ Working | Streams live incidents to dashboards |
| Slack | ✅ Ready to connect | When configured, sends alerts to Slack |
| LLM | ✅ Mock (can be real) | Generates recommendations |
| Vector DB | ✅ Working | Stores & searches 10 historical incidents |
| PostgreSQL | ⏳ Optional | Can store permanent incident records |
- Verify backend is running:
uvicorn backend.main:app --reload - Run tests:
python test_system.py - Check API: Visit
http://localhost:8000/docs - Optional - Connect Slack:
- Get webhook URL from Slack API
- Add to
.env - Restart backend
- Try incident analysis - alert will appear in Slack!
- Develop frontend or Integrate with real services