An intelligent debugging assistant that analyzes error logs and provides AI-generated solutions using RAG (Retrieval-Augmented Generation) with Stack Overflow knowledge base.
- Frontend: https://debugai.vercel.app/
- Backend API: https://debugai-production.up.railway.app/
- API Docs: https://debugai-production.up.railway.app/docs
graph TB
subgraph Frontend["Frontend (Next.js)"]
UI[Web UI]
API_CLIENT[API Service]
end
subgraph Backend["Backend (FastAPI)"]
ROUTER[API Router]
PARSER[Error Parser]
VECTOR[Vector Store]
LLM[LLM Analyzer]
CACHE[Redis Cache]
COST[Cost Tracker]
end
subgraph Database["Database (Supabase)"]
PG[(PostgreSQL)]
PGVECTOR[(pgvector)]
end
subgraph External["External Services"]
OPENAI[GitHub Models API]
SO[Stack Overflow Data]
end
UI --> API_CLIENT
API_CLIENT --> ROUTER
ROUTER --> CACHE
CACHE --> PARSER
PARSER --> VECTOR
VECTOR --> PGVECTOR
VECTOR --> LLM
LLM --> OPENAI
LLM --> COST
COST --> PG
PARSER --> PG
SO --> PG
sequenceDiagram
participant User
participant Frontend
participant Cache
participant Parser
participant VectorStore
participant LLM
participant Database
User->>Frontend: Paste Error Log
Frontend->>Cache: Check Cache
alt Cache Hit
Cache-->>Frontend: Return Cached Analysis
else Cache Miss
Cache->>Parser: Parse Error
Parser->>Database: Store Parsed Error
Parser->>VectorStore: Search Similar Errors
VectorStore->>Database: Query pgvector
Database-->>VectorStore: Return Top K Results
VectorStore->>LLM: Analyze with Context
LLM->>Database: Track API Cost
LLM-->>Parser: Return Analysis
Parser->>Database: Store Analysis
Parser->>Cache: Cache Result (24h TTL)
Cache-->>Frontend: Return Analysis
end
Frontend-->>User: Display Solutions
- Backend: Python FastAPI
- Frontend: Next.js 14 (TypeScript, React)
- Database: Supabase (PostgreSQL with pgvector)
- Vector Store: Supabase pgvector for embeddings
- Caching: Redis (24h TTL for analyses)
- AI/LLM: GitHub Models (Azure OpenAI)
- GPT-4o-mini for error analysis
- text-embedding-3-small for vector embeddings
- Deployment: Railway (Backend), Vercel (Frontend)
- Intelligent Error Parsing: Automatically extracts error type, message, stack trace, and context from logs
- RAG-Powered Analysis: Searches Stack Overflow knowledge base using semantic similarity
- AI-Generated Solutions: GPT-4o-mini provides ranked solutions with code examples and confidence scores
- Multi-Language Support: Currently supports Python, JavaScript, TypeScript, React, Node.js, Django, FastAPI
- Batch Scraping: Automated Stack Overflow scraping across multiple tags
- Vector Search: Fast semantic search using Supabase pgvector
- Persistent Storage: All errors and analyses stored in Supabase PostgreSQL
- Redis Caching: Two-level cache for analyses and search results (24h TTL)
- Cost Tracking: Real-time API cost monitoring with daily/operation breakdown
- Analytics Dashboard: Comprehensive metrics including success rates, language breakdown, cache performance
- Feedback System: User feedback collection to improve solution quality
debugAi/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI app & endpoints
│ │ ├── api/
│ │ │ ├── analyze.py # Analysis endpoints
│ │ │ ├── analytics.py # Analytics & metrics endpoints
│ │ │ ├── feedback.py # Feedback endpoints
│ │ │ ├── cost_routes.py # Cost tracking endpoints
│ │ │ ├── embeddings_routes.py # Embeddings endpoints
│ │ │ └── scrape_routes.py # Scraping endpoints
│ │ ├── core/ # Core configuration
│ │ ├── db/
│ │ │ ├── models/ # SQLAlchemy models
│ │ │ │ ├── error.py # ParsedError, Analysis models
│ │ │ │ ├── feedback.py # Feedback model
│ │ │ │ └── cost.py # CostTracking model
│ │ │ ├── crud/ # Database operations
│ │ │ │ ├── error_crud.py # Error CRUD operations
│ │ │ │ ├── feedback_crud.py # Feedback CRUD operations
│ │ │ │ ├── cost_crud.py # Cost tracking CRUD
│ │ │ │ └── analytics_crud.py # Analytics CRUD operations
│ │ │ └── session.py # DB session management
│ │ ├── services/
│ │ │ ├── parser.py # Error log parsing
│ │ │ ├── supabase_vector_store.py # Vector store operations
│ │ │ ├── llm_analyzer.py # LLM error analysis
│ │ │ ├── cache.py # Redis caching service
│ │ │ └── cost_tracker.py # API cost tracking
│ │ ├── schemas/ # Pydantic models
│ │ ├── scripts/
│ │ │ ├── scrape_stackoverflow.py # SO scraper
│ │ │ ├── batch_scrape.py # Batch scraping
│ │ │ └── create_embeddings.py # Generate embeddings
│ │ └── utils/ # Utility functions
│ ├── Dockerfile
│ ├── requirements.txt
│ └── .env.example
├── frontend/
│ ├── app/
│ │ ├── page.tsx # Main UI
│ │ ├── analytics/page.tsx # Analytics dashboard
│ │ └── layout.tsx
│ ├── components/
│ │ ├── InputSection.tsx # Error input
│ │ └── ResultsSection.tsx # Results display
│ ├── services/
│ │ └── api.ts # API client
│ ├── Dockerfile
│ └── .env.local.example
├── docker-compose.yml
├── APPLICATION_FLOW.md # Detailed flow documentation
└── README.md
- Docker installed on your machine
- Docker Compose installed
- GitHub Personal Access Token (for GitHub Models API)
- Supabase project with pgvector enabled
- Stack Exchange API key (optional, for scraping)
-
Clone and navigate to the project
git clone <repository-url> cd debugAi
-
Create environment files
cp backend/.env.example backend/.env cp frontend/.env.local.example frontend/.env.local
-
Configure Backend Environment (
backend/.env)# Database (Supabase PostgreSQL) DATABASE_URL=postgresql+asyncpg://user:pass@host:port/dbname # GitHub Models API (for LLM and embeddings) GITHUB_TOKEN=github_pat_xxxxxxxxxxxxx # Redis (optional - for caching) REDIS_URL=redis://localhost:6379 # CORS (comma-separated origins) ALLOWED_ORIGINS=http://localhost:3000,https://yourdomain.com # Stack Exchange API (optional) STACKEXCHANGE_API_KEY=your_key_here
-
Configure Frontend Environment (
frontend/.env.local)NEXT_PUBLIC_API_URL=http://localhost:8000
-
Start all services with Docker
docker-compose up --build
This will start:
- FastAPI backend on port
8000 - Next.js frontend on port
3000
- FastAPI backend on port
-
Access the application
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
-
Initialize Database Tables The database tables are automatically created on first startup via SQLAlchemy.
-
Scrape Stack Overflow Posts (Optional but recommended)
# Scrape posts for multiple tags at once curl -X POST http://localhost:8000/api/scrape/batch # Or scrape specific tag curl -X POST http://localhost:8000/api/scrape \ -H "Content-Type: application/json" \ -d '{"tag": "python", "limit": 500}'
-
Create Embeddings
curl -X POST http://localhost:8000/api/embeddings/create
Live Application:
- Frontend: https://debugai.vercel.app/
- Backend API: https://debugai-production.up.railway.app/
- API Documentation: https://debugai-production.up.railway.app/docs
Deployment Platforms:
- Backend: Railway
- Frontend: Vercel
Deployment Checklist:
- Set all environment variables in Railway/Vercel dashboards
- Ensure
GITHUB_TOKENhas no trailing whitespace/newlines - Configure CORS origins to include your production domain:
ALLOWED_ORIGINS=https://debugai.vercel.app - Use Supabase connection pooler for better performance
- Verify pgvector extension is enabled in Supabase
docker-compose up -ddocker-compose downdocker-compose down -vdocker-compose logs -fdocker-compose logs -f backend
docker-compose logs -f frontend
docker-compose logs -f dbdocker-compose up --builddocker exec -it debugai-backend shdocker exec -it debugai-frontend shdocker exec -it debugai-db psql -U debugai -d debugai_dbgraph LR
subgraph Analysis
A1[POST /api/analyze]
A2[POST /api/feedback]
end
subgraph Analytics
B1[GET /api/analytics/overview]
B2[GET /api/analytics/language-breakdown]
B3[GET /api/analytics/feedback-stats]
B4[GET /api/analytics/cache-stats]
B5[GET /api/analytics/costs]
end
subgraph Data
C1[POST /api/scrape]
C2[POST /api/scrape/batch]
C3[POST /api/embeddings/create]
end
-
POST /api/analyze - Analyze error log and get AI-generated solutions
{ "query": "Traceback (most recent call last):\n File \"app.py\"...", "limit": 5 } -
POST /api/feedback - Submit feedback on a solution
{ "analysis_id": 1, "solution_index": 0, "worked": true, "notes": "Fixed my issue!" } -
POST /api/scrape - Scrape Stack Overflow posts for a specific tag
{ "tag": "python", "limit": 500 } -
POST /api/scrape/batch - Batch scrape multiple tags
- Python: 500 posts
- JavaScript: 500 posts
- React: 300 posts
- TypeScript: 300 posts
- Node.js: 200 posts
- Django: 150 posts
- FastAPI: 100 posts
-
POST /api/embeddings/create - Generate embeddings for scraped posts
-
GET /health - Health check endpoint
- GET /api/analytics/overview - System-wide analytics (total analyses, errors, avg time, success rate)
- GET /api/analytics/language-breakdown - Error distribution by programming language
- GET /api/analytics/feedback-stats - Feedback statistics with solution breakdown
- GET /api/analytics/cache-stats - Redis cache performance metrics
- GET /api/analytics/costs?days=30 - API cost tracking with daily breakdown
flowchart TD
A[User Pastes Error Log] --> B{Check Redis Cache}
B -->|Cache Hit| C[Return Cached Analysis]
B -->|Cache Miss| D[Parse Error Log]
D --> E[Extract Error Type, Message, Stack Trace]
E --> F[Store Parsed Error in DB]
F --> G[Vector Search in pgvector]
G --> H[Retrieve Top K Similar Posts]
H --> I[Build RAG Context]
I --> J[Send to GPT-4o-mini]
J --> K[Track API Cost]
K --> L[Generate Analysis]
L --> M[Store Analysis in DB]
M --> N[Cache Result - 24h TTL]
N --> O[Return to Frontend]
C --> O
O --> P[Display Solutions with Syntax Highlighting]
P --> Q{User Feedback}
Q -->|Worked/Didn't Work| R[Store Feedback]
- User Input: Paste error log into the frontend
- Cache Check: Check Redis for cached analysis (24h TTL)
- Error Parsing: Backend extracts error type, message, stack trace, file path, and line number
- Vector Search: Searches Stack Overflow knowledge base using semantic similarity
- RAG Context: Top 3-5 most relevant posts are retrieved (distance threshold: 0.6)
- LLM Analysis: GPT-4o-mini analyzes error with context and generates:
- Root cause explanation
- Step-by-step reasoning
- 2-3 ranked solutions with code examples
- Confidence scores (0-1)
- Source URLs from Stack Overflow
- Cost Tracking: API usage and costs are logged for monitoring
- Database Storage: Error and analysis are stored in Supabase
- Caching: Result is cached in Redis for 24 hours
- Response: Frontend displays solutions with syntax-highlighted code
- Feedback: Users can provide feedback on solution effectiveness
See APPLICATION_FLOW.md for detailed flow documentation.
Both frontend and backend support hot reload:
- Backend: Changes to Python files will automatically reload the FastAPI server
- Frontend: Changes to TypeScript/React files will trigger Next.js hot reload
- Add the package to
backend/requirements.txt - Rebuild the backend container:
docker-compose up --build backend
- Add the package to
frontend/package.jsonor run:docker-compose exec frontend npm install <package-name>
- Restart the frontend service:
docker-compose restart frontend
Backend:
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
uvicorn app.main:app --reloadFrontend:
cd frontend
npm install
npm run deverDiagram
PARSED_ERRORS ||--o{ ANALYSES : has
ANALYSES ||--o{ FEEDBACK : receives
STACKOVERFLOW_POSTS ||--o{ EMBEDDINGS : generates
COST_TRACKING
PARSED_ERRORS {
int id PK
text raw_error_log
string error_type
string error_message
string language
string framework
string file_name
int line_number
string function_name
jsonb stack_trace
float confidence_score
timestamp created_at
}
ANALYSES {
int id PK
int parsed_error_id FK
text root_cause
text reasoning
jsonb solutions
int sources_used
int analysis_time
timestamp created_at
}
FEEDBACK {
int id PK
int analysis_id FK
int solution_index
boolean worked
text notes
timestamp created_at
}
STACKOVERFLOW_POSTS {
int id PK
int question_id
string title
text question_body
text answer_body
array tags
int votes
string url
timestamp created_at
timestamp scraped_at
}
EMBEDDINGS {
int id PK
text content
vector embedding
jsonb metadata
timestamp created_at
}
COST_TRACKING {
int id PK
string operation
string model
float cost
int prompt_tokens
int completion_tokens
int total_tokens
timestamp created_at
}
parsed_errors
- Stores parsed error information
- Fields: id, raw_error_log, error_type, error_message, language, framework, file_name, line_number, function_name, stack_trace (JSONB), confidence_score, created_at
analyses
- Stores LLM-generated analysis results
- Fields: id, parsed_error_id (FK), root_cause, reasoning, solutions (JSONB), sources_used, analysis_time, created_at
feedback
- Stores user feedback on solutions
- Fields: id, analysis_id (FK), solution_index, worked, notes, created_at
stackoverflow_posts
- Stores scraped Stack Overflow posts
- Fields: id, question_id, title, question_body, answer_body, tags (array), votes, url, created_at, scraped_at
embeddings (Supabase pgvector)
- Stores vector embeddings for semantic search
- Fields: id, content, embedding (vector), metadata (JSONB), created_at
cost_tracking
- Stores API cost records for monitoring
- Fields: id, operation, model, cost, prompt_tokens, completion_tokens, total_tokens, created_at
Connect to Supabase dashboard or use direct connection:
psql "postgresql://user:pass@host:port/dbname"Symptom: httpcore.LocalProtocolError: Illegal header value
Cause: API keys in environment variables have trailing newlines/whitespace
Solution: Ensure GITHUB_TOKEN and other API keys are stripped of whitespace. This is now handled automatically in the code with .strip().
If you get port conflict errors, either:
- Stop the conflicting service on your machine
- Change ports in
docker-compose.yml
Check logs:
docker-compose logs <service-name>Remove all containers and volumes:
docker-compose down -v
docker-compose up --buildMake sure NEXT_PUBLIC_API_URL in frontend environment points to http://localhost:8000 for local development or your production backend URL for production.
Ensure your frontend origin is included in ALLOWED_ORIGINS environment variable on the backend:
ALLOWED_ORIGINS=http://localhost:3000,https://yourdomain.com- Ensure embeddings are created:
POST /api/embeddings/create - Check that Stack Overflow posts are scraped:
POST /api/scrape/batch - Verify Supabase pgvector extension is enabled
- Check
DATABASE_URLformat:postgresql+asyncpg://user:pass@host:port/dbname - Use Supabase connection pooler URL for better performance
- Ensure pgvector extension is installed in your Supabase project
| Stage | Typical Duration | Notes |
|---|---|---|
| Cache Check | 1-5ms | Redis lookup |
| Parse Error | 10-50ms | Regex-based, very fast |
| Vector Search | 50-200ms | Depends on collection size |
| DB Insert (Error) | 20-100ms | Async operation |
| LLM Analysis | 2-5 seconds | Main bottleneck |
| DB Insert (Analysis) | 20-100ms | Async operation |
| Cache Write | 1-5ms | Redis set with TTL |
| Total (Cache Miss) | 2.5-6 seconds | Full pipeline |
| Total (Cache Hit) | 5-20ms | Cached response |
graph TB
subgraph Caching Layer
REDIS[Redis Cache]
L1[Analysis Cache<br/>24h TTL]
L2[Search Cache<br/>24h TTL]
end
subgraph Cost Management
CT[Cost Tracker]
CM[Cost Model]
end
REDIS --> L1
REDIS --> L2
CT --> CM
style REDIS fill:#dc382d,color:#fff
style CT fill:#22c55e,color:#fff
- RAG (Retrieval-Augmented Generation): Combines semantic search with LLM generation for context-aware solutions
- Vector Search: Uses pgvector for fast semantic similarity search
- Two-Level Caching: Redis cache for analyses and search results with 24-hour TTL
- Cost Tracking: Real-time monitoring of API costs with breakdown by operation
- Async Operations: FastAPI async endpoints with SQLAlchemy async for better performance
- Structured Output: Uses OpenAI function calling for reliable JSON responses
- Multi-stage Pipeline: Cache Check → Parse → Search → Analyze → Store → Cache
- Feedback Loop: User feedback collection for continuous improvement
- FastAPI: Modern Python web framework with automatic OpenAPI docs
- Next.js 14: React framework with server-side rendering
- SQLAlchemy: Python ORM with async support
- Supabase: PostgreSQL with pgvector extension
- Redis: In-memory caching for analyses and search results
- GitHub Models: Azure OpenAI endpoints (GPT-4o-mini, text-embedding-3-small)
- Docker: Containerization for consistent environments
- Railway: Backend deployment platform
- Vercel: Frontend deployment platform
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Commit changes:
git commit -m "Add feature" - Push to branch:
git push origin feature-name - Open a pull request
[Add your license here]
- Stack Overflow for community knowledge
- GitHub Models for AI services
- Supabase for database and vector storage
Sai Krishna
- Email: kokkulasaikrishna1288@gmail.com
- Project: DebugAI on GitHub
Last Updated: 2026-01-23 Version: 1.1