Enter a YouTube video ID, ask any question, and get AI-powered answers grounded in the video's transcript β all from a sleek Chrome extension.
- Overview
- Key Features
- Architecture
- Tech Stack
- Getting Started
- How It Works
- API Reference
- Project Structure
- Troubleshooting
- License
YouTube RAG Assistant brings the power of Retrieval-Augmented Generation (RAG) to YouTube. Instead of watching an entire video, you can:
- Provide a YouTube Video ID
- Ask a question in natural language
- Get an accurate, transcript-grounded answer powered by Google Gemini 2.5 Flash
The system automatically downloads the video transcript, chunks it, embeds it into a vector store, and runs a full RAG pipeline to answer your question β all in real time.
| Feature | Description |
|---|---|
| π₯ Transcript Q&A | Ask anything about a YouTube video and get answers from its transcript |
| π€ Gemini 2.5 Flash | Fast, accurate language model for natural-language understanding |
| π§ RAG Pipeline | Retrieval-Augmented Generation ensures factual, grounded responses |
| π Semantic Search | ChromaDB + HuggingFace embeddings for intelligent context retrieval |
| π¬ Chat Interface | Beautiful dark-themed popup with animated chat bubbles |
| β‘ Real-Time | Processes transcripts and answers on-the-fly |
ββββββββββββββββββββββββββββββββββββββββ
β Chrome Extension (MV3) β
β ββββββββββββββ βββββββββββββββββ β
β β popup.html β β popup.js β β
β β (Chat UI) β β (API calls) β β
β βββββββ¬βββββββ βββββββββ¬ββββββββ β
ββββββββββΌββββββββββββββββββββΌββββββββββ
β POST /ask β
βΌ βΌ
ββββββββββββββββββββββββββββββββββββββββ
β FastAPI Backend (api.py) β
β β
β 1. YouTube Transcript Loader β
β β β
β βΌ β
β 2. RecursiveCharacterTextSplitter β
β β β
β βΌ β
β 3. ChromaDB Vector Store β
β (HuggingFace Embeddings) β
β β β
β βΌ β
β 4. LangChain RAG Chain β
β (Retriever β Prompt β LLM) β
β β β
β βΌ β
β 5. Google Gemini 2.5 Flash β
β β β
β βΌ β
β 6. Parsed Answer β JSON Response β
ββββββββββββββββββββββββββββββββββββββββ
| Layer | Technology | Purpose |
|---|---|---|
| LLM | Google Gemini 2.5 Flash | Answer generation from transcript context |
| RAG Framework | LangChain | Pipeline orchestration β loader, splitter, retriever, chain |
| Transcript | YoutubeLoader (LangChain) |
Automatic YouTube transcript extraction |
| Vector DB | ChromaDB | Semantic search over transcript chunks |
| Embeddings | HuggingFace all-MiniLM-L6-v2 |
Sentence-level embeddings |
| Backend | FastAPI + Uvicorn | Async REST API |
| Frontend | Chrome Extension (Manifest V3) | Chat interface popup |
| Styling | Vanilla CSS | Dark theme, indigo gradients, smooth animations |
| Requirement | Version |
|---|---|
| Python | 3.8+ |
| Google Chrome | Latest |
| Google API Key | Get one here |
cd backend
# Create virtual environment
python -m venv venv
# Activate
venv\Scripts\activate # Windows
# source venv/bin/activate # macOS / Linux
# Install dependencies
pip install fastapi uvicorn python-dotenv langchain langchain-google-genai langchain-community chromadb sentence-transformers
# Create .env file
echo GOOGLE_API_KEY=your_api_key_here > .envuvicorn api:app --reload --port 8000API available at http://localhost:8000.
- Open
chrome://extensions/ - Enable Developer mode
- Click "Load unpacked"
- Select the
frontend/folder - Pin the extension
- Click the extension icon
- Enter a YouTube Video ID (e.g.,
aircAruvnKk) - Type a question: "What is this video about?"
- Click "Ask" and get your answer!
Tip: The Video ID is the part after
v=in a YouTube URL.
Forhttps://www.youtube.com/watch?v=aircAruvnKk, the ID isaircAruvnKk.
Send a video ID and question to get a transcript-grounded answer.
Request Body:
{
"video_id": "aircAruvnKk",
"question": "What are the main topics discussed?"
}Response:
{
"answer": "The video discusses neural networks, specifically..."
}Error Response (no transcript available):
{
"detail": "Transcript not available for this video"
}- Transcript Loading β
YoutubeLoaderfetches the auto-generated or manual transcript - Text Splitting β
RecursiveCharacterTextSplitterchunks transcript into 1000-char pieces with 200-char overlap - Embedding & Storage β Chunks embedded with
all-MiniLM-L6-v2and stored in ChromaDB - Retrieval β Top 4 most relevant chunks retrieved for each question
- RAG Chain β Retrieved context + question fed through a LangChain prompt template
- LLM Answer β Gemini 2.5 Flash generates a grounded answer
- Response β Answer returned to the extension and displayed in the chat
yt_video_chatbot/
βββ backend/
β βββ api.py # FastAPI backend β full RAG pipeline
β βββ yt_chroma_db/ # ChromaDB persistent storage (auto-generated)
β
βββ frontend/
βββ manifest.json # Chrome Extension config (Manifest V3)
βββ popup.html # Chat UI β dark indigo theme, animations
βββ popup.js # Extension logic β API calls, chat rendering
| Problem | Solution |
|---|---|
| "Transcript not available" | The video may not have captions β try a different video |
| Server not running | Run uvicorn api:app --reload --port 8000 |
| API key error | Add GOOGLE_API_KEY to backend/.env |
| Extension not loading | Enable Developer mode at chrome://extensions/ |
| Slow first request | HuggingFace embeddings model downloads on first run (~90MB) |
| Wrong video ID | Use only the ID part (e.g., aircAruvnKk), not the full URL |
MIT License β feel free to modify and use for your own projects.
Built with β€οΈ for Knowledge Seekers π¬π€
Ask a video anything β powered by RAG + Gemini AI.