Paste a URL. Describe your research. Let AI do the rest.
CrawlMind combines Cloudflare's crawl infrastructure with AI-powered URL discovery and multi-hop research synthesis — turning any query into structured, crawled knowledge.
- Smart Input — Auto-detects URLs vs. natural language; just paste or type
- Cloudflare-Powered — Fast, reliable crawling via Cloudflare's Browser Rendering API
- Multi-Format Output — Markdown, HTML, plaintext, or cleaned readable HTML
- JS Rendering — Crawl JavaScript-heavy SPAs with headless rendering
- Advanced Controls — Depth, page limits, subdomain inclusion, URL patterns, date filters
- AI URL Discovery — Describe what you need; Groq finds the best sources to crawl
- Depth Tiers — Quick (~30s), Deep Dive (~2min), or Multi-hop Research (~5min)
- Multi-Hop Research — Crawl → analyze gaps → discover follow-up sources → repeat (up to 3 rounds)
- AI Synthesis — NVIDIA NIM generates a comprehensive research report from all crawled data
- Parent-Child Jobs — Research jobs manage multiple sub-crawls independently, no interference with normal crawls
- AI Chat — Ask questions about crawl results with full context awareness
- Soft-Delete Library — Archive, restore, and manage past crawls
- Analytics Dashboard — Track crawl usage, search patterns, and AI queries
- Plan-Based Limits — Tiered pricing with Stripe integration
- Auth — GitHub, Google, and email sign-in via Better Auth
┌─────────────────────────────────────────────────────────────────┐
│ USER INPUT │
│ URL / Natural Language / AI Discovery Toggle │
└─────────────┬──────────────────────────┬────────────────────────┘
│ │
URL detected AI Discovery ON
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────────────┐
│ POST /api/crawl │ │ POST /api/research │
│ Normal Pipeline │ │ AI Research Pipeline │
└────────┬────────┘ └────────────┬─────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────────────┐
│ Cloudflare Crawl │ │ Groq: Discover URLs │
│ Single Job │ │ (llama-3.3-70b-versatile)│
└────────┬────────┘ └────────────┬─────────────┘
│ │
│ ▼
│ ┌──────────────────────────┐
│ │ Spawn Parallel Sub-Crawls │
│ │ via Cloudflare Crawl API │
│ └────────────┬─────────────┘
│ │
│ ┌────────────▼─────────────┐
│ │ RESEARCH tier only: │
│ │ NIM Gap Analysis → │
│ │ Follow-up Crawls (×3) │
│ └────────────┬─────────────┘
│ │
│ ▼
│ ┌──────────────────────────┐
│ │ NIM: Synthesis Report │
│ │ (nemotron-super-49b) │
▼ └────────────┬─────────────┘
┌─────────────────┐ │
│ Neon PostgreSQL │◄────────────────┘
│ (Prisma ORM) │
└─────────────────┘
| Layer | Technology | Purpose |
|---|---|---|
| Framework | Next.js 15 (App Router) | Full-stack React with server components |
| Database | Neon PostgreSQL + Prisma | Serverless Postgres with type-safe ORM |
| Auth | Better Auth | GitHub, Google, email authentication |
| Crawling | Cloudflare Crawl API | Browser rendering + web crawling at scale |
| AI — Fast | Groq (llama-3.3-70b) |
URL discovery (~200ms responses) |
| AI — Deep | NVIDIA NIM (nemotron-super-49b) |
Gap analysis + synthesis reports |
| AI Chat | Vercel AI SDK | Streaming chat over crawl results |
| Payments | Stripe | Subscription billing + webhooks |
| Styling | Tailwind CSS + shadcn/ui | Utility-first CSS + accessible components |
| Deployment | Vercel | Edge-optimized serverless hosting |
- Bun v1.0+
- Neon PostgreSQL database
- Cloudflare account with Crawl API access
- Groq API key (for AI URL discovery)
- NVIDIA NIM API key (for synthesis)
# Clone
git clone https://github.com/pantha704/CrawlMind.git
cd CrawlMind
# Install
bun install
# Configure
cp .env.example .env.local
# Edit .env.local with your keys (see below)
# Database
bunx prisma db push
bunx prisma generate
# Run
bun run dev# Database (Neon)
DATABASE_URL=postgresql://...
# Auth
BETTER_AUTH_SECRET=your-secret
BETTER_AUTH_URL=http://localhost:3001
GITHUB_CLIENT_ID=...
GITHUB_CLIENT_SECRET=...
GOOGLE_CLIENT_ID=...
GOOGLE_CLIENT_SECRET=...
# Cloudflare
CLOUDFLARE_API_TOKEN=...
CLOUDFLARE_ACCOUNT_ID=...
# AI
GROQ_API_KEY=... # For URL discovery (Groq)
NVIDIA_NIM_API_KEY=... # For synthesis (NVIDIA NIM)
# Stripe
STRIPE_SECRET_KEY=...
STRIPE_WEBHOOK_SECRET=...
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=...
# App
NEXT_PUBLIC_APP_URL=http://localhost:3001src/
├── app/
│ ├── api/
│ │ ├── crawl/ # Crawl CRUD, results proxy, cancel
│ │ ├── research/ # AI Discovery — create, poll, active
│ │ ├── chat/ # AI chat endpoint
│ │ ├── stripe/ # Payment webhooks
│ │ └── user/ # Usage tracking & settings
│ ├── dashboard/
│ │ ├── page.tsx # Main dashboard
│ │ ├── jobs/ # Crawl job list + detail
│ │ ├── research/ # AI research detail page
│ │ ├── chat/ # AI chat interface
│ │ ├── library/ # Archived results
│ │ └── analytics/ # Usage analytics
│ ├── pricing/ # Pricing page
│ └── (auth)/ # Sign in / sign up
├── components/
│ ├── dashboard/ # Dashboard UI (crawl-input, active-jobs, etc.)
│ ├── landing/ # Landing page components
│ └── ui/ # shadcn/ui primitives
└── lib/
├── auth.ts # Better Auth config
├── cloudflare.ts # Cloudflare Crawl API client
├── research.ts # AI Discovery — Groq + NIM integration
├── ai.ts # AI model configuration
├── prisma.ts # Prisma client
└── stripe.ts # Stripe client
| Tier | What Happens | Sources | Time |
|---|---|---|---|
| ⚡ Quick | AI finds 3-5 relevant sources, crawls them | 3-5 | ~30s |
| 🔍 Deep Dive | AI discovers 10-15 categorized sources | 10-15 | ~2min |
| 🧠 Research | Multi-hop: crawl → gap analysis → follow-up crawls (×3 rounds) → synthesis | 15-30+ | ~5min |
Models used:
- Groq (
llama-3.3-70b-versatile) — Fast URL discovery (~200ms) - NVIDIA NIM (
nemotron-super-49b-v1.5) — Deep analysis & comprehensive synthesis
| Plan | Price | Crawls/day | Pages/crawl | AI Chat | JS Render |
|---|---|---|---|---|---|
| Spark | Free | 2 | 30 | 3 queries | ❌ |
| Pro | $12/mo | 25 | 500 | Unlimited | ✅ |
| Pro+ | $24/mo | 75 | 1,000 | Unlimited | ✅ |
| Scale | $39/mo | 150 | 5,000 | Unlimited | ✅ |
- Push to GitHub
- Import in Vercel
- Add all environment variables
- Set
NEXT_PUBLIC_APP_URLto your Vercel domain - Deploy
Note: Ensure
NEXT_PUBLIC_APP_URLpoints to your deployed domain (notlocalhost) for webhooks and auth callbacks.
MIT — see LICENSE for details.
Built with ☕ and curiosity