Skip to content

Curate and index clean docs for AI context and keep it fresh (with Claude Code)

Notifications You must be signed in to change notification settings

michellepace/docs-for-ai

Repository files navigation

Curate Docs For AI (with Claude Code)

Curate and index documentation from any website into collections like tailwind/, horses/, etc. Reference collection indexes in your AI chats (e.g. @tailwind/INDEX.xml what's a utility?) so that only relevant docs are analysed. Much cleaner than a web-fetch and more focussed than a web-search. Keep your AI context sharp.

📦 Repo Collections

Collection Collection Index Description Scraped Source
📦 anthropic/ 📄 anthropic/INDEX.xml Claude Agent SDK 2025-11-06 Official
📦 biome/ 📄 biome/INDEX.xml Fast linter/formatter 2025-11-04 Official
📦 marimo/ 📄 marimo/INDEX.xml Reactive Python notebooks 2025-11-11 Official
📦 nextjs/ 📄 nextjs/INDEX.xml React framework 2025-11-05 Official
📦 playwright/ 📄 playwright/INDEX.xml Browser testing 2025-11-07 Official
📦 shiny/ 📄 shiny/INDEX.xml Python web apps 2025-11-02 Official
📦 tailwind/ 📄 tailwind/INDEX.xml CSS framework 2025-10-15 Official
📦 uv/ 📄 uv/INDEX.xml Python projects 2025-10-15 Official
📦 vercel/ 📄 vercel/INDEX.xml Deployment platform 2025-10-20 Official
📦 vitest/ 📄 vitest/INDEX.xml Testing framework 2025-11-05 Official

Curate your own collections. For most Anthropic docs use this tool.


🚀 Setup

# 1. Install UV
# 👉 https://docs.astral.sh/uv/getting-started/installation/

# 2. Clone repository
git clone https://github.com/michellepace/docs-for-ai.git
cd docs-for-ai

# 3. Get free FireCrawl API key
# Visit: https://www.firecrawl.dev/app/api-keys

# 4. Add to shell profile (.zshrc, .bashrc, .profile)
echo 'export API_KEY_MCP_FIRECRAWL=your-api-key-here' >> ~/.zshrc
source ~/.zshrc

📖 Curate With Slash Commands

Command Purpose .md Files INDEX <source>
/curate-doc <directory> <url> Add / re-scrape doc ✅ Write ✅ Add/replace
/rescrape-docs <directory> Re-scrape all docs ✅ Write all ✅ Selective update
Terminal showing three-step workflow: (1) Running /curate-doc biome command, (2) Curation success output showing scraped documentation and generated INDEX.xml entry, (3) Reference command @biome/INDEX.xml to ask questions. Handwritten annotations highlight each step.

Complete workflow: curate → scrape → reference

💡 Usage Examples

To curate and keep docs fresh in this repo:

# Curate a new doc from a URL
/curate-doc tailwind https://tailwindcss.com/docs/customizing-colors
# → Scrapes page, writes .md file, adds source to INDEX.xml

# Re-scrape existing doc (refresh content from same URL)
/curate-doc tailwind https://tailwindcss.com/docs/customizing-colors
# → Re-scrapes, writes .md file, replaces source in INDEX.xml

# Start a new collection
/curate-doc reflex https://reflex.dev/docs/getting-started/installation
# → Creates reflex/ directory, README.md, INDEX.xml, and first curated doc

# Re-scrape all docs in collection (monthly maintenance)
/rescrape-docs tailwind
# → Re-scrapes all URLs in INDEX.xml, writes all .md files, replaces all sources

To use the docs (from other projects):

# From a different project

# 1. Give Claude Code access to the repo
/add-dir /home/mp/projects/docs-for-ai

# 2. Then reference as normal to ask your question
@/home/mp/projects/docs-for-ai/tailwind/INDEX.xml what's a utility?

🏗️ How This Repo Works

Workflow: Python script scrapes URL → writes .md file → creates INDEX.xml entry with PLACEHOLDER description → Claude Code generates semantic description.

  • /curate-doc - Always regenerates description for the doc
  • /rescrape-docs - Only regenerates descriptions for files with content changes (ignores whitespace)

Usage: Reference @INDEX.xml [question] to let Claude Code use descriptions to find relevant docs.

Directory Structure:

uv/
├── INDEX.xml               # Index of all docs
├── README.md
├── api-reference.md        # Scraped doc
├── getting-started.md      # Scraped doc
└── ...

INDEX.xml Schema:

<docs_index>
  <source>
    <title>Hello Document Title</title>
    <description>20-30 word dense summary optimised for semantic search...</description>
    <source_url>https://docs.example.com/hello</source_url>
    <local_file>hello-document-title.md</local_file>
    <scraped_at>2025-10-15</scraped_at>
  </source>
  <!-- Multiple <source> entries, one per .md file -->
</docs_index>

Scripts use FireCrawl Python SDK for scraping. MCP server configured (.mcp.json, .claude/settings.json) for Claude Code self-healing beyond scripts.

About

Curate and index clean docs for AI context and keep it fresh (with Claude Code)

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages