The high-density arXiv research pipe for AI Agents and Humans.
arxiv-mcp turns the world's primary research source into a clean, actionable data stream. It search papers, extracts clean Markdown from experimental HTML, maps citation lineages, and stashes everything in a searchable local depot.
- Clean Text Extraction: Stop fighting multi-column PDFs. We prefer arXiv's experimental HTML to give you (and your agents) clean, structured Markdown.
- Local Depot (RAG-Ready): Any paper you ingest is indexed in a local SQLite FTS5 database. Search thousands of papers by keyword in milliseconds.
- Citation Graphs: Follow the intellectual lineage of any paper using Semantic Scholar integration.
- DOI Resolution: Resolve any DOI to metadata and OA PDF via Unpaywall + Crossref. Fetches open-access full text from 50,000+ publishers — no API keys required.
- AI Lab Blog Support: Beyond arXiv, we fetch from Anthropic, DeepMind, and Google Research blogs.
- Agent Native: Built on FastMCP 3.2.0, supporting sophisticated features like sampling (
ctx.sample) and bundled skills.
| Guide | Content |
|---|---|
| 🚀 Installation | Getting up and running step-by-step. |
| 🏗️ Architecture | How the backend, frontend, and storage layers work. |
| 🔭 arXiv Context | Philosophy on recency and why HTML > PDF. |
| 🛠️ MCP Server | Complete manifest of tools, prompts, and skills. |
| 📊 Web Dashboard | Features and usage patterns for the UI. |
| 🔗 DOI Resolution | How Unpaywall + Crossref work, OA statuses explained, publishing ecosystem. |
| ⚡ FastMCP 3+ Features | How we use dual transport, skills, prefab, prompts, sampling, safety wrapping, and more. |
git clone https://github.com/sandraschi/arxiv-mcp.git
cd arxiv-mcp
uv syncThat's it. Now configure your MCP client (see below).
Add to your claude_desktop_config.json:
{
"mcpServers": {
"arxiv-mcp": {
"command": "uv",
"args": ["run", "--directory", "C:\\path\\to\\arxiv-mcp", "python", "-m", "arxiv_mcp", "--stdio"]
}
}
}Replace C:\\path\\to\\arxiv-mcp with the actual path to your clone.
In Cursor settings → Features → MCP Servers → Add new MCP server:
Name: arxiv-mcp
Type: command
Command: uv run --directory C:\path\to\arxiv-mcp python -m arxiv_mcp --stdio
If you have the MCPB CLI installed:
just mcpb-packThis creates dist/arxiv-mcp.mcpb — drag this file into Claude Desktop to install.
Alternatively, download the pre-built .mcpb from the Releases page.
Requires Node.js in addition to Python/uv.
cd arxiv-mcp\web_sota
.\start.batThis starts both the backend and Vite dashboard, then opens http://127.0.0.1:10771 in your browser.
After setup, just is available for common tasks:
just lint # Ruff lint Python
just lint-web # Biome lint frontend
just fix # Ruff auto-fix Python
just test # Run Python tests
just serve # Start backend only (HTTP)
just stdio # Start backend only (stdio)
just dev # Full stack (backend + Vite)
just sync # uv sync with dev extrasRun just --list to see all recipes.
- Discovery: "What are the most cited papers in cs.RO from the last week?"
- Deep Read: "Pull the full text of 2401.00001 and audit its methods for reproducibility."
- Synthesis: "Compare the abstracts of these 5 papers for contradictions in their consciousness claims."
- Expansion: "Save this whole thread of citations into my local corpus."
See CHANGELOG.md for release notes.
This project adheres to SOTA 14.1 industrial standards for high-fidelity agentic orchestration:
- Python (Core): Ruff for linting and formatting. Zero-tolerance for
printstatements in core handlers (T201). - Webapp (UI): Biome for sub-millisecond linting. Strict
noConsoleLogenforcement. - Protocol Compliance: Hardened
stdout/stderrisolation to ensure crash-resistant JSON-RPC communication. - Automation: Justfile recipes for all fleet operations (
just lint,just fix,just dev). - Security: Automated audits via
banditandsafety.
MIT — see LICENSE.