Skip to content

sandraschi/arxiv-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

arxiv-mcp

Just Ruff Python FastMCP

The high-density arXiv research pipe for AI Agents and Humans.

arxiv-mcp turns the world's primary research source into a clean, actionable data stream. It search papers, extracts clean Markdown from experimental HTML, maps citation lineages, and stashes everything in a searchable local depot.


Why use arxiv-mcp?

  1. Clean Text Extraction: Stop fighting multi-column PDFs. We prefer arXiv's experimental HTML to give you (and your agents) clean, structured Markdown.
  2. Local Depot (RAG-Ready): Any paper you ingest is indexed in a local SQLite FTS5 database. Search thousands of papers by keyword in milliseconds.
  3. Citation Graphs: Follow the intellectual lineage of any paper using Semantic Scholar integration.
  4. DOI Resolution: Resolve any DOI to metadata and OA PDF via Unpaywall + Crossref. Fetches open-access full text from 50,000+ publishers — no API keys required.
  5. AI Lab Blog Support: Beyond arXiv, we fetch from Anthropic, DeepMind, and Google Research blogs.
  6. Agent Native: Built on FastMCP 3.2.0, supporting sophisticated features like sampling (ctx.sample) and bundled skills.

Documentation Index

Guide Content
🚀 Installation Getting up and running step-by-step.
🏗️ Architecture How the backend, frontend, and storage layers work.
🔭 arXiv Context Philosophy on recency and why HTML > PDF.
🛠️ MCP Server Complete manifest of tools, prompts, and skills.
📊 Web Dashboard Features and usage patterns for the UI.
🔗 DOI Resolution How Unpaywall + Crossref work, OA statuses explained, publishing ecosystem.
FastMCP 3+ Features How we use dual transport, skills, prefab, prompts, sampling, safety wrapping, and more.

Quick Start (30 Seconds)

git clone https://github.com/sandraschi/arxiv-mcp.git
cd arxiv-mcp
uv sync

That's it. Now configure your MCP client (see below).


Configuring MCP Clients

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "arxiv-mcp": {
      "command": "uv",
      "args": ["run", "--directory", "C:\\path\\to\\arxiv-mcp", "python", "-m", "arxiv_mcp", "--stdio"]
    }
  }
}

Replace C:\\path\\to\\arxiv-mcp with the actual path to your clone.

Cursor

In Cursor settings → Features → MCP Servers → Add new MCP server:

Name: arxiv-mcp
Type: command
Command: uv run --directory C:\path\to\arxiv-mcp python -m arxiv_mcp --stdio

MCPB Package (Claude Desktop drag-and-drop)

If you have the MCPB CLI installed:

just mcpb-pack

This creates dist/arxiv-mcp.mcpb — drag this file into Claude Desktop to install.

Alternatively, download the pre-built .mcpb from the Releases page.


Full Stack (Backend + Web Dashboard)

Requires Node.js in addition to Python/uv.

cd arxiv-mcp\web_sota
.\start.bat

This starts both the backend and Vite dashboard, then opens http://127.0.0.1:10771 in your browser.


Using just

After setup, just is available for common tasks:

just lint         # Ruff lint Python
just lint-web     # Biome lint frontend
just fix          # Ruff auto-fix Python
just test         # Run Python tests
just serve        # Start backend only (HTTP)
just stdio        # Start backend only (stdio)
just dev          # Full stack (backend + Vite)
just sync         # uv sync with dev extras

Run just --list to see all recipes.


What can you do?

  • Discovery: "What are the most cited papers in cs.RO from the last week?"
  • Deep Read: "Pull the full text of 2401.00001 and audit its methods for reproducibility."
  • Synthesis: "Compare the abstracts of these 5 papers for contradictions in their consciousness claims."
  • Expansion: "Save this whole thread of citations into my local corpus."

Changelog

See CHANGELOG.md for release notes.

🛡️ Industrial Quality Stack

This project adheres to SOTA 14.1 industrial standards for high-fidelity agentic orchestration:

  • Python (Core): Ruff for linting and formatting. Zero-tolerance for print statements in core handlers (T201).
  • Webapp (UI): Biome for sub-millisecond linting. Strict noConsoleLog enforcement.
  • Protocol Compliance: Hardened stdout/stderr isolation to ensure crash-resistant JSON-RPC communication.
  • Automation: Justfile recipes for all fleet operations (just lint, just fix, just dev).
  • Security: Automated audits via bandit and safety.

License

MIT — see LICENSE.

About

FastMcp 3.2 server plus webapp for human/agentic arxiv and doi paper search, rag and store

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors