English | 中文 | Website | GitHub | PyPI Online demo: Cloudflare Workers | Online demo: Netlify
PaperPilot is a CLI research agent for scholarly literature review across AI, biomedicine, and AI for Science.
It turns one user request into a traceable, evidence-based research workflow and generates bilingual reports (zh/en) in Markdown, HTML, and PDF.
The Cloudflare Workers online demo provides a lightweight browser experience: it uses an OpenAI-compatible LLM to generate search plans, queries public paper metadata sources, and lets users download a lightweight Markdown or HTML report. The full PaperPilot CLI remains the complete workflow for screened corpora, PDF/full-text handling, evidence ledgers, bilingual PDF output, and Obsidian Wiki export.
PaperPilot is not a chatbot. It is an interactive scientific workflow:
- Parse natural-language research requests
- Build an explicit search protocol with inclusion/exclusion rules
- Query multi-source literature APIs
- Normalize, deduplicate, and screen papers
- Verify URLs/PDF/code availability
- Synthesize evidence and generate review reports
- Output structured artifacts for reproducibility
Each run creates a dedicated folder under runs/ with full state, logs, and intermediate files.
- Natural-language intake with LLM-assisted interpretation
- Cloudflare Workers online demo for lightweight search plans, public-source candidates, and downloadable Markdown/HTML reports
- Interactive shell with:
/modelto manage LLM profiles/sourcesto inspect search source/API status/doctorfor quick self-checks
- Multi-source retrieval with source registry and diagnostics
- Resume/inspect modes for reproducible research sessions
- Protocol-aware search using plan + diversified keywords
- Canonicalized
Paperschema and robust deduplication - Core/adjacent/excluded paper classification
- PDF + code-link verification (no paywall bypass)
- Optional full-text extraction from downloadable PDFs
- Canonical bilingual report model
- Consistent
[1][2][3]citation mapping - Method taxonomy and evidence matrix
- Markdown + HTML + PDF outputs with aligned content
- Browser demo can download a lightweight Markdown/HTML briefing based on public metadata and abstracts
- Final report view keeps up to 100 papers by default, without a hard minimum
- Obsidian Wiki export with paper, method, topic, and claim notes
- Quality gates and reflection workflow
- Evidence ledger linking claims to corpus evidence
- Review checks for citation compliance and source reliability
- Event stream logs for auditability
Default free sources:
- arXiv
- Semantic Scholar
- OpenAlex
- Crossref
- OpenReview
- PubMed / NCBI E-utilities
- Europe PMC
- bioRxiv / medRxiv
- DBLP
- ACL Anthology
- Papers.cool
Optional API-key sources:
- DeepXiv / Agentic Data
- CORE
- Lens.org Scholarly API
- IEEE Xplore
- Springer Nature
- Elsevier / Scopus
- Dimensions
python -m pip install paperpilot -i https://pypi.org/simpleLocal development:
git clone https://github.com/CHB-learner/PaperPilot.git
cd PaperPilot
python -m pip install -e .PaperPilot requires OpenAI-compatible LLM settings for query understanding, planning, synthesis, and report generation.
On first run, it creates an editable configuration template at:
~/.paperpilot/config.json
Minimal default template:
{
"active": "default",
"profiles": {
"default": {
"api_key": "",
"base_url": "",
"model": "gpt-5.2"
}
},
"sources": {
"core": {"enabled": null, "api_key": "", "base_url": ""},
"lens": {"enabled": null, "api_key": "", "base_url": ""},
"ieee": {"enabled": null, "api_key": "", "base_url": ""},
"springer": {"enabled": null, "api_key": "", "base_url": ""},
"elsevier": {"enabled": null, "api_key": "", "base_url": ""},
"dimensions": {"enabled": null, "api_key": "", "base_url": ""},
"deepxiv": {"enabled": null, "api_key": "", "base_url": ""}
}
}Notes:
- Leave optional source API keys empty if unavailable.
enabled: nullmeans auto-enable once a valid key is provided.~/.paperpilot/config.jsonis not committed; edit it directly or use CLI commands.
PaperPilot config set --base-url https://api.deepseek.com --model deepseek-chat
PaperPilot config import ./api.json
PaperPilot config list
PaperPilot config use deepseek
PaperPilot config show
PaperPilot --doctorPaperPilot sources list
PaperPilot sources config core
PaperPilot sources config deepxiv
PaperPilot sources enable core
PaperPilot sources test coreInside interactive mode, use /sources and /doctor.
The hosted demo runs on Cloudflare Workers at https://paperpilot.aleck-757.workers.dev/ and serves /api/literature-search from the Worker. wrangler.jsonc includes safe defaults for the online experience:
LLM_BASE_URL=https://api.deepseek.com
LLM_MODEL=deepseek-v4-flash
LLM_API_KEY=123456
Replace the placeholder LLM_API_KEY in Cloudflare Variables and Secrets with a real server-side key. The frontend calls the Worker API and never embeds the key in browser code. The online demo uses OpenAlex and Crossref as public metadata sources; Semantic Scholar is skipped unless SEMANTIC_SCHOLAR_API_KEY is configured to avoid public API rate limits.
| Source | Access page |
|---|---|
| CORE | https://core.ac.uk/services/api |
| Lens.org | https://docs.api.lens.org/ |
| IEEE Xplore | https://developer.ieee.org/getting_started |
| Springer Nature | https://dev.springernature.com/ |
| Elsevier / Scopus | https://dev.elsevier.com/ |
| Dimensions | https://docs.dimensions.ai/dsl/api.html |
| DeepXiv / Agentic Data | https://data.rag.ac.cn/api/docs |
| Papers.cool | https://papers.cool |
Interactive usage:
PaperPilotCommand mode example:
PaperPilot "RNA inverse folding sequence design" \
--auto-confirm \
--max-papers 50 \
--since-year 2021 \
--github-filter required \
--sources auto \
--mode apa \
--quality balancedImport local corpus and skip download:
PaperPilot "RNA inverse folding sequence design" \
--auto-confirm \
--user-corpus ./papers \
--user-corpus references.bib \
--no-downloadInspect/resume workflow:
PaperPilot inspect runs/<task-id>
PaperPilot resume runs/<task-id>PaperPilot follows this state-machine pipeline:
Intake -> Protocol -> Search -> Corpus -> Screening -> Verification -> Synthesis -> Review -> Report
flowchart LR
U["User request"] --> C["Run context"]
C --> QA["Query understanding"]
QA --> PL["Planning + Protocol"]
PL --> ST["Source Registry search"]
ST --> NB["Corpus normalization"]
NB --> SC["Core / adjacent screening"]
SC --> VF["Verification + PDF + code checks"]
VF --> SY["Literature matrix"]
SY --> QG["Quality gate + reflection"]
QG --> EL["Evidence ledger"]
EL --> RP["Report render: ZH / EN"]
runs/<task-id>/ will contain:
task.json/state.json/events.jsonl/manifest.jsonplanning/: query understanding, search plan, protocol, prompt and registry manifestssearch/: raw normalized metadata and source diagnosticscorpus/: screened corpus, core/adjacent/excluded sets, ranked report papersverification/: verification records, quality gate, reflection, download log, evidence ledger, review findingssynthesis/: literature matrix and field-level synthesisreports/:report.canonical.json, bilingual Markdown, HTML, and PDF reportsassets/pdfs/andassets/fulltext/: downloaded open PDFs and extracted full textwiki/obsidian/: Obsidian knowledge graph with notes, wikilinks, and lint metadata
Each successful run generates runs/<task-id>/wiki/obsidian/ by default. Open that folder as an Obsidian vault to browse:
index.md: research entry point and reported-paper overviewpapers/: one note per reported paper with citation label, PDF/code links, method family, and evidence basismethods/: method-family notes linked to representative paperstopics/: query/subtopic notesclaims/: evidence-map claim notes_meta/manifest.jsonand_meta/wiki_lint.json: provenance, hashes, broken-link checks
Use --no-obsidian-wiki to skip Wiki generation.
For a public-safe ScholarFlow-style vault layout and config template, see:
Example summary.md auto-index table:
| Date | Paper | Notes | Code | Source | Remarks |
|---|---|---|---|---|---|
| 2026.05.20 | CitationGraph-RAG | To read | GitHub | arXiv | Public demo row |
| 2026.05.18 | BenchAgent-Eval | Draft note | OpenReview | Sanitized example |
This table is written as normal Markdown, not inside a fenced code block, so GitHub can render it.
any: keep all papers and annotate code availabilityrequired: keep only papers with detected code repositories in final viewnone: keep only papers without detected public code links
--max-papers INT maximum papers in final report view; default: 100
--min-report-papers INT optional minimum report size; default: 0
--since-year INT preferred lower year bound
--github-filter any|required|none
--github-search-limit INT
--no-download skip PDF downloads
--pdf-limit INT maximum PDFs to download
--user-corpus PATH repeatable local corpus path
--mode quick|apa|systematic
--interaction auto|gated
--quality fast|balanced|strict
--include-adjacent include adjacent papers in appendices
--sources auto|all|core|biomed|cs|configured
--enable-source SOURCE enable one source (repeatable)
--disable-source SOURCE disable one source (repeatable)
--no-obsidian-wiki skip Obsidian Wiki export
See paperpilot --help for full options and Chinese/English output.
- Keep run outputs and generated artifacts out of source control.
- Keep API keys out of git history.
- Prefer
.gitignoreover manual cleanup. - Use semantic tags for releases and keep
README+ docs aligned. - Keep
.github/workflows/*,RELEASING.md,CHANGELOG.mdin sync when publishing.
- Ensure
~/.paperpilot/config.json,api.json, and.envwith credentials are never committed. - Add/keep
LICENSEand.gitignore. - Add source code and tags before publishing release assets.
- Publish GitHub Pages from
docs/. - Keep versions in
pyproject.toml,literature_agent/__init__.py, and generated manifests aligned.
# dry-run checks only
./scripts/release_everywhere.sh --dry-run
# normal release (pushed commit + tag + GH release + PyPI)
export PYPI_TOKEN='pypi-...'
./scripts/release_everywhere.sh
# release without publishing to PyPI
./scripts/release_everywhere.sh --no-pypiSuggested publish flow (full):
python -m unittest discover -s tests
python -m compileall literature_agent
./publish_pypi.sh --dry-run --version <VERSION>
git add -A
git commit -m "chore: release v<VERSION>"
git tag -a v<VERSION> -m "v<VERSION>"
git push origin main --tags
./publish_pypi.sh --version <VERSION>For GitHub Pages: enable Pages to deploy from main + /docs, or rely on .github/workflows/gh-pages.yml.
PaperPilot is shaped by ideas from open academic-research and agent projects. Thanks to these projects and their authors for making their work public:
- LLMForEverybody for Agent design-pattern learning material.
- academic-research-skills for research integrity, source verification, and structured synthesis inspiration.
- DeepTutor for Tool/Capability-style agent architecture ideas.
- obsidian-wiki for the Obsidian Wiki export direction.
- Research-Paper-Writing-Skills, research-writing-skill, and SLR-FC for literature review, research writing, and systematic-review workflow references.
If you use PaperPilot in your work, include the repository URL and version used so results are reproducible.
