PaperPilot

PaperPilot is a CLI research agent for scholarly literature review across AI, biomedicine, and AI for Science.
It turns one user request into a traceable, evidence-based research workflow and generates bilingual reports (zh/en) in Markdown, HTML, and PDF.

The Cloudflare Workers online demo provides a lightweight browser experience: it uses an OpenAI-compatible LLM to generate search plans, queries public paper metadata sources, and lets users download a lightweight Markdown or HTML report. The full PaperPilot CLI remains the complete workflow for screened corpora, PDF/full-text handling, evidence ledgers, bilingual PDF output, and Obsidian Wiki export.

✨ What PaperPilot does

PaperPilot is not a chatbot. It is an interactive scientific workflow:

Parse natural-language research requests
Build an explicit search protocol with inclusion/exclusion rules
Query multi-source literature APIs
Normalize, deduplicate, and screen papers
Verify URLs/PDF/code availability
Synthesize evidence and generate review reports
Output structured artifacts for reproducibility

Each run creates a dedicated folder under runs/ with full state, logs, and intermediate files.

🚀 Highlights

Core experience

Natural-language intake with LLM-assisted interpretation
Cloudflare Workers online demo for lightweight search plans, public-source candidates, and downloadable Markdown/HTML reports
Interactive shell with:
- /model to manage LLM profiles
- /sources to inspect search source/API status
- /doctor for quick self-checks
Multi-source retrieval with source registry and diagnostics
Resume/inspect modes for reproducible research sessions

Retrieval and screening

Protocol-aware search using plan + diversified keywords
Canonicalized Paper schema and robust deduplication
Core/adjacent/excluded paper classification
PDF + code-link verification (no paywall bypass)
Optional full-text extraction from downloadable PDFs

Reporting

Canonical bilingual report model
Consistent [1][2][3] citation mapping
Method taxonomy and evidence matrix
Markdown + HTML + PDF outputs with aligned content
Browser demo can download a lightweight Markdown/HTML briefing based on public metadata and abstracts
Final report view keeps up to 100 papers by default, without a hard minimum
Obsidian Wiki export with paper, method, topic, and claim notes

Quality controls

Quality gates and reflection workflow
Evidence ledger linking claims to corpus evidence
Review checks for citation compliance and source reliability
Event stream logs for auditability

🗂 Source stack

Default free sources:

arXiv
Semantic Scholar
OpenAlex
Crossref
OpenReview
PubMed / NCBI E-utilities
Europe PMC
bioRxiv / medRxiv
DBLP
ACL Anthology
Papers.cool

Optional API-key sources:

DeepXiv / Agentic Data
CORE
Lens.org Scholarly API
IEEE Xplore
Springer Nature
Elsevier / Scopus
Dimensions

🛠 Installation

python -m pip install paperpilot -i https://pypi.org/simple

Local development:

git clone https://github.com/CHB-learner/PaperPilot.git
cd PaperPilot
python -m pip install -e .

⚙️ LLM + Source Configuration

PaperPilot requires OpenAI-compatible LLM settings for query understanding, planning, synthesis, and report generation.

On first run, it creates an editable configuration template at:

~/.paperpilot/config.json

Minimal default template:

{
  "active": "default",
  "profiles": {
    "default": {
      "api_key": "",
      "base_url": "",
      "model": "gpt-5.2"
    }
  },
  "sources": {
    "core": {"enabled": null, "api_key": "", "base_url": ""},
    "lens": {"enabled": null, "api_key": "", "base_url": ""},
    "ieee": {"enabled": null, "api_key": "", "base_url": ""},
    "springer": {"enabled": null, "api_key": "", "base_url": ""},
    "elsevier": {"enabled": null, "api_key": "", "base_url": ""},
    "dimensions": {"enabled": null, "api_key": "", "base_url": ""},
    "deepxiv": {"enabled": null, "api_key": "", "base_url": ""}
  }
}

Notes:

Leave optional source API keys empty if unavailable.
enabled: null means auto-enable once a valid key is provided.
~/.paperpilot/config.json is not committed; edit it directly or use CLI commands.

CLI config commands

PaperPilot config set --base-url https://api.deepseek.com --model deepseek-chat
PaperPilot config import ./api.json
PaperPilot config list
PaperPilot config use deepseek
PaperPilot config show
PaperPilot --doctor

PaperPilot sources list
PaperPilot sources config core
PaperPilot sources config deepxiv
PaperPilot sources enable core
PaperPilot sources test core

Inside interactive mode, use /sources and /doctor.

Cloudflare Workers online demo configuration

The hosted demo runs on Cloudflare Workers at https://paperpilot.aleck-757.workers.dev/ and serves /api/literature-search from the Worker. wrangler.jsonc includes safe defaults for the online experience:

LLM_BASE_URL=https://api.deepseek.com
LLM_MODEL=deepseek-v4-flash
LLM_API_KEY=123456

Replace the placeholder LLM_API_KEY in Cloudflare Variables and Secrets with a real server-side key. The frontend calls the Worker API and never embeds the key in browser code. The online demo uses OpenAlex and Crossref as public metadata sources; Semantic Scholar is skipped unless SEMANTIC_SCHOLAR_API_KEY is configured to avoid public API rate limits.

🔑 API source keys references

Source	Access page
CORE	https://core.ac.uk/services/api
Lens.org	https://docs.api.lens.org/
IEEE Xplore	https://developer.ieee.org/getting_started
Springer Nature	https://dev.springernature.com/
Elsevier / Scopus	https://dev.elsevier.com/
Dimensions	https://docs.dimensions.ai/dsl/api.html
DeepXiv / Agentic Data	https://data.rag.ac.cn/api/docs
Papers.cool	https://papers.cool

🧪 Quick Start

Interactive usage:

PaperPilot

Command mode example:

PaperPilot "RNA inverse folding sequence design" \
  --auto-confirm \
  --max-papers 50 \
  --since-year 2021 \
  --github-filter required \
  --sources auto \
  --mode apa \
  --quality balanced

Import local corpus and skip download:

PaperPilot "RNA inverse folding sequence design" \
  --auto-confirm \
  --user-corpus ./papers \
  --user-corpus references.bib \
  --no-download

Inspect/resume workflow:

PaperPilot inspect runs/<task-id>
PaperPilot resume runs/<task-id>

🧭 Workflow

PaperPilot follows this state-machine pipeline:

Intake -> Protocol -> Search -> Corpus -> Screening -> Verification -> Synthesis -> Review -> Report

flowchart LR
  U["User request"] --> C["Run context"]
  C --> QA["Query understanding"]
  QA --> PL["Planning + Protocol"]
  PL --> ST["Source Registry search"]
  ST --> NB["Corpus normalization"]
  NB --> SC["Core / adjacent screening"]
  SC --> VF["Verification + PDF + code checks"]
  VF --> SY["Literature matrix"]
  SY --> QG["Quality gate + reflection"]
  QG --> EL["Evidence ledger"]
  EL --> RP["Report render: ZH / EN"]

📁 Run artifacts

runs/<task-id>/ will contain:

task.json / state.json / events.jsonl / manifest.json
planning/: query understanding, search plan, protocol, prompt and registry manifests
search/: raw normalized metadata and source diagnostics
corpus/: screened corpus, core/adjacent/excluded sets, ranked report papers
verification/: verification records, quality gate, reflection, download log, evidence ledger, review findings
synthesis/: literature matrix and field-level synthesis
reports/: report.canonical.json, bilingual Markdown, HTML, and PDF reports
assets/pdfs/ and assets/fulltext/: downloaded open PDFs and extracted full text
wiki/obsidian/: Obsidian knowledge graph with notes, wikilinks, and lint metadata

🧠 Obsidian Wiki

Each successful run generates runs/<task-id>/wiki/obsidian/ by default. Open that folder as an Obsidian vault to browse:

index.md: research entry point and reported-paper overview
papers/: one note per reported paper with citation label, PDF/code links, method family, and evidence basis
methods/: method-family notes linked to representative papers
topics/: query/subtopic notes
claims/: evidence-map claim notes
_meta/manifest.json and _meta/wiki_lint.json: provenance, hashes, broken-link checks

Use --no-obsidian-wiki to skip Wiki generation.

For a public-safe ScholarFlow-style vault layout and config template, see:

Example summary.md auto-index table:

Date	Paper	Notes	Code	Source	Remarks
2026.05.20	CitationGraph-RAG	To read	GitHub	arXiv	Public demo row
2026.05.18	BenchAgent-Eval	Draft note		OpenReview	Sanitized example

This table is written as normal Markdown, not inside a fenced code block, so GitHub can render it.

🧩 Code filter modes

any: keep all papers and annotate code availability
required: keep only papers with detected code repositories in final view
none: keep only papers without detected public code links

🧪 CLI options (important ones)

--max-papers INT                 maximum papers in final report view; default: 100
--min-report-papers INT          optional minimum report size; default: 0
--since-year INT                 preferred lower year bound
--github-filter any|required|none
--github-search-limit INT
--no-download                    skip PDF downloads
--pdf-limit INT                  maximum PDFs to download
--user-corpus PATH               repeatable local corpus path
--mode quick|apa|systematic
--interaction auto|gated
--quality fast|balanced|strict
--include-adjacent               include adjacent papers in appendices
--sources auto|all|core|biomed|cs|configured
--enable-source SOURCE           enable one source (repeatable)
--disable-source SOURCE          disable one source (repeatable)
--no-obsidian-wiki               skip Obsidian Wiki export

See paperpilot --help for full options and Chinese/English output.

🧱 Development notes

Keep run outputs and generated artifacts out of source control.
Keep API keys out of git history.
Prefer .gitignore over manual cleanup.
Use semantic tags for releases and keep README + docs aligned.
Keep .github/workflows/*, RELEASING.md, CHANGELOG.md in sync when publishing.

🧭 Open source checklist

Ensure ~/.paperpilot/config.json, api.json, and .env with credentials are never committed.
Add/keep LICENSE and .gitignore.
Add source code and tags before publishing release assets.
Publish GitHub Pages from docs/.
Keep versions in pyproject.toml, literature_agent/__init__.py, and generated manifests aligned.

One-command release

# dry-run checks only
./scripts/release_everywhere.sh --dry-run

# normal release (pushed commit + tag + GH release + PyPI)
export PYPI_TOKEN='pypi-...'
./scripts/release_everywhere.sh

# release without publishing to PyPI
./scripts/release_everywhere.sh --no-pypi

Suggested publish flow (full):

python -m unittest discover -s tests
python -m compileall literature_agent
./publish_pypi.sh --dry-run --version <VERSION>
git add -A
git commit -m "chore: release v<VERSION>"
git tag -a v<VERSION> -m "v<VERSION>"
git push origin main --tags
./publish_pypi.sh --version <VERSION>

For GitHub Pages: enable Pages to deploy from main + /docs, or rely on .github/workflows/gh-pages.yml.

🙏 Acknowledgements

PaperPilot is shaped by ideas from open academic-research and agent projects. Thanks to these projects and their authors for making their work public:

LLMForEverybody for Agent design-pattern learning material.
academic-research-skills for research integrity, source verification, and structured synthesis inspiration.
DeepTutor for Tool/Capability-style agent architecture ideas.
obsidian-wiki for the Obsidian Wiki export direction.
Research-Paper-Writing-Skills, research-writing-skill, and SLR-FC for literature review, research writing, and systematic-review workflow references.

📚 Citation note

If you use PaperPilot in your work, include the repository URL and version used so results are reproducible.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
examples		examples
functions		functions
literature_agent		literature_agent
netlify/functions		netlify/functions
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
RELEASING.md		RELEASING.md
netlify.toml		netlify.toml
package-lock.json		package-lock.json
package.json		package.json
paperpilot_agent_flow.html		paperpilot_agent_flow.html
pyproject.toml		pyproject.toml
wrangler.jsonc		wrangler.jsonc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaperPilot

✨ What PaperPilot does

🚀 Highlights

Core experience

Retrieval and screening

Reporting

Quality controls

🗂 Source stack

🛠 Installation

⚙️ LLM + Source Configuration

CLI config commands

Cloudflare Workers online demo configuration

🔑 API source keys references

🧪 Quick Start

🧭 Workflow

📁 Run artifacts

🧠 Obsidian Wiki

🧩 Code filter modes

🧪 CLI options (important ones)

🧱 Development notes

🧭 Open source checklist

One-command release

🙏 Acknowledgements

📚 Citation note

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PaperPilot

✨ What PaperPilot does

🚀 Highlights

Core experience

Retrieval and screening

Reporting

Quality controls

🗂 Source stack

🛠 Installation

⚙️ LLM + Source Configuration

CLI config commands

Cloudflare Workers online demo configuration

🔑 API source keys references

🧪 Quick Start

🧭 Workflow

📁 Run artifacts

🧠 Obsidian Wiki

🧩 Code filter modes

🧪 CLI options (important ones)

🧱 Development notes

🧭 Open source checklist

One-command release

🙏 Acknowledgements

📚 Citation note

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages