Skip to content

Add llms-full.txt for AI consumption#798

Merged
RetricSu merged 2 commits into
nervosnetwork:developfrom
humble-little-bear:add-llms-full-txt
May 20, 2026
Merged

Add llms-full.txt for AI consumption#798
RetricSu merged 2 commits into
nervosnetwork:developfrom
humble-little-bear:add-llms-full-txt

Conversation

@humble-little-bear
Copy link
Copy Markdown
Contributor

Summary

This PR adds a llms-full.txt file containing the complete text corpus of the Nervos CKB documentation, following the pattern established by other major blockchain documentation sites (Aptos, BNB Chain, Polkadot, NEAR).

Changes

  • New file: website/static/llms-full.txt — Full documentation corpus (~1.3MB, 173 pages) generated from all .md and .mdx source files under website/docs/. The file starts with the curated llms.txt index, followed by the complete text of every documentation page.
  • New file: scripts/generate-llms-full.js — A Node.js script that regenerates llms-full.txt from source. It:
    • Walks website/docs/ and includes all pages (excluding partials prefixed with _)
    • Strips Docusaurus-specific syntax (frontmatter, import statements, JSX components like <TutorialHeader>, <Tooltip>, <Tabs>, admonition wrappers, etc.)
    • Produces clean Markdown suitable for LLM consumption

Why

llms.txt provides a curated, selective index. llms-full.txt provides the complete text content so that AI assistants and tools can ingest the full documentation without scraping individual pages.

Test plan

  • Verify llms-full.txt is served at https://docs.nervos.org/llms-full.txt after deployment
  • Re-run node scripts/generate-llms-full.js and confirm it produces the same output

🤖 Generated with Claude Code

- Add scripts/generate-llms-full.js to generate llms-full.txt from docs source
- Add website/static/llms-full.txt containing the full documentation corpus
- The generated file includes all docs pages (excluding partials) with MDX
  components stripped for clean LLM consumption

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

@humble-little-bear is attempting to deploy a commit to the CKBA-2026 Team on Vercel.

A member of the Team first needs to authorize it.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
nervos-ckb-docs Ready Ready Preview, Comment May 20, 2026 6:53am

Request Review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a full-documentation text corpus (llms-full.txt) intended for AI/LLM consumption, along with a Node.js generator script to rebuild it from the Docusaurus docs sources.

Changes:

  • Adds website/static/llms-full.txt containing the full concatenated docs corpus (prefixed by the curated llms.txt).
  • Adds scripts/generate-llms-full.js to walk website/docs/, clean MDX/Docusaurus syntax, and produce llms-full.txt.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 4 comments.

File Description
scripts/generate-llms-full.js Generator that walks docs, strips/rewrites MDX constructs, and writes the combined corpus.
website/static/llms-full.txt Generated full corpus output intended to be served directly for AI ingestion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/generate-llms-full.js Outdated
Comment thread scripts/generate-llms-full.js Outdated
Comment thread scripts/generate-llms-full.js Outdated
Comment thread scripts/generate-llms-full.js Outdated
- Only remove import statements outside fenced code blocks, preserving
  code examples like TypeScript imports inside \\`\\`\\` blocks.
- Parse frontmatter id/slug to generate correct doc URLs instead of
  deriving them purely from file paths.
- Unwrap Docusaurus mdx-code-block + Tabs/TabItem into plain Markdown
  with \"**Command:**\" / \"**Response:**\" labels.
- Extract key props from instructional JSX components:
  - CodeTabs → labeled code blocks
  - TutorialHeader → estimated time / tools metadata
  - ImgContainer → Markdown images
  - CopyLink → Markdown links
- Fix CardLayout removal bug where the self-closing regex incorrectly
  consumed nested Card components.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@RetricSu RetricSu merged commit 61ef605 into nervosnetwork:develop May 20, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants