Skip to content

Comments

feat: save documentation info to SQLite database#347

Merged
hargoniX merged 112 commits intoleanprover:mainfrom
david-christiansen:db
Feb 23, 2026
Merged

feat: save documentation info to SQLite database#347
hargoniX merged 112 commits intoleanprover:mainfrom
david-christiansen:db

Conversation

@david-christiansen
Copy link
Contributor

@david-christiansen david-christiansen commented Jan 19, 2026

This PR adds a SQLite database between doc-gen4's per-module analysis and HTML generation. Lake
runs one single command per module (and genCore for the core Lean modules Init, Std, Lake, and
Lean), each of which writes to a shared SQLite database. If the database already exists, single
incrementally updates it by deleting the module's old rows and reinserting. Then the fromDb
command reads everything back and generates HTML in parallel. The old pipeline, which generated
HTML directly during module analysis, is removed.

The database makes documentation data available to other tools. Verso can query it for docstrings
instead of consulting the Lean environment, and future tools can use it in ways we haven't
anticipated yet. It also means that HTML generation has access to the full set of declarations
across all modules, which improves the heuristic insertion of links in code.

The Database

Within a module, each item (declaration, module doc, constructor, structure field) is assigned a
sequential position starting from 0. The composite key (module_name, position) is the primary
key for most tables. Constructors and structure fields are interleaved between their parent
declarations, so positions are not contiguous across top-level members. HTML generation reconstructs
module members by querying name_info and module_docs_Markdown, ordered by position.

The database schema is versioned by two hashes: a DDL hash that detects changes to table
definitions, and a type hash that detects changes to Lean types serialized as blobs (like
RenderedCode and RenderedCode.Tag). If either hash doesn't match, the database is rejected with
an error message asking the user to rebuild. The type hash is computed at compile time from a
string representation of the relevant inductive types, so adding a constructor to RenderedCode.Tag
will invalidate old databases automatically.

RenderedCode

Lean's pretty printer produces CodeWithInfos (a TaggedText SubexprInfo), which carries
expression types, universe levels, elaboration state, and other metadata that is too large to
serialize. RenderedCode is a TaggedText RenderedCode.Tag that keeps only what is needed for HTML
rendering: which tokens are declaration references (for linking), which are sorts (for linking to
the foundational types page), and which are keywords or strings (for syntax highlighting). The
conversion from CodeWithInfos to RenderedCode is lossy and not reversible. RenderedCode is
serialized to the database as a binary blob.

Verso Docstrings

Verso docstrings contain a tree of Doc.Block/Doc.Inline nodes with extension points
(ElabInline/ElabBlock) that hold opaque Dynamic values identified by Name. Different Lean
packages can register their own extension types, so there is no way to know all possible types at
compile time. The serialization uses a registry of handlers (DocstringValues) keyed by name. If a
handler exists for a given extension type, the payload is serialized with it. If not, only the name
is stored, and on deserialization the unknown extension is replaced with a sentinel value. In Verso
docstrings, the content underneath one of these extension types represents an alternative, simpler
rendering (e.g. plain text instead of highlighted code). This means the database remains readable
even if extension types are added or removed between versions. builtinDocstringValues includes
handlers for the extension types that ship with Lean. A future PR will add a plugin system for
registering additional handlers so authors of docstring extensions can control their serialization
and their rendering to HTML.

HTML Generation and Link Resolution

HTML is generated in parallel, with around 20 tasks that each have a database connection. This
number was chosen through experimentation.

When converting RenderedCode to HTML, the code needs to turn declaration names into links. For
names that appear directly in the global name index, this is straightforward. For names that don't
appear directly (private names, auto-generated auxiliary names like match discriminants and proof
terms), the code tries several heuristics: resolving private names to user-facing names, stripping
trailing auxiliary components to find a linkable parent, and falling back to a link to the module
page. The details and examples are documented in renderedCodeToHtmlAux in DocGen4.Output.Base.

Future Possibilities

Here are some useful directions that are made possible by this PR, but not implemented in it.

HTML Simplification

Today, things like the instances list are generated at run time via JavaScript because HTML files don't have a global view of the project. With this new model, the HTML can be straightforwardly generated. To ease comparison with the existing code, that feature is not implemented in this PR.

Plugin API

Having documentation in a database solves an immediate need in Verso brought about by the module system: it's no longer possible to extract docstrings from environments now that they're in the server olean, but a SQLite file provides for easy retrieval of them.

Additionally, there are more and more use cases for extending doc-gen4:

  • Verso docstrings are extensible and allow custom elements that should be renderable (e.g. correctly highlighted Lean code with reliable links, diagrams, etc).
  • Custom indices like the tactics overview. Tactics ship with Lean, and make sense as a built-in feature, but it would be nice if Mathlib could have an overview for library notes, or Verso could have an overview for documentation language extensions.
  • Custom attributes could provide a way to show themselves on declaration documentation, rather than just showing a hard-coded list of them.

These plugins need to be used at multiple times in doc-gen4:

  1. During the analysis phase, they need to check the environment for information like new tactics or library notes.
  2. When rendering a particular module's members, they need to check for the presence of attributes.
  3. When creating the navigation bar, they need to provide new pages.

A plugin is a structure that has a field for each of these interpolation points. Instead of having a fixed Main.lean, a custom Lake target discovers the registered plugins in all packages in the workspace and generates a Main that includes calls to them.

The database model makes this much easier to implement. Plugins can add their own tables to the DB and write to them in the analysis phase, and they can be invoked with a DB handle at various points during HTML generation.

Incrementality and Caching

It would be possible to build each package's documentation DB separately. For instance, the core docs DB and Mathlib DBs could be part of the Mathlib cache. Then, at HTML generation time, the HTML generation process could ATTACH each package's documentation database and generate HTML for all of them at once, rather than re-analyzing all of Mathlib.

Validation

To check that the database-generated HTML matches the old pipeline's output, there is a Python
comparison script (scripts/check_diff_soup.py) that was used extensively during development and
will not be present in the squashed commit history, so it is described here in some detail.

A simple bit-for-bit comparison (even after normalizing with tidy) is not sufficient because the
database-generated HTML is intentionally different in some ways. It has more links than the old
output because it can resolve names across all modules rather than just within the transitive
imports of the module being processed. It also adds id attributes to inherited structure fields,
wraps extends clause parents in <span> elements for linking, deduplicates import lists, and drops
empty equations sections. These are all improvements, but they mean that a naïve diff would report
thousands of false positives.

The script parses both HTML trees and walks them in parallel, matching elements by position within
their parent. It can distinguish between a changed attribute and a removed element, which makes it
precise enough to enforce specific rules about which differences are acceptable.

The rules are declarative. Each rule is a function that receives a DiffContext (containing the
old and new elements, their ancestor chains, and pre-collected sets of valid link targets from both
directories) and returns a reason string if the difference is acceptable, or None if it is not.
The rules cover the following cases:

  • Broken links in the old version may be removed, replaced with <span class="fn">, or have
    their href changed.
  • <span class="fn"> may be replaced by <a> if the new link target is valid, since the
    database version can resolve names that the old pipeline could not.
  • New <a> elements may be added if their targets are valid.
  • href attributes pointing to private names (_private. in the URL) may change if the new
    target is valid, since the new version resolves to the public declaration.
  • href attributes may change as long as the anchor fragment is the same and the new target is
    valid, covering cases where a name resolves to a different module.
  • file:/// hrefs to .lean source files may differ in their temp directory prefix.
  • Duplicate <li> elements in import lists may be deduplicated.
  • Empty equations sections (with no equation items) may be removed.
  • Declarations at the same source position may appear in a different order, since the old
    pipeline's ordering was nondeterministic for declarations that share a position. This rule uses
    the SQLite database itself (via the --db flag) to look up source positions.
  • Inherited structure fields may gain id attributes, since the new version adds them as link
    targets.
  • Extends clause parents may be wrapped in <span id="..."> elements to create link targets for
    parent projection names.

For <code> elements (declaration signatures and types), the tool uses a separate comparison
algorithm that walks through children of both elements simultaneously while matching text content,
tracking wrapper elements (<a> and <span>) as it goes. This is more precise than the general
tree-walking comparison because it can verify that the underlying text content is identical even
when the wrapping structure differs.

In addition to the per-file HTML comparison, the script also checks several other things. It
compares static assets (CSS, JavaScript, fonts, images) byte for byte. It compares the search index
(declaration-data.bmp, which is JSON) with domain-specific rules: module entries are compared by
URL and import set, instance lists are compared as sets, and declaration entries are compared field
by field, with docLink differences accepted when both the old and new targets are valid anchors.
Other JSON files are compared by structural equality.

The script also runs a declaration census: it queries the database for every declaration marked as
rendered (render = 1 in name_info) and checks that a corresponding anchor exists in the
generated HTML. This catches cases where a declaration is in the database but missing from the
output, which would not be detected by the per-file comparison since there is no old file to
compare against. Finally, it does a bidirectional target coverage check, comparing the set of
anchored link targets between the old and new HTML to flag any targets that were dropped.

The output is organized per file: for each file with differences, the script prints the rejected
differences (with unified diffs of the parent element) and, in verbose mode, the accepted ones
with their rule names. At the end it prints a summary with counts for HTML files, data files,
static assets, the declaration census, and target coverage, followed by a breakdown of accepted
differences by rule with a sample for each to facilitate inspection.

Results

The results of the script show one accepted difference in data files. declaration-data.bmp swaps the link destinations for two instances: CategoryTheory.Abelian.instIsStableUnderBaseChangeEpimorphisms and CategoryTheory.Abelian.instIsStableUnderCobaseChangeMonomorphisms. This is because the relevant instances exist in both modules in Mathlib (link 1 link 2). The generated destination in the HTML depends just on the order that doc-gen4 happens to visit the modules in. I'm not sure why there isn't some kind of conflict when importing them, but I don't think this differences is a bug.

It also shows many differences in the files. Most of them are source link targets that just point at different temp files. There are examples of each category of allowed difference to make it easier to understand.

Loaded 383233 declaration positions from /Users/davidc/tmp/compare-mathlib-docs/new/api-docs.db
Comparing /Users/davidc/tmp/compare-mathlib-docs/old/doc vs /Users/davidc/tmp/compare-mathlib-docs/new/doc
Scanning directories... (1259.7ms)
  HTML files in both: 10183
  HTML files only in dir1: 0
  HTML files only in dir2: 0
  Data files in both: 1
  Data files only in dir1: 0
  Data files only in dir2: 0
  Static assets in both: 13
  Static assets only in dir1: 0
  Static assets only in dir2: 0
Extracting link targets... (97181.2ms)
  Targets in dir1: 541566
  Targets in dir2: 543401

Comparing 1 data files... (1189.2ms)
  Identical: 0
  Different: 0

  declarations/declaration-data.bmp: declarations: 2 multi-module (both valid)
          CategoryTheory.Abelian.instIsStableUnderBaseChangeEpimorphisms: multi-module (both valid)
            old: ./Mathlib/CategoryTheory/Abelian/Monomorphisms.html#CategoryTheory.Abelian.instIsStableUnderBaseChangeEpimorphisms
            new: ./Mathlib/CategoryTheory/Abelian/CommSq.html#CategoryTheory.Abelian.instIsStableUnderBaseChangeEpimorphisms
          CategoryTheory.Abelian.instIsStableUnderCobaseChangeMonomorphisms: multi-module (both valid)
            old: ./Mathlib/CategoryTheory/Abelian/Monomorphisms.html#CategoryTheory.Abelian.instIsStableUnderCobaseChangeMonomorphisms
            new: ./Mathlib/CategoryTheory/Abelian/CommSq.html#CategoryTheory.Abelian.instIsStableUnderCobaseChangeMonomorphisms

Declaration census... (393.6ms)
  Declarations checked: 362592
  Missing from HTML: 0

Comparing 13 static assets... (3.8ms)
  Identical: 13
  Different: 0

Target coverage... (183.7ms)
  Anchored targets in dir1: 531383
  Anchored targets in dir2: 533218
  Dropped (in old, not new): 0
  Added (in new, not old): 1835

Comparing 10183 HTML files...

============================================================
SUMMARY (1310494.3ms)
============================================================
  HTML files compared: 10183
  Files with differences: 8750
  Total rejected differences: 0
  Total accepted differences: 309699
  Files only in dir1: 0
  Files only in dir2: 0
  Data files compared: 0
  Data files identical: 0
  Data files different: 0
  Data files only in dir1: 0
  Data files only in dir2: 0
  Static assets compared: 13
  Static assets identical: 13
  Static assets different: 0
  Static assets only in dir1: 0
  Static assets only in dir2: 0
  Declaration census: 362592 checked, 0 missing
  Target coverage: 531383 old anchors, 533218 new anchors, 0 dropped, 1835 added

  Accepted differences by rule:
    allow_lean_file_href_change: 291,127
      e.g. Mathlib/Algebra/BigOperators/Expect.html: attribute 'href'
             <div class="gh_link">
           -<a href="file:///tmp/tmp.Y0Ko4VPaev/mathproject/.lake/packages/mathlib/Mathlib/Algebra/BigOperators/Expect.lean">
           +<a href="file:///tmp/tmp.3TLeCnu8q7/mathproject/.lake/packages/mathlib/Mathlib/Algebra/BigOperators/Expect.lean">
             source
             </a>
      e.g. Mathlib/Computability/Ackermann.html: attribute 'href'
             <div class="gh_link">
           -<a href="file:///tmp/tmp.Y0Ko4VPaev/mathproject/.lake/packages/mathlib/Mathlib/Computability/Ackermann.lean">
           +<a href="file:///tmp/tmp.3TLeCnu8q7/mathproject/.lake/packages/mathlib/Mathlib/Computability/Ackermann.lean">
             source
             </a>
      e.g. Mathlib/RingTheory/NonUnitalSubsemiring/Basic.html: attribute 'href'
             <div class="gh_link">
           -<a href="file:///tmp/tmp.Y0Ko4VPaev/mathproject/.lake/packages/mathlib/Mathlib/RingTheory/NonUnitalSubsemiring/Basic.lean">
           +<a href="file:///tmp/tmp.3TLeCnu8q7/mathproject/.lake/packages/mathlib/Mathlib/RingTheory/NonUnitalSubsemiring/Basic.lean">
             source
             </a>
      e.g. Mathlib/NumberTheory/Cyclotomic/Basic.html: attribute 'href'
             <div class="gh_link">
           -<a href="file:///tmp/tmp.Y0Ko4VPaev/mathproject/.lake/packages/mathlib/Mathlib/NumberTheory/Cyclotomic/Basic.lean">
           +<a href="file:///tmp/tmp.3TLeCnu8q7/mathproject/.lake/packages/mathlib/Mathlib/NumberTheory/Cyclotomic/Basic.lean">
             source
             </a>
      e.g. Mathlib/Algebra/Polynomial/EraseLead.html: attribute 'href'
             <div class="gh_link">
           -<a href="file:///tmp/tmp.Y0Ko4VPaev/mathproject/.lake/packages/mathlib/Mathlib/Algebra/Polynomial/EraseLead.lean">
           +<a href="file:///tmp/tmp.3TLeCnu8q7/mathproject/.lake/packages/mathlib/Mathlib/Algebra/Polynomial/EraseLead.lean">
             source
             </a>
    allow_extends_id_wrapper: 7,124
      e.g. Mathlib/Algebra/Quandle.html: element_removed
           +<span id="UnitalShelf.toOne">
             <span class="fn">
             <a href="../.././Init/Prelude.html#One">
           ...
             </span>
             </span>
           +</span>
      e.g. Mathlib/Geometry/Manifold/Algebra/Structures.html: element_removed
           +<span id="ContMDiffRing.toContMDiffAdd">
             <span class="fn">
             <a href="../../../.././Mathlib/Geometry/Manifold/Algebra/Monoid.html#ContMDiffAdd">
           ...
             </span>
             </span>
           +</span>
      e.g. Mathlib/Algebra/Group/Defs.html: element_replaced
           +<span id="CancelCommMonoid.toCommMonoid">
             <span class="fn">
             <a href="../../.././Mathlib/Algebra/Group/Defs.html#CommMonoid">
           ...
             </span>
             </span>
           +</span>
      e.g. Lake/Config/ConfigDecl.html: element_removed
           +<span id="Lake.NConfigDecl.toPConfigDecl">
             <span class="fn">
             <a href="../.././Lake/Config/ConfigDecl.html#Lake.PConfigDecl">
           ...
             </span>
             </span>
           +</span>
      e.g. Mathlib/Topology/Metrizable/CompletelyMetrizable.html: attribute 'id'
             </span>
              
           +<span id="TopologicalSpace.UpgradedIsCompletelyMetrizableSpace.toMetricSpace">
             <span class="fn">
             <a href="../../.././Mathlib/Topology/MetricSpace/Defs.html#MetricSpace">
           ...
             </span>
             </span>
           +</span>
             , 
           +<span id="TopologicalSpace.UpgradedIsCompletelyMetrizableSpace.toCompleteSpace">
             <span class="fn">
             <a href="../../.././Mathlib/Topology/UniformSpace/Cauchy.html#CompleteSpace">
           ...
             <span class="fn">
             X
           +</span>
             </span>
             </span>
    compare_code_elements: 6,550
      e.g. Mathlib/Data/Ordmap/Invariants.html: structural
             The 
             <code>
           -<a href="../../.././Mathlib/Data/Ordmap/Invariants.html#Ordnode.Balanced">
           +<a href="../../.././Mathlib/Analysis/LocallyConvex/Basic.html#Balanced">
             Balanced
             </a>
      e.g. Std/Sync/RecursiveMutex.html: structural
             <span class="fn">
             (
           -<a href="../.././Std/Sync/RecursiveMutex.html#_private.Std.Sync.RecursiveMutex.0.Std.RecursiveMutex.ref">
           +<a href="../.././Std/Sync/RecursiveMutex.html#Std.RecursiveMutex">
             Std.RecursiveMutex.ref✝
             </a>
      e.g. Lake/Config/FacetConfig.html: structural
             </a>
              
           -<a href="../.././Lake/Config/FacetConfig.html#Lake.instTypeNameModuleFacetDecl.unsafe_impl_3">
           +<a href="../.././Lake/Config/FacetConfig.html#Lake.instTypeNameModuleFacetDecl">
             Lake.instTypeNameModuleFacetDecl.unsafe_impl_3
             </a>
      e.g. Mathlib/Analysis/BoxIntegral/Box/SubboxInduction.html: structural
              is true. See also 
             <code>
           +<a href="../../../.././Mathlib/Analysis/BoxIntegral/Partition/SubboxInduction.html#BoxIntegral.Box.subbox_induction_on">
             BoxIntegral.Box.subbox_induction_on
           +</a>
             </code>
              for a version using

             <code>
           +<a href="../../../.././Mathlib/Analysis/BoxIntegral/Partition/SubboxInduction.html#BoxIntegral.Prepartition.splitCenter">
             BoxIntegral.Prepartition.splitCenter
           +</a>
             </code>
              instead of 
      e.g. Mathlib/Computability/Reduce.html: structural
             </span>
              
           -<a href="../.././Mathlib/Computability/Reduce.html#ManyOneDegree.instLE._proof_1">
           +<a href="../.././Mathlib/Computability/Reduce.html#ManyOneDegree.instLE">
             ManyOneDegree.instLE._proof_1
             </a>
    allow_href_change_if_old_broken: 2,140
      e.g. Mathlib/AlgebraicTopology/SimplicialSet/NerveAdjunction.html: attribute 'href'
             </span>
              
           -<a href="../../.././Mathlib/AlgebraicTopology/SimplicialSet/NerveAdjunction.html#_private.Mathlib.AlgebraicTopology.SimplicialSet.NerveAdjunction.0._proof_12">
           +<a href="../../.././Mathlib/AlgebraicTopology/SimplicialSet/NerveAdjunction.html">
             _proof_12✝⁶
             </a>
              
           -<a href="../../.././Mathlib/AlgebraicTopology/SimplicialSet/NerveAdjunction.html#_private.Mathlib.AlgebraicTopology.SimplicialSet.NerveAdjunction.0._proof_15">
           +<a href="../../.././Mathlib/AlgebraicTopology/SimplicialSet/NerveAdjunction.html">
             _proof_15✝²
             </a>
      e.g. Mathlib/CategoryTheory/ComposableArrows/Basic.html: attribute 'href'
             </span>
              
           -<a href="../../.././Mathlib/CategoryTheory/ComposableArrows/Basic.html#CategoryTheory.ComposableArrows.homMk₁._proof_4">
           +<a href="../../.././Mathlib/CategoryTheory/ComposableArrows/Basic.html#CategoryTheory.ComposableArrows.homMk₁">
             homMk₁._proof_4
             </a>
      e.g. Mathlib/CategoryTheory/ComposableArrows/Basic.html: attribute 'href'
             </span>
              
           -<a href="../../.././Mathlib/CategoryTheory/ComposableArrows/Basic.html#_private.Mathlib.CategoryTheory.ComposableArrows.Basic.0._proof_445">
           +<a href="../../.././Mathlib/CategoryTheory/ComposableArrows/Basic.html">
             _proof_445✝¹
             </a>
      e.g. Mathlib/CategoryTheory/ComposableArrows/Basic.html: attribute 'href'
             </span>
              
           -<a href="../../.././Mathlib/CategoryTheory/ComposableArrows/Basic.html#_private.Mathlib.CategoryTheory.ComposableArrows.Basic.0._proof_356">
           +<a href="../../.././Mathlib/CategoryTheory/ComposableArrows/Basic.html">
             _proof_356✝¹
             </a>
              
           -<a href="../../.././Mathlib/CategoryTheory/ComposableArrows/Basic.html#_private.Mathlib.CategoryTheory.ComposableArrows.Basic.0._proof_445">
           +<a href="../../.././Mathlib/CategoryTheory/ComposableArrows/Basic.html">
             _proof_445✝³
             </a>
      e.g. Mathlib/CategoryTheory/ComposableArrows/Basic.html: attribute 'href'
             </span>
              
           -<a href="../../.././Mathlib/CategoryTheory/ComposableArrows/Basic.html#CategoryTheory.ComposableArrows.homMk₁._proof_4">
           +<a href="../../.././Mathlib/CategoryTheory/ComposableArrows/Basic.html#CategoryTheory.ComposableArrows.homMk₁">
             homMk₁._proof_4
             </a>
              
           -<a href="../../.././Mathlib/CategoryTheory/ComposableArrows/Basic.html#_private.Mathlib.CategoryTheory.ComposableArrows.Basic.0._proof_354">
           +<a href="../../.././Mathlib/CategoryTheory/ComposableArrows/Basic.html">
             _proof_354✝³
             </a>
    allow_reorder_same_source_position: 1,110
      e.g. Lean/Meta/Basic.html: Lean.Meta.instInhabitedExprParamInfo ↔ Lean.Meta.instInhabitedExprParamInfo.default (line 2145)
      e.g. Lean/Environment.html: Lean.instInhabitedEnvExtension ↔ Lean.instInhabitedEnvExtension.default (line 1300)
      e.g. Lean/Widget/TaggedText.html: Lean.Widget.instFromJsonTaggedText ↔ Lean.Widget.instFromJsonTaggedText.fromJson (line 29)
      e.g. Lean/Environment.html: Lean.instInhabitedEnvExtension ↔ Lean.instInhabitedEnvExtension.default (line 1300)
      e.g. Lean/Elab/StructInst.html: Lean.Elab.Term.StructInst.instInhabitedFieldLHS.default ↔ Lean.Elab.Term.StructInst.instInhabitedFieldLHS (line 272)
    allow_duplicate_li_removal_in_imports: 1,054
      e.g. Init/GrindInstances/Ring/Fin.html: text
           -<a href="../../.././Init/GrindInstances/ToInt.html">
           +<a href="../../.././Init/Data/Fin/Lemmas.html">
           -Init.GrindInstances.ToInt
           +Init.Data.Fin.Lemmas
             </a>
      e.g. Init/Data/Iterators/Lemmas/Consumers/Collect.html: attribute 'href'
             <li>
           -<a href="../../../../.././Init/Data/Iterators/Consumers/Collect.html">
           +<a href="../../../../.././Init/Data/Iterators/Lemmas/Basic.html">
           -Init.Data.Iterators.Consumers.Collect
           +Init.Data.Iterators.Lemmas.Basic
             </a>
             </li>
      e.g. Init/Data/Iterators/Lemmas/Combinators/Monadic/FlatMap.html: attribute 'href'
             <li>
           -<a href="../../../../../.././Init/Data/Iterators/Consumers/Monadic/Collect.html">
           +<a href="../../../../../.././Init/Data/Iterators/Lemmas/Consumers/Monadic.html">
           -Init.Data.Iterators.Consumers.Monadic.Collect
           +Init.Data.Iterators.Lemmas.Consumers.Monadic
             </a>
             </li>
      e.g. Init/Data/BitVec/Lemmas.html: text
           -<a href="../../.././Init/Data/BitVec/Basic.html">
           +<a href="../../.././Init/Data/BitVec/BasicAux.html">
           -Init.Data.BitVec.Basic
           +Init.Data.BitVec.BasicAux
             </a>
      e.g. Std/Data/DHashMap/Raw.html: element_removed
             </a>
             </li>
           -<li>
           -<a href="../../.././Std/Data/DHashMap/Internal/Defs.html">
           -Std.Data.DHashMap.Internal.Defs
           -</a>
           -</li>
             </ul>
    allow_added_link_with_valid_target: 276
      e.g. Mathlib/Analysis/Normed/Unbundled/RingSeminorm.html: element_added
           +<span id="MulRingSeminorm.toMonoidWithZeroHom">
             <span class="fn">
             R
             </span>
           + 
           +<a href="../../../.././Mathlib/Algebra/GroupWithZero/Hom.html#MonoidWithZeroHom">
           +→*₀
           +</a>
           + 
           +<a href="../../../.././Mathlib/Data/Real/Basic.html#Real">
           +ℝ
           +</a>
           +</span>
      e.g. Mathlib/RingTheory/Polynomial/Eisenstein/Distinguished.html: element_added
           +<span class="fn">
             <span class="fn">
             f
             </span>
           +.
           +<a href="../../../.././Mathlib/RingTheory/Polynomial/Eisenstein/Basic.html#Polynomial.IsWeaklyEisensteinAt">
           +IsWeaklyEisensteinAt
           +</a>
           +</span>
      e.g. Mathlib/Algebra/Lie/Basic.html: element_added
           +<span id="LieModuleHom.toLinearMap">
             <span class="fn">
             M
             </span>
           + 
           +<a href="../../.././Mathlib/Algebra/Module/LinearMap/Defs.html#LinearMap">
           +→ₗ[
           +</a>
           +<span class="fn">
           +R
           +</span>
           +<a href="../../.././Mathlib/Algebra/Module/LinearMap/Defs.html#LinearMap">
           +]
           +</a>
           + 
           +<span class="fn">
           +N
           +</span>
           +</span>
      e.g. Mathlib/Topology/Algebra/Algebra.html: element_added
           +<span id="ContinuousAlgHom.toAlgHom">
             <span class="fn">
             A
             </span>
           + 
           +<a href="../../.././Mathlib/Algebra/Algebra/Hom.html#AlgHom">
           +→ₐ[
           +</a>
           +<span class="fn">
           +R
           +</span>
           +<a href="../../.././Mathlib/Algebra/Algebra/Hom.html#AlgHom">
           +]
           +</a>
           + 
           +<span class="fn">
           +B
           +</span>
           +</span>
      e.g. Mathlib/Algebra/Group/Equiv/Defs.html: element_added
           +<span id="AddEquiv.toAddHom">
           +<span class="fn">
           +A
           +</span>
           + 
           +<a href="../../../.././Mathlib/Algebra/Group/Hom/Defs.html#AddHom">
           +→ₙ+
           +</a>
           + 
             <span class="fn">
             B
             </span>
           +</span>
    allow_inherited_field_id: 267
      e.g. Mathlib/Algebra/Ring/Defs.html: attribute 'id'
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="NonUnitalNonAssocRing.left_distrib">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/Ring/Defs.html#Distrib.left_distrib">
           ...
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="NonUnitalNonAssocRing.right_distrib">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/Ring/Defs.html#Distrib.right_distrib">
           ...
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="NonUnitalNonAssocRing.zero_mul">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/GroupWithZero/Defs.html#MulZeroClass.zero_mul">
           ...
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="NonUnitalNonAssocRing.mul_zero">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/GroupWithZero/Defs.html#MulZeroClass.mul_zero">
      e.g. Mathlib/Algebra/Ring/Defs.html: attribute 'id'
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="NonUnitalCommRing.mul_comm">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/Group/Defs.html#CommMagma.mul_comm">
      e.g. Mathlib/Algebra/Ring/Defs.html: attribute 'id'
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="NonUnitalCommSemiring.mul_comm">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/Group/Defs.html#CommMagma.mul_comm">
      e.g. Mathlib/Algebra/Field/Defs.html: attribute 'id'
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="DivisionSemiring.div_eq_mul_inv">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/Group/Defs.html#DivInvMonoid.div_eq_mul_inv">
           ...
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="DivisionSemiring.zpow">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/Group/Defs.html#DivInvMonoid.zpow">
           ...
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="DivisionSemiring.zpow_zero'">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/Group/Defs.html#DivInvMonoid.zpow_zero'">
           ...
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="DivisionSemiring.zpow_succ'">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/Group/Defs.html#DivInvMonoid.zpow_succ'">
           ...
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="DivisionSemiring.zpow_neg'">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/Group/Defs.html#DivInvMonoid.zpow_neg'">
           ...
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="DivisionSemiring.inv_zero">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/GroupWithZero/Defs.html#GroupWithZero.inv_zero">
           ...
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="DivisionSemiring.mul_inv_cancel">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/GroupWithZero/Defs.html#GroupWithZero.mul_inv_cancel">
           ...
             Unless there is a risk of a 
             <code>
           -Module ℚ≥0 _
           +<a href="../../.././Mathlib/Algebra/Module/Defs.html#Module">
           +Module
           +</a>
           + ℚ≥0 _
             </code>
              instance diamond, write 
      e.g. Mathlib/Algebra/GroupWithZero/Defs.html: attribute 'id'
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="CommGroupWithZero.div_eq_mul_inv">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/Group/Defs.html#DivInvMonoid.div_eq_mul_inv">
           ...
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="CommGroupWithZero.zpow">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/Group/Defs.html#DivInvMonoid.zpow">
           ...
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="CommGroupWithZero.zpow_zero'">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/Group/Defs.html#DivInvMonoid.zpow_zero'">
           ...
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="CommGroupWithZero.zpow_succ'">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/Group/Defs.html#DivInvMonoid.zpow_succ'">
           ...
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="CommGroupWithZero.zpow_neg'">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/Group/Defs.html#DivInvMonoid.zpow_neg'">
           ...
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="CommGroupWithZero.inv_zero">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/GroupWithZero/Defs.html#GroupWithZero.inv_zero">
           ...
             </div>
             </li>
           -<li class="structure_field inherited_field">
           +<li class="structure_field inherited_field" id="CommGroupWithZero.mul_inv_cancel">
             <div class="structure_field_info">
             <a href="../../.././Mathlib/Algebra/GroupWithZero/Defs.html#GroupWithZero.mul_inv_cancel">
    allow_empty_equations_removal: 40
      e.g. Mathlib/Data/List/Pi.html: attribute 'class'
             <div class="def">
             <div class="gh_link">
           -<a href="file:///tmp/tmp.Y0Ko4VPaev/mathproject/.lake/packages/mathlib/Mathlib/Data/List/Pi.lean">
           +<a href="file:///tmp/tmp.3TLeCnu8q7/mathproject/.lake/packages/mathlib/Mathlib/Data/List/Pi.lean">
             source
             </a>
           ...
              is the trivial dependent function out of the empty list.
             </p>
           -<details>
           -<summary>
           -Equations
           -</summary>
           -<ul class="equations">
           -</ul>
           -</details>
             <details class="instances-for-list" id="instances-for-list-List.Pi.nil">
             <summary>
      e.g. Mathlib/Data/Vector3.html: text
             <summary>
           -Equations
           +Instances For
             </summary>
      e.g. Mathlib/Data/Vector3.html: attribute 'id'
             <div class="def">
             <div class="gh_link">
           -<a href="file:///tmp/tmp.Y0Ko4VPaev/mathproject/.lake/packages/mathlib/Mathlib/Data/Vector3.lean">
           +<a href="file:///tmp/tmp.3TLeCnu8q7/mathproject/.lake/packages/mathlib/Mathlib/Data/Vector3.lean">
             source
             </a>
           ...
             The empty vector
             </p>
           -<details>
           -<summary>
           -Equations
           -</summary>
           -<ul class="equations">
           -</ul>
           -</details>
             <details class="instances-for-list" id="instances-for-list-Vector3.nil">
             <summary>
      e.g. Mathlib/Data/Multiset/Pi.html: element_removed
             <div class="def">
             <div class="gh_link">
           -<a href="file:///tmp/tmp.Y0Ko4VPaev/mathproject/.lake/packages/mathlib/Mathlib/Data/Multiset/Pi.lean">
           +<a href="file:///tmp/tmp.3TLeCnu8q7/mathproject/.lake/packages/mathlib/Mathlib/Data/Multiset/Pi.lean">
             source
             </a>
           ...
              is the trivial dependent function out of the empty
multiset.
             </p>
           -<details>
           -<summary>
           -Equations
           -</summary>
           -<ul class="equations">
           -</ul>
           -</details>
             <details class="instances-for-list" id="instances-for-list-Multiset.Pi.empty">
             <summary>
      e.g. Mathlib/Data/List/Pi.html: element_removed
             <div class="def">
             <div class="gh_link">
           -<a href="file:///tmp/tmp.Y0Ko4VPaev/mathproject/.lake/packages/mathlib/Mathlib/Data/List/Pi.lean">
           +<a href="file:///tmp/tmp.3TLeCnu8q7/mathproject/.lake/packages/mathlib/Mathlib/Data/List/Pi.lean">
             source
             </a>
           ...
              is the trivial dependent function out of the empty list.
             </p>
           -<details>
           -<summary>
           -Equations
           -</summary>
           -<ul class="equations">
           -</ul>
           -</details>
             <details class="instances-for-list" id="instances-for-list-List.Pi.nil">
             <summary>
    allow_href_change_from_private: 4
      e.g. Mathlib/Algebra/Category/ModuleCat/Basic.html: attribute 'href'
             </a>
              
           -<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#_private.Mathlib.Algebra.Category.ModuleCat.Basic.0.ModuleCat.Hom.mk">
           +<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#ModuleCat.Hom">
             {
             </a>
           ...
             </span>
              
           -<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#_private.Mathlib.Algebra.Category.ModuleCat.Basic.0.ModuleCat.Hom.mk">
           +<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#ModuleCat.Hom">
             }
             </a>
      e.g. Mathlib/Algebra/Category/ModuleCat/Basic.html: attribute 'href'
             </a>
              
           -<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#_private.Mathlib.Algebra.Category.ModuleCat.Basic.0.ModuleCat.Hom.mk">
           +<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#ModuleCat.Hom">
             {
             </a>
           ...
             </span>
              
           -<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#_private.Mathlib.Algebra.Category.ModuleCat.Basic.0.ModuleCat.Hom.mk">
           +<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#ModuleCat.Hom">
             }
             </a>
      e.g. Mathlib/Algebra/Category/ModuleCat/Basic.html: attribute 'href'
             </a>
              
           -<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#_private.Mathlib.Algebra.Category.ModuleCat.Basic.0.ModuleCat.Hom.mk">
           +<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#ModuleCat.Hom">
             {
             </a>
           ...
             </span>
              
           -<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#_private.Mathlib.Algebra.Category.ModuleCat.Basic.0.ModuleCat.Hom.mk">
           +<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#ModuleCat.Hom">
             }
             </a>
      e.g. Mathlib/Algebra/Category/ModuleCat/Basic.html: attribute 'href'
             </a>
              
           -<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#_private.Mathlib.Algebra.Category.ModuleCat.Basic.0.ModuleCat.Hom.mk">
           +<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#ModuleCat.Hom">
             {
             </a>
           ...
             </span>
              
           -<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#_private.Mathlib.Algebra.Category.ModuleCat.Basic.0.ModuleCat.Hom.mk">
           +<a href="../../../.././Mathlib/Algebra/Category/ModuleCat/Basic.html#ModuleCat.Hom">
             }
             </a>
    allow_span_fn_to_link: 4
      e.g. Mathlib/AlgebraicTopology/ModelCategory/BrownLemma.html: element_replaced
           +<span class="fn">
             <span class="fn">
             (
           ...
             )
             </span>
           +.
           +<a href="../../.././Mathlib/CategoryTheory/MorphismProperty/Factorization.html#CategoryTheory.MorphismProperty.MapFactorizationData">
           +MapFactorizationData
           +</a>
           +</span>
      e.g. Mathlib/AlgebraicTopology/ModelCategory/BrownLemma.html: element_replaced
           +<span class="fn">
             <span class="fn">
             (
           ...
             )
             </span>
           +.
           +<a href="../../.././Mathlib/CategoryTheory/MorphismProperty/Factorization.html#CategoryTheory.MorphismProperty.MapFactorizationData">
           +MapFactorizationData
           +</a>
           +</span>
      e.g. Mathlib/CategoryTheory/Limits/Sifted.html: element_replaced
           +<span class="fn">
             <span class="fn">
             (
           ...
             )
             </span>
           +.
           +<a href="../../.././Mathlib/CategoryTheory/Limits/Final.html#CategoryTheory.Functor.Final">
           +Final
           +</a>
           +</span>
      e.g. Mathlib/GroupTheory/GroupAction/Hom.html: element_replaced
           +<span id="DistribMulActionHom.toAddMonoidHom">
             <span class="fn">
           -⇑
           +A
           +</span>
           + 
           +<a href="../../.././Mathlib/Algebra/Group/Hom/Defs.html#AddMonoidHom">
           +→+
           +</a>
           + 
             <span class="fn">
           -φ
           +B
             </span>
             </span>
    allow_href_change_same_anchor_valid_target: 3
      e.g. Mathlib/CategoryTheory/Abelian/Monomorphisms.html: attribute 'href'
             <span class="decl_name">
           -<a class="break_within" href="../../.././Mathlib/CategoryTheory/Abelian/Monomorphisms.html#CategoryTheory.Abelian.instIsStableUnderCobaseChangeMonomorphisms">
           +<a class="break_within" href="../../.././Mathlib/CategoryTheory/Abelian/CommSq.html#CategoryTheory.Abelian.instIsStableUnderCobaseChangeMonomorphisms">
             <span class="name">
             CategoryTheory
      e.g. Mathlib/CategoryTheory/Abelian/Monomorphisms.html: attribute 'href'
             <span class="decl_name">
           -<a class="break_within" href="../../.././Mathlib/CategoryTheory/Abelian/Monomorphisms.html#CategoryTheory.Abelian.instIsStableUnderBaseChangeEpimorphisms">
           +<a class="break_within" href="../../.././Mathlib/CategoryTheory/Abelian/CommSq.html#CategoryTheory.Abelian.instIsStableUnderBaseChangeEpimorphisms">
             <span class="name">
             CategoryTheory
      e.g. Mathlib/Data/List/Basic.html: attribute 'href'
             <span class="decl_name">
           -<a class="break_within" href="../../.././Mathlib/Data/List/Basic.html#List.take_one_drop_eq_of_lt_length">
           +<a class="break_within" href="../../.././Mathlib/Data/List/TakeDrop.html#List.take_one_drop_eq_of_lt_length">
             <span class="name">
             List

Review Process

Once this PR is acceptable, we should archive the comparison script somewhere and delete it prior to the squash merge. It won't be useful to compare two docsets from the database version, but it could be adapted to be useful for that purpose at some point.

This PR adds a SQLite database that contains all of the documentation
info.
@david-christiansen
Copy link
Contributor Author

!bench

@leanprover-radar
Copy link

leanprover-radar commented Jan 19, 2026

Benchmark results for 7f88ca7 against 837f89a are in! @david-christiansen

No significant changes detected.

@david-christiansen
Copy link
Contributor Author

david-christiansen commented Jan 19, 2026

Significance detection isn't going yet due to there not being enough data. The result for this version isn't great, so more work is needed:

mathlib-docs // instructions 258.8T +175.4T +210.4% runner-mathlib1
mathlib-docs // maxrss 6 GiB -4 MiB -0.1% B runner-mathlib1
mathlib-docs // task-clock 14h 12m 51s +10h 25m 39s +275.4% s runner-mathlib1
mathlib-docs // wall-clock 12m 17s +7m 25s +152.8% s runner-mathlib1
own-docs // instructions 3.7T +50.7G +1.4% runner-mathlib1
own-docs // maxrss 3 GiB -21 MiB -0.6% B runner-mathlib1
own-docs // task-clock 5m 51s +6s +1.9% s runner-mathlib1
own-docs // wall-clock 2m 39s +2s +1.3% s runner-mathlib1
radar/run/main // time 17m 9s +7m 40s +80.9% s runner-mathlib1
radar/run/main/script // time 17m 8s +7m 40s +81.1% s runner-mathlib1

task-clock and wall-clock for Mathlib builds are the most important measurements here.

Experiment to see if slowdown due to database file contention
@david-christiansen
Copy link
Contributor Author

!bench

@leanprover-radar
Copy link

leanprover-radar commented Jan 20, 2026

Benchmark results for 9272ece against 837f89a are in! @david-christiansen

No significant changes detected.

@david-christiansen
Copy link
Contributor Author

!bench

@leanprover-radar
Copy link

leanprover-radar commented Jan 20, 2026

Benchmark results for 418fa51 against 837f89a are in! @david-christiansen

No significant changes detected.

@david-christiansen
Copy link
Contributor Author

!bench

@leanprover-radar
Copy link

leanprover-radar commented Jan 20, 2026

Benchmark results for 76667ea against 837f89a are in! @david-christiansen

No significant changes detected.

@david-christiansen
Copy link
Contributor Author

!bench

@leanprover-radar
Copy link

leanprover-radar commented Jan 20, 2026

Benchmark results for e429f8e against 837f89a are in! @david-christiansen

No significant changes detected.

@david-christiansen
Copy link
Contributor Author

!bench

@leanprover-radar
Copy link

leanprover-radar commented Jan 20, 2026

Benchmark results for e1d2a8b against 837f89a are in! @david-christiansen

No significant changes detected.

@david-christiansen
Copy link
Contributor Author

As of e29f8e, it's:

mathlib-docs // instructions 84.6T +1.2T +1.5%
mathlib-docs // maxrss 6 GiB -4 MiB -0.1% B
mathlib-docs // task-clock 3h 49m 49s +2m 37s +1.2% s
mathlib-docs // wall-clock 4m 58s +6s +2.3% s
own-docs // instructions 3.7T +40.6G +1.1%
own-docs // maxrss 3 GiB +39 MiB +1.2% B
own-docs // task-clock 5m 55s +10s +3.1% s
own-docs // wall-clock 2m 41s +4s +3.1% s
radar/run/main // time 19m 9s +9m 40s +102.0% s
radar/run/main/script // time 19m 8s +9m 40s +102.3% s

Seems that the Mathlib cache was only getting partial values.

@david-christiansen
Copy link
Contributor Author

With the original PR code, it's comparable:

mathlib-docs // instructions 84.6T +1.2T +1.4%
mathlib-docs // maxrss 6 GiB -3 MiB -0.0% B
mathlib-docs // task-clock 3h 50m 4s +2m 52s +1.3% s
mathlib-docs // wall-clock 4m 58s +6s +2.3% s
own-docs // instructions 3.7T +52.2G +1.4%
own-docs // maxrss 3 GiB +75 MiB +2.3% B
own-docs // task-clock 5m 51s +7s +2.1% s
own-docs // wall-clock 2m 40s +3s +1.9% s
radar/run/main // time 19m 10s +9m 41s +102.2% s
radar/run/main/script // time 19m 9s +9m 41s +102.4% s

Extensions are not presently handled, but the fallback data are saved.
This is the first step towards rendering HTML from the DB instead of
directly. The serializable version of CodeWithInfos used here can be
saved in the DB. The generated HTML is the same, modulo commit hashes
and external URLs.
@david-christiansen
Copy link
Contributor Author

!bench

@leanprover-radar
Copy link

leanprover-radar commented Jan 27, 2026

Benchmark results for ec4cf3e against 837f89a are in! @david-christiansen

  • mathlib-docs//instructions: +1.3T (+1.6%)
  • mathlib-docs//maxrss: -2MiB (-0.0%)
  • mathlib-docs//task-clock: +3m 23s (+1.5%)
  • mathlib-docs//wall-clock: +7s (+2.7%)
  • own-docs//instructions: +91.8G (+2.5%)
  • own-docs//maxrss: +80MiB (+2.4%)
  • own-docs//task-clock: +11s (+3.4%)
  • own-docs//wall-clock: +4s (+2.7%)

No significant changes detected.

This is preliminary to generating HTML from the database. The output
is still unchanged, modulo commit hashes and source URLs.
@david-christiansen
Copy link
Contributor Author

!bench

@leanprover-radar
Copy link

leanprover-radar commented Jan 27, 2026

Benchmark results for 9489cfd against 837f89a are in! @david-christiansen

  • mathlib-docs//instructions: +1.2T (+1.4%)
  • mathlib-docs//maxrss: -3MiB (-0.1%)
  • mathlib-docs//task-clock: +2m 37s (+1.2%)
  • mathlib-docs//wall-clock: +9s (+3.4%)
  • own-docs//instructions: +51.9G (+1.4%)
  • own-docs//maxrss: -821MiB (-24.6%)
  • own-docs//task-clock: +706ms (+0.2%)
  • own-docs//wall-clock: -378ms (-0.2%)

No significant changes detected.

The scripts indicate that the output is the same, modulo minor
differences in automatic linking
@david-christiansen
Copy link
Contributor Author

!bench

@leanprover-radar
Copy link

leanprover-radar commented Feb 19, 2026

Benchmark results for c08bf78 against 13ecfbb are in! @david-christiansen

  • mathlib-docs//instructions: -1.4T (-1.78%)
  • mathlib-docs//maxrss: -4MiB (-0.07%)
  • mathlib-docs//task-clock: +17m 42s (+7.88%)
  • mathlib-docs//wall-clock: +3m 41s (+74.43%)
  • own-docs//instructions: -52.9G (-1.38%)
  • own-docs//maxrss: -850MiB (-25.27%)
  • own-docs//task-clock: +50s (+14.12%)
  • own-docs//wall-clock: +15s (+8.62%)

No significant changes detected.

@david-christiansen
Copy link
Contributor Author

!bench

@leanprover-radar
Copy link

leanprover-radar commented Feb 19, 2026

Benchmark results for beb37c1 against 13ecfbb are in! @david-christiansen

  • mathlib-docs//instructions: -1.4T (-1.84%)
  • mathlib-docs//maxrss: -5MiB (-0.08%)
  • mathlib-docs//task-clock: +17m 24s (+7.75%)
  • mathlib-docs//wall-clock: +3m 50s (+77.52%)
  • own-docs//instructions: -54.3G (-1.42%)
  • own-docs//maxrss: -852MiB (-25.33%)
  • own-docs//task-clock: +54s (+14.98%)
  • own-docs//wall-clock: +15s (+8.96%)

No significant changes detected.

@david-christiansen
Copy link
Contributor Author

!bench

@david-christiansen david-christiansen marked this pull request as ready for review February 20, 2026 04:33
@leanprover-radar
Copy link

leanprover-radar commented Feb 20, 2026

Benchmark results for 06dfc9d against 13ecfbb are in! @david-christiansen

  • mathlib-docs//instructions: -1.4T (-1.82%)
  • mathlib-docs//maxrss: -7MiB (-0.12%)
  • mathlib-docs//task-clock: +17m 34s (+7.82%)
  • mathlib-docs//wall-clock: +3m 46s (+76.23%)
  • own-docs//instructions: -53.6G (-1.40%)
  • own-docs//maxrss: -834MiB (-24.77%)
  • own-docs//task-clock: +52s (+14.59%)
  • own-docs//wall-clock: +15s (+8.68%)

No significant changes detected.

Comment on lines +34 to +56

def escape (s : String) : String := Id.run do
let mut out := ""
let mut i := s.startPos
let mut j := s.startPos
while h : j ≠ s.endPos do
let c := j.get h
if let some esc := subst c then
out := out ++ s.extract i j ++ esc
j := j.next h
i := j
else
j := j.next h
if i = s.startPos then s -- no escaping needed, return original
else out ++ s.extract i j
where
subst : Char → Option String
| '&' => some "&amp;"
| '<' => some "&lt;"
| '>' => some "&gt;"
| '"' => some "&quot;"
| _ => none

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old implementation actually showed up in profiles - this isn't just an exercise in code-fancying.

Direct and transitive dependencies.

Loosely inspired by bazel's [depset](https://bazel.build/rules/lib/builtins/depset). -/
abbrev DepSet (α) [Hashable α] [BEq α] := Array α × OrdHashSet α
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check that after deleting this #286 still works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that the build.yml added in that PR tests it effectively, then it still works. Is there another way to double-check? The output when running that command locally is very similar to what was on the PR, but there's not quite enough context there to know what it did before the PR and how to check it more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs facet is still returning the list of generated files (without duplicates), so I believe the relevant behavior is preserved.

@@ -23,8 +23,11 @@ require «UnicodeBasic» from git
require Cli from git
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tydeu I would like to get your opinion on the lakefile. I don't trust myself enough with lake trickery to say it's correct for sure.

TAR_ARGS=(doc)
if [ -f .lake/build/api-docs.db ]; then
# Compact the DB into a single portable file (removes WAL/journal dependency)
sqlite3 .lake/build/api-docs.db "VACUUM"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big is the database for mathlib? Is this a metric we should track if only not to blow up the amount of cache people have to use too much if they decide to cache the DB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's 717MB. With zstd compression, it's 96MB.

I'll rig up the benchmark to track both, just in case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, the latest benchmark run includes the sizes of the DB for both doc-gen itself and Mathlib. It's not posted here by the bot, but clicking through shows it.

Comment on lines +29 to +31
private def isAutoGeneratedSuffix (s : String) : Bool :=
s == "rec" || s == "recOn" || s == "casesOn" || s == "noConfusion" ||
s == "noConfusionType" || s == "below" || s == "brecOn"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if we could consolidate this at some point :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

@david-christiansen
Copy link
Contributor Author

!bench

@leanprover-radar
Copy link

leanprover-radar commented Feb 20, 2026

Benchmark results for 91b4138 against 13ecfbb are in! @david-christiansen

  • mathlib-docs//instructions: -1.4T (-1.83%)
  • mathlib-docs//maxrss: -6MiB (-0.10%)
  • mathlib-docs//task-clock: +16m 44s (+7.45%)
  • mathlib-docs//wall-clock: +3m 46s (+76.31%)
  • own-docs//instructions: -53.8G (-1.40%)
  • own-docs//maxrss: -867MiB (-25.77%)
  • own-docs//task-clock: +50s (+13.88%)
  • own-docs//wall-clock: +15s (+8.84%)

No significant changes detected.

Copy link
Member

@tydeu tydeu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't claim to fully understand the intricacies of what the doc-gen lakefile is doing. However, as far as I can tell from a thorough readthrough, it looks good to me.

david-christiansen and others added 2 commits February 23, 2026 06:38
Co-authored-by: Mac Malone <tydeu@hatpress.net>
@david-christiansen
Copy link
Contributor Author

!bench

@leanprover-radar
Copy link

leanprover-radar commented Feb 23, 2026

Benchmark results for bb0b496 against 13ecfbb are in! @david-christiansen

  • mathlib-docs//instructions: -1.4T (-1.81%)
  • mathlib-docs//maxrss: -4MiB (-0.07%)
  • mathlib-docs//task-clock: +18m 12s (+8.11%)
  • mathlib-docs//wall-clock: +3m 44s (+75.63%)
  • own-docs//instructions: -53.9G (-1.41%)
  • own-docs//maxrss: -873MiB (-25.95%)
  • own-docs//task-clock: +54s (+15.06%)
  • own-docs//wall-clock: +16s (+9.65%)

No significant changes detected.

@hargoniX hargoniX merged commit 4b7b1cc into leanprover:main Feb 23, 2026
1 check passed
david-christiansen added a commit to david-christiansen/doc-gen4 that referenced this pull request Feb 24, 2026
Fixes a bug in leanprover#347 that caused multi-library builds to skip
generating HTML for some libraries.

The issue was that the Lake setup used declaration-data.bmp as its
build target. Multiple-library builds would populate the database, but
then only the one that won the race would generate its
declaration-data.bmp. The others would incorrectly see that it existed
and generate no HTML.

Now, the same "marker file" approach is used as the database content
to indicate that HTML for a given module, library, or package is up to
date. This gives them the right traces, and allows
declaration-data.bmp to be updated as needed.

A regression test is also included.
hargoniX pushed a commit that referenced this pull request Feb 24, 2026
* fix: don't skip docs in multi-library situations

Fixes a bug in #347 that caused multi-library builds to skip
generating HTML for some libraries.

The issue was that the Lake setup used declaration-data.bmp as its
build target. Multiple-library builds would populate the database, but
then only the one that won the race would generate its
declaration-data.bmp. The others would incorrectly see that it existed
and generate no HTML.

Now, the same "marker file" approach is used as the database content
to indicate that HTML for a given module, library, or package is up to
date. This gives them the right traces, and allows
declaration-data.bmp to be updated as needed.

A regression test is also included.

* fix: DB contention without timeout

* chore: bump leansqlite and update calls

Rather than sending pragmas in strings, it's nicer to use a
higher-level API.

* Revert "chore: bump leansqlite and update calls"

This reverts commit b95beb5. It will
be sent in a separate PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants