diff --git a/docs/gtars/core.md b/docs/gtars/core.md
index 5d2a2612..71aef461 100644
--- a/docs/gtars/core.md
+++ b/docs/gtars/core.md
@@ -1,60 +1,257 @@
 # gtars-core
 
-Core library providing fundamental data structures and utilities for genomic interval operations. This is the foundation that all other gtars modules build upon.
+Core library providing the fundamental data structures and utilities that every other gtars crate builds on. If you're working directly with genomic regions in Rust, everything starts here.
 
-## Features
+!!! info "What's new"
+    Recent additions to `gtars-core` that pre-existing doc snippets don't cover:
 
-- Common genomic data structures (Region, RegionSet)
-- BED file parsing utilities
-- Shared constants and helper functions
-- Foundation for all gtars modules
+    - **`RegionSetList`** — a `GRangesList`-style collection of named `RegionSet`s with `concat`, `iter`, and set-level identifier computation. Consumed by `gtars-genomicdist` and `gtars-lola`.
+    - **`CoordinateMode`** enum — selects BED (0-based half-open) vs. GRanges (1-based closed) conventions for midpoint calculations.
+    - **`Fragment`** — fragment-file record type for single-cell ATAC-seq workflows.
+    - **`Interval<I, T>`** — generic interval with payload, used internally by overlap indexes.
+    - **Typed error enum** `RegionSetError` — replaces the previous panicking parse paths.
+    - **`to_polars`** + `dataframe` feature flag — zero-copy conversion to a Polars DataFrame.
+    - **`bigbed`** and **`http`** feature flags — optional bigBed writing and URL-backed `RegionSet::try_from`.
 
-## Core Data Types
+## Core data types
+
+The `gtars_core::models` module re-exports all six core types at the top level, so you typically import them directly:
+
+```rust
+use gtars_core::models::{
+    Region, RegionSet, RegionSetList, Interval, Fragment, CoordinateMode,
+};
+```
+
+### `Region`
+
+A single genomic interval. `start` and `end` are 0-based half-open by BED convention; `rest` carries any trailing tab-separated fields from the original line verbatim.
 
-### Region
-Represents a genomic interval with chromosome, start, and end coordinates:
 ```rust
 use gtars_core::models::Region;
 
-// Create a region
-let region = Region::new("chr1", 1000, 2000);
+let region = Region {
+    chr: "chr1".to_string(),
+    start: 1000,
+    end: 2000,
+    rest: None,
+};
 
-// Access properties
-println!("Chr: {}", region.chr);
-println!("Start: {}", region.start);
-println!("End: {}", region.end);
+assert_eq!(region.width(), 1000);
+println!("{}", region.as_string()); // chr1\t1000\t2000
 ```
 
-### RegionSet
-Collection of genomic regions:
+Methods:
+
+| method | returns | notes |
+|---|---|---|
+| `width()` | `u32` | `end - start` |
+| `as_string()` | `String` | tab-separated BED line |
+| `digest()` | `String` | MD5 digest of `"chr,start,end"` |
+| `mid_point()` | `u32` | `start + width() / 2` (BED/floor) |
+| `mid_point_with_mode(mode)` | `u32` | BED or GRanges convention — see `CoordinateMode` below |
+
+`Region` implements `Display` (as tab-separated text), `Clone`, `Debug`, `Eq`, `Hash`, and — under the `serde` feature — `Serialize`/`Deserialize`.
+
+### `RegionSet`
+
+An ordered collection of `Region`s. `RegionSet::try_from` accepts `&Path`, `&str`, `String`, `PathBuf`, or `Vec<u8>`, auto-detects gzip by extension, and with the `http` feature will fetch from URLs. Construction always sorts by `(chr, start)`.
+
 ```rust
 use gtars_core::models::RegionSet;
 use std::path::Path;
 
-// Load from BED file
+// From a local BED (or BED.gz) file
 let rs = RegionSet::try_from(Path::new("peaks.bed"))?;
 
-// Access regions
-println!("Number of regions: {}", rs.regions.len());
+assert!(!rs.is_empty());
+println!("{} regions, {} bp total", rs.len(), rs.nucleotides_length());
 
-// Iterate over regions
-for region in &rs.regions {
+for region in &rs {
     println!("{}: {}-{}", region.chr, region.start, region.end);
 }
+# Ok::<(), gtars_core::errors::RegionSetError>(())
+```
+
+Key methods:
+
+**Construction**
+
+- `RegionSet::try_from(path)` — accepts `&Path`, `&str`, `String`, `PathBuf`. Sorts on load.
+- `RegionSet::from(regions: Vec<Region>)` — in-memory construction.
+- `RegionSet::from(bytes: &[u8])` — parse from an in-memory byte slice (no gzip handling).
+
+**Iteration**
+
+- `for region in &rs { ... }` — `IntoIterator` is implemented for `&RegionSet`.
+- `iter_chroms()` → unique chromosomes in insertion order (post-sort).
+- `iter_chr_regions(chr)` → all regions on a specific chromosome.
+
+**Summaries**
+
+- `len()`, `is_empty()`, `nucleotides_length()` — count and total bp.
+- `region_widths()` → `Vec<u32>`.
+- `mean_region_width()` → `f64`, rounded to 2 decimals.
+- `get_max_end_per_chr()` → `HashMap<String, u32>`.
+- `calc_mid_points()` → `HashMap<String, Vec<u32>>` (BED convention).
+- `calc_mid_points_with_mode(CoordinateMode)` → same, with mode control.
+
+**Identifiers**
+
+- `identifier()` — MD5 digest over the first-three-column layout; the canonical BEDbase identifier.
+- `file_digest()` — MD5 digest over the full serialized file content.
+
+**I/O**
+
+- `to_bed(path)` / `to_bed_gz(path)` — write plain or gzipped BED.
+- `to_bigbed(path, chrom_sizes)` — under the `bigbed` feature, write a bigBed file.
+- `to_polars()` — under the `dataframe` feature, return a `PolarsResult<DataFrame>`.
+
+**Mutation**
+
+- `sort()` — in-place sort by `(chr, start)`.
+
+### `RegionSetList`
+
+A collection of `RegionSet`s — the gtars equivalent of Bioconductor's `GRangesList`. This is the type that downstream crates (genomicdist, lola) use to pass multiple region sets across FFI boundaries without paying N × clone costs.
+
+```rust
+use gtars_core::models::{RegionSet, RegionSetList};
+
+let peaks1 = RegionSet::try_from("peaks_rep1.bed")?;
+let peaks2 = RegionSet::try_from("peaks_rep2.bed")?;
+let peaks3 = RegionSet::try_from("peaks_rep3.bed")?;
+
+// Unnamed — names are optional
+let rsl = RegionSetList::new(vec![peaks1, peaks2, peaks3]);
+
+// Or with explicit names
+let rsl = RegionSetList::with_names(
+    rsl.region_sets,
+    vec!["rep1".into(), "rep2".into(), "rep3".into()],
+);
+
+// Iterate
+for rs in &rsl {
+    println!("{} regions", rs.len());
+}
+
+// Flatten all regions into a single RegionSet (no merge/dedup)
+let combined: RegionSet = rsl.concat();
+
+// Stable identifier over the full set (order-independent)
+let id = rsl.identifier();
+# Ok::<(), gtars_core::errors::RegionSetError>(())
+```
+
+`RegionSetList::try_from` also accepts:
+
+- A path to a **bedset manifest file** — a text file listing one BED path per line (`read_bedset_file` under the hood).
+- A `Vec<&Path>`, `Vec<&str>`, `Vec<String>`, or `Vec<PathBuf>` — each is loaded as its own `RegionSet`.
+
+`concat()` flattens without merging; if you need a reduced union, call `.reduce()` on the result (the `reduce` method lives in `gtars-genomicdist` via the `IntervalRanges` trait).
+
+Key methods: `new`, `with_names`, `add`, `get(i)`, `iter`, `len`, `is_empty`, `concat`, `identifier`.
+
+### `CoordinateMode`
+
+Switches between BED (0-based half-open, floor division) and GRanges (1-based closed, banker's rounding) conventions for midpoint calculation. BED is the default and matches Python/numpy conventions; GRanges exists for exact bit-compatibility with R GenomicDistributions output.
+
+```rust
+use gtars_core::models::{Region, CoordinateMode};
+
+let r = Region { chr: "chr1".into(), start: 100, end: 206, rest: None };
+
+assert_eq!(r.mid_point_with_mode(CoordinateMode::Bed), 153);      // 100 + 53
+assert_eq!(r.mid_point_with_mode(CoordinateMode::GRanges), 152);  // banker's rounding
+```
+
+For widths `w` where `w % 4 == 2`, BED and GRanges midpoints differ by 1 bp — this affects approximately 2.6% of typical peak distance calculations. Use `GRanges` mode only when you need byte-for-byte parity with R output.
+
+### `Fragment`
+
+A fragment-file record (chromatin accessibility / scATAC-seq). Implements `FromStr` for parsing lines from 10x-style `fragments.tsv` files and `From<Fragment> for Region` for dropping the barcode/support metadata when you only care about coordinates.
+
+```rust
+use gtars_core::models::{Fragment, Region};
+use std::str::FromStr;
+
+let f = Fragment::from_str("chr1\t1000\t1200\tAAACCTGAGAAACCAT-1\t3")?;
+assert_eq!(f.barcode, "AAACCTGAGAAACCAT-1");
+assert_eq!(f.read_support, 3);
+
+let as_region: Region = f.into();
+# Ok::<(), anyhow::Error>(())
+```
+
+### `Interval<I, T>`
+
+A generic `[start, end)` range with a payload `T`, parameterized over any unsigned integer type. This is primarily a building block for overlap indexes (consumed by `gtars-overlaprs` and `gtars-igd`); most user code should prefer `Region`/`RegionSet`.
+
+```rust
+use gtars_core::models::Interval;
+
+let a: Interval<u32, usize> = Interval { start: 10, end: 50, val: 0 };
+let b: Interval<u32, usize> = Interval { start: 40, end: 80, val: 1 };
+
+assert!(a.overlap(b.start, b.end));
+assert_eq!(a.intersect(&b), 10); // overlap width in bp
+```
+
+## Error handling
+
+Parse and I/O errors from `RegionSet::try_from` come back as the typed `RegionSetError` enum (no panics on malformed input):
+
+```rust
+use gtars_core::errors::RegionSetError;
+use gtars_core::models::RegionSet;
+
+match RegionSet::try_from("not_a_real_file.bed") {
+    Ok(rs) => println!("loaded {} regions", rs.len()),
+    Err(RegionSetError::FileReadError(msg)) => eprintln!("read failed: {msg}"),
+    Err(RegionSetError::RegionParseError(msg)) => eprintln!("parse failed: {msg}"),
+    Err(RegionSetError::EmptyRegionSet(path)) => eprintln!("no regions in {path}"),
+    Err(e) => eprintln!("other: {e}"),
+}
 ```
 
-## Available Modules
+Variants: `FileReadError`, `InvalidPathOrUrl`, `InvalidBedbaseIdentifier`, `BedbaseFetchError`, `RegionParseError`, `EmptyRegionSet`, `HttpFeatureDisabled`, `BigBedError`, and a transparent `Io` wrapper around `std::io::Error`.
 
-- `models` - Core data structures (Region, RegionSet)
-- `utils` - Utility functions for file handling and parsing
-- `consts` - Shared constants
+## Feature flags
+
+| flag | effect |
+|---|---|
+| *(default)* | Pure in-memory BED reading/writing, no optional dependencies. |
+| `serde` | Derives `Serialize`/`Deserialize` on `Region`, `RegionSet`. |
+| `http` | Enables `RegionSet::try_from(&Path)` to fetch from HTTP(S) URLs via `ureq`. Without this feature, non-file paths return `HttpFeatureDisabled`. |
+| `dataframe` | Enables `RegionSet::to_polars()` (pulls in `polars`). Required transitively by `gtars-genomicdist`'s `bedclassifier` feature. |
+| `bigbed` | Enables `RegionSet::to_bigbed()` via `bigtools`. |
+
+Enable them in `Cargo.toml` like so:
+
+```toml
+[dependencies]
+gtars-core = { version = "0.5", features = ["serde", "dataframe"] }
+```
 
-## Dependencies
+## Available modules
 
-Minimal external dependencies:
+- **`models`** — all core data types (`Region`, `RegionSet`, `RegionSetList`, `Interval`, `Fragment`, `CoordinateMode`). Re-exported at the crate root.
+- **`errors`** — `RegionSetError` enum.
+- **`utils`** — readers, file-type detection, chromosome-sizes parsing, and `Region` ↔ id hash-map helpers:
+    - `get_dynamic_reader(&Path)` / `get_dynamic_reader_w_stdin(&str)` — transparent gzip/stdin handling.
+    - `get_dynamic_reader_from_url(&Path)` — under the `http` feature.
+    - `get_file_info(&Path) -> FileInfo` — detect type (BED, BAM, NARROWPEAK, UNKNOWN) and gzip.
+    - `parse_bedlike_file(line)` → `(chr, start, end)` tuple from a single line.
+    - `get_chrom_sizes(path)` → `HashMap<String, u32>`.
+    - `read_bedset_file(path)` → `Vec<String>` of BED paths from a bedset manifest.
+    - `generate_region_to_id_map` / `generate_id_to_region_map` and string variants — stable id assignment for tokenizer vocabularies.
+    - `remove_all_extensions(&Path)` → stem with *all* extensions stripped (handles `.bed.gz`).
+- **`consts`** — column-name constants (`CHR_COL_NAME`, `START_COL_NAME`, `END_COL_NAME`, `DELIMITER`) and file-extension constants (`BED_FILE_EXTENSION`, `BAM_FILE_EXTENSION`, `GZ_FILE_EXTENSION`, `IGD_FILE_EXTENSION`, `GTOK_EXT`).
 
-- `anyhow` - Error handling
-- `flate2` - Gzip compression support
-- Other standard bioinformatics libraries
+## Where to go next
 
-This module serves as the foundation for all other gtars modules and maintains backward compatibility within major versions.
\ No newline at end of file
+- **[Core models tour](regionSet.md)** — a cross-language (Python + Rust) walkthrough of `Region`, `RegionSet`, and friends.
+- **[gtars-overlaprs](overlaprs.md)** — high-performance overlap queries that operate on `RegionSet`.
+- **[gtars-genomicdist](genomicdist.md)** — the `IntervalRanges` and `GenomicIntervalSetStatistics` traits extend `RegionSet` with R GenomicDistributions–style set algebra and summary stats.
+- **[gtars-lola](lola.md)** — LOLA enrichment built on top of the IGD index and `RegionSetList`.
diff --git a/docs/gtars/genomicdist.md b/docs/gtars/genomicdist.md
new file mode 100644
index 00000000..e57c66f6
--- /dev/null
+++ b/docs/gtars/genomicdist.md
@@ -0,0 +1,509 @@
+# gtars-genomicdist
+
+Rust port of the R [GenomicDistributions](https://code.databio.org/GenomicDistributions/) package — plus a handful of extra calculations and binary serialization formats used by BEDbase. The crate provides:
+
+- **Summary statistics** and per-chromosome distributions (`GenomicIntervalSetStatistics` trait).
+- **Interval set algebra** in the GRanges/IRanges idiom (`IntervalRanges` trait, 26+ methods).
+- **Genomic partitioning** using a gene model (promoter / UTR / exon / intron / intergenic).
+- **Signal matrix overlap and summary** (`calc_summary_signal`, TSV + packed-binary I/O).
+- **Consensus region calling** across multiple input sets.
+- **Nearest-TSS / nearest-feature distance** calculations (`TssIndex`).
+- **GC content** and **dinucleotide frequency** against a reference genome.
+- **BED file classification** (optional `bedclassifier` feature, polars-backed).
+- The **GDA** binary gene-model format and the **.fab** binary FASTA format for fast mmap-backed sequence lookup.
+
+All operations take a `&RegionSet` (from `gtars-core`) as input and either extend it via trait implementations or run as free functions. If you're new to the crate, start with the `IntervalRanges` and `GenomicIntervalSetStatistics` traits — those cover the GRanges/GenomicDistributions surface area most users need.
+
+!!! tip "Coming from R GenomicDistributions?"
+    Most function names follow the pattern `calc_*` where R used `calc*`, and behavior is matched where practical. A few deliberate divergences are documented inline — the most notable is that midpoint calculations default to BED (floor) conventions rather than GRanges banker's rounding, and spacing/nearest-neighbor calculations *exclude* single-region chromosomes rather than emitting sentinel values.
+
+## Installation
+
+```toml
+[dependencies]
+gtars-genomicdist = "0.7"
+```
+
+### Feature flags
+
+| flag | effect |
+|---|---|
+| *(default)* | All traits, statistics, partitions, signal, consensus, and TSS calculations. |
+| `bedclassifier` | Enables `classify_bed()` and pulls in `polars`. Transitively enables `gtars-core/dataframe`. |
+
+Enable the classifier feature:
+
+```toml
+[dependencies]
+gtars-genomicdist = { version = "0.7", features = ["bedclassifier"] }
+```
+
+## Core types
+
+All of these are re-exported at the crate root, so you typically import them directly:
+
+```rust
+use gtars_genomicdist::{
+    // Statistics
+    GenomicIntervalSetStatistics,
+    // Interval algebra
+    IntervalRanges, RegionSetListOps, pairwise_jaccard,
+    // Strand-aware wrappers
+    SortedRegionSet, Strand, StrandedRegionSet,
+    // Partitions
+    GeneModel, PartitionList, PartitionResult,
+    ExpectedPartitionResult, ExpectedPartitionRow,
+    calc_expected_partitions, calc_partitions, genome_partition_list,
+    // Signal
+    SignalMatrix, SignalSummaryResult, ConditionStats,
+    calc_summary_signal,
+    // Consensus
+    ConsensusRegion, consensus,
+    // Sequence stats
+    calc_gc_content, calc_dinucl_freq, DINUCL_ORDER,
+    // Utilities
+    chrom_karyotype_key, median_abs_distance,
+    // Gene model serialization
+    GenomicDistAnnotation,
+    // BED classifier (feature-gated)
+    #[cfg(feature = "bedclassifier")] classify_bed,
+    // Re-export from gtars-core
+    CoordinateMode,
+};
+```
+
+The submodules (`statistics`, `interval_ranges`, `partitions`, `signal`, `consensus`, `models`, `asset`, `bed_classifier`, `utils`) are all `pub` as well, so both `gtars_genomicdist::calc_partitions` and `gtars_genomicdist::partitions::calc_partitions` resolve to the same symbol.
+
+## Statistics
+
+The `GenomicIntervalSetStatistics` trait extends `RegionSet` with per-region and per-pair quantities — direct ports of the R GenomicDistributions functions (`calcWidth`, `calcNeighborDist`, `calcNearestNeighbors`, `calcChromBins`). These return vectors or per-chromosome tables, one number per peak or per gap. Use them when you want to *see* each value — plot a histogram, feed them to a downstream test, render a per-chromosome heatmap.
+
+Bring the trait into scope and the methods appear on any `RegionSet`.
+
+### Per-region and per-pair quantities
+
+Direct ports of R GenomicDistributions. Each returns a vector or per-chromosome table.
+
+```rust
+use gtars_core::models::RegionSet;
+use gtars_genomicdist::GenomicIntervalSetStatistics;
+
+let peaks = RegionSet::try_from("peaks.bed")?;
+
+// Per-chromosome summary: count, min/max/mean/median width, chr bounds
+for (chr, stats) in peaks.chromosome_statistics() {
+    println!(
+        "{chr}: {} regions, median width {}",
+        stats.number_of_regions,
+        stats.median_region_length
+    );
+}
+
+// Region-width distribution (one value per region)
+let widths: Vec<u32> = peaks.calc_widths();
+
+// Signed inter-region gaps on each chromosome (only positive gaps are returned)
+let gaps: Vec<i64> = peaks.calc_neighbor_distances()?;
+
+// Nearest-neighbor distance per region (multi-region chromosomes only)
+let nn: Vec<u32> = peaks.calc_nearest_neighbors()?;
+# Ok::<(), Box<dyn std::error::Error>>(())
+```
+
+| method | question it answers | returns |
+|---|---|---|
+| `chromosome_statistics()` | "What do the per-chromosome counts and width distributions look like?" | `HashMap<String, ChromosomeStatistics>` |
+| `region_distribution_with_bins(n_bins)` | "How many peaks fall in each of `n_bins` bins sized by the longest chromosome?" | `HashMap<String, RegionBin>` |
+| `region_distribution_with_chrom_sizes(n_bins, chrom_sizes)` | Same, but bins are sized per-chromosome by actual chromosome length — matches R `getGenomeBins` | `HashMap<String, RegionBin>` |
+| `calc_widths()` | "How wide is each peak?" — port of R `calcWidth()` | `Vec<u32>` |
+| `calc_neighbor_distances()` | "What's the bp gap between every pair of consecutive peaks on each chromosome?" | `Result<Vec<i64>>` |
+| `calc_nearest_neighbors()` | "How far is each peak from its closest neighbor?" — port of R `calcNearestNeighbors()` | `Result<Vec<u32>>` |
+
+### Output structs
+
+```rust
+pub struct ChromosomeStatistics {
+    pub chromosome: String,
+    pub number_of_regions: u32,
+    pub start_nucleotide_position: u32,  // leftmost start
+    pub end_nucleotide_position: u32,    // rightmost end
+    pub minimum_region_length: u32,
+    pub maximum_region_length: u32,
+    pub mean_region_length: f64,
+    pub median_region_length: f64,
+}
+
+pub struct RegionBin { pub chr: String, pub start: u32, pub end: u32, pub n: u32, pub rid: u32 }
+```
+
+!!! warning "Output length for spacing/nearest calculations"
+    `calc_neighbor_distances` and `calc_nearest_neighbors` **skip single-region chromosomes** (matching R GenomicDistributions behavior). The output length is therefore **not 1:1 with the input region count** — it's the number of multi-region chromosomes' regions. No sentinel values are emitted. If you need 1:1 alignment, filter your input to multi-region chromosomes first.
+
+!!! warning "`n_bins` is a target, not a total"
+    In `region_distribution_with_chrom_sizes`, `n_bins` is the target bin count for the **longest** chromosome in `chrom_sizes`. Bin width is derived from that, and every chromosome is tiled at the same bp width — so shorter chromosomes get fewer bins and the total bin count is `sum(ceil(chrom_size / bin_width))`, which can exceed `n_bins` substantially. To target a specific bin width in bp, set `n_bins = max_chrom_len / desired_bp`.
+
+## Interval set algebra
+
+`IntervalRanges` is a second trait on `RegionSet` that provides GRanges/IRanges-style set algebra. All operations return new `RegionSet`s (immutable pattern) and are strand-unaware by default — use `StrandedRegionSet` if you need strand-aware promoters, reduce, or setdiff.
+
+```rust
+use gtars_core::models::RegionSet;
+use gtars_genomicdist::IntervalRanges;
+
+let a = RegionSet::try_from("peaks_a.bed")?;
+let b = RegionSet::try_from("peaks_b.bed")?;
+
+let merged   = a.reduce();                         // merge overlapping intervals
+let common   = a.intersect(&b);                    // range-level intersection
+let a_only   = a.setdiff(&b);                      // remove b from a
+let combined = a.union(&b);                        // union + reduce
+let jac      = a.jaccard(&b);                      // bp-Jaccard similarity
+let closest  = a.closest(&b);                      // Vec<(a_idx, b_idx, dist)>
+let clustered = a.cluster(1000);                   // per-region cluster id
+# Ok::<(), Box<dyn std::error::Error>>(())
+```
+
+### Method reference
+
+| method | description |
+|---|---|
+| `trim(chrom_sizes)` | clamp regions to `[0, chrom_size)`; drop regions on unknown chromosomes |
+| `promoters(upstream, downstream)` | `[start - upstream, start + downstream)` per region |
+| `reduce()` | merge overlapping/adjacent intervals per chromosome |
+| `setdiff(other)` / `subtract(other)` | remove `other` from self |
+| `pintersect(other)` | *pairwise* (by index) intersection |
+| `intersect(other)` | range-level intersection |
+| `intersect_all(other)` | all-vs-all pairwise intersection fragments (AIList-backed) |
+| `concat(other)` | concatenate without merging |
+| `union(other)` | `concat(other).reduce()` |
+| `jaccard(other)` | bp-level Jaccard `|A ∩ B| / |A ∪ B|` |
+| `coverage(other)` | fraction of `self` bp covered by `other` |
+| `overlap_coefficient(other)` | `|A ∩ B| / min(|A|, |B|)` |
+| `shift(offset)` | translate by signed bp offset (saturating at 0) |
+| `flank(width, use_start, both)` | upstream/downstream/both-side flanks |
+| `resize(width, fix)` | fixed width anchored at `"start"`, `"end"`, or `"center"` |
+| `narrow(start, end, width)` | GRanges-style narrow with 1-based relative coords |
+| `disjoin()` | tile into non-overlapping pieces at every boundary |
+| `gaps(chrom_sizes)` | peak-free regions, including leading/trailing/whole-chrom gaps |
+| `closest(other)` | `Vec<(self_idx, other_idx, signed_dist)>` |
+| `cluster(max_gap)` | `Vec<u32>` cluster ids in original order |
+
+!!! note "`rest` fields are dropped"
+    Operations that merge or synthesize new intervals (reduce, setdiff, promoters, etc.) produce regions with `rest: None`. There is no unambiguous way to carry the original metadata through a merge, so the contract is: use `IntervalRanges` methods for coordinate-only work.
+
+### `pairwise_jaccard`
+
+A standalone helper for computing the full N×N Jaccard matrix over a slice of region sets, optimized to pre-reduce each set once and walk pairs linearly:
+
+```rust
+use gtars_genomicdist::pairwise_jaccard;
+
+let sets = vec![rs1, rs2, rs3];
+let jac = pairwise_jaccard(&sets); // Vec<f64> of length 9 (row-major)
+
+for i in 0..3 {
+    for j in 0..3 {
+        print!("{:.3} ", jac[i * 3 + j]);
+    }
+    println!();
+}
+```
+
+### RegionSetListOps
+
+`RegionSetListOps` implements the same set-algebra operations on a `RegionSetList` by index — useful for bindings (wasm/Python/R) that want to operate on pairs without copying whole `RegionSet`s across an FFI boundary.
+
+```rust
+use gtars_core::models::RegionSetList;
+use gtars_genomicdist::RegionSetListOps;
+
+let rsl = RegionSetList::try_from(vec!["a.bed", "b.bed", "c.bed"])?;
+
+let jac_01 = rsl.jaccard_at(0, 1);                   // Option<f64>
+let union_02 = rsl.union_at(0, 2);                   // Option<RegionSet>
+let drop_one = rsl.union_except(1);                  // union of all but index 1
+let (full_union, loo) = rsl.bulk_union_except()      // all leave-one-out unions in O(n)
+    .ok_or("empty list")?;
+# Ok::<(), Box<dyn std::error::Error>>(())
+```
+
+Methods: `pintersect_at`, `pintersect_count`, `jaccard_at`, `union_at`, `setdiff_at`, `region_count`, `union_except`, `bulk_union_except`, `union_all`, `intersect_all`.
+
+## Partitions
+
+Ports of `genomePartitionList()`, `calcPartitions()`, and `calcExpectedPartitions()` from R GenomicDistributions. You load a gene model from BED files or GTF, build an ordered `PartitionList` of promoter / UTR / exon / intron categories, and classify your query regions into mutually exclusive buckets (`intergenic` is added as the implicit remainder).
+
+```rust
+use gtars_core::models::RegionSet;
+use gtars_genomicdist::{
+    GeneModel, calc_partitions, calc_expected_partitions, genome_partition_list,
+};
+use std::collections::HashMap;
+
+// 1. Load the gene model (choose one)
+let model = GeneModel::from_bed_files(
+    "genes.bed",
+    "exons.bed",
+    Some("three_utr.bed"),  // optional
+    Some("five_utr.bed"),   // optional
+)?;
+
+// Or, from a GTF:
+// let model = GeneModel::from_gtf("gencode.v47.gtf.gz", true, true)?;
+
+// 2. Build partition list (core/prox promoter sizes in bp; chrom_sizes optional)
+let partitions = genome_partition_list(&model, 100, 2000, None);
+
+// 3. Classify query regions
+let query = RegionSet::try_from("peaks.bed")?;
+
+let result = calc_partitions(&query, &partitions, /* bp_proportion = */ false);
+for (name, count) in &result.counts {
+    let pct = *count as f64 / result.total as f64 * 100.0;
+    println!("{:<15} {:>6}  ({:.1}%)", name, count, pct);
+}
+
+// 4. Observed vs expected, with chi-square p-value
+let chrom_sizes: HashMap<String, u32> = [("chr1".into(), 248_956_422)].into_iter().collect();
+let expected = calc_expected_partitions(&query, &partitions, &chrom_sizes, false);
+for row in &expected.rows {
+    println!(
+        "{:<15} obs={:>6}  exp={:>10.1}  log10(O/E)={:+.2}  p={:.2e}",
+        row.partition, row.observed, row.expected, row.log10_oe, row.chi_sq_pval
+    );
+}
+# Ok::<(), Box<dyn std::error::Error>>(())
+```
+
+### Priority order
+
+`genome_partition_list` produces partitions in this order, which is also the priority for classification:
+
+1. **`promoterCore`** — `core_prom_size` bp upstream of each gene start (strand-aware).
+2. **`promoterProx`** — `prox_prom_size` bp upstream, minus core.
+3. **`threeUTR`** — if UTR files are provided.
+4. **`fiveUTR`** — minus 3'UTR (R gives 3'UTR priority).
+5. **`exon`** — minus UTRs.
+6. **`intron`** — gene bodies minus UTRs and exons.
+7. **`intergenic`** — implicit remainder, emitted by `calc_partitions`.
+
+`calc_partitions` has two modes:
+
+- **Priority mode (`bp_proportion = false`)** — each query region is assigned to the first partition it overlaps; remainders are counted as `intergenic`. Mutually exclusive.
+- **BP proportion mode (`bp_proportion = true`)** — for each partition, computes total overlapping base pairs of query. Not mutually exclusive: a query region overlapping multiple partitions contributes bp to each (matching R behavior).
+
+!!! note "Chi-square p-values differ from R"
+    `calc_expected_partitions` uses a goodness-of-fit `(O-E)²/E` formula. R's `chisq.test()` computes a 2×2 test of independence with optional Yates correction, so p-values will not match R GenomicDistributions output byte-for-byte. The `log10_oe` column is directly comparable.
+
+### `GeneModel::from_gtf`
+
+GTF loading handles GENCODE-style files that use an undifferentiated `UTR` feature type by parsing `CDS` records to infer which UTRs are 5' vs 3'. Two flags:
+
+- `filter_protein_coding` — keep only features with `gene_biotype "protein_coding"` or `gene_type "protein_coding"` in the attributes column.
+- `convert_ensembl_ucsc` — prepend `chr` to chromosome names that don't already have it (so Ensembl `1` becomes UCSC `chr1`).
+
+## Signal matrix overlap
+
+`SignalMatrix` holds a matrix of signal values across genomic regions × conditions (e.g. a peak × cell-type matrix of ChIP-seq intensities). `calc_summary_signal` overlaps a query region set against the matrix, aggregates by MAX per query region, and returns Tukey boxplot statistics per condition.
+
+```rust
+use gtars_core::models::{RegionSet, CoordinateMode};
+use gtars_genomicdist::{SignalMatrix, calc_summary_signal};
+
+// Load from a TSV where col 0 is "chr_start_end", remaining cols are condition values
+let sm = SignalMatrix::from_tsv("signal_matrix.tsv")?;
+
+// Or load from the packed binary format (produced by sm.save_bin())
+let sm = SignalMatrix::load_bin("signal_matrix.sigm")?;
+// ...or from an in-memory byte slice (for wasm):
+// let sm = SignalMatrix::load_bin_from_bytes(&bytes)?;
+
+let query = RegionSet::try_from("peaks.bed")?;
+let summary = calc_summary_signal(&query, &sm, CoordinateMode::Bed)?;
+
+for stats in &summary.matrix_stats {
+    println!(
+        "{}: median={:.2}  [{:.2}–{:.2}]",
+        stats.condition, stats.median, stats.lower_hinge, stats.upper_hinge
+    );
+}
+# Ok::<(), Box<dyn std::error::Error>>(())
+```
+
+`SignalSummaryResult` contains per-query-region max signal vectors (`signal_matrix`), per-condition Tukey stats (`matrix_stats`), and `condition_names`. The packed binary format (`.sigm`, magic `SIGM`) is produced by `sm.save_bin()` — substantially faster to load than TSV for large matrices.
+
+## Consensus regions
+
+Given N region sets, `consensus` produces the reduced union annotated with how many input sets overlap each union region. Useful for filtering by replicate support threshold:
+
+```rust
+use gtars_core::models::RegionSet;
+use gtars_genomicdist::{consensus, ConsensusRegion};
+
+let reps = [
+    RegionSet::try_from("rep1.bed")?,
+    RegionSet::try_from("rep2.bed")?,
+    RegionSet::try_from("rep3.bed")?,
+];
+
+let cons: Vec<ConsensusRegion> = consensus(&reps);
+
+// Keep regions present in ≥ 2/3 replicates
+let robust: Vec<_> = cons.into_iter().filter(|c| c.count >= 2).collect();
+println!("{} robust regions", robust.len());
+# Ok::<(), Box<dyn std::error::Error>>(())
+```
+
+`ConsensusRegion` has `chr`, `start`, `end`, and `count` — the number of input sets overlapping that union region.
+
+## TSS / nearest-feature distances
+
+`TssIndex` builds a per-chromosome sorted vector of TSS midpoints from a `RegionSet`, enabling O(R · log M) distance queries instead of the naive O(R · M):
+
+```rust
+use gtars_core::models::{RegionSet, CoordinateMode};
+use gtars_genomicdist::models::TssIndex;
+
+let tss = TssIndex::try_from("gencode_tss.bed")?;
+
+let peaks = RegionSet::try_from("peaks.bed")?;
+
+// Unsigned nearest-feature distance per query region
+let dists: Vec<u32> = tss.calc_tss_distances(&peaks, CoordinateMode::Bed)?;
+
+// Signed distances (positive = feature downstream, negative = upstream)
+let signed: Vec<i64> = tss.calc_feature_distances(&peaks, CoordinateMode::Bed)?;
+# Ok::<(), Box<dyn std::error::Error>>(())
+```
+
+Regions on chromosomes missing from the TSS index are padded with `u32::MAX` / `i64::MAX` sentinels (one per query region). Use `median_abs_distance(&signed)` from the `utils` module to summarize while filtering those sentinels.
+
+## GC content and dinucleotide frequencies
+
+Both functions take anything implementing `SequenceAccess` — currently `GenomeAssembly` (in-memory HashMap, built from a FASTA) or `BinaryGenomeAssembly` (mmap-backed `.fab` binary).
+
+```rust
+use gtars_core::models::RegionSet;
+use gtars_genomicdist::{calc_gc_content, calc_dinucl_freq, DINUCL_ORDER};
+use gtars_genomicdist::models::{GenomeAssembly, BinaryGenomeAssembly};
+
+// In-memory loader — ~2s to build for hg38 but then gives zero-copy slices
+let genome = GenomeAssembly::try_from("hg38.fa")?;
+
+// Or the mmap .fab loader — instant construction, zero-copy per region
+let genome = BinaryGenomeAssembly::from_file("hg38.fab".as_ref())?;
+
+let peaks = RegionSet::try_from("peaks.bed")?;
+
+// GC content per region, 0.0–1.0
+let gc: Vec<f64> = calc_gc_content(&peaks, &genome, /* ignore_unk_chroms = */ true)?;
+
+// Dinucleotide frequencies: 16 columns per region in DINUCL_ORDER
+// raw_counts=false → percentages (matches R default)
+let (labels, matrix) = calc_dinucl_freq(&peaks, &genome, false, true)?;
+# Ok::<(), Box<dyn std::error::Error>>(())
+```
+
+Create a `.fab` file from a regular FASTA once up-front:
+
+```rust
+use gtars_genomicdist::models::BinaryGenomeAssembly;
+
+BinaryGenomeAssembly::write_from_fasta(
+    "hg38.fa".as_ref(),
+    "hg38.fab".as_ref(),
+)?;
+# Ok::<(), Box<dyn std::error::Error>>(())
+```
+
+Or via the CLI: `gtars prep --fasta hg38.fa` (see the [gtars-cli](cli.md) page).
+
+## BED classification
+
+Under the `bedclassifier` feature, `classify_bed` inspects the column layout of a BED file and assigns one of several UCSC / ENCODE format classifications (BED3, BED6+, narrowPeak, broadPeak, gappedPeak, RNA elements, etc.).
+
+```rust
+#[cfg(feature = "bedclassifier")]
+{
+    use gtars_core::models::RegionSet;
+    use gtars_genomicdist::bed_classifier::classify_bed;
+
+    let rs = RegionSet::try_from("unknown_format.bed").unwrap();
+    let c = classify_bed(&rs).unwrap();
+    println!(
+        "format={}  bed_compliance={}  compliant_cols={}",
+        c.data_format, c.bed_compliance, c.compliant_columns
+    );
+}
+```
+
+## GDA binary format
+
+`GenomicDistAnnotation` is a serializable wrapper around `GeneModel` in the GDA (Genomic Dist Annotation) binary format. Chrom sizes are **not** stored in the GDA file — they come from a separate source (refgenie, `.chrom.sizes` file, or an API).
+
+```rust
+use gtars_genomicdist::GenomicDistAnnotation;
+
+// Build from GTF once, save as GDA
+let gda = GenomicDistAnnotation::from_gtf("gencode.v47.gtf.gz")?;
+gda.save_bin("gencode.v47.gda")?;
+
+// Load from disk
+let gda = GenomicDistAnnotation::load_bin("gencode.v47.gda")?;
+
+// Or from memory (wasm / API)
+// let gda = GenomicDistAnnotation::load_bin_from_bytes(&bytes)?;
+
+let partitions = gtars_genomicdist::genome_partition_list(&gda.gene_model, 100, 2000, None);
+# Ok::<(), Box<dyn std::error::Error>>(())
+```
+
+## Utilities
+
+From `gtars_genomicdist::utils`:
+
+- **`partition_genome_into_bins(chrom_sizes, n_bins)`** — tile all chromosomes with fixed-width bins (bin width = longest chrom / n_bins, floored, min 1 bp). Returns a `RegionSet`.
+- **`median_abs_distance(&[i64])`** — median absolute value, filtering `i64::MAX` sentinels produced by `TssIndex::calc_feature_distances`.
+- **`chrom_karyotype_key(chr)`** — sort key producing the standard karyotype order: `chr1, chr2, …, chr22, chrX, chrY, chrM, chrUn_*`. Works with or without the `chr` prefix.
+
+## Strand-aware wrappers
+
+Two wrappers extend `RegionSet` with stronger invariants:
+
+- **`SortedRegionSet`** — a newtype guaranteeing `(chr, start)` sort order. Constructed via `SortedRegionSet::new(rs)`, which sorts in place (move, no clone). Downstream code that requires sorted input can accept `&SortedRegionSet` to avoid re-sorting on every call.
+- **`StrandedRegionSet`** — pairs a `RegionSet` with a parallel `Vec<Strand>`. Strand-aware `promoters_stranded`, `reduce`, `setdiff`, and `trim` are methods on this type and are used internally by `genome_partition_list` to produce correct partitions for minus-strand genes.
+
+```rust
+use gtars_core::models::RegionSet;
+use gtars_genomicdist::{SortedRegionSet, StrandedRegionSet, Strand};
+
+let rs = RegionSet::try_from("peaks.bed")?;
+let sorted = SortedRegionSet::new(rs); // in-place sort
+
+let stranded = StrandedRegionSet::new(
+    sorted.0.clone(),
+    vec![Strand::Plus; sorted.0.regions.len()],
+);
+# Ok::<(), Box<dyn std::error::Error>>(())
+```
+
+`Strand::from_char('+' | '-' | _)` converts a single character to `Plus` / `Minus` / `Unstranded`.
+
+## Errors
+
+All operations that can fail return `Result<T, GtarsGenomicDistError>`. Variants:
+
+- `CustomError(String)` — general I/O and parsing wrapper.
+- `GCContentError(chr, start, end, msg)` — sequence lookup failed during GC calculation.
+- `TSSContentError(msg)` — no TSS features found for the region set.
+- `SignalMatrixError(msg)` — signal-matrix parse failure.
+- `Io(std::io::Error)` — transparent I/O error.
+
+The `bedclassifier` feature has its own `BedClassifierError` enum for format-classification failures.
+
+## Where to go next
+
+- **[gtars-core](core.md)** — `RegionSet`, `RegionSetList`, and `CoordinateMode`, which this crate consumes.
+- **[gtars-lola](lola.md)** — LOLA enrichment is built on `IntervalRanges` (for universe construction) and the IGD index.
+- **[gtars-overlaprs](overlaprs.md)** — the overlap-detection engine used internally by `calc_partitions` and `consensus`.
+- **[gtars CLI](cli.md)** — `gtars genomicdist` subcommands for running these analyses from the command line.
diff --git a/docs/gtars/lola.md b/docs/gtars/lola.md
new file mode 100644
index 00000000..88e7809c
--- /dev/null
+++ b/docs/gtars/lola.md
@@ -0,0 +1,358 @@
+# gtars-lola
+
+Rust port of the R [LOLA](http://lola.databio.org/) (Locus Overlap Analysis) package: region-set enrichment testing against a database of reference region sets, using Fisher's exact test on 2×2 contingency tables.
+
+Given a **user set** of regions (your peaks), a **universe** (the background you consider sampleable — typically a union of publicly available peak sets), and a **region database** (LOLA core or custom), `gtars-lola` computes for every database set the probability that your user set's overlap with that database set is larger (or smaller) than expected under the null.
+
+The crate is built on top of:
+
+- **[gtars-igd](igd.md)** — the interval overlap index. A LOLA database is exposed as a single `Igd` index over all its files, so one call can score a user set against thousands of reference sets simultaneously.
+- **[gtars-genomicdist](genomicdist.md)** — the `IntervalRanges` trait (for `concat`, `disjoin`, `union`, etc.) is used to build restricted universes and redefine user sets in terms of universe regions.
+- **[gtars-core](core.md)** — all region types come from here; `RegionSetList` is used as the FFI-friendly return type for extracting region sets from a database.
+
+## Installation
+
+```toml
+[dependencies]
+gtars-lola = "0.2"
+```
+
+No feature flags — the crate has a single default configuration.
+
+## Module layout
+
+`gtars-lola` does not re-export symbols at the crate root; import from the submodules directly:
+
+- **`gtars_lola::database`** — `RegionDB`, `CollectionAnno`, `RegionSetAnno`
+- **`gtars_lola::enrichment`** — `run_lola`, `ContingencyTable` impls
+- **`gtars_lola::universe`** — `check_universe_appropriateness`, `redefine_user_sets`, `build_restricted_universe`, `UniverseReport`, `UserSetReport`
+- **`gtars_lola::output`** — `annotate_results`, `apply_fdr_correction`, `results_to_columns`, `write_results_tsv`, `LolaColumnar`
+- **`gtars_lola::models`** — `Direction`, `LolaConfig`, `ContingencyTable`, `LolaResult`
+- **`gtars_lola::errors`** — `LolaError`
+
+## End-to-end workflow
+
+```rust
+use std::fs::File;
+use std::io::BufWriter;
+use std::path::Path;
+
+use gtars_core::models::RegionSet;
+use gtars_lola::database::RegionDB;
+use gtars_lola::enrichment::run_lola;
+use gtars_lola::models::LolaConfig;
+use gtars_lola::output::{annotate_results, apply_fdr_correction, write_results_tsv};
+
+// 1. Load the region database from a LOLA-format folder
+let db = RegionDB::from_lola_folder(
+    Path::new("LOLACore/hg38"),
+    None,   // collections filter (None = all)
+    None,   // per-collection file limit (None = all)
+)?;
+
+// 2. Load the user set(s) and universe
+let user_sets = vec![
+    RegionSet::try_from("peaks_condition_a.bed")?,
+    RegionSet::try_from("peaks_condition_b.bed")?,
+];
+let universe = RegionSet::try_from("universe.bed")?;
+
+// 3. Run Fisher's exact test for every (user_set, db_set) pair
+let mut results = run_lola(
+    &db.igd,
+    &user_sets,
+    &universe,
+    &LolaConfig::default(),
+)?;
+
+// 4. Attach database metadata (collection, cell type, tissue, …)
+annotate_results(&mut results, &db);
+
+// 5. Benjamini-Hochberg FDR correction, per user set
+apply_fdr_correction(&mut results);
+
+// 6. Write results to TSV (format matches R LOLA's writeCombinedEnrichment)
+let mut out = BufWriter::new(File::create("lola_results.tsv")?);
+write_results_tsv(&mut out, &results)?;
+# Ok::<(), Box<dyn std::error::Error>>(())
+```
+
+The steps correspond 1:1 to the R LOLA workflow: `loadRegionDB` → `runLOLA` → `getLolaResults` → `writeCombinedEnrichment`.
+
+## Loading a region database
+
+`RegionDB::from_lola_folder` expects a directory laid out in the standard R LOLA format:
+
+```text
+db_path/
+├── collection1/
+│   ├── collection.txt      # collector, date, source, description (TSV)
+│   ├── index.txt           # per-file annotations (TSV or CSV)
+│   └── regions/
+│       ├── file1.bed
+│       └── file2.bed
+├── collection2/
+│   ├── collection.txt
+│   ├── index.txt
+│   └── regions/
+│       └── …
+```
+
+`collection.txt` and `index.txt` are both auto-detected TSV or CSV (R's `fread` convention). Recognized columns in `index.txt`: `filename`, `cellType`, `description`, `tissue`, `dataSource`, `antibody`, `treatment`. Unknown columns are ignored; missing optional columns are preserved as `None`.
+
+### Filters and limits
+
+```rust
+// Load only specific collections
+let db = RegionDB::from_lola_folder(
+    path,
+    Some(&["encode_segmentation", "roadmap_epigenomics"]),
+    None,
+)?;
+
+// Limit files per collection (useful for smoke tests)
+let db = RegionDB::from_lola_folder(path, None, Some(10))?;
+# Ok::<(), Box<dyn std::error::Error>>(())
+```
+
+### In-memory construction (for API integrations)
+
+If you're serving region sets from an API rather than a folder, build the `RegionDB` directly from a pre-constructed `Igd`:
+
+```rust
+use gtars_lola::database::{RegionDB, RegionSetAnno};
+
+let db = RegionDB::from_igd_with_regions(igd, region_sets, region_annos);
+```
+
+This is the path BEDbase uses to expose LOLA over the web — the IGD is built server-side from the bedset collection and shipped to the client.
+
+### Convenience accessors
+
+- **`db.num_region_sets()`** — total region sets in the database.
+- **`db.list_region_sets(collections)`** → `Vec<String>` — filenames, optionally filtered by collection.
+- **`db.get_region_set(filenames, collections)`** → `Vec<&RegionSet>` — look up by filename.
+- **`db.get_region_set_list(&[usize])`** → `RegionSetList` — extract indexed region sets as a named `RegionSetList` (for FFI, plotting, or downstream analysis). Out-of-bounds indices are silently skipped.
+- **`RegionDB::merge(a, b)`** — combine two databases into one; rebuilds the IGD.
+
+## Checking and shaping the universe
+
+LOLA results are only meaningful if your universe **contains** your user set: every user region should overlap at least one universe region, with no many-to-many mappings. The `universe` module provides three diagnostics/fixers:
+
+### `check_universe_appropriateness`
+
+```rust
+use gtars_lola::universe::check_universe_appropriateness;
+use gtars_igd::igd::Igd;
+
+// Pre-build the universe IGD once — you can reuse it across calls
+let universe_igd = Igd::from_single_region_set(&universe);
+
+let report = check_universe_appropriateness(&user_sets, &universe_igd);
+
+for r in &report.user_set_reports {
+    println!(
+        "user set {}: {:.1}% coverage ({} of {} regions), {} many-to-many",
+        r.user_set_index,
+        r.coverage * 100.0,
+        r.regions_in_universe,
+        r.total_regions,
+        r.many_to_many_count,
+    );
+    for w in &r.warnings {
+        eprintln!("  ⚠ {w}");
+    }
+}
+```
+
+Warnings fire when coverage is under 50% (severe) or under 90% (moderate), or when any user region overlaps more than one universe region (many-to-many).
+
+### `redefine_user_sets`
+
+Rewrite each user set in terms of universe regions: for every user region, find the universe regions it overlaps and emit those as the new user set. This eliminates many-to-many mapping artifacts and is the Rust equivalent of R LOLA's `redefineUserSets()`.
+
+```rust
+use gtars_lola::universe::redefine_user_sets;
+
+let universe_igd = Igd::from_single_region_set(&universe);
+let redefined: Vec<RegionSet> =
+    redefine_user_sets(&user_sets, &universe, &universe_igd);
+
+// Now run enrichment on `redefined` instead of the raw user_sets
+let results = run_lola(&db.igd, &redefined, &universe, &LolaConfig::default())?;
+# Ok::<(), Box<dyn std::error::Error>>(())
+```
+
+### `build_restricted_universe`
+
+For differential enrichment analysis — build a universe that is exactly the union of all user sets, disjoined into non-overlapping pieces. This is R LOLA's `disjoin(unlist(userSets))`.
+
+```rust
+use gtars_lola::universe::build_restricted_universe;
+
+let restricted = build_restricted_universe(&user_sets);
+// Use `restricted` as the universe in a differential run
+```
+
+## Running enrichment
+
+`run_lola(igd, user_sets, universe, config)` is the core engine. For every `(user_set, db_set)` pair it builds a 2×2 contingency table and runs a one-sided Fisher exact test:
+
+```text
+                   In DB set    Not in DB set
+In user set           a              c
+Not in user set       b              d
+```
+
+- **a** = overlap count between user set and DB set (the *support* — shows up as `support` in results).
+- **b** = universe-DB overlap − user-DB overlap.
+- **c** = user set size − user-DB overlap.
+- **d** = universe size − a − b − c.
+
+### `LolaConfig`
+
+```rust
+pub struct LolaConfig {
+    pub min_overlap: i32,         // default 1
+    pub direction: Direction,     // default Enrichment
+}
+
+pub enum Direction {
+    Enrichment, // P(X ≥ a), alternative = "greater"
+    Depletion,  // P(X ≤ a), alternative = "less"
+}
+```
+
+```rust
+use gtars_lola::models::{LolaConfig, Direction};
+
+let config = LolaConfig {
+    min_overlap: 1,
+    direction: Direction::Enrichment,
+};
+```
+
+### p-values and odds ratios
+
+The p-value is computed from the hypergeometric distribution using the survival function (`sf`) rather than `1 - cdf` to avoid catastrophic cancellation when the tail probability is very small. It is reported as `-log10(p)` (capped at ~322 via a `1e-322` floor, matching R LOLA).
+
+The odds ratio is the **conditional maximum likelihood estimate** — the noncentrality parameter of Fisher's noncentral hypergeometric distribution — solved by Brent's method. This matches `fisher.test()$estimate` in R exactly, not the simple `(a·d)/(b·c)` point estimate.
+
+### Negative contingency values
+
+If your user set contains regions *outside* the universe, the contingency table can produce negative `b`, `c`, or `d`. `run_lola` matches R LOLA's behavior: it prints a warning to stderr, stores the signed values on the `LolaResult` row, and returns `p_value_log = 0.0`, `odds_ratio = NaN` for that row. To avoid this entirely, pre-process your inputs with `redefine_user_sets`.
+
+## Results
+
+Each `(user_set, db_set)` pair produces a `LolaResult`:
+
+```rust
+pub struct LolaResult {
+    pub user_set: usize,              // 0-based user set index
+    pub db_set: usize,                // 0-based db set index
+    pub p_value_log: f64,             // -log10(p), ≥ 0
+    pub odds_ratio: f64,              // CMLE, matches R fisher.test()$estimate
+    pub support: u64,                 // contingency table cell 'a'
+    pub rnk_pv: usize,                // rank by p-value (1-based, ascending)
+    pub rnk_or: usize,                // rank by odds ratio (1-based, descending)
+    pub rnk_sup: usize,                // rank by support (1-based, descending)
+    pub max_rnk: usize,                // max(rnk_pv, rnk_or, rnk_sup)
+    pub mean_rnk: f64,                 // mean of the three ranks
+    pub b: i64, pub c: i64, pub d: i64, // signed contingency values
+    pub q_value: Option<f64>,          // BH-adjusted p (None until apply_fdr_correction)
+    pub filename: String,
+    pub collection: Option<String>,
+    pub description: Option<String>,
+    pub cell_type: Option<String>,
+    pub tissue: Option<String>,
+    pub antibody: Option<String>,
+    pub treatment: Option<String>,
+    pub data_source: Option<String>,
+    pub db_set_size: u64,
+}
+```
+
+`run_lola` returns results sorted by descending `p_value_log`, then ascending `mean_rnk` — identical to R LOLA's output order. Ranks are assigned per user set so the rank columns within a user set are independent.
+
+## Annotation and FDR
+
+Two post-processing steps, both in `gtars_lola::output`:
+
+- **`annotate_results(&mut results, &db)`** — fills in `collection`, `description`, `cell_type`, `tissue`, `antibody`, `treatment`, `data_source`, and `db_set_size` from the database. `description` is truncated to 80 chars to match R LOLA.
+- **`apply_fdr_correction(&mut results)`** — Benjamini-Hochberg q-value computation, applied **independently per user set**. Writes to the `q_value: Option<f64>` field.
+
+## Output
+
+### TSV (R LOLA–compatible)
+
+```rust
+use gtars_lola::output::write_results_tsv;
+use std::fs::File;
+use std::io::BufWriter;
+
+let mut out = BufWriter::new(File::create("lola_results.tsv")?);
+write_results_tsv(&mut out, &results)?;
+# Ok::<(), Box<dyn std::error::Error>>(())
+```
+
+The header exactly matches R LOLA's `writeCombinedEnrichment`:
+
+```text
+userSet  dbSet  collection  pValueLog  oddsRatio  support
+rnkPV  rnkOR  rnkSup  maxRnk  meanRnk  b  c  d
+description  cellType  tissue  antibody  treatment  dataSource
+filename  qValue  size
+```
+
+`userSet` and `dbSet` are 1-based in the TSV (R convention), even though they are 0-based in memory.
+
+### Columnar (for FFI bindings)
+
+`results_to_columns(&results) -> LolaColumnar` pivots the row-oriented `Vec<LolaResult>` into parallel column vectors. This is what the Python, WASM, and R bindings consume to build native data frames without reimplementing the row→column transpose:
+
+```rust
+use gtars_lola::output::{results_to_columns, LolaColumnar};
+
+let cols: LolaColumnar = results_to_columns(&results);
+println!("{} rows", cols.p_value_log.len());
+// cols.p_value_log[i], cols.odds_ratio[i], cols.cell_type[i], … all align on row i
+```
+
+Every field of `LolaResult` has a corresponding `Vec<T>` on `LolaColumnar`; the ordering is preserved from the input slice.
+
+## Direct contingency-table use
+
+If you want to compute a single Fisher test outside of the full `run_lola` pipeline:
+
+```rust
+use gtars_lola::models::{ContingencyTable, Direction};
+
+let ct = ContingencyTable { a: 45, b: 155, c: 155, d: 9645 };
+
+let p     = ct.fisher_pvalue(Direction::Enrichment);
+let log_p = ct.p_value_log(Direction::Enrichment);
+let or    = ct.odds_ratio();
+
+println!("p = {p:.2e}  (-log10 = {log_p:.2})  odds = {or:.2}");
+```
+
+`fisher_pvalue`, `p_value_log`, and `odds_ratio` are `impl ContingencyTable` methods — the same functions `run_lola` calls internally.
+
+## Errors
+
+```rust
+pub enum LolaError {
+    EmptyUniverse,                // universe has 0 regions
+    EmptyDatabase,                // IGD has 0 files
+    Other(anyhow::Error),         // wrapped I/O etc.
+}
+```
+
+`run_lola` returns early with `EmptyUniverse` / `EmptyDatabase` so you can distinguish configuration problems from per-row negative-contingency warnings (which print to stderr and continue).
+
+## Where to go next
+
+- **[gtars-igd](igd.md)** — the overlap index that backs `RegionDB.igd`. Worth reading if you're tuning universe construction or wrapping your own database.
+- **[gtars-genomicdist](genomicdist.md)** — `IntervalRanges` operations used during `build_restricted_universe` and `redefine_user_sets`.
+- **[gtars-core](core.md)** — `RegionSet` and `RegionSetList`, the input/output types throughout this page.
+- **R LOLA** ([docs](http://lola.databio.org/)) — the reference implementation this port targets.
diff --git a/docs/gtars/modules.md b/docs/gtars/modules.md
index 47ce2f9a..c30ea86c 100644
--- a/docs/gtars/modules.md
+++ b/docs/gtars/modules.md
@@ -6,7 +6,7 @@ gtars is organized as a workspace of independent Rust crates, each providing spe
 
 ## Core Infrastructure
 
-- **[gtars-core](core.md)** - Fundamental data structures and utilities
+- **[gtars-core](core.md)** - Fundamental data structures (`Region`, `RegionSet`, `RegionSetList`, `Interval`, `Fragment`, `CoordinateMode`) and utilities
 - **[gtars-io](io.md)** - I/O operations and file format parsers
 
 ## Genomic Analysis
@@ -17,6 +17,11 @@ gtars is organized as a workspace of independent Rust crates, each providing spe
 - **[gtars-scoring](scoring.md)** - Fragment overlap scoring
 - **[gtars-fragsplit](fragsplit.md)** - Fragment file splitting for pseudobulk
 
+## Distribution Analysis & Enrichment
+
+- **[gtars-genomicdist](genomicdist.md)** - Rust port of R GenomicDistributions: summary stats, interval set algebra, partitions, signal matrices, consensus
+- **[gtars-lola](lola.md)** - LOLA (Locus Overlap Analysis) enrichment testing built on IGD + genomicdist
+
 ## Machine Learning
 
 - **[gtars-tokenizers](tokenizers.md)** - Genomic region tokenizers for ML
diff --git a/docs/gtars/python-overview.md b/docs/gtars/python-overview.md
index cad45506..2ebb2e31 100644
--- a/docs/gtars/python-overview.md
+++ b/docs/gtars/python-overview.md
@@ -1,6 +1,6 @@
 # gtars-python
 
-Python bindings for gtars functionality, providing native Python API for genomic interval analysis.
+Python bindings for gtars — native Python API for genomic interval analysis, built on the Rust core.
 
 ## Installation
 
@@ -8,52 +8,93 @@ Python bindings for gtars functionality, providing native Python API for genomic
 pip install gtars
 ```
 
-## Available Modules
+For development installs from source, `uv pip install -e ./gtars-python` from the repo root.
 
-### tokenizers
-```python
-from gtars.tokenizers import Tokenizer
+## Submodules
 
-# Create tokenizer from BED file
-tokenizer = Tokenizer.from_bed("regions.bed")
+The `gtars` Python package exposes several submodules, each wrapping a gtars Rust crate:
 
-# Or from configuration
-tokenizer = Tokenizer.from_config("config.toml")
+| submodule | Rust crate | purpose | dedicated doc |
+|---|---|---|---|
+| `gtars.models` | `gtars-core` + `gtars-genomicdist::models` | `Region`, `RegionSet`, `RegionSetList`, reference genomes, gene models, partition lists, signal matrices — plus all the per-`RegionSet` statistics and interval algebra methods | [models](python/models.md) |
+| `gtars.genomic_distributions` | `gtars-genomicdist` | GC content, dinucleotide freq, partition classification, signal summaries, consensus regions (free functions that need a reference genome or a partition list) | [genomic_distributions](python/genomic_distributions.md) |
+| `gtars.lola` | `gtars-lola` | LOLA enrichment testing: `RegionDB`, `run_lola`, universe helpers | [lola](python/lola.md) |
+| `gtars.tokenizers` | `gtars-tokenizers` | Genomic region tokenizers for ML | *(see in-package docs)* |
+| `gtars.refget` | `gtars-refget` | GA4GH refget protocol client + local store | [digests](python/digests.md), [refget API](python/refget-api.md), [RefgetStore](python/refget-store.md) |
+| `gtars.utils` | `gtars-core::utils` | File-reading and parsing helpers | *(see in-package docs)* |
 
-# Tokenize regions
-tokens = tokenizer.tokenize(["chr1:1000-2000", "chr2:3000-4000"])
-```
+## Quick start
 
-### models (RegionSet)
 ```python
 from gtars.models import RegionSet
 
-# Create from BED file
-rs = RegionSet.from_bed("peaks.bed")
+# Load a BED file (local path, or URL if the http feature is enabled)
+rs = RegionSet("peaks.bed")
 
-# Access properties
-print(f"Number of regions: {len(rs)}")
-print(f"Total coverage: {rs.coverage()}")
+print(len(rs))                       # region count
+print(rs.identifier)                 # canonical bedbase MD5 id
+print(rs.mean_region_width())        # average width
 
-# Operations
-rs.sort()
-merged = rs.merge()
+# Iterate
+for region in rs:
+    print(region.chr, region.start, region.end)
+
+# Set algebra
+merged = rs.reduce()                 # merge overlapping
+promoters = rs.promoters(2000, 0)    # 2kb upstream
 ```
 
-### refget
+See [`gtars.models`](python/models.md) for the full `RegionSet` surface — including `reduce`, `setdiff`, `union`, `intersect_all`, `jaccard`, `coverage`, `neighbor_distances`, `nearest_neighbors`, and more.
+
+## Extended example — genomic distributions
+
 ```python
-from gtars.refget import RefgetStore, compute_digest
+from gtars.models import RegionSet, BinaryGenomeAssembly, PartitionList
+from gtars.genomic_distributions import calc_gc_content, calc_expected_partitions
+
+peaks = RegionSet("peaks.bed")
+genome = BinaryGenomeAssembly("hg38.fab")
+pl = PartitionList.from_gtf("gencode.v47.gtf.gz", core_prom=100, prox_prom=2000)
+chrom_sizes = {"chr1": 248_956_422, "chr2": 242_193_529}
+
+# GC content per region
+gc = calc_gc_content(peaks, genome, ignore_unk_chroms=True)
+print(f"Mean GC: {sum(gc) / len(gc):.3f}")
+
+# Observed vs expected partition enrichment
+result = calc_expected_partitions(peaks, pl, chrom_sizes)
+for p, o, e, lo, pv in zip(
+    result["partition"], result["observed"],
+    result["expected"], result["log10OE"], result["pvalue"],
+):
+    print(f"  {p:<15} obs={o:>6.0f}  exp={e:>10.1f}  log10(O/E)={lo:+.2f}  p={pv:.2e}")
+```
 
-# Compute sequence digest
-digest = compute_digest(b"ACGTACGT")
+## Extended example — LOLA enrichment
 
-# Use RefgetStore for sequence management
-store = RefgetStore()
+```python
+import pandas as pd
+from gtars.models import RegionSet
+from gtars.lola import RegionDB, run_lola
+
+def to_tuples(rs):
+    return [(r.chr, r.start, r.end) for r in rs]
+
+db = RegionDB.from_folder("LOLACore/hg38")
+
+user_sets = [to_tuples(RegionSet("peaks.bed"))]
+universe = to_tuples(RegionSet("universe.bed"))
+
+results = run_lola(user_sets, universe, db)
+df = pd.DataFrame(results).sort_values("pValueLog", ascending=False)
+print(df.head(20))
 ```
 
-## Performance
+See [`gtars.lola`](python/lola.md) for universe preparation helpers, `RegionDB` accessors, and the full result schema.
+
+## Performance notes
 
-- Native Rust performance
-- Zero-copy data transfer where possible
-- NumPy array integration
-- Parallel processing support
+- `RegionSet` construction sorts on load; repeated operations don't re-sort.
+- Overlap queries (`count_overlaps`, `find_overlaps`, `subset_by_overlaps`) go through an AIList index for O(log n) per-query lookups.
+- `BinaryGenomeAssembly` uses `mmap` for zero-copy sequence access — use it over `GenomeAssembly` for large genomes unless you specifically need the in-memory version.
+- Most numeric-result methods return Python lists; for very large outputs, consider iterating or using numpy wrappers at the call site.
diff --git a/docs/gtars/python/genomic_distributions.md b/docs/gtars/python/genomic_distributions.md
new file mode 100644
index 00000000..ab892057
--- /dev/null
+++ b/docs/gtars/python/genomic_distributions.md
@@ -0,0 +1,225 @@
+# `gtars.genomic_distributions`
+
+Python bindings for [`gtars-genomicdist`](../genomicdist.md) — the Rust port of R GenomicDistributions plus extras. This submodule contains the **free functions** that operate on regions-plus-something-else (a reference genome, a partition list, a signal matrix). Methods that operate only on a `RegionSet` (summary stats, interval algebra, peak clustering) live directly on `RegionSet` in [`gtars.models`](models.md).
+
+!!! info "Where to look for what"
+    - Per-region summaries, nearest-neighbor distances, density vectors, set algebra → `RegionSet` methods in [`gtars.models`](models.md).
+    - GC content / dinucleotide frequencies → **this module** (need a reference genome).
+    - Partition classification → **this module** (need a `PartitionList`).
+    - Signal matrix overlap → **this module** (need a `SignalMatrix`).
+    - Consensus regions across replicates → **this module**.
+    - LOLA enrichment → [`gtars.lola`](lola.md).
+
+The Rust reference page, [gtars-genomicdist](../genomicdist.md), has the algorithmic detail, caveats about bin-width semantics, divergences from R GenomicDistributions, and so on. This page is a Python-focused reference.
+
+## Sequence statistics
+
+These both take a `RegionSet` and a reference genome (`GenomeAssembly` for in-memory or `BinaryGenomeAssembly` for mmap-backed `.fab` files — see [`gtars.models`](models.md#genomeassembly)).
+
+### `calc_gc_content`
+
+```python
+from gtars.models import RegionSet, BinaryGenomeAssembly
+from gtars.genomic_distributions import calc_gc_content
+
+peaks = RegionSet("peaks.bed")
+genome = BinaryGenomeAssembly("hg38.fab")
+
+gc = calc_gc_content(peaks, genome, ignore_unk_chroms=True)
+# gc: list[float] — one value per region, 0.0–1.0
+```
+
+Setting `ignore_unk_chroms=False` raises if any region lives on a chromosome missing from the assembly. Set to `True` (default) to skip them silently.
+
+### `calc_dinucl_freq`
+
+```python
+from gtars.genomic_distributions import calc_dinucl_freq
+
+result = calc_dinucl_freq(
+    peaks, genome,
+    raw_counts=False,       # False (default) → percentages 0–100; True → integer counts
+    ignore_unk_chroms=True,
+)
+
+# result is a dict:
+result["region_labels"]   # list[str] — "chr_start_end" per region
+result["dinucleotides"]   # list[str] — 16 dinucleotide names in canonical order
+result["frequencies"]     # list[list[float]] — one row per region, 16 values per row
+```
+
+The 16 columns in `frequencies` correspond one-for-one with the 16 names in `dinucleotides`. For pooled global counts across all regions, run with `raw_counts=True` and sum each column of the `frequencies` matrix.
+
+This matches R GenomicDistributions `calcDinuclFreq` column-for-column, including the canonical order.
+
+## Partition classification
+
+Builds on the `PartitionList` type from [`gtars.models`](models.md#partitionlist).
+
+### `calc_partitions`
+
+```python
+from gtars.models import RegionSet, PartitionList
+from gtars.genomic_distributions import calc_partitions
+
+peaks = RegionSet("peaks.bed")
+pl = PartitionList.from_gtf("gencode.v47.gtf.gz", core_prom=100, prox_prom=2000)
+
+result = calc_partitions(peaks, pl, bp_proportion=False)
+
+# result is a dict with:
+result["partition"]   # list[str] — partition names + "intergenic"
+result["count"]       # list[int] — region (or bp) count per partition
+result["total"]       # int — sum of all counts
+```
+
+- `bp_proportion=False` (priority mode): each query region is assigned to the first partition it overlaps. Mutually exclusive. The `intergenic` bucket collects everything not matched.
+- `bp_proportion=True`: for each partition, sum the overlapping base pairs from the query set. Not mutually exclusive — a region that straddles two partitions contributes bp to each (matching R behavior).
+
+### `calc_expected_partitions`
+
+Observed vs expected enrichment with chi-square test, given chromosome sizes:
+
+```python
+from gtars.genomic_distributions import calc_expected_partitions
+
+chrom_sizes = {"chr1": 248_956_422, "chr2": 242_193_529, ...}
+
+result = calc_expected_partitions(
+    peaks, pl, chrom_sizes, bp_proportion=False,
+)
+
+# result is a dict with parallel lists:
+result["partition"]   # list[str]
+result["observed"]    # list[float]
+result["expected"]    # list[float]
+result["log10OE"]     # list[float] — log10(observed / expected)
+result["pvalue"]      # list[float] — chi-square goodness-of-fit p-value
+```
+
+!!! warning "p-values will differ slightly from R"
+    `calc_expected_partitions` uses a goodness-of-fit `(O-E)²/E` formula. R's `chisq.test()` computes a 2×2 test of independence with optional Yates correction, so p-values will not match R GenomicDistributions output byte-for-byte. The `log10OE` column is directly comparable.
+
+## Signal matrix overlap
+
+### `calc_summary_signal`
+
+Overlap a query region set against a pre-loaded `SignalMatrix` and compute Tukey boxplot statistics per condition:
+
+```python
+from gtars.models import RegionSet, SignalMatrix
+from gtars.genomic_distributions import calc_summary_signal
+
+peaks = RegionSet("peaks.bed")
+sm = SignalMatrix.from_tsv("signal_matrix.tsv")
+# or: sm = SignalMatrix.load_bin("signal_matrix.sigm")
+
+result = calc_summary_signal(peaks, sm)
+
+result["condition_names"]   # list[str] — columns of the signal matrix
+result["region_labels"]     # list[str] — "chr_start_end" per matched query
+result["signal_matrix"]     # list[list[float]] — row per matched query, col per condition (MAX)
+result["matrix_stats"]      # list of dicts — Tukey boxplot stats per condition
+```
+
+Each entry in `matrix_stats` has: `condition`, `lower_whisker`, `lower_hinge`, `median`, `upper_hinge`, `upper_whisker`.
+
+For each query region, the function finds all overlapping rows in the signal matrix and takes the **max** per condition — answering "what's the peak signal at this region in each cell type?".
+
+## Consensus regions
+
+### `consensus`
+
+Given N region sets (e.g. replicates), produce the reduced union with per-region replicate support count:
+
+```python
+from gtars.models import RegionSet
+from gtars.genomic_distributions import consensus
+
+reps = [
+    RegionSet("rep1.bed"),
+    RegionSet("rep2.bed"),
+    RegionSet("rep3.bed"),
+]
+
+cons = consensus(reps)
+# cons: list of dicts with keys: chr, start, end, count
+
+# Keep regions present in ≥ 2/3 replicates
+robust = [c for c in cons if c["count"] >= 2]
+print(f"{len(robust)} robust regions")
+```
+
+The `count` field is how many input sets overlap each union region — not the number of replicate *regions*, but the number of input *sets*.
+
+## Utilities
+
+### `median_abs_distance`
+
+Median of absolute values, filtering sentinels. Pairs well with `TssIndex.feature_distances`:
+
+```python
+from gtars.models import TssIndex, RegionSet
+from gtars.genomic_distributions import median_abs_distance
+
+tss = TssIndex("gencode.v47.tss.bed")
+peaks = RegionSet("peaks.bed")
+
+signed = tss.feature_distances(peaks)        # list[float | None]
+# Drop None values (regions on chromosomes not in the TSS index)
+clean = [d for d in signed if d is not None]
+
+median = median_abs_distance(clean)          # Optional[float]
+```
+
+Returns `None` if the input is empty or contains only missing/sentinel values.
+
+## End-to-end example
+
+```python
+from gtars.models import (
+    RegionSet, BinaryGenomeAssembly, PartitionList, SignalMatrix, TssIndex,
+)
+from gtars.genomic_distributions import (
+    calc_gc_content, calc_partitions, calc_expected_partitions,
+    calc_summary_signal, consensus,
+)
+
+# 1. Load inputs once
+peaks = RegionSet("peaks.bed")
+genome = BinaryGenomeAssembly("hg38.fab")
+pl = PartitionList.from_gtf("gencode.v47.gtf.gz", core_prom=100, prox_prom=2000)
+chrom_sizes = {"chr1": 248_956_422, "chr2": 242_193_529}
+
+# 2. Peak-level statistics — live on RegionSet itself
+print(f"{len(peaks)} regions, mean width {peaks.mean_region_width():.1f}")
+print(f"Per-chromosome: {peaks.chromosome_statistics()}")
+
+# 3. GC content per region
+gc = calc_gc_content(peaks, genome, ignore_unk_chroms=True)
+print(f"Mean GC: {sum(gc) / len(gc):.3f}")
+
+# 4. Partition enrichment (observed vs expected)
+exp = calc_expected_partitions(peaks, pl, chrom_sizes)
+for p, o, e, lo, pv in zip(
+    exp["partition"], exp["observed"], exp["expected"], exp["log10OE"], exp["pvalue"]
+):
+    print(f"  {p:<15} obs={o:>6.0f}  exp={e:>10.1f}  log10(O/E)={lo:+.2f}  p={pv:.2e}")
+
+# 5. Nearest-TSS distances
+tss = TssIndex("gencode.v47.tss.bed")
+tss_dists = tss.calc_tss_distances(peaks)
+print(f"Median distance to nearest TSS: {sorted(tss_dists)[len(tss_dists)//2]:,}")
+
+# 6. Consensus across replicates
+reps = [RegionSet(f"rep{i}.bed") for i in (1, 2, 3)]
+cons = consensus(reps)
+robust = [c for c in cons if c["count"] >= 2]
+print(f"{len(robust)} of {len(cons)} union regions present in ≥2 replicates")
+```
+
+## See also
+
+- **[`gtars.models`](models.md)** — `RegionSet`, `GenomeAssembly`, `PartitionList`, `SignalMatrix`, and the peak-stats methods attached to `RegionSet`.
+- **[`gtars.lola`](lola.md)** — LOLA enrichment testing.
+- **[gtars-genomicdist](../genomicdist.md)** — full Rust API reference, algorithmic notes, and caveats that apply identically in Python.
diff --git a/docs/gtars/python/lola.md b/docs/gtars/python/lola.md
new file mode 100644
index 00000000..4f269732
--- /dev/null
+++ b/docs/gtars/python/lola.md
@@ -0,0 +1,269 @@
+# `gtars.lola`
+
+Python bindings for [`gtars-lola`](../lola.md) — LOLA (Locus Overlap Analysis) enrichment testing. Given a user set of regions, a universe (background), and a region database, computes Fisher's exact test enrichment of the user set against every database region set.
+
+See the [Rust gtars-lola page](../lola.md) for the full statistical reference (Fisher's exact test, CMLE odds ratio, contingency table semantics, R LOLA compatibility notes). This page is a Python-focused reference.
+
+!!! warning "LOLA uses tuples, not `RegionSet`"
+    The Python `gtars.lola` API takes **`list[tuple[str, int, int]]`** — lists of `(chr, start, end)` tuples — rather than `RegionSet` objects. This is a short-term quirk of the binding. If you have a `RegionSet` in hand, convert with:
+
+    ```python
+    def to_tuples(rs):
+        return [(r.chr, r.start, r.end) for r in rs]
+    ```
+
+## End-to-end workflow
+
+```python
+from gtars.models import RegionSet
+from gtars.lola import RegionDB, run_lola
+
+# 1. Load the region database from a LOLA-format folder
+db = RegionDB.from_folder(
+    "LOLACore/hg38",
+    collections=None,   # None = all; or pass a list of collection names
+    limit=None,         # None = all files; or pass a per-collection cap
+)
+
+# 2. Prepare user sets and universe as lists of (chr, start, end) tuples
+peaks = RegionSet("peaks.bed")
+universe = RegionSet("universe.bed")
+
+def to_tuples(rs):
+    return [(r.chr, r.start, r.end) for r in rs]
+
+user_sets = [to_tuples(peaks)]          # list of user sets; each is a list of tuples
+universe_tuples = to_tuples(universe)
+
+# 3. Run the enrichment
+results = run_lola(
+    user_sets,
+    universe_tuples,
+    db,
+    min_overlap=1,
+    direction="enrichment",   # or "depletion"
+)
+
+# results is a column-oriented dict — DataFrame-friendly
+import pandas as pd
+df = pd.DataFrame(results)
+df = df.sort_values("pValueLog", ascending=False)
+print(df.head(20))
+```
+
+The 4-step workflow mirrors R LOLA: load database → convert inputs → run enrichment → pandas-ify.
+
+## `RegionDB`
+
+Wraps a `gtars-lola` `RegionDB` — an IGD overlap index plus original region sets plus per-file annotations, loaded from the standard LOLA directory layout:
+
+```text
+db_path/
+├── collection1/
+│   ├── collection.txt      # collector, date, source, description (TSV/CSV)
+│   ├── index.txt           # per-file annotations (TSV/CSV)
+│   └── regions/
+│       ├── file1.bed
+│       └── file2.bed
+├── collection2/
+│   └── …
+```
+
+### `RegionDB.from_folder`
+
+```python
+RegionDB.from_folder(db_path, collections=None, limit=None) -> RegionDB
+```
+
+- `db_path` — path to the LOLA database root.
+- `collections` — optional list of collection names to include (others are skipped).
+- `limit` — optional per-collection file cap (useful for smoke tests).
+
+`collection.txt` and `index.txt` are auto-detected TSV or CSV. Recognized `index.txt` columns: `filename`, `cellType`, `description`, `tissue`, `dataSource`, `antibody`, `treatment`. Unknown columns are ignored; missing optional columns become `None`.
+
+### `RegionDB.from_bed_files`
+
+Build a database from a list of BED file paths without the full LOLA layout:
+
+```python
+db = RegionDB.from_bed_files(
+    ["file1.bed", "file2.bed", "file3.bed"],
+    filenames=None,   # optional; defaults to basenames
+)
+```
+
+Useful for ad-hoc analyses where you have a list of peak files but don't want to set up a collection directory.
+
+### Accessors
+
+```python
+db.num_region_sets         # int — total files in the database
+
+db.list_region_sets()                       # list[str] — all filenames
+db.list_region_sets(collections=["enc3"])   # filtered
+
+# Extract region sets as a gtars.models.RegionSetList (names populated from filenames)
+rsl = db.get_region_sets()                  # all sets
+rsl = db.get_region_sets(indices=[0, 5, 12])
+
+# Annotations as lists of dicts
+db.region_anno       # per-file dicts with filename/cellType/description/tissue/...
+db.collection_anno   # per-collection dicts with collector/date/source/description
+```
+
+`get_region_sets()` returns a [`gtars.models.RegionSetList`](models.md#regionsetlist), which has `names` populated from the database filenames. This is the one path in Python where `RegionSetList.names` is non-None — normal `RegionSetList(...)` construction can't set names.
+
+## Running enrichment
+
+### `run_lola`
+
+```python
+run_lola(
+    user_sets: list[list[tuple[str, int, int]]],
+    universe: list[tuple[str, int, int]],
+    region_db: RegionDB,
+    min_overlap: int = 1,
+    direction: str = "enrichment",
+) -> dict
+```
+
+**Parameters:**
+
+- `user_sets` — a **list of user sets**, where each user set is a list of `(chr, start, end)` tuples. Pass one-element list `[peaks_tuples]` for a single-user-set analysis.
+- `universe` — a single list of `(chr, start, end)` tuples representing the background.
+- `region_db` — a `RegionDB`.
+- `min_overlap` — minimum bp overlap to count as overlapping (default 1).
+- `direction` — `"enrichment"` (default, P(X ≥ a), alternative "greater") or `"depletion"` (P(X ≤ a), alternative "less"). The strings `"greater"` / `"less"` are accepted as aliases.
+
+**Returns** a column-oriented dict with these parallel lists (one entry per `(user_set, db_set)` pair):
+
+| key | type | meaning |
+|---|---|---|
+| `userSet` | `list[int]` | 0-based user-set index |
+| `dbSet` | `list[int]` | 0-based db-set index |
+| `collection` | `list[str | None]` | from `index.txt` |
+| `pValueLog` | `list[float]` | `-log10(p)` from Fisher's exact test, capped at ~322 |
+| `oddsRatio` | `list[float]` | CMLE odds ratio (matches R `fisher.test()$estimate`) |
+| `support` | `list[int]` | overlap count between user set and db set — the contingency `a` |
+| `rnkPV`, `rnkOR`, `rnkSup` | `list[int]` | 1-based per-metric ranks within the user set |
+| `maxRnk` | `list[int]` | max of the three ranks |
+| `meanRnk` | `list[float]` | mean of the three ranks |
+| `b`, `c`, `d` | `list[int]` | signed contingency values (can be negative if user set extends outside universe) |
+| `qValue` | `list[float | None]` | BH-adjusted p-value (applied automatically inside `run_lola`) |
+| `description`, `cellType`, `tissue`, `antibody`, `treatment`, `dataSource` | `list[str | None]` | from `index.txt` |
+| `filename` | `list[str]` | db set file name |
+| `size` | `list[int]` | number of regions in the db set |
+
+The Python `run_lola` already calls `annotate_results` + `apply_fdr_correction` internally, so `qValue` and the annotation columns come back populated — you don't need to post-process.
+
+Rows are sorted by descending `pValueLog`, then ascending `meanRnk` (matching R LOLA output order).
+
+!!! note "Negative contingency values"
+    If your user set contains regions *outside* the universe, the contingency table produces negative `b`/`c`/`d`. The binding prints a warning to stderr, stores the signed values, and returns `pValueLog = 0.0`, `oddsRatio = NaN` for that row. To avoid this, preprocess with `redefine_user_sets` (below).
+
+## Universe preparation
+
+LOLA results are only meaningful when your universe contains your user sets. Three helpers for checking and shaping the universe:
+
+### `check_universe`
+
+```python
+from gtars.lola import check_universe
+
+report = check_universe(
+    user_sets,           # list[list[tuple[str, int, int]]]
+    universe_tuples,     # list[tuple[str, int, int]]
+)
+
+# report is a dict with parallel lists:
+report["userSet"]           # list[int]
+report["totalRegions"]      # list[int]
+report["regionsInUniverse"] # list[int]
+report["coverage"]          # list[float]  — 0.0 to 1.0
+report["manyToMany"]        # list[int]   — number of user regions overlapping >1 universe region
+report["warnings"]          # list[str]   — free-form warning messages
+```
+
+Warnings fire when coverage is under 50% (severe) or under 90% (moderate), and when any user region overlaps more than one universe region.
+
+### `redefine_user_sets`
+
+Rewrite each user set in terms of universe regions — for every user region, find the universe regions it overlaps and emit *those* as the new user set. Eliminates many-to-many mapping artifacts. This is the Python equivalent of R LOLA's `redefineUserSets()`.
+
+```python
+from gtars.lola import redefine_user_sets
+
+redefined = redefine_user_sets(user_sets, universe_tuples)
+# redefined is list[list[tuple[str, int, int]]] — same shape as input, but rewritten
+
+# Now run enrichment on the redefined version
+results = run_lola(redefined, universe_tuples, db)
+```
+
+### `build_restricted_universe`
+
+For differential enrichment analysis — build a universe that is exactly the union of all user sets, disjoined into non-overlapping pieces. This is R LOLA's `disjoin(unlist(userSets))`.
+
+```python
+from gtars.lola import build_restricted_universe
+
+restricted = build_restricted_universe(user_sets)
+# restricted: list[tuple[str, int, int]]
+
+# Use it as the universe in a differential run
+results = run_lola(user_sets, restricted, db)
+```
+
+## Complete example with pandas
+
+```python
+import pandas as pd
+from gtars.models import RegionSet
+from gtars.lola import RegionDB, run_lola, check_universe, redefine_user_sets
+
+# Convert a RegionSet to the tuple form lola expects
+def to_tuples(rs):
+    return [(r.chr, r.start, r.end) for r in rs]
+
+# Load inputs
+db = RegionDB.from_folder("LOLACore/hg38")
+peaks_a = to_tuples(RegionSet("condition_a.bed"))
+peaks_b = to_tuples(RegionSet("condition_b.bed"))
+universe = to_tuples(RegionSet("universe.bed"))
+
+user_sets = [peaks_a, peaks_b]
+
+# Universe sanity check
+report = check_universe(user_sets, universe)
+for us, cov, mm in zip(
+    report["userSet"], report["coverage"], report["manyToMany"]
+):
+    print(f"user set {us}: {cov:.1%} coverage, {mm} many-to-many")
+for w in report["warnings"]:
+    print(f"  ⚠ {w}")
+
+# Optional: eliminate many-to-many artifacts
+user_sets = redefine_user_sets(user_sets, universe)
+
+# Run enrichment
+results = run_lola(user_sets, universe, db, direction="enrichment")
+
+df = pd.DataFrame(results)
+print(df.shape)   # (n_user_sets * n_db_sets, 23)
+
+# Top 10 per user set
+for us in df["userSet"].unique():
+    top = (
+        df[df["userSet"] == us]
+        .sort_values("pValueLog", ascending=False)
+        .head(10)
+    )
+    print(f"\n=== User set {us} — top 10 ===")
+    print(top[["filename", "cellType", "pValueLog", "oddsRatio", "support", "qValue"]])
+```
+
+## See also
+
+- **[`gtars.models`](models.md)** — `RegionSet`, `RegionSetList`, and the types returned by `RegionDB.get_region_sets()`.
+- **[`gtars.genomic_distributions`](genomic_distributions.md)** — set algebra on `RegionSet` (which internally powers universe redefinition).
+- **[gtars-lola](../lola.md)** — full Rust API reference, including contingency table math, p-value computation via hypergeometric survival function, CMLE odds ratio, and the R LOLA TSV output format.
diff --git a/docs/gtars/python/models.md b/docs/gtars/python/models.md
new file mode 100644
index 00000000..c7a5c791
--- /dev/null
+++ b/docs/gtars/python/models.md
@@ -0,0 +1,351 @@
+# `gtars.models`
+
+The `gtars.models` submodule exposes the core genomic data types and — unlike the Rust layout, where statistics and set algebra live in separate crates — attaches almost the entire `gtars-genomicdist` method surface directly to `RegionSet`. In Python, you generally interact with `RegionSet` and the rest is built up on top of it.
+
+!!! info "Python ↔ Rust correspondence"
+    `gtars.models` combines types from three Rust crates:
+
+    - `gtars-core::models` → `Region`, `Interval`, `RegionSet`, `RegionSetList`
+    - `gtars-genomicdist::models` → `ChromosomeStatistics`, `GenomeAssembly`, `BinaryGenomeAssembly`, `TssIndex`, `GeneModel`, `PartitionList`, `SignalMatrix`, `GenomicDistAnnotation`
+    - `gtars-genomicdist::{IntervalRanges, GenomicIntervalSetStatistics}` → methods on `RegionSet`
+
+    Free functions that need a reference genome or a partition list live in [`gtars.genomic_distributions`](genomic_distributions.md).
+
+## Classes
+
+### `Region`
+
+A single genomic interval.
+
+```python
+from gtars.models import Region
+
+r = Region(chr="chr1", start=100, end=200, rest="peak1")
+print(r)          # chr1\t100\t200\tpeak1
+print(len(r))     # 100 (width)
+print(r.chr, r.start, r.end, r.rest)
+```
+
+Constructor: `Region(chr, start, end, rest=None)`.
+Attributes: `chr: str`, `start: int`, `end: int`, `rest: str | None`.
+Supports `==`, `!=`, `len()`, `str()`, `repr()`.
+
+### `RegionSet`
+
+The main workhorse — a collection of regions loaded from a BED file (or a URL, via the `http` feature on the underlying crate), and the attachment point for ~50 methods covering load/save, iteration, summaries, interval set algebra, overlap queries, and peak statistics.
+
+#### Construction
+
+```python
+from gtars.models import Region, RegionSet
+
+# From a local file (or URL if the http feature is compiled in)
+rs = RegionSet("peaks.bed")
+rs = RegionSet("peaks.bed.gz")
+rs = RegionSet("https://example.org/peaks.bed.gz")
+
+# From a list of Region objects
+regions = [
+    Region("chr1", 100, 200),
+    Region("chr1", 300, 400),
+]
+rs = RegionSet.from_regions(regions)
+
+# From parallel column vectors (useful when coming from pandas / numpy)
+rs = RegionSet.from_vectors(
+    chrs=["chr1", "chr1", "chr2"],
+    starts=[100, 300, 500],
+    ends=[200, 400, 600],
+)
+
+# Optional strand tracking
+rs = RegionSet.from_regions(regions, strands=["+", "-"])
+```
+
+Regions are auto-sorted by `(chr, start)` on construction.
+
+#### Properties and iteration
+
+```python
+len(rs)                 # number of regions
+list(rs)                # iterate; yields Region objects
+rs[0]                   # indexed access (supports negatives)
+rs.is_empty()
+
+rs.identifier           # MD5 of first three columns — canonical bedbase id
+rs.file_digest          # MD5 of full serialized content
+rs.header               # original BED header (may be None)
+rs.path                 # source path (str, raises if constructed in-memory)
+rs.strands              # parallel list of strand strings
+```
+
+#### Summaries
+
+```python
+rs.widths()                    # list[int], one width per region
+rs.region_widths()             # alias for widths()
+rs.mean_region_width()         # float, rounded to 2 decimals
+rs.get_nucleotide_length()     # int, total bp across all regions
+rs.get_max_end_per_chr()       # dict[str, int]
+rs.chromosome_statistics()     # dict[str, ChromosomeStatistics]
+rs.distribution(n_bins=250, chrom_sizes=None)
+                               # list of dicts: chr/start/end/n/rid
+```
+
+!!! warning "`chrom_sizes` matters"
+    Without `chrom_sizes`, `distribution()` derives bin size from the BED file's own observed max end — so outputs are **not comparable across files**. Pass `chrom_sizes` to get reference-genome-aligned bins that are comparable.
+
+#### I/O
+
+```python
+rs.to_bed("out.bed")
+rs.to_bed_gz("out.bed.gz")
+rs.to_bigbed("out.bb", "chrom.sizes")   # requires the bigbed feature
+rs.sort()                                # in-place
+```
+
+#### Interval set algebra
+
+Methods that mirror R GRanges/IRanges — all return a new `RegionSet`:
+
+```python
+rs.trim(chrom_sizes)                 # clamp to [0, chrom_size)
+rs.promoters(upstream=2000, downstream=0)
+rs.reduce()                          # merge overlapping/adjacent
+rs.disjoin()                         # tile at every boundary into disjoint pieces
+rs.setdiff(other)                    # remove other from self
+rs.subtract(other)                   # alias for setdiff
+rs.pintersect(other)                 # pairwise (index-aligned) intersect
+rs.intersect_all(other)              # all-vs-all intersection fragments
+rs.concat(other)                     # combine without merging
+rs.union(other)                      # concat + reduce
+rs.gaps(chrom_sizes)                 # peak-free regions bounded by chrom sizes
+```
+
+And similarity / coverage metrics:
+
+```python
+rs.jaccard(other)                    # nucleotide Jaccard |A∩B| / |A∪B|
+rs.coverage(other)                   # fraction of self bp covered by other
+rs.overlap_coefficient(other)        # |A∩B| / min(|A|, |B|)
+```
+
+!!! note "`rest` metadata is dropped"
+    Operations that merge or synthesize new intervals (`reduce`, `setdiff`, `promoters`, etc.) produce regions with `rest = None`. There's no unambiguous way to carry metadata through a merge.
+
+#### Overlap queries
+
+These route through the `gtars-overlaprs` AIList index under the hood:
+
+```python
+rs.subset_by_overlaps(other)         # RegionSet containing only self-regions that overlap other
+rs.count_overlaps(other)             # list[int], per-region overlap count
+rs.any_overlaps(other)               # list[bool], per-region "overlaps anything?"
+rs.find_overlaps(other)              # list[list[int]], indices into other per self-region
+rs.closest(other)                    # list[(self_idx, other_idx, distance)]
+rs.cluster(max_gap=0)                # list[int], cluster id per region
+```
+
+#### Peak statistics
+
+Ports of the R GenomicDistributions functions, exposed via the Rust `GenomicIntervalSetStatistics` trait as `RegionSet` methods in Python:
+
+```python
+rs.neighbor_distances()              # list[int], signed gaps per chromosome (multi-region chroms only)
+rs.nearest_neighbors()               # list[int], per-region min neighbor distance
+```
+
+!!! warning "Output length for `neighbor_distances` / `nearest_neighbors`"
+    Both methods **skip chromosomes with only one region** (matching R GenomicDistributions). The returned list is therefore **not aligned 1:1 with input regions** — it's shorter than `len(rs)` whenever any chromosome has exactly one peak. No sentinel values are emitted.
+
+!!! warning "`n_bins` is a target, not a total"
+    In `distribution(chrom_sizes=...)`, `n_bins` is the target bin count for the **longest** chromosome. Bin width is derived from that, and every chromosome is tiled at the same bp width — so the total window count is `sum(ceil(size / bin_width))` across chromosomes, often much larger than `n_bins`.
+
+### `RegionSetList`
+
+A collection of `RegionSet`s — the Python wrapper for `gtars-core::models::RegionSetList`. Substantially thinner than the Rust type: only construction from a list of `RegionSet`s, iteration/indexing, `concat`, and pairwise Jaccard.
+
+```python
+from gtars.models import RegionSet, RegionSetList
+
+rsl = RegionSetList([
+    RegionSet("rep1.bed"),
+    RegionSet("rep2.bed"),
+    RegionSet("rep3.bed"),
+])
+
+len(rsl)                     # 3
+rsl[0]                       # RegionSet
+list(rsl)                    # iterate
+
+combined = rsl.concat()      # flatten into a single RegionSet (no merge/dedup)
+rsl.names                    # None unless produced by RegionDB.get_region_sets()
+
+# Full N×N Jaccard similarity matrix
+jac = rsl.pairwise_jaccard()  # list[list[float]], symmetric, 1.0 on diagonal
+```
+
+!!! note "Names come from `RegionDB`"
+    The Python `RegionSetList.__new__` does not accept a `names=` parameter — names are only populated when a `RegionSetList` is produced by `RegionDB.get_region_sets()` (see [`gtars.lola`](lola.md)). If you need named sets from scratch, that's currently only available in the Rust API.
+
+### `Interval`
+
+A generic integer interval with a payload. Primarily a helper for overlap indexes — most user code should use `Region` / `RegionSet` instead.
+
+### Statistics result types
+
+These are returned by `RegionSet` methods. All are read-only dataclass-like types with named fields.
+
+#### `ChromosomeStatistics`
+
+Returned by `rs.chromosome_statistics()` → `dict[str, ChromosomeStatistics]`.
+
+| field | type | meaning |
+|---|---|---|
+| `chromosome` | `str` | chromosome name |
+| `number_of_regions` | `int` | region count on this chromosome |
+| `start_nucleotide_position` | `int` | leftmost start |
+| `end_nucleotide_position` | `int` | rightmost end |
+| `minimum_region_length` | `int` | |
+| `maximum_region_length` | `int` | |
+| `mean_region_length` | `float` | |
+| `median_region_length` | `float` | |
+
+## Reference genome and annotation types
+
+These types are constructed from files on disk and passed into the functions in [`gtars.genomic_distributions`](genomic_distributions.md).
+
+### `GenomeAssembly`
+
+In-memory FASTA loader. Slow to construct (~2s for hg38) but fast for per-region lookups.
+
+```python
+from gtars.models import GenomeAssembly
+
+genome = GenomeAssembly("hg38.fa")
+```
+
+### `BinaryGenomeAssembly`
+
+mmap-backed `.fab` binary FASTA — instant construction, zero-copy per-region access.
+
+```python
+from gtars.models import BinaryGenomeAssembly
+
+genome = BinaryGenomeAssembly("hg38.fab")
+```
+
+Create a `.fab` file once up front with the `gtars prep --fasta hg38.fa` CLI command (or the Rust `BinaryGenomeAssembly.write_from_fasta`).
+
+### `TssIndex`
+
+Sorted-per-chromosome TSS midpoint index for fast nearest-TSS queries.
+
+```python
+from gtars.models import TssIndex, RegionSet
+
+tss = TssIndex("gencode.v47.tss.bed")
+# or:
+# tss = TssIndex.from_regionset(RegionSet("gencode.v47.tss.bed"))
+
+peaks = RegionSet("peaks.bed")
+
+dists = tss.calc_tss_distances(peaks)       # list[int]
+signed = tss.feature_distances(peaks)       # list[float | None]
+```
+
+`feature_distances` returns `None` for regions on chromosomes that have no features in the index.
+
+### `GeneModel`
+
+Loaded from a GTF file. Used to build a `PartitionList`.
+
+```python
+from gtars.models import GeneModel
+
+model = GeneModel.from_gtf(
+    "gencode.v47.annotation.gtf.gz",
+    filter_protein_coding=True,
+    convert_ensembl_ucsc=True,
+)
+print(model.n_genes, model.n_exons)
+```
+
+- `filter_protein_coding` — keep only features with `gene_biotype "protein_coding"` (or `gene_type`).
+- `convert_ensembl_ucsc` — prepend `chr` to chromosome names that don't already have it.
+
+### `PartitionList`
+
+Ordered, priority-based genomic partition list: `promoterCore` → `promoterProx` → `threeUTR` → `fiveUTR` → `exon` → `intron` → `intergenic`.
+
+```python
+from gtars.models import GeneModel, PartitionList
+
+model = GeneModel.from_gtf("gencode.v47.gtf.gz")
+
+# From an existing GeneModel
+pl = PartitionList.from_gene_model(
+    model,
+    core_prom=100,
+    prox_prom=2000,
+    chrom_sizes=None,  # optional; used for trimming to chrom boundaries
+)
+
+# Or directly from GTF in one call
+pl = PartitionList.from_gtf(
+    "gencode.v47.gtf.gz",
+    core_prom=100,
+    prox_prom=2000,
+    filter_protein_coding=True,
+    convert_ensembl_ucsc=True,
+)
+
+print(pl.partition_names())   # ['promoterCore', 'promoterProx', 'exon', 'intron', ...]
+print(len(pl))                # number of partitions
+```
+
+Pass the resulting `PartitionList` to `calc_partitions` / `calc_expected_partitions` in [`gtars.genomic_distributions`](genomic_distributions.md).
+
+### `SignalMatrix`
+
+A region × condition matrix of signal values (e.g. a peak × cell-type ChIP intensity matrix). Load from TSV or the packed binary `.sigm` format:
+
+```python
+from gtars.models import SignalMatrix
+
+sm = SignalMatrix.from_tsv("signal_matrix.tsv")
+sm = SignalMatrix.load_bin("signal_matrix.sigm")
+
+print(sm.condition_names)   # list[str]
+print(sm.n_conditions)      # int
+print(sm.n_regions)         # int
+print(len(sm))              # n_regions
+```
+
+Pass it to `calc_summary_signal` in [`gtars.genomic_distributions`](genomic_distributions.md).
+
+### `GenomicDistAnnotation`
+
+A serializable wrapper around `GeneModel` in the GDA binary format. Chrom sizes are not stored in a GDA file — they come from a separate source.
+
+```python
+from gtars.models import GenomicDistAnnotation
+
+# Build once from GTF, save as GDA (save via Rust / CLI)
+gda = GenomicDistAnnotation.from_gtf("gencode.v47.gtf.gz")
+
+# Load from disk (fast)
+gda = GenomicDistAnnotation.load_bin("gencode.v47.gda")
+
+# Derived views
+model = gda.gene_model()
+pl = gda.partition_list(core_prom=100, prox_prom=2000, chrom_sizes=None)
+tss = gda.tss_index()
+```
+
+## See also
+
+- **[`gtars.genomic_distributions`](genomic_distributions.md)** — free functions for GC content, dinucleotide frequencies, partition classification, signal summaries, consensus regions.
+- **[`gtars.lola`](lola.md)** — LOLA enrichment testing.
+- **[gtars-core](../core.md)** — the Rust crate `gtars.models` wraps. Useful if you want to know which methods are "free" vs. which come from `gtars-genomicdist`.
+- **[Core models tour](../regionSet.md)** — cross-language walkthrough of `Region` / `RegionSet` / `RegionSetList`.
diff --git a/docs/gtars/r.md b/docs/gtars/r.md
index 36bdefe9..9b1c8edf 100644
--- a/docs/gtars/r.md
+++ b/docs/gtars/r.md
@@ -1,44 +1,135 @@
 # gtars-r
 
-R bindings for gtars functionality, providing an R API for genomic interval analysis.
+R bindings for gtars — high-performance genomic interval analysis in R, backed by Rust via `extendr`. The package is `gtars`.
 
 ## Installation
 
-To install the development version, use the `remotes` package:
-
-``` R
+```r
 install.packages("remotes")
-remotes::install_github("databio/gtars", ref = "dev", subdir = "gtars-r")
+remotes::install_github("databio/gtars", subdir = "gtars-r")
 ```
 
-You can also install the R package locally from the repo directory:
+Or install from a local clone of the repo:
 
-``` console
+```console
 R CMD INSTALL gtars-r
 ```
 
-## Available Modules
+Building the package requires a Rust toolchain (`cargo`, `rustc`). See the [gtars-r README](https://github.com/databio/gtars/blob/dev/gtars-r/README.md) for detailed setup notes.
+
+## Loading
+
+```r
+library(gtars)
+```
+
+## Submodule pages
+
+| page | covers |
+|---|---|
+| [**RegionSet & RegionSetList**](r/regionset.md) | S4 classes for genomic regions, interval set algebra, overlap queries, conversion to/from `GRanges`, pairwise Jaccard. |
+| [**GenomicDistributions**](r/genomicdist.md) | Drop-in replacements for R GenomicDistributions — `calcWidth`, `calcGCContent`, `calcDinuclFreq`, `genomePartitionList`, `calcPartitions`, `calcExpectedPartitions`, `calcSummarySignal`, `calcFeatureDist`, etc. |
+| [**LOLA enrichment**](r/lola.md) | Drop-in replacement for R LOLA — `loadRegionDB`, `runLOLA`, `checkUniverseAppropriateness`, `redefineUserSets`, `buildRestrictedUniverse`. |
+| [**IGD indexing**](r/igd.md) | Low-level `igd_create` and `igd_search` for building and querying IGD databases directly. |
+| [**RefgetStore**](r/refgetstore.md) | GA4GH refget protocol client. |
+
+## Polymorphic inputs
+
+A defining feature of the R gtars API: nearly every function that takes a query accepts **any** of these input types interchangeably:
+
+- a `RegionSet` S4 object,
+- a `GRanges` object (from `GenomicRanges`),
+- a file path to a BED / BED.gz / narrowPeak file,
+- a `data.frame` with `chr` / `start` / `end` columns (and optional `strand`).
+
+This means you can drop gtars into an existing Bioconductor workflow without explicit conversion:
+
+```r
+library(GenomicRanges)
+library(gtars)
+
+gr <- GRanges(...)
+
+widths <- calcWidth(gr)                        # GRanges works directly
+merged <- reduce(gr)                            # overridden S4 method
+jac <- jaccard(gr, "other_peaks.bed")           # mix GRanges with file path
+```
+
+## Quick start
+
+```r
+library(gtars)
+
+# 1. Load regions
+peaks <- RegionSet("peaks.bed")
+length(peaks)
+show(peaks)
+
+# 2. Summary statistics
+calcWidth(peaks) |> summary()
+chromosomeStatistics(peaks)
+
+# 3. Interval set algebra
+merged <- reduce(peaks)
+proms  <- promoters(peaks, upstream = 2000L, downstream = 200L)
+
+# 4. Convert back to GRanges for downstream Bioconductor work
+#    (GenomicRanges only needed here — it's a bridge, not a core dep)
+library(GenomicRanges)
+gr <- as_granges(peaks)
+```
+
+## LOLA enrichment quick start
 
-### refget
-``` R
+```r
 library(gtars)
 
-# Compute sequence digest
-readable <- 'ACGTACGT'
+regionDB <- loadRegionDB("LOLACore/hg38")
+
+results <- runLOLA(
+  userSets      = list(RegionSet("peaks_a.bed"), RegionSet("peaks_b.bed")),
+  userUniverse  = "universe.bed",
+  regionDB      = regionDB,
+  redefineUserSets = TRUE,
+)
+
+head(results[order(-pValueLog)], 20)
+```
+
+## refget quick start
+
+```r
+library(gtars)
+
+readable <- "ACGTACGT"
 gtars::sha512t24u_digest(readable)
 gtars::md5_digest(readable)
 
-# Use RefgetStore for sequence management
-store <- global_refget_store('raw')
+store <- global_refget_store("raw")
 ```
 
-### IGD
-``` R
+## IGD quick start
+
+```r
 library(gtars)
 
-### Building an Index
-igd_create(output_path = '/home/igd_output/', filelist = '/home/my_bedfiles/')
+# Build an IGD index from a directory of BED files
+igd_create(output_path = "./igd_out", filelist = "./my_beds/", db_name = "my_peaks")
 
-### Querying with a single bed file
-igd_search(database_path = 'my_igd_database.igd', query_path = 'my_query.bed')
+# Query the index
+hits <- igd_search(
+  database_path = "./igd_out/my_peaks.igd",
+  query_path    = "query.bed",
+)
 ```
+
+## Coordinate-system notes
+
+The R package uses **0-based half-open** coordinates internally (BED convention) to match the Rust core. When you pass a `GRanges` object to a gtars function, coordinates are converted from 1-based closed to 0-based half-open automatically. Converting back via `as_granges(rs)` does the reverse.
+
+If you're building a `RegionSet` from a `data.frame` directly, the `start` and `end` columns are expected to already be in 0-based half-open form.
+
+## See also
+
+- **[gtars-core](core.md)** and **[gtars-genomicdist](genomicdist.md)** — Rust reference pages with the algorithmic detail and caveats that apply identically in R.
+- **[R GenomicDistributions](https://code.databio.org/GenomicDistributions/)** and **[R LOLA](https://www.bioconductor.org/packages/release/bioc/html/LOLA.html)** — the original packages gtars R is a port of.
diff --git a/docs/gtars/r/genomicdist.md b/docs/gtars/r/genomicdist.md
new file mode 100644
index 00000000..bd678746
--- /dev/null
+++ b/docs/gtars/r/genomicdist.md
@@ -0,0 +1,224 @@
+# GenomicDistributions in R (gtars)
+
+The `gtars` R package ships **drop-in replacements** for the core functions in the [R GenomicDistributions](https://code.databio.org/GenomicDistributions/) package, backed by the Rust [`gtars-genomicdist`](../genomicdist.md) crate. The function names, signatures, and return shapes match R GenomicDistributions closely enough that most analyses should port over with one-line changes.
+
+Every function accepts polymorphic query input — `RegionSet`, `GRanges`, file path, or `data.frame` — so you can mix gtars with existing Bioconductor workflows without explicit conversion.
+
+For algorithmic detail and caveats that apply identically in R, see [gtars-genomicdist](../genomicdist.md).
+
+## Drop-in replacements for GenomicDistributions
+
+| gtars R function | R GenomicDistributions equivalent | notes |
+|---|---|---|
+| `calcWidth(query)` | `calcWidth()` | delegates to `widths()` S4 method |
+| `calcNeighborDist(query)` | `calcNeighborDist()` | signed inter-region gaps per chromosome |
+| `calcNearestNeighbors(query)` | `calcNearestNeighbors()` | per-region min neighbor distance |
+| `regionDistribution(query, nBins, chromSizes)` | `calcChromBins()` | bin counts; pass `chromSizes` for reference-aligned bins |
+| `calcGCContent(query, ref, ignoreUnkChroms)` | `calcGCContent()` | requires a `GenomeAssembly` pointer, not BSgenome |
+| `calcDinuclFreq(query, ref, rawCounts)` | `calcDinuclFreq()` | returns a data.frame with region + 16 dinucleotide columns |
+| `genomePartitionList(...)` | `genomePartitionList()` | build partition list from gene model components |
+| `partitionListFromGTF(path, ...)` | *(not in original)* | convenience — loads gene model from GTF in one step |
+| `calcPartitions(query, partitionList, bpProportion)` | `calcPartitions()` | priority-based (or bp-proportion) partition classification |
+| `calcExpectedPartitions(query, partitionList, genomeSize)` | `calcExpectedPartitions()` | observed vs. expected with chi-square p-values |
+| `calcSummarySignal(query, signalMatrix)` | `calcSummarySignal()` | overlap query regions with a signal matrix |
+| `calcFeatureDist(query, features)` | `calcFeatureDist()` | signed distance to nearest feature |
+| `calcTSSDist(query, features)` | *(not in original)* | absolute distance variant |
+| `loadGenomeAssembly(fasta_path)` | — | replaces BSgenome loading (FASTA directly) |
+
+## Porting an analysis
+
+Most GenomicDistributions scripts port by adding `library(gtars)` after `library(GenomicDistributions)` — gtars masks the functions with faster implementations:
+
+```r
+library(GenomicDistributions)  # optional, for plotting helpers
+library(gtars)                  # masks calcWidth, calcGCContent, etc.
+library(GenomicRanges)
+
+query <- GRanges(...)           # or a file path, or a data.frame
+widths <- calcWidth(query)      # now backed by Rust
+
+# Use existing plotGenomicDist* helpers on the results
+plotChromBins(regionDistribution(query, chromSizes = hg38))
+```
+
+## Basic statistics
+
+```r
+library(gtars)
+
+# Works with any query input: RegionSet, GRanges, file path, or data.frame
+query <- "peaks.bed"
+
+w  <- calcWidth(query)                  # numeric vector of widths
+nd <- calcNeighborDist(query)           # signed gaps per chromosome
+nn <- calcNearestNeighbors(query)       # min neighbor distance per region
+```
+
+!!! warning "`calcNeighborDist` / `calcNearestNeighbors` output length"
+    Both functions **skip chromosomes with only one region** (matching R GenomicDistributions). The returned vector is shorter than `length(query)` whenever any chromosome has a single peak, and is **not aligned 1:1** with input regions.
+
+## Region distribution
+
+```r
+# Without chromSizes: bin size derived from the query itself
+# (not comparable across files)
+rd <- regionDistribution(query, nBins = 250L)
+
+# With chromSizes: reference-aligned bins (comparable across files,
+# aligned with reference genome positions)
+chromSizes <- c(chr1 = 248956422L, chr2 = 242193529L)
+rd <- regionDistribution(query, nBins = 250L, chromSizes = chromSizes)
+```
+
+Returns a data.table compatible with R GenomicDistributions' `plotChromBins`.
+
+!!! warning "`nBins` is a target, not a total"
+    When `chromSizes` is provided, `nBins` is the target bin count for the **longest** chromosome in `chromSizes`. Bin width is derived as `max(chromSizes) %/% nBins` (floored, minimum 1 bp), and every chromosome is tiled at the same bp width — so shorter chromosomes get proportionally fewer bins. The total bin count returned is `sum(ceiling(chrom_size / bin_width))`, which can substantially exceed `nBins` when `chromSizes` has many entries. To target a specific bin width in bp instead, pass `nBins = max_chrom_len %/% desired_bp`.
+
+## GC content and dinucleotides
+
+Unlike the original R GenomicDistributions (which uses BSgenome), gtars loads a reference genome directly from a FASTA file:
+
+```r
+genome <- loadGenomeAssembly("hg38.fa")
+
+gc <- calcGCContent(query, genome, ignoreUnkChroms = TRUE)
+# numeric vector of GC content per region, 0.0 to 1.0
+
+dinucl <- calcDinuclFreq(query, genome, rawCounts = FALSE)
+# data.frame with columns: region + 16 dinucleotide columns
+# rawCounts = FALSE → percentages (each row sums to 100)
+# rawCounts = TRUE  → integer counts
+
+# Pooled global counts across all regions
+pooled <- colSums(calcDinuclFreq(query, genome, rawCounts = TRUE)[, -1])
+```
+
+## Partitions
+
+Build a `PartitionList` of promoter / UTR / exon / intron / intergenic categories from gene model components, then classify your query. You can either supply each component explicitly or load them in one call from a GTF.
+
+### From gene model components
+
+```r
+# From GRanges, file paths, or data.frames — any combination works
+pl <- genomePartitionList(
+  genesGR      = "genes.bed",
+  exonsGR      = "exons.bed",
+  threeUTRGR   = "three_utr.bed",
+  fiveUTRGR    = "five_utr.bed",
+  corePromSize = 100L,
+  proxPromSize = 2000L,
+  chromSizes   = c(chr1 = 248956422L, chr2 = 242193529L),  # optional
+)
+```
+
+Strand information from the inputs (GRanges strand column, or the `strand` column of a data.frame) is used for strand-aware promoter and reduce operations — critical for getting correct promoters on minus-strand genes.
+
+### From a GTF
+
+```r
+# Single-call convenience — loads gene model and builds partitions
+pl <- partitionListFromGTF(
+  "gencode.v47.annotation.gtf.gz",
+  filterProteinCoding = TRUE,
+  convertEnsemblUCSC  = FALSE,
+  corePromSize        = 100L,
+  proxPromSize        = 2000L,
+  chromSizes          = chromSizes,
+)
+```
+
+### Classifying query regions
+
+```r
+# Priority mode — each query region assigned to first matching partition
+counts <- calcPartitions(query, pl, bpProportion = FALSE)
+# data.frame with partition, Freq columns + attr(counts, "total")
+
+total <- attr(counts, "total")
+counts$pct <- counts$Freq / total * 100
+print(counts)
+
+# Observed vs expected with chi-square p-values
+chromSizes <- c(chr1 = 248956422L, chr2 = 242193529L)
+expected <- calcExpectedPartitions(query, pl, chromSizes, bpProportion = FALSE)
+# data.frame with partition, observed, expected, log10OE, pvalue columns
+```
+
+!!! warning "p-values differ slightly from R GenomicDistributions"
+    `calcExpectedPartitions` uses a goodness-of-fit `(O-E)²/E` formula. R `chisq.test()` computes a 2×2 test of independence with optional Yates correction, so p-values will not match R GenomicDistributions output byte-for-byte. The `log10OE` column is directly comparable.
+
+## Signal matrix overlap
+
+```r
+# signalMatrix is a data.frame / data.table where column 1 is the region ID
+# in "chr_start_end" format and the remaining columns are condition names
+# with numeric signal values.
+signalMatrix <- data.table::fread("signal_matrix.tsv")
+
+result <- calcSummarySignal(query, signalMatrix)
+# result is a list compatible with GenomicDistributions' plotSummarySignal:
+#   $signalSummaryMatrix — data.table with queryPeak + one column per condition
+#   $matrixStats — data.frame with 5 rows (Tukey stats) and one column per condition
+
+# Row names of matrixStats are the Tukey boxplot quantities:
+#   lowerWhisker, lowerHinge, median, upperHinge, upperWhisker
+print(result$matrixStats)
+```
+
+Plug directly into R GenomicDistributions' `plotSummarySignal()` without further conversion.
+
+## Distances to TSS / nearest features
+
+```r
+tss <- "gencode.v47.tss.bed"
+
+# Signed distance (positive = feature downstream, negative = upstream)
+d_signed <- calcFeatureDist(query, tss)
+
+# Absolute distance
+d_abs <- calcTSSDist(query, tss)
+```
+
+Both accept RegionSet, GRanges, file path, or data.frame for either argument.
+
+## End-to-end example
+
+```r
+library(gtars)
+
+# 1. Load inputs once
+peaks <- RegionSet("peaks.bed")
+genome <- loadGenomeAssembly("hg38.fa")
+chromSizes <- c(chr1 = 248956422L, chr2 = 242193529L)
+pl <- partitionListFromGTF("gencode.v47.gtf.gz", corePromSize = 100L,
+                            proxPromSize = 2000L, chromSizes = chromSizes)
+
+# 2. Widths and GC content
+cat(sprintf("%d peaks, median width %.0f\n",
+            length(peaks), median(calcWidth(peaks))))
+
+gc <- calcGCContent(peaks, genome, ignoreUnkChroms = TRUE)
+cat(sprintf("Mean GC content: %.3f\n", mean(gc)))
+
+# 3. Partition enrichment
+expected <- calcExpectedPartitions(peaks, pl, chromSizes, bpProportion = FALSE)
+print(expected)
+
+# 4. Distance to nearest TSS
+tssDist <- calcTSSDist(peaks, "gencode.v47.tss.bed")
+cat(sprintf("Median distance to nearest TSS: %d bp\n", median(tssDist)))
+
+# 5. Convert to GRanges for downstream Bioconductor work
+#    (only step that needs GenomicRanges — it's a bridge, not a core dep)
+library(GenomicRanges)
+gr <- as_granges(peaks)
+```
+
+## See also
+
+- **[RegionSet & RegionSetList (R)](regionset.md)** — S4 class reference and method list.
+- **[R LOLA interface](lola.md)** — enrichment testing built on top of these types.
+- **[gtars-genomicdist](../genomicdist.md)** — Rust reference with all the algorithmic detail.
+- **[R GenomicDistributions](https://code.databio.org/GenomicDistributions/)** — the original package gtars is a port of. Plotting helpers from GenomicDistributions still work on gtars return values.
diff --git a/docs/gtars/r/igd.md b/docs/gtars/r/igd.md
new file mode 100644
index 00000000..948653c8
--- /dev/null
+++ b/docs/gtars/r/igd.md
@@ -0,0 +1,101 @@
+# IGD in R (gtars)
+
+Low-level R bindings for IGD ([Indexed Genomic Data](../igd.md)) — a fast binary index over collections of BED files, used internally by gtars LOLA and by the BEDbase retrieval pipeline. The R API exposes two functions: `igd_create` (build an index from BED files on disk) and `igd_search` (query an existing index with a BED file).
+
+If you're running LOLA enrichment, you usually don't need to touch IGD directly — `loadRegionDB()` in [`r/lola.md`](lola.md) builds an IGD index internally from the region database. Use this page if you want to index a custom collection of BED files and query it directly.
+
+## Creating an IGD database
+
+```r
+library(gtars)
+
+igd_create(
+  output_path = "path/to/output/",
+  filelist    = "path/to/bed/files/",
+  db_name     = "igd_database",  # default
+)
+```
+
+**Arguments:**
+
+- `output_path` — directory where the IGD database files will be written. Must exist.
+- `filelist` — one of:
+    - a path to a text file listing BED file paths (one per line),
+    - a path to a directory containing BED files (all `.bed` / `.bed.gz` files will be indexed),
+    - `"-"` or `"stdin"` to read paths from standard input.
+- `db_name` — prefix for the output filenames (default `"igd_database"`). The function produces two files: `<db_name>.igd` (the index) and `<db_name>_index.tsv` (file metadata).
+
+Returns `NULL` invisibly on success. Errors are raised via `stop()` on invalid input.
+
+### Example
+
+```r
+# From a directory of BED files
+igd_create(
+  output_path = "./igd_out",
+  filelist    = "./my_bed_files/",
+  db_name     = "my_peaks",
+)
+
+# Produces:
+# ./igd_out/my_peaks.igd
+# ./igd_out/my_peaks_index.tsv
+
+# From an explicit file list
+writeLines(
+  c("./rep1.bed", "./rep2.bed", "./rep3.bed"),
+  "bed_list.txt",
+)
+igd_create(
+  output_path = "./igd_out",
+  filelist    = "bed_list.txt",
+  db_name     = "replicates",
+)
+```
+
+## Searching an IGD database
+
+```r
+hits <- igd_search(
+  database_path = "path/to/my_peaks.igd",
+  query_path    = "path/to/query.bed",
+)
+```
+
+**Arguments:**
+
+- `database_path` — path to an existing `.igd` file (produced by `igd_create` or any other IGD-compatible tool).
+- `query_path` — path to a BED file containing the query regions.
+
+**Returns** a `data.frame` of overlap hits. The exact columns depend on the query result schema — typically `file_id`, `filename`, `overlap_count`, etc. Use `colnames(hits)` to discover the structure for your version.
+
+### Example
+
+```r
+hits <- igd_search(
+  database_path = "./igd_out/my_peaks.igd",
+  query_path    = "query.bed",
+)
+
+# Inspect the schema
+colnames(hits)
+head(hits)
+
+# Typical analysis: count overlaps per database file
+aggregate(overlap_count ~ filename, data = hits, sum)
+```
+
+## Relationship to other gtars R functions
+
+- **`loadRegionDB()` in `lola.R`** — builds an IGD index internally for a LOLA region database. The IGD is wrapped inside a `RegionDB` pointer and used by `runLOLA()`. Users running enrichment typically don't need to call `igd_create` / `igd_search` separately.
+- **`countOverlaps()` / `findOverlaps()` on `RegionSet`** — ad-hoc overlap queries between two `RegionSet` objects use the AIList index from [`gtars-overlaprs`](../overlaprs.md), not IGD. Use those methods when you have both the query and subject in memory.
+- **IGD is preferred when** you have a large static collection of BED files that you'll query many times against different user sets, and you want the index persisted on disk.
+
+!!! note "Naming convention"
+    Unlike the rest of the R gtars API, which uses camelCase (`runLOLA`, `calcPartitions`, etc.), the IGD functions use snake_case (`igd_create`, `igd_search`). This is a legacy from the direct CLI mapping and may be aligned in a future release.
+
+## See also
+
+- **[gtars-igd](../igd.md)** — the underlying Rust crate, with the binary format reference and performance characteristics.
+- **[R LOLA interface](lola.md)** — `loadRegionDB()` and `runLOLA()`, which use IGD internally.
+- **[gtars CLI](../cli.md)** — `gtars igd create` and `gtars igd search` commands, equivalent to these R functions.
diff --git a/docs/gtars/r/lola.md b/docs/gtars/r/lola.md
new file mode 100644
index 00000000..7fdab4a0
--- /dev/null
+++ b/docs/gtars/r/lola.md
@@ -0,0 +1,228 @@
+# LOLA enrichment in R (gtars)
+
+The `gtars` R package ships a **drop-in replacement** for the [LOLA](https://www.bioconductor.org/packages/release/bioc/html/LOLA.html) R/Bioconductor package, powered by the Rust [`gtars-lola`](../lola.md) crate. Function names and signatures match LOLA closely enough that existing scripts port over with one-line changes.
+
+For the underlying statistical reference see [gtars-lola](../lola.md) — Fisher's exact test on a hypergeometric survival function, CMLE odds ratio via Brent's method, per-user-set Benjamini-Hochberg FDR correction, and 1-based TSV output compatible with R LOLA's `writeCombinedEnrichment`.
+
+## Loading the database
+
+`loadRegionDB()` reads a standard LOLA-format folder (one directory per collection, each with `collection.txt`, `index.txt`, and a `regions/` subdirectory of BED files):
+
+```r
+library(gtars)
+
+# Load the full database
+regionDB <- loadRegionDB("LOLACore/hg38")
+
+# Load only specific collections
+regionDB <- loadRegionDB("LOLACore/hg38", collections = c("encode_tfbs", "roadmap_epigenomics"))
+
+# Per-collection file cap (useful for smoke tests)
+regionDB <- loadRegionDB("LOLACore/hg38", limit = 10L)
+```
+
+### From a list of BED files
+
+If you don't have the full LOLA folder layout, `loadRegionDBFromBeds()` builds a minimal region database from an arbitrary list of BED paths:
+
+```r
+regionDB <- loadRegionDBFromBeds(
+  bedFiles  = c("file1.bed", "file2.bed", "file3.bed"),
+  filenames = c("Condition A", "Condition B", "Condition C"),  # optional display names
+)
+```
+
+### Accessing annotations
+
+```r
+# Per-file annotation table (matches R LOLA's regionDB$regionAnno)
+regionAnno <- regionDBAnno(regionDB)
+# data.table with columns: filename, cellType, description, tissue,
+# dataSource, antibody, treatment, collection
+
+# Collection-level annotation
+collectionAnno <- regionDBCollectionAnno(regionDB)
+# data.table with columns: collectionname, collector, date, source, description
+
+# List region set filenames
+listRegionSets(regionDB)
+listRegionSets(regionDB, collections = "encode_tfbs")  # filtered
+```
+
+### Extracting region sets
+
+```r
+# Get all region sets as a RegionSetList
+all_sets <- getRegionSets(regionDB)
+
+# Or a subset by 1-based indices
+subset <- getRegionSets(regionDB, indices = c(1, 5, 12))
+
+# RegionSetList is the same S4 class described in the RegionSet page
+length(subset)
+names(subset)
+rs1 <- subset[[1]]  # individual RegionSet
+```
+
+## Running enrichment
+
+```r
+runLOLA(
+  userSets,           # RegionSet, or list of RegionSets (or GRanges / file paths — polymorphic)
+  userUniverse,       # RegionSet or GRanges representing the background
+  regionDB,           # from loadRegionDB
+  minOverlap = 1,     # minimum bp overlap to count as overlapping
+  cores = 1,          # reserved for future parallelism — currently ignored
+  redefineUserSets = FALSE,  # automatically rewrite user sets against universe
+  direction = "enrichment",  # "enrichment" or "depletion"
+)
+```
+
+**Returns a `data.table`** with one row per `(user_set, db_set)` pair, matching the R LOLA schema: `userSet`, `dbSet`, `collection`, `pValueLog`, `oddsRatio`, `support`, `rnkPV`, `rnkOR`, `rnkSup`, `maxRnk`, `meanRnk`, `b`, `c`, `d`, `description`, `cellType`, `tissue`, `antibody`, `treatment`, `dataSource`, `filename`, `qValue`, `size`.
+
+- `userSet` and `dbSet` are **1-based** in the output data.table (R convention), even though they are 0-based internally.
+- `pValueLog` is `-log10(p)` from Fisher's exact test, capped at ~322.
+- `oddsRatio` is the CMLE odds ratio — matches `fisher.test()$estimate`, not the simple `(a·d)/(b·c)` point estimate.
+- `qValue` is Benjamini-Hochberg adjusted, computed **per user set independently**.
+- Rows are sorted by descending `pValueLog`, then ascending `meanRnk` — identical to R LOLA output order.
+
+### Basic example
+
+```r
+library(gtars)
+
+regionDB <- loadRegionDB("LOLACore/hg38")
+
+# Polymorphic inputs — RegionSet, GRanges, file path all accepted
+userSets <- list(
+  RegionSet("condition_a.bed"),
+  RegionSet("condition_b.bed"),
+)
+universe <- "universe.bed"  # file path works directly
+
+results <- runLOLA(userSets, universe, regionDB,
+                    minOverlap = 1,
+                    direction = "enrichment")
+
+# Top 20 hits for user set 1
+head(results[userSet == 1], 20)
+```
+
+### Two-condition comparison
+
+```r
+# Differential enrichment — use the union of both conditions as the universe
+restricted <- buildRestrictedUniverse(list(peaks_a, peaks_b))
+
+results <- runLOLA(list(peaks_a, peaks_b), restricted, regionDB)
+
+# Top hits unique to condition A
+head(results[userSet == 1 & qValue < 0.05][order(-pValueLog)], 20)
+```
+
+## Universe diagnostics
+
+LOLA results are only meaningful if your universe **contains** your user sets: every user region should overlap at least one universe region.
+
+### `checkUniverseAppropriateness`
+
+```r
+report <- checkUniverseAppropriateness(userSets, userUniverse)
+# data.frame with columns:
+#   user_set, total_regions, regions_in_universe, coverage, many_to_many
+# Warnings are emitted via warning() for low coverage or many-to-many mappings.
+```
+
+Warnings fire when coverage is below 50% (severe) or 90% (moderate), or when any user region overlaps more than one universe region.
+
+### `redefineUserSets`
+
+Rewrite each user set in terms of universe regions — eliminates many-to-many mapping artifacts.
+
+```r
+redefined <- redefineUserSets(userSets, userUniverse)
+# Returns a list of RegionSet objects.
+
+# Use directly
+results <- runLOLA(redefined, userUniverse, regionDB)
+
+# Or pass redefineUserSets = TRUE to runLOLA to do this automatically
+results <- runLOLA(userSets, userUniverse, regionDB, redefineUserSets = TRUE)
+```
+
+### `buildRestrictedUniverse`
+
+For differential enrichment: build a universe that is exactly the union of all user sets, disjoined into non-overlapping pieces (R LOLA's `disjoin(unlist(userSets))`):
+
+```r
+restricted <- buildRestrictedUniverse(list(peaks_a, peaks_b, peaks_c))
+# Returns a RegionSet
+```
+
+## Porting from R LOLA
+
+Most LOLA scripts port by changing one line — replacing `library(LOLA)` with `library(gtars)`. The core API is compatible:
+
+| R LOLA | gtars R | notes |
+|---|---|---|
+| `loadRegionDB()` | `loadRegionDB()` | same signature |
+| `runLOLA()` | `runLOLA()` | same signature; returns `data.table` |
+| `checkUniverseAppropriateness()` | `checkUniverseAppropriateness()` | same signature |
+| `redefineUserSets()` | `redefineUserSets()` | same signature; returns list of `RegionSet` |
+| `writeCombinedEnrichment()` | *(use `data.table::fwrite`)* | output table is already in the right format |
+| `extractEnrichmentOverlaps()` | *(not implemented)* | file an issue if needed |
+| `getRegionFile()` | `getRegionSets(regionDB, index)` | returns `RegionSet`, not `GRanges` — call `as_granges()` to convert |
+
+The notable differences:
+
+- **Return type is `data.table`, not `data.frame`.** R LOLA returns a plain data.frame; gtars returns a `data.table` by default. Convert with `as.data.frame()` if you prefer.
+- **Faster p-value and odds ratio.** The p-value uses the survival function (no cancellation at small tails), and the odds ratio is the CMLE (not the point estimate). Expect tail significance to be reported more precisely.
+
+## End-to-end example
+
+```r
+library(gtars)
+library(data.table)
+
+# 1. Load the database once
+regionDB <- loadRegionDB("LOLACore/hg38")
+cat(sprintf("Loaded %d region sets\n", length(listRegionSets(regionDB))))
+
+# 2. Load user sets and universe — polymorphic inputs
+peaks_a <- "peaks_condition_a.bed"
+peaks_b <- "peaks_condition_b.bed"
+universe <- "universe.bed"
+
+# 3. Universe sanity check
+diag <- checkUniverseAppropriateness(list(peaks_a, peaks_b), universe)
+print(diag)
+
+# 4. Run enrichment with auto-redefinition
+results <- runLOLA(
+  list(peaks_a, peaks_b), universe, regionDB,
+  minOverlap = 1,
+  redefineUserSets = TRUE,
+  direction = "enrichment",
+)
+
+# 5. Top hits per user set
+for (us in unique(results$userSet)) {
+  cat(sprintf("\n=== User set %d — top 10 ===\n", us))
+  top <- head(
+    results[userSet == us][order(-pValueLog)],
+    10,
+  )
+  print(top[, .(filename, cellType, pValueLog, oddsRatio, support, qValue)])
+}
+
+# 6. Write to TSV matching R LOLA's writeCombinedEnrichment format
+fwrite(results, "lola_results.tsv", sep = "\t")
+```
+
+## See also
+
+- **[RegionSet & RegionSetList (R)](regionset.md)** — the S4 classes used for user sets and extracted database sets.
+- **[R GenomicDistributions wrappers](genomicdist.md)** — statistics and partition analysis.
+- **[R IGD interface](igd.md)** — IGD is the overlap index that backs a `RegionDB`.
+- **[gtars-lola](../lola.md)** — Rust reference with the full statistical detail.
+- **[R LOLA package](https://www.bioconductor.org/packages/release/bioc/html/LOLA.html)** — the original Bioconductor package this is a port of.
diff --git a/docs/gtars/r/regionset.md b/docs/gtars/r/regionset.md
new file mode 100644
index 00000000..0cc62d40
--- /dev/null
+++ b/docs/gtars/r/regionset.md
@@ -0,0 +1,204 @@
+# RegionSet & RegionSetList (R)
+
+The `gtars` R package exposes S4 classes `RegionSet` and `RegionSetList` — the R-idiomatic equivalents of Bioconductor's `GRanges` and `GRangesList`, backed by a high-performance Rust implementation via `externalptr`.
+
+Unlike most gtars bindings, the R API is **polymorphic on input type**: nearly every function that takes a query will accept a `RegionSet`, a `GRanges` object, a file path, or a `data.frame` interchangeably. This lets you mix existing Bioconductor workflows with gtars without converting explicitly.
+
+For the underlying semantics see [gtars-core](../core.md) (core types) and [gtars-genomicdist](../genomicdist.md) (statistics, set algebra).
+
+## Loading the package
+
+```r
+library(gtars)
+```
+
+## `RegionSet`
+
+An S4 class representing a set of genomic regions. Internally wraps a Rust `RegionSet` pointer plus an R-side strand vector.
+
+### Construction
+
+`RegionSet()` accepts any of these inputs:
+
+```r
+# 1. File path (BED, bed.gz, narrowPeak, etc.)
+rs <- RegionSet("peaks.bed")
+
+# 2. GRanges object (requires GenomicRanges)
+library(GenomicRanges)
+gr <- GRanges(seqnames = c("chr1", "chr1"),
+              ranges = IRanges(start = c(100, 500), end = c(200, 600)),
+              strand = c("+", "-"))
+rs <- RegionSet(gr)   # coords converted from 1-based closed to 0-based half-open
+
+# 3. data.frame with chr/start/end (and optional strand)
+df <- data.frame(chr = c("chr1", "chr1"),
+                 start = c(100, 500),
+                 end = c(200, 600),
+                 strand = c("+", "-"))
+rs <- RegionSet(df)
+
+# 4. An existing RegionSet (returned as-is)
+rs2 <- RegionSet(rs)
+```
+
+`as_regionset()` is an alias for `RegionSet()`.
+
+!!! warning "0-based half-open, BED convention"
+    R gtars uses **0-based half-open** coordinates internally, matching BED convention (not GRanges' 1-based closed convention). Constructor/accessor functions convert automatically when bridging to `GRanges`; if you're building a RegionSet from a data.frame directly, the `start` and `end` columns are expected in 0-based half-open form.
+
+### Basic S4 methods
+
+```r
+length(rs)              # number of regions
+show(rs)                # pretty-print summary (first 5 regions)
+rs                      # same — calls show()
+
+as.data.frame(rs)       # convert to a data.frame with chr/start/end/strand
+rs[1:10]                # subset by index (numeric, logical, or character)
+rs[rs$strand == "+"]    # Note: use as.data.frame(rs)$strand for filtering
+```
+
+### Converting to `GRanges`
+
+```r
+gr <- as_granges(rs)
+# Coordinates converted from 0-based half-open to 1-based closed.
+# Strand information is preserved.
+```
+
+Requires the `GenomicRanges` package to be installed.
+
+### Statistics methods
+
+All of these accept `RegionSet`, `GRanges`, file path, or data.frame as input:
+
+```r
+widths(rs)                     # numeric vector of region widths (end - start)
+neighborDistances(rs)          # signed gaps between consecutive regions per chromosome
+nearestNeighbors(rs)           # per-region min neighbor distance
+chromosomeStatistics(rs)       # named list of per-chromosome stats
+
+distribution(rs, nBins = 250)                     # bin counts across the genome
+distribution(rs, nBins = 250, chromSizes = hg38)  # with reference chrom sizes
+
+clusterRegions(rs, maxGap = 5000L)  # cluster id per region
+```
+
+!!! warning "Output length for `neighborDistances` / `nearestNeighbors`"
+    Both skip chromosomes with only one region (matching R GenomicDistributions). The returned vector is **not aligned 1:1** with the input — it's shorter than `length(rs)` whenever any chromosome has a single peak.
+
+!!! warning "`nBins` is a target, not a total"
+    In `distribution(chromSizes = ...)`, `nBins` is the target bin count for the **longest** chromosome in `chromSizes`. Bin width is derived as `max(chromSizes) %/% nBins` (floored, minimum 1 bp), and every chromosome is tiled at the same bp width — so shorter chromosomes get proportionally fewer bins. The total bin count returned is `sum(ceiling(chrom_size / bin_width))`, which can substantially exceed `nBins` when `chromSizes` has many entries. To target a specific bin width in bp instead, pass `nBins = max_chrom_len %/% desired_bp`.
+
+### Interval set algebra
+
+R gtars overrides the standard Bioconductor generics (`union`, `intersect`, `setdiff`, `reduce`, `promoters`, `shift`, `flank`, `resize`, `narrow`, `disjoin`, `gaps`, `findOverlaps`, `countOverlaps`) plus adds gtars-specific ones (`trim`, `pintersect`, `concat`, `jaccard`).
+
+```r
+merged   <- reduce(rs)
+trimmed  <- trim(rs, chromSizes = hg38)
+proms    <- promoters(rs, upstream = 2000L, downstream = 200L)
+shifted  <- shift(rs, shift = 100L)
+resized  <- resize(rs, width = 500L, fix = "center")
+disjoint <- disjoin(rs)
+gapped   <- gaps(rs, chrom_sizes = hg38)
+
+# Binary operations
+u  <- union(rs1, rs2)
+i  <- intersect(rs1, rs2)
+d  <- setdiff(rs1, rs2)
+pi <- pintersect(rs1, rs2)   # pairwise (by index)
+c  <- concat(rs1, rs2)
+
+j  <- jaccard(rs1, rs2)      # scalar Jaccard similarity
+
+# Overlap queries
+hits  <- findOverlaps(rs1, rs2)
+counts <- countOverlaps(rs1, rs2)
+```
+
+Methods dispatch on `("RegionSet", "RegionSet")`, `("RegionSet", "ANY")`, and `("ANY", "RegionSet")` so you can mix in `GRanges`, file paths, or data.frames on either side:
+
+```r
+# Works — second argument is a file path
+j <- jaccard(rs, "other_peaks.bed")
+
+# Works — first argument is a GRanges
+merged <- union(gr, rs)
+```
+
+### Consensus regions
+
+```r
+cons <- consensus(list(rep1_rs, rep2_rs, rep3_rs))
+# Returns a data.frame with chr, start, end, count columns.
+# 'count' is the number of input sets overlapping each union region.
+
+# Keep regions present in ≥ 2/3 replicates
+robust <- cons[cons$count >= 2, ]
+```
+
+## `RegionSetList`
+
+An S4 class for collections of `RegionSet`s — the gtars equivalent of `GRangesList`. Provides the single most efficient path for operating on many region sets at once, since pointers are passed between R and Rust without copying.
+
+### Construction
+
+```r
+# Variadic: any mix of RegionSet / file path / GRanges / data.frame
+rsl <- RegionSetList(
+  RegionSet("rep1.bed"),
+  RegionSet("rep2.bed"),
+  RegionSet("rep3.bed"),
+)
+
+# Equivalent: single list argument
+rsl <- RegionSetList(list(rs1, rs2, rs3))
+
+# Or directly from file paths — each is auto-wrapped via RegionSet()
+rsl <- RegionSetList("rep1.bed", "rep2.bed", "rep3.bed")
+
+# Empty list
+empty <- RegionSetList()
+```
+
+### S4 methods
+
+```r
+length(rsl)        # number of region sets
+show(rsl)          # pretty-print with sizes
+rsl[[1]]           # extract a single RegionSet by index (1-based)
+names(rsl)         # character vector of names, or NULL
+```
+
+### Operations
+
+```r
+# Flatten into a single RegionSet (no merge/dedup)
+flat <- concat(rsl)
+# Apply reduce() on the result if you want the union
+merged_all <- reduce(concat(rsl))
+
+# Full N x N Jaccard similarity matrix
+jac <- pairwise_jaccard(rsl)
+# Returns a symmetric numeric matrix with 1.0 on the diagonal.
+```
+
+## Coordinate-system notes
+
+| system | representation | gtars handling |
+|---|---|---|
+| BED / gtars | 0-based half-open | **internal** — always used by `RegionSet` |
+| GRanges / IRanges | 1-based closed | converted in both directions by `RegionSet(gr)` / `as_granges(rs)` |
+| data.frame input | 0-based half-open | passed through as-is — make sure your `start`/`end` columns are BED-style |
+
+If you're working entirely in GRanges-land, use `as_granges()` to convert back before handing off to Bioconductor workflows. If you're reading BED files and writing results back to BED, stay in `RegionSet` to avoid double conversion.
+
+## See also
+
+- **[R GenomicDistributions wrappers](genomicdist.md)** — `calcWidth`, `calcGCContent`, `calcPartitions`, etc. — the drop-in API for users migrating from R GenomicDistributions.
+- **[R LOLA interface](lola.md)** — `loadRegionDB`, `runLOLA`, `checkUniverseAppropriateness`, etc.
+- **[R IGD interface](igd.md)** — `igd_create` / `igd_search` for low-level IGD access.
+- **[gtars-core](../core.md)** — the underlying Rust types.
+- **[Core models tour](../regionSet.md)** — Python + Rust walkthrough of the same concepts.
diff --git a/docs/gtars/regionSet.md b/docs/gtars/regionSet.md
index 3c392227..62af0ab6 100644
--- a/docs/gtars/regionSet.md
+++ b/docs/gtars/regionSet.md
@@ -135,3 +135,62 @@ rs.to_bigbed("path/to/save/bedfile.bb", chrom_sizes="path/to/chrom.sizes")
 
 !!! info 
     - Detailed documentation for RegionSet is available in the [API reference](https://docs.rs/gtars-core/latest/gtars_core/models/region_set/).
+
+### 🟢 RegionSetList
+
+`RegionSetList` is the gtars equivalent of Bioconductor's `GRangesList` — an ordered collection of `RegionSet`s with optional names. It's the type downstream crates (genomicdist, lola) use to pass multiple region sets across FFI boundaries without paying N×clone costs.
+
+#### Example
+
+=== "Python"
+    ```python
+    from gtars.models import RegionSet, RegionSetList
+
+    rs1 = RegionSet("rep1.bed")
+    rs2 = RegionSet("rep2.bed")
+    rs3 = RegionSet("rep3.bed")
+
+    # Unnamed
+    rsl = RegionSetList([rs1, rs2, rs3])
+
+    # Or with names
+    rsl = RegionSetList([rs1, rs2, rs3], names=["rep1", "rep2", "rep3"])
+
+    print(len(rsl))                # number of sets
+    combined = rsl.concat()        # flatten into a single RegionSet (no merge)
+    set_id = rsl.identifier()      # stable order-independent identifier
+    ```
+
+=== "Rust"
+    ```rust
+    use gtars_core::models::{RegionSet, RegionSetList};
+
+    let rs1 = RegionSet::try_from("rep1.bed")?;
+    let rs2 = RegionSet::try_from("rep2.bed")?;
+    let rs3 = RegionSet::try_from("rep3.bed")?;
+
+    let rsl = RegionSetList::with_names(
+        vec![rs1, rs2, rs3],
+        vec!["rep1".into(), "rep2".into(), "rep3".into()],
+    );
+
+    // Iterate
+    for rs in &rsl {
+        println!("{} regions", rs.len());
+    }
+
+    // Flatten all regions into a single RegionSet (no merge/dedup)
+    let combined = rsl.concat();
+    let id = rsl.identifier();
+    # Ok::<(), gtars_core::errors::RegionSetError>(())
+    ```
+
+`RegionSetList::try_from` in Rust also accepts a **bedset manifest file** (text file listing one BED path per line) or a `Vec<&Path>` / `Vec<&str>` / `Vec<String>` / `Vec<PathBuf>`.
+
+`concat()` flattens without merging; if you need a reduced union, call `.reduce()` on the result — that method comes from the `IntervalRanges` trait in [gtars-genomicdist](genomicdist.md).
+
+## See also
+
+- **[gtars-core](core.md)** — the canonical Rust API reference for `Region`, `RegionSet`, `RegionSetList`, `Interval`, `Fragment`, `CoordinateMode`, and `RegionSetError`.
+- **[gtars-genomicdist](genomicdist.md)** — the `IntervalRanges` and `GenomicIntervalSetStatistics` traits that extend `RegionSet` with set-algebra and summary stats.
+- **[gtars-lola](lola.md)** — LOLA enrichment, which consumes `RegionSetList` for user-set and database-set inputs.
diff --git a/docs/gtars/wasm.md b/docs/gtars/wasm.md
index 26caaca9..b01e4b1c 100644
--- a/docs/gtars/wasm.md
+++ b/docs/gtars/wasm.md
@@ -1,61 +1,89 @@
 # gtars Wasm/JS
 
-We provide WebAssembly (Wasm) bindings for gtars functionality, enabling genomic interval analysis in JavaScript/TypeScript environments. This allows for client-side processing of genomic data without the need for server-side computation.
+WebAssembly bindings for gtars, enabling genomic interval analysis directly in browsers and Node — no server round-trip required. The package is `@databio/gtars-js`; the underlying Rust crate is `gtars-wasm`.
 
 ## Features
 
-- Zero-installation genomic tools
-- Client-side processing (no server required)
-- JavaScript/TypeScript interoperability
+- Zero-installation genomic tools running client-side.
+- Full `RegionSet` interval algebra and summary statistics.
+- Genomic distribution analysis: partitions, signal matrices, consensus regions.
+- LOLA enrichment testing with in-memory region databases.
+- TypeScript-friendly — the published package ships `.d.ts` declarations generated from the Rust source.
 
 ## Installation
-You can install the gtars Wasm package via npm:
 
 ```bash
 npm install @databio/gtars-js
 ```
 
-## Usage
-We don't currently provide full feature parity bindings, but some functionality is available.
+## Quick start
 
-```typescript
-import init, { Overlapper } from 'gtars-js';
+Every Wasm entry point needs the module to be initialized once before use. In ESM environments with top-level await, that's a single line:
+
+```ts
+import init, { RegionSet, Overlapper } from '@databio/gtars-js';
 
 await init();
 
-import { Overlapper } from '@databio/gtars-js';
+const peaks = new RegionSet([
+  ['chr1', 100, 200, 'peak1'],
+  ['chr2', 150, 250, 'peak2'],
+  ['chr3', 300, 400, 'peak3'],
+]);
+
+console.log(`${peaks.numberOfRegions} regions, mean width ${peaks.meanRegionWidth}`);
+```
 
-const universe = [
-    ['chr1', 100, 200],
-    ['chr1', 150, 250],
-    ['chr2', 300, 400],
-]
+## Submodule pages
 
-const overlapper = new Overlapper(universe, 'ailist');
-console.log(`Using backend: ${overlapper.get_backend()}`);
+| page | covers |
+|---|---|
+| [**Overlappers**](wasm/overlappers.md) | `Overlapper` — low-level interval overlap engine with AIList / BITS backends. |
+| [**RegionSet & models**](wasm/regionset.md) | `RegionSet`, `RegionSetList`, `ConsensusBuilder`, `ChromosomeStatistics`, `RegionDistribution`, `BedClassificationOutput`, and all the per-`RegionSet` statistics and interval algebra methods. |
+| [**Partitions**](wasm/partitions.md) | `GeneModel`, `PartitionList`, `calcPartitions`, `calcExpectedPartitions` — classify regions into promoter / UTR / exon / intron / intergenic. |
+| [**Signal matrix**](wasm/signal.md) | `SignalMatrix` and `calcSummarySignal` — overlay region sets on region × condition signal matrices. |
+| [**LOLA enrichment**](wasm/lola.md) | `LolaRegionDB`, `runLOLA`, `checkUniverseAppropriateness` — Fisher's exact test enrichment against a database of reference region sets. |
 
-const query = ['chr1', 180, 220];
-const overlaps = overlapper.find(query);
+## Limitations vs. the Rust/Python APIs
 
-console.log(`Overlaps for ${query}:`, overlaps);
-```
+Wasm is a constrained runtime — a few crate features are not exposed:
 
-## Integration
+- **No filesystem loaders.** `GeneModel::from_gtf`, `GeneModel::from_bed_files`, `SignalMatrix::from_tsv`, and `RegionDB::from_lola_folder` are not available. Everything is constructed from in-memory JS data or from packed binary buffers (e.g. `.sigm`, `.fab`, `.gda`) fetched over the network.
+- **No `redefine_user_sets` / `build_restricted_universe`.** These LOLA universe helpers are Rust-only at the moment; use `checkUniverseAppropriateness` for diagnostics and replicate the set-algebra logic client-side with `RegionSetList` operations if you need the rewriting behavior.
+- **Refget is exposed as a separate module** — see the refget binding docs (currently being written).
 
-### An example React component
-```jsx
+## Integration example — React
+
+```tsx
 import { useEffect, useState } from 'react';
-import init, { TreeTokenizer } from 'gtars-js';
+import init, { RegionSet, ConsensusBuilder } from '@databio/gtars-js';
 
-function TokenizerComponent() {
-  const [tokenizer, setTokenizer] = useState(null);
+function ConsensusComponent({ replicates }: { replicates: Array<[string, number, number, string][]> }) {
+  const [ready, setReady] = useState(false);
+  const [robust, setRobust] = useState<number | null>(null);
 
   useEffect(() => {
-    init().then(() => {
-      setTokenizer(new TreeTokenizer());
-    });
+    init().then(() => setReady(true));
   }, []);
 
-  // Use tokenizer...
+  useEffect(() => {
+    if (!ready) return;
+    const cb = new ConsensusBuilder();
+    for (const rep of replicates) {
+      cb.add(new RegionSet(rep));
+    }
+    const cons = cb.compute();
+    setRobust(cons.filter((r) => r.count >= 2).length);
+  }, [ready, replicates]);
+
+  if (!ready) return <p>Loading gtars-js…</p>;
+  return <p>{robust} regions present in ≥ 2 replicates</p>;
 }
 ```
+
+## Performance notes
+
+- `RegionSet` construction sorts on load. Repeated operations don't re-sort.
+- Overlap queries (`jaccard`, `setdiff`, `intersect`, etc.) route through the AIList index under the hood, giving O(log n) per-query lookups.
+- Statistics methods are implemented in Rust and called via zero-copy boundary — they're substantially faster than pure-JS equivalents for typical peak-set sizes.
+- For very large numeric outputs, the Wasm glue converts between Rust `Vec<u32>` and JS arrays — if you're hitting GC pressure, consider batching calls or keeping `RegionSet` handles around instead of returning raw arrays.
diff --git a/docs/gtars/wasm/lola.md b/docs/gtars/wasm/lola.md
new file mode 100644
index 00000000..88c87b30
--- /dev/null
+++ b/docs/gtars/wasm/lola.md
@@ -0,0 +1,190 @@
+# LOLA enrichment (Wasm)
+
+Wasm bindings for [gtars-lola](../lola.md) — LOLA (Locus Overlap Analysis) enrichment testing, runnable entirely in the browser. Given a user set of regions, a universe, and a region database (built in memory from API-served BED data), computes Fisher's exact test enrichment of the user set against every database region set.
+
+See the [Rust gtars-lola reference](../lola.md) for the full statistical details — Fisher's exact test via hypergeometric survival function, CMLE odds ratio, contingency table semantics, and compatibility notes with R LOLA. This page is a Wasm-focused binding reference.
+
+!!! note "In-memory only — no folder loading"
+    The Wasm binding's `LolaRegionDB` is built entirely from in-memory data (`{regions, name}` entries passed in at construction). The Rust `RegionDB::from_lola_folder` loader is not available in Wasm — it requires a filesystem. In practice you'd fetch the region database from an API and pass the regions directly into the constructor, or build it from a pre-indexed IGD on the server side.
+
+## Import
+
+```ts
+import init, {
+  LolaRegionDB,
+  runLOLA,
+  checkUniverseAppropriateness,
+} from '@databio/gtars-js';
+
+await init();
+```
+
+## `LolaRegionDB`
+
+### Construction
+
+Construct from an array of `{ regions, name }` objects, where `regions` is an array of `[chr, start, end]` tuples and `name` is the filename-like identifier:
+
+```ts
+const dbEntries = [
+  {
+    name: 'encode_k562_ctcf.bed',
+    regions: [
+      ['chr1', 100, 200],
+      ['chr1', 400, 500],
+      ['chr2', 1000, 1100],
+    ],
+  },
+  {
+    name: 'encode_hela_ctcf.bed',
+    regions: [
+      ['chr1', 150, 250],
+      ['chr2', 1050, 1150],
+    ],
+  },
+];
+
+const db = new LolaRegionDB(dbEntries);
+```
+
+The constructor builds an IGD overlap index internally and wraps it with minimal annotation (filename only). To attach richer per-file metadata — cell type, tissue, antibody, etc. — build the annotated `RegionDB` on the Rust side and serve it through a different transport (e.g. a custom binary format), or expose a method that accepts `RegionSetAnno` objects alongside the regions.
+
+### Accessors
+
+```ts
+db.numRegionSets              // number
+db.listRegionSets()           // string[] — filenames
+db.collectionAnno             // Array of { collectionname, collector, date, source, description }
+
+// Extract region sets by 0-based index as a RegionSetList
+// (pass null for "all sets"; names are populated from filenames)
+const rsl = db.getRegionSets(null);
+const rsl2 = db.getRegionSets([0, 5, 12]);
+```
+
+`getRegionSets()` returns a [`RegionSetList`](regionset.md#regionsetlist) with `.names` populated from the database filenames — the one path in Wasm where a `RegionSetList` has non-null names.
+
+## `runLOLA`
+
+Runs Fisher's exact test for every `(user_set, db_set)` pair and returns a column-oriented object suitable for dropping directly into a DataFrame or table.
+
+```ts
+runLOLA(
+  userSets: Array<Array<[string, number, number]>>,
+  universe: Array<[string, number, number]>,
+  regionDb: LolaRegionDB,
+  minOverlap?: number,       // default 1
+  direction?: string,         // "enrichment" (default) | "depletion"
+): LolaResultColumnar
+```
+
+**Parameters:**
+- `userSets` — **array of user sets**, where each user set is an array of `[chr, start, end]` tuples. Pass a single-element outer array `[peaks]` for a one-user-set analysis.
+- `universe` — a single array of `[chr, start, end]` tuples representing the background.
+- `regionDb` — a `LolaRegionDB`.
+- `minOverlap` — minimum bp overlap to count as overlapping (default 1).
+- `direction` — `"enrichment"` (default, P(X ≥ a), alternative "greater") or `"depletion"` (P(X ≤ a), alternative "less"). The strings `"greater"` / `"less"` are accepted as aliases on the Python side but Wasm only recognizes `"enrichment"` / `"depletion"` explicitly; anything else falls back to enrichment.
+
+**Returns** an object with parallel arrays (one entry per `(user_set, db_set)` pair) using camelCase keys:
+
+| key | type | meaning |
+|---|---|---|
+| `userSet` | `number[]` | 0-based user-set index |
+| `dbSet` | `number[]` | 0-based db-set index |
+| `collection` | `(string \| null)[]` | per-file annotation, usually null in Wasm |
+| `pValueLog` | `number[]` | `-log10(p)` from Fisher's exact test, capped at ~322 |
+| `oddsRatio` | `number[]` | CMLE odds ratio (matches R `fisher.test()$estimate`) |
+| `support` | `number[]` | overlap count between user set and db set |
+| `rnkPv`, `rnkOr`, `rnkSup` | `number[]` | 1-based per-metric ranks within the user set |
+| `maxRnk` | `number[]` | max of the three ranks |
+| `meanRnk` | `number[]` | mean of the three ranks |
+| `b`, `c`, `d` | `number[]` | signed contingency values (can be negative) |
+| `qValue` | `(number \| null)[]` | BH-adjusted p-value (applied automatically inside `runLOLA`) |
+| `description`, `cellType`, `tissue`, `antibody`, `treatment`, `dataSource` | `(string \| null)[]` | per-file metadata |
+| `filename` | `string[]` | db set file name |
+| `size` | `number[]` | number of regions in the db set |
+
+Like the Python binding, Wasm `runLOLA` **auto-applies** `annotate_results` + `apply_fdr_correction` internally — you don't need to call them separately. Rows come back sorted by descending `pValueLog`, then ascending `meanRnk` (matching R LOLA output order).
+
+### Usage example
+
+```ts
+import init, {
+  LolaRegionDB,
+  runLOLA,
+  RegionSet,
+} from '@databio/gtars-js';
+
+await init();
+
+// Helper to convert a RegionSet to the tuple form LOLA expects
+function toTuples(rs: RegionSet): Array<[string, number, number]> {
+  // Iterate rs and pull chr/start/end — or keep the original BedEntries
+  // if you still have them in scope from constructing the RegionSet.
+  // ...
+}
+
+// 1. Build the region database from API-fetched data
+const dbEntries = await fetchDatabaseRegions();  // [{ name, regions }, ...]
+const db = new LolaRegionDB(dbEntries);
+
+// 2. Prepare user sets and universe as tuple arrays
+const userSets = [userPeakTuples];
+const universe = universeTuples;
+
+// 3. Run enrichment
+const results = runLOLA(userSets, universe, db, 1, 'enrichment');
+
+// 4. Pivot to row-oriented view for rendering
+const n = results.pValueLog.length;
+for (let i = 0; i < n; i++) {
+  console.log(
+    `${results.filename[i]}  ` +
+    `p=${results.pValueLog[i].toFixed(2)}  ` +
+    `OR=${results.oddsRatio[i].toFixed(2)}  ` +
+    `support=${results.support[i]}`
+  );
+}
+```
+
+!!! note "Negative contingency values"
+    If your user set contains regions *outside* the universe, the contingency table produces negative `b`/`c`/`d`. The binding matches R LOLA's behavior — it logs a warning, stores the signed values, and returns `pValueLog = 0.0`, `oddsRatio = NaN` for that row. Pre-process with `checkUniverseAppropriateness` (below) to detect this before running enrichment.
+
+## `checkUniverseAppropriateness`
+
+Diagnostic check — for each user set, reports what fraction of user-set regions overlap the universe and whether there are many-to-many mappings.
+
+```ts
+const report = checkUniverseAppropriateness(userSets, universe);
+// Returns a column-oriented object:
+// {
+//   userSet: number[],
+//   totalRegions: number[],
+//   regionsInUniverse: number[],
+//   coverage: number[],        // 0.0 to 1.0 per user set
+//   manyToMany: number[],      // count of user regions overlapping >1 universe region
+//   warnings: string[]         // free-form warning messages, pooled across user sets
+// }
+
+const n = report.userSet.length;
+for (let i = 0; i < n; i++) {
+  console.log(
+    `user set ${report.userSet[i]}: ` +
+    `${(report.coverage[i] * 100).toFixed(1)}% coverage, ` +
+    `${report.manyToMany[i]} many-to-many`
+  );
+}
+for (const w of report.warnings) {
+  console.warn('⚠', w);
+}
+```
+
+Warnings fire when coverage drops below 50% (severe) or 90% (moderate), or when any user region overlaps more than one universe region.
+
+!!! note "Rust-only helpers not exposed in Wasm"
+    The Rust universe module has two additional helpers — `redefine_user_sets` (rewrite user sets in terms of universe regions) and `build_restricted_universe` (disjoint union of all user sets, for differential analysis). Neither is currently exposed in the Wasm binding. If you need them, you can replicate the logic client-side using `RegionSetList` operations from [wasm/regionset](regionset.md), or request a binding.
+
+## See also
+
+- **[wasm/regionset](regionset.md)** — `RegionSet`, `RegionSetList`, and the `ConsensusBuilder` used upstream of LOLA analyses.
+- **[gtars-lola](../lola.md)** — full Rust API reference, including the contingency table math, p-value computation, and CMLE odds ratio details.
diff --git a/docs/gtars/wasm/partitions.md b/docs/gtars/wasm/partitions.md
new file mode 100644
index 00000000..d86074aa
--- /dev/null
+++ b/docs/gtars/wasm/partitions.md
@@ -0,0 +1,152 @@
+# Partitions (Wasm)
+
+Wasm bindings for the gtars-genomicdist partition system — classify query regions into promoter / UTR / exon / intron / intergenic buckets given a gene model. Everything is in-memory; there is no GTF loader on the JS side (because GTF loading requires filesystem access), so you either build a `GeneModel` from pre-parsed region arrays or lift one out of a `GenomicDistAnnotation` served from the API.
+
+See [gtars-genomicdist → Partitions](../genomicdist.md#partitions) for the full semantics — priority order, mutually-exclusive vs. bp-proportion modes, R GenomicDistributions divergences, and the chi-square p-value caveat.
+
+## Import
+
+```ts
+import init, {
+  GeneModel,
+  PartitionList,
+  calcPartitions,
+  calcExpectedPartitions,
+} from '@databio/gtars-js';
+
+await init();
+```
+
+## `GeneModel`
+
+Constructed from four JS arrays of region tuples: `genes`, `exons`, and optional `three_utr` / `five_utr`. Each region tuple is `[chr, start, end]` or `[chr, start, end, strand]`. When strand is provided, `+`/`-` become stranded; any other character becomes `Unstranded`.
+
+```ts
+const genes = [
+  ['chr1', 1000, 5000, '+'],
+  ['chr1', 8000, 12000, '-'],
+];
+const exons = [
+  ['chr1', 1000, 2000, '+'],
+  ['chr1', 4500, 5000, '+'],
+  ['chr1', 11500, 12000, '-'],
+];
+const threeUtr = [['chr1', 4800, 5000, '+']];
+const fiveUtr  = [['chr1', 1000, 1100, '+']];
+
+// Any of three_utr / five_utr can be null if you don't have UTR annotations
+const model = new GeneModel(genes, exons, threeUtr, fiveUtr);
+```
+
+!!! note "No GTF loader in Wasm"
+    The Rust `GeneModel::from_gtf` and `GeneModel::from_bed_files` methods are **not** available in Wasm — they require filesystem access. In the browser, typically:
+
+    - Fetch a pre-built `.gda` (Genomic Dist Annotation) binary from the BEDbase API and decode it, **or**
+    - Parse regions yourself from JSON / TSV over the network and pass them into the `GeneModel` constructor.
+
+## `PartitionList`
+
+Build an ordered, priority-based partition list from a `GeneModel`. Partition priority order is hard-coded: `promoterCore` → `promoterProx` → `threeUTR` → `fiveUTR` → `exon` → `intron` (with `intergenic` added implicitly by `calcPartitions`).
+
+```ts
+const chromSizes = { chr1: 248956422 };
+
+const pl = PartitionList.fromGeneModel(
+  model,
+  100,        // core promoter size (bp upstream of gene start)
+  2000,       // proximal promoter size
+  chromSizes, // optional — pass null to skip boundary trimming
+);
+```
+
+Pass `chromSizes: null` if you don't need promoter trimming at chromosome boundaries.
+
+## `calcPartitions`
+
+Classify query regions into partition categories. Two modes:
+
+- **Priority mode** (`bpProportion = false`): each query region is assigned to the first partition it overlaps; everything else goes to `intergenic`. Mutually exclusive, one region → one partition.
+- **BP proportion mode** (`bpProportion = true`): for each partition, computes the total overlapping base pairs of the query set. **Not** mutually exclusive — a region that straddles two partitions contributes bp to each.
+
+```ts
+const peaks: RegionSet = /* ... */;
+
+// Priority mode (region counts)
+const result = calcPartitions(peaks, pl, false);
+// {
+//   partitions: [{ name: 'promoterCore', count: 42 }, ...],
+//   total: 523
+// }
+
+for (const { name, count } of result.partitions) {
+  const pct = (count / result.total) * 100;
+  console.log(`${name.padEnd(15)} ${String(count).padStart(6)}  (${pct.toFixed(1)}%)`);
+}
+```
+
+## `calcExpectedPartitions`
+
+Observed vs. expected partition enrichment with a chi-square test. Requires chromosome sizes to compute the expected fraction based on each partition's share of the genome.
+
+```ts
+const chromSizes = { chr1: 248956422, chr2: 242193529 };
+
+const expected = calcExpectedPartitions(peaks, pl, chromSizes, false);
+// Array of { partition, observed, expected, log10Oe, pvalue }
+
+for (const row of expected) {
+  console.log(
+    `${row.partition.padEnd(15)} ` +
+    `obs=${row.observed.toFixed(0).padStart(6)} ` +
+    `exp=${row.expected.toFixed(1).padStart(10)} ` +
+    `log10(O/E)=${row.log10Oe >= 0 ? '+' : ''}${row.log10Oe.toFixed(2)} ` +
+    `p=${row.pvalue.toExponential(2)}`
+  );
+}
+```
+
+Result fields use camelCase: `partition`, `observed`, `expected`, `log10Oe`, `pvalue`.
+
+!!! warning "p-values will differ slightly from R GenomicDistributions"
+    `calcExpectedPartitions` uses a goodness-of-fit `(O-E)²/E` formula. R's `chisq.test()` computes a 2×2 test of independence with optional Yates correction, so p-values will not match R output byte-for-byte. The `log10Oe` column is directly comparable.
+
+## End-to-end example
+
+```ts
+import init, {
+  RegionSet,
+  GeneModel,
+  PartitionList,
+  calcExpectedPartitions,
+} from '@databio/gtars-js';
+
+await init();
+
+// 1. Build query RegionSet from BED-like entries
+const peaks = new RegionSet([
+  ['chr1', 1500, 1800, ''],
+  ['chr1', 4600, 4900, ''],
+  ['chr1', 11700, 12000, ''],
+]);
+
+// 2. Build GeneModel — in a real app you'd fetch annotations from an API
+const model = new GeneModel(
+  [['chr1', 1000, 5000, '+'], ['chr1', 8000, 12000, '-']],
+  [['chr1', 1000, 2000, '+'], ['chr1', 4500, 5000, '+'], ['chr1', 11500, 12000, '-']],
+  null,
+  null,
+);
+
+const chromSizes = { chr1: 248956422 };
+const pl = PartitionList.fromGeneModel(model, 100, 2000, chromSizes);
+
+// 3. Compute observed vs expected partition enrichment
+const enriched = calcExpectedPartitions(peaks, pl, chromSizes, false);
+console.table(enriched);
+```
+
+## See also
+
+- **[wasm/regionset](regionset.md)** — the `RegionSet` type used as the query input.
+- **[wasm/signal](signal.md)** — signal matrix overlap.
+- **[gtars-genomicdist → Partitions](../genomicdist.md#partitions)** — Rust reference with the priority-order and mutually-exclusive-vs-bp-proportion semantics.
diff --git a/docs/gtars/wasm/regionset.md b/docs/gtars/wasm/regionset.md
new file mode 100644
index 00000000..bef2a9e4
--- /dev/null
+++ b/docs/gtars/wasm/regionset.md
@@ -0,0 +1,255 @@
+# RegionSet & models (Wasm)
+
+The `@databio/gtars-js` Wasm bindings expose gtars' core interval-analysis types directly to JavaScript and TypeScript, with no server round-trip. This page covers everything in the `regionset` Wasm module — `RegionSet`, `RegionSetList`, `ConsensusBuilder`, and the result types for statistics and BED classification.
+
+See the Rust reference pages for the underlying semantics:
+
+- [gtars-core](../core.md) — `Region`, `RegionSet`, `RegionSetList`
+- [gtars-genomicdist](../genomicdist.md) — the stats, interval algebra, and consensus/classifier methods that `RegionSet` gains in Wasm
+
+## Import
+
+```ts
+import init, {
+  RegionSet,
+  RegionSetList,
+  ConsensusBuilder,
+} from '@databio/gtars-js';
+
+await init();  // initializes the wasm module; top-level await in ESM environments
+```
+
+## `RegionSet`
+
+A sorted collection of genomic regions constructed from an array of `[chr, start, end, rest]` tuples. Construction sorts by `(chr, start)`.
+
+### Construction
+
+```ts
+// BED-like entries: [chr, start, end, rest]
+// `rest` is any trailing BED metadata — pass "" if you have BED3 data
+const entries: [string, number, number, string][] = [
+  ['chr1', 100, 200, 'peak1'],
+  ['chr1', 300, 400, 'peak2'],
+  ['chr2', 500, 600, 'peak3'],
+];
+
+const rs = new RegionSet(entries);
+```
+
+### Properties
+
+```ts
+rs.numberOfRegions    // number
+rs.meanRegionWidth    // number
+rs.nucleotidesLength  // number (total bp)
+rs.identifier         // string — canonical MD5 id over first 3 columns
+rs.firstRegion        // string — debug repr of the first region
+rs.classify           // BedClassificationOutput (see below)
+```
+
+### Statistics
+
+```ts
+// Per-chromosome summary stats — returns { [chr]: ChromosomeStatistics }
+const stats = rs.chromosomeStatistics();
+
+// Region widths (end - start)
+const widths: number[] = rs.calcWidths();
+
+// Signed gaps between consecutive regions (positive gaps only)
+const gaps: number[] = rs.calcNeighborDistances();
+
+// Per-region min-neighbor distance
+const nn: number[] = rs.calcNearestNeighbors();
+
+// Single-linkage clusters at a stitching radius; returns a cluster id per region
+const clusterIds: number[] = rs.cluster(5_000);
+```
+
+!!! warning "Output length for `calcNeighborDistances` / `calcNearestNeighbors`"
+    Both methods **skip chromosomes with only one region** (matching R GenomicDistributions). The returned array is therefore **not aligned 1:1 with input regions** — it's shorter than `rs.numberOfRegions` whenever any chromosome has a single peak. No sentinel values are emitted.
+
+### Region distribution
+
+```ts
+// Bin counts for plotting; chrom_sizes is optional
+const chromSizes = { chr1: 248956422, chr2: 242193529 };
+
+// With chrom_sizes → reference-aligned bins (comparable across files)
+const dist = rs.regionDistribution(250, chromSizes);
+
+// Without chrom_sizes → bins derived from observed max end (NOT comparable)
+const distLocal = rs.regionDistribution(250, null);
+// Array of { chr, start, end, n, rid }
+```
+
+!!! warning "`n_bins` is a target, not a total"
+    When `chrom_sizes` is provided, `n_bins` is the target bin count for the **longest** chromosome in `chrom_sizes`. Bin width is derived from that, and every chromosome is tiled at the same bp width — so the total bin count across chromosomes is `sum(ceil(size / bin_width))`, which can substantially exceed `n_bins`.
+
+### Interval set algebra
+
+All operations return a new `RegionSet`. Metadata (`rest` fields) is dropped by operations that merge or synthesize new intervals.
+
+```ts
+const chromSizes = { chr1: 248956422 };
+
+const trimmed   = rs.trim(chromSizes);
+const merged    = rs.reduce();
+const disjoint  = rs.disjoin();
+const proms     = rs.promoters(2000, 0);
+const shifted   = rs.shift(100);
+const resized   = rs.resize(500, 'center');
+const flanked   = rs.flank(1000, true, false);
+const narrowed  = rs.narrow(100, 200, null);
+const gapped    = rs.gaps(chromSizes);
+
+const diff      = rs.setdiff(other);
+const pInter    = rs.pintersect(other);
+const inter     = rs.intersect(other);
+const combined  = rs.concat(other);
+const unioned   = rs.union(other);
+
+const jac: number = rs.jaccard(other);
+```
+
+## `RegionSetList`
+
+A collection of named `RegionSet`s — the gtars equivalent of Bioconductor's `GRangesList`. Provides indexed pairwise operations without copying whole `RegionSet`s on every call.
+
+### Construction
+
+```ts
+// Empty builder + add()
+const rsl = new RegionSetList();
+rsl.add(rs1, 'rep1');
+rsl.add(rs2, 'rep2');
+rsl.add(rs3, 'rep3');
+
+// Or build directly from BED entries for multiple sets at once
+const rsl2 = RegionSetList.fromEntries(
+  [
+    [['chr1', 100, 200, ''], ['chr1', 300, 400, '']],  // set 1
+    [['chr1', 150, 250, '']],                          // set 2
+    [['chr2', 500, 600, '']],                          // set 3
+  ],
+  ['rep1', 'rep2', 'rep3'],   // names; pass null to leave unnamed
+);
+```
+
+### Accessors
+
+```ts
+rsl.length                    // number of sets
+const rs = rsl.get(0);        // RegionSet at index (throws on out-of-range)
+rsl.names                     // string[] or null
+const flat = rsl.concat();    // flatten into a single RegionSet (no merge)
+```
+
+### Indexed pair operations
+
+These let you compute operations on pairs by index without shuttling full `RegionSet`s across the JS/Wasm boundary — useful for building N×N analysis grids:
+
+```ts
+const n = rsl.regionCount(0);               // regions in set 0
+const pCount = rsl.pintersectCount(0, 1);   // pairwise intersect count
+const jac    = rsl.jaccardAt(0, 1);         // Jaccard between sets 0 and 1
+const un     = rsl.unionAt(0, 1);           // union as a new RegionSet
+const diff   = rsl.setdiffAt(0, 1);         // set 0 minus set 1
+
+const except = rsl.unionExcept(2);          // union of everything but index 2
+```
+
+### Bulk operations
+
+O(n) N-way operations via prefix/suffix arrays.
+
+```ts
+// Union of all sets
+const unionAll = rsl.unionAll();
+
+// Intersection of all sets
+const interAll = rsl.intersectAll();
+
+// Prefix/suffix-based leave-one-out: all N "union except i" results in O(n) unions
+const bulk = rsl.bulkUnionExcept();
+// { union_regions, union_nucleotides, except_unique: number[] }
+// except_unique[i] = number of regions unique to set i vs. the union of all others
+```
+
+### Pairwise Jaccard matrix
+
+```ts
+const result = rsl.pairwiseJaccard();
+// { matrix: number[][], names: string[] | null }
+//
+// matrix is symmetric with 1.0 on the diagonal — one row per set, one col per set.
+```
+
+## `ConsensusBuilder`
+
+Builder pattern for consensus region analysis — given N input region sets, compute the reduced union annotated with per-region replicate support.
+
+```ts
+import { ConsensusBuilder } from '@databio/gtars-js';
+
+const cb = new ConsensusBuilder();
+cb.add(rep1);
+cb.add(rep2);
+cb.add(rep3);
+
+const consensus = cb.compute();
+// Array of { chr, start, end, count }
+
+// Keep regions present in ≥ 2/3 replicates
+const robust = consensus.filter(r => r.count >= 2);
+console.log(`${robust.length} robust regions`);
+```
+
+`count` is the number of input **sets** (not regions) that overlap each union region.
+
+## Result types
+
+### `ChromosomeStatistics`
+
+Returned by `rs.chromosomeStatistics()` → `{ [chr: string]: ChromosomeStatistics }`.
+
+| field | type |
+|---|---|
+| `chromosome` | `string` |
+| `number_of_regions` | `number` |
+| `start_nucleotide_position` | `number` — leftmost start |
+| `end_nucleotide_position` | `number` — rightmost end |
+| `minimum_region_length`, `maximum_region_length` | `number` |
+| `mean_region_length`, `median_region_length` | `number` |
+
+Fields are exposed as getters on a Wasm-bound class.
+
+### `RegionDistribution`
+
+Entries in the array returned by `rs.regionDistribution(n_bins, chrom_sizes)`.
+
+| field | type |
+|---|---|
+| `chr` | `string` |
+| `start`, `end`, `n`, `rid` | `number` |
+
+`n` is the count of regions whose midpoint falls in the bin; `rid` is the bin's row index within its chromosome.
+
+### `BedClassificationOutput`
+
+Returned by the `rs.classify` getter. Identifies the BED/ENCODE subtype based on column analysis (uses the `bedclassifier` feature, enabled by default in the Wasm build).
+
+| field | type |
+|---|---|
+| `bed_compliance` | `string` |
+| `data_format` | `string` — e.g. `"UcscBed"`, `"EncodeNarrowPeak"`, etc. |
+| `compliant_columns`, `non_compliant_columns` | `number` |
+
+## See also
+
+- **[wasm/partitions](partitions.md)** — `GeneModel`, `PartitionList`, `calcPartitions` / `calcExpectedPartitions`.
+- **[wasm/signal](signal.md)** — `SignalMatrix` and `calcSummarySignal`.
+- **[wasm/lola](lola.md)** — `LolaRegionDB` and `runLOLA`.
+- **[wasm/overlappers](overlappers.md)** — low-level overlap engine.
+- **[gtars-core](../core.md)** and **[gtars-genomicdist](../genomicdist.md)** — Rust reference with algorithmic details and caveats that apply identically in Wasm.
diff --git a/docs/gtars/wasm/signal.md b/docs/gtars/wasm/signal.md
new file mode 100644
index 00000000..75a3f225
--- /dev/null
+++ b/docs/gtars/wasm/signal.md
@@ -0,0 +1,140 @@
+# Signal matrix (Wasm)
+
+Wasm bindings for the gtars-genomicdist signal matrix overlap — overlay query regions on a region × condition matrix of signal values (e.g. a peak × cell-type ChIP intensity matrix), aggregate by MAX per query region, and compute Tukey boxplot statistics per condition.
+
+See [gtars-genomicdist → Signal matrix overlap](../genomicdist.md#signal-matrix-overlap) for the full algorithmic detail.
+
+## Import
+
+```ts
+import init, {
+  SignalMatrix,
+  calcSummarySignal,
+} from '@databio/gtars-js';
+
+await init();
+```
+
+## `SignalMatrix`
+
+A region × condition matrix of signal values, loaded from either the packed binary format or built directly from JS data.
+
+### `SignalMatrix.fromBin`
+
+Load from a packed binary buffer (the `.sigm` format produced by the Rust `SignalMatrix::save_bin` or the CLI). This is the fast path — drop the bytes of a `.sigm` file served from your API into the constructor.
+
+```ts
+const response = await fetch('/api/signal-matrix/my_matrix.sigm');
+const bytes = new Uint8Array(await response.arrayBuffer());
+
+const sm = SignalMatrix.fromBin(bytes);
+```
+
+### From JS arrays
+
+For the rare case where you have the signal data in memory already (e.g. parsed from a TSV fetched separately), you can build a `SignalMatrix` directly via the `new SignalMatrix(...)` constructor:
+
+```ts
+const regionIds = [
+  'chr1_100_200',
+  'chr1_300_400',
+  'chr2_500_600',
+];
+const conditionNames = ['K562', 'HeLa', 'GM12878'];
+
+// Flat row-major array: row i, condition j = values[i * nConditions + j]
+const values = new Float64Array([
+  1.2, 0.8, 2.3,   // region 0
+  0.4, 1.5, 0.9,   // region 1
+  3.1, 2.7, 1.8,   // region 2
+]);
+
+const sm = new SignalMatrix(
+  regionIds,
+  conditionNames,
+  values,
+  3,  // nRegions
+  3,  // nConditions
+);
+```
+
+!!! note "Region IDs must be `chr_start_end`"
+    The constructor parses each `regionIds` entry by splitting on `_` and expects exactly three parts: chromosome, start, end. IDs with more or fewer underscores error out. This matches the R GenomicDistributions convention used by `signal_matrix.tsv` files.
+
+## `calcSummarySignal`
+
+Overlap a query region set against a `SignalMatrix`, take the MAX signal per query region per condition, and compute Tukey boxplot statistics per condition.
+
+```ts
+const peaks = new RegionSet([
+  ['chr1', 150, 250, ''],
+  ['chr2', 550, 580, ''],
+]);
+
+const result = calcSummarySignal(peaks, sm);
+
+console.log(result.conditionNames);
+// ['K562', 'HeLa', 'GM12878']
+
+console.log(result.signalMatrix);
+// Array of { region, values } — per-query-region max signal vector
+// [
+//   { region: "chr1_150_250", values: [1.2, 0.8, 2.3] },
+//   { region: "chr2_550_580", values: [3.1, 2.7, 1.8] },
+// ]
+
+console.log(result.matrixStats);
+// Per-condition Tukey stats
+// [
+//   { condition: 'K562',    lowerWhisker, lowerHinge, median, upperHinge, upperWhisker },
+//   { condition: 'HeLa',    ... },
+//   { condition: 'GM12878', ... },
+// ]
+```
+
+### Result schema
+
+**Top level:**
+- `signalMatrix: { region: string, values: number[] }[]` — one entry per query region that matched at least one signal row. `region` is the query region label in `chr_start_end` form; `values` are the per-condition max signals.
+- `matrixStats: ConditionStats[]` — one entry per condition, in the order of `conditionNames`.
+- `conditionNames: string[]` — column labels, same as the input `SignalMatrix.conditionNames`.
+
+**`ConditionStats`** (per condition, standard Tukey 5-number summary):
+
+| field | type |
+|---|---|
+| `condition` | `string` |
+| `lowerWhisker` | `number` |
+| `lowerHinge` | `number` — Q1 |
+| `median` | `number` |
+| `upperHinge` | `number` — Q3 |
+| `upperWhisker` | `number` |
+
+## End-to-end example
+
+```ts
+import init, { RegionSet, SignalMatrix, calcSummarySignal } from '@databio/gtars-js';
+
+await init();
+
+// 1. Fetch a packed binary signal matrix from the API
+const response = await fetch('/api/signal/encode_k562_hela.sigm');
+const sm = SignalMatrix.fromBin(new Uint8Array(await response.arrayBuffer()));
+
+// 2. Build a query RegionSet from user-provided peaks
+const peaks = new RegionSet(userPeakEntries);
+
+// 3. Compute summary signal
+const result = calcSummarySignal(peaks, sm);
+
+// 4. Render per-condition boxplot
+for (const stats of result.matrixStats) {
+  renderBoxplot(stats.condition, stats);
+}
+```
+
+## See also
+
+- **[wasm/regionset](regionset.md)** — the `RegionSet` type used as the query input.
+- **[wasm/partitions](partitions.md)** — partition classification.
+- **[gtars-genomicdist → Signal matrix overlap](../genomicdist.md#signal-matrix-overlap)** — Rust reference for the underlying algorithm and the packed `.sigm` binary format.
diff --git a/mkdocs.yml b/mkdocs.yml
index 1711b382..ebb40d42 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -175,7 +175,7 @@ nav:
       - Overview: gtars/modules.md
       - gtars-core: gtars/core.md
       # - gtars-io: gtars/io.md
-      - gtars-models: gtars/regionSet.md
+      - Core models: gtars/regionSet.md
       - gtars-overlaprs: gtars/overlaprs.md
       - gtars-uniwig: gtars/uniwig.md
       - gtars-igd: gtars/igd.md
@@ -183,11 +183,16 @@ nav:
       - gtars-fragsplit: gtars/fragsplit.md
       - gtars-tokenizers: gtars/tokenizers.md
       - gtars-scatrs: gtars/scatrs.md
+      - gtars-genomicdist: gtars/genomicdist.md
+      - gtars-lola: gtars/lola.md
       - gtars-refget: gtars/refget.md
       - gtars-bbcache: gtars/bbcache.md
     - Bindings:
       - Python:
         - Overview: gtars/python-overview.md
+        - Models: gtars/python/models.md
+        - GenomicDist: gtars/python/genomic_distributions.md
+        - LOLA: gtars/python/lola.md
         - Digests: gtars/python/digests.md
         - RefgetStore Tutorial: gtars/python/refgetstore.ipynb
         - RefgetStore Python Reference: gtars/python/refget-store.md
@@ -195,9 +200,17 @@ nav:
       - Wasm:
         - Overview: gtars/wasm.md
         - Overlappers: gtars/wasm/overlappers.md
+        - RegionSet: gtars/wasm/regionset.md
+        - Partitions: gtars/wasm/partitions.md
+        - Signal: gtars/wasm/signal.md
+        - LOLA: gtars/wasm/lola.md
       - CLI: gtars/cli.md
       - R:
         - R: gtars/r.md
+        - RegionSet: gtars/r/regionset.md
+        - GenomicDist: gtars/r/genomicdist.md
+        - LOLA: gtars/r/lola.md
+        - IGD: gtars/r/igd.md
         - RefgetStore: gtars/r/refgetstore.md
     - Reference:
       - Changelog: gtars/changelog.md