Skip to content

0.9.0 release working PR#256

Open
sanghoonio wants to merge 37 commits into
masterfrom
dev
Open

0.9.0 release working PR#256
sanghoonio wants to merge 37 commits into
masterfrom
dev

Conversation

@sanghoonio
Copy link
Copy Markdown
Member

opening a PR to track work on dev

khoroshevskyi and others added 30 commits February 11, 2026 01:06
Enables JS/TS consumers to build a RegionSetList from individual
RegionSet objects, unlocking batch pairwiseJaccard() for uploaded
collections in bedbase-ui.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds RegionSetListOps trait in gtars-genomicdist with indexed pair
operations (pintersect_at, jaccard_at, union_at, setdiff_at,
region_count, union_except) so all bindings (wasm, Python, R) can
operate on pairs by index without cloning across the FFI boundary.

Also adds RegionSetList.fromEntries() to wasm bindings for zero-clone
construction directly from BED entry arrays.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add LolaColumnar struct and results_to_columns() in gtars-lola/output.rs
  so all bindings share a single row→column pivot implementation
- Simplify WASM, Python, and R bindings to use results_to_columns()
  instead of copy-pasting the 22-field iteration (~100 lines each)
- Remove empty_to_none/empty_to_na helpers from each binding (now in core)
- Remove greet() scaffolding from gtars-wasm/lib.rs
- Remove unused write_results_to_file() from gtars-lola/output.rs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add bulk_union_except() using prefix/suffix arrays: O(n) unions
  instead of O(n²) for computing per-file unique region counts
- Add union_all() and intersect_all() fold methods
- Add wasm bindings: bulkUnionExcept (returns union stats + per-file
  unique counts in a single call), unionAll, intersectAll

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…etListOps and results_to_columns

intersect_all() was incorrectly using pintersect (positional pairing)
instead of intersect (range-level), producing wrong results when sets
have different numbers of regions. Added 21 tests covering all
RegionSetListOps trait methods and 3 tests for results_to_columns.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ze suffix construction

- Always push a name in JsRegionSetList::add() to keep names/region_sets aligned
- Return None from union_except when skip >= n instead of silently ignoring
- Build bulk_union_except suffix array incrementally to avoid redundant clones

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The refactor in eb8ef5a removed this function but r_regiondb_anno still
depends on it for converting empty annotation strings to R NA values.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace empty-string-as-sentinel pattern with Option<String> for all
annotation fields (cell_type, description, tissue, data_source, antibody,
treatment, collection). This eliminates empty_to_na/empty_to_none helpers
from all bindings — R gets NA, Python gets None, and WASM gets null
natively from the type system.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add RegionSetList bindings and indexed operations
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WASM builds from the local workspace, not crates.io, so it doesn't
need publish-all-crates to succeed first.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
The Rust core genomicdist crate already has both region_distribution_with_bins
and region_distribution_with_chrom_sizes, but only R binding exposed the
chrom_sizes variant. Without reference-derived per-chrom bin sizes, every BED
file gets its own bin width (from max observed coordinate), making outputs
incomparable across files. Server-side SQL aggregation is impossible under
this scheme.

This wires the chrom_sizes variant through CLI, Python, and WASM so that
bin widths are a property of the reference genome, not the BED file.

Changes per binding:
  CLI:    --chrom-sizes dispatches to with_chrom_sizes. When absent, prints
          loud warning to stderr noting that outputs won't be comparable
          across files. Existing --chrom-sizes flag (previously only used
          for partitions/promoter trimming) now also controls region_dist.
  Python: distribution() accepts optional chrom_sizes=None kwarg.
  WASM:   regionDistribution(n_bins, chrom_sizes) dispatches on
          null/undefined. Breaking change for the JS signature.
  R:      already supported, but updated to handle new result type.

Bonus fix: region_distribution_with_chrom_sizes had a latent bug where
regions whose midpoint fell beyond the stated chromosome size produced
invalid bins (end < start, rid >= n_bins). This happens with assembly
mismatches (e.g. hg38 BED paired with hg19 chrom_sizes). Fixed by:
  - Changing return type to RegionDistResult { bins, out_of_range }
  - Skipping regions with mid >= chrom_size, tracking per-chrom counts
  - CLI warns with per-chrom breakdown when out_of_range is non-empty
  - Python/WASM/R surface out_of_range alongside bins

Tests added:
  - gtars-genomicdist: 2 new tests covering out-of-range skip + happy path
  - gtars-python: 3 new tests covering new result shape + chrom_sizes kwarg

Test results (all green):
  cargo test -p gtars-genomicdist:  262 passed (260 original + 2 new)
  pytest tests/ (gtars-python):     147 passed (146 original + 1 new)
  wasm-pack test --node:            2 passed
  CLI smoke: dispatch + warning confirmed on hg38 BED + hg19 chrom_sizes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cross-binding alignment for genomicdist functions:
- Add ignore_unk_chroms parameter to calc_dinucl_freq (core, Python, R,
  CLI), matching calc_gc_content's existing behavior for alt chromosomes
- Fold calc_dinucl_freq variants into single R-aligned function with
  raw_counts and ignore_unk_chroms parameters
- Align region_distribution to return RegionDistResult with out_of_range
  tracking across Python, R, and WASM bindings
- Simplify Python neighbor_distances/nearest_neighbors (remove dead
  Option<> wrapping)
- Update R type casts, test coverage, and rextendr-generated wrappers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Set default features to include all subcommands so cargo install
  produces a functional binary (previously default = [] gave an empty
  CLI with no subcommands visible in --help)
- Fix overlaprs description ("Tokenize data into a universe" → accurate
  description of overlap-based tokenization)
- Add --query and --universe long-form flags to overlaprs (previously
  only -q/-u short forms existed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Introduces the .fab binary FASTA format — sequences stored contiguously
without line wrapping, enabling mmap + zero-copy &[u8] slices. This
gives both instant construction (~ms) and fast per-region access,
replacing the tradeoff between HashMap (fast access, slow load) and
line-wrapped mmap (fast load, slow access due to newline filtering).

Benchmarked at 83.9s for 30 files vs 151.2s with HashMap-per-subprocess.
The .fab format eliminates the need for a mixed CLI+Python backend.

Core changes:
- BinaryGenomeAssembly: mmap-backed .fab reader with SequenceAccess impl
- BinaryGenomeAssembly::write_from_fasta: one-time FASTA → .fab conversion
- SequenceAccess trait: calc_gc_content/calc_dinucl_freq accept either
  GenomeAssembly (HashMap fallback) or BinaryGenomeAssembly (.fab)
- memmap2 dependency added

CLI changes:
- `gtars prep --fasta` command creates .fab files
- `gtars genomicdist --fasta` auto-detects .fab vs .fa by extension
- --fasta help text updated to mention .fab format
- --dinucl-freq made opt-in (expensive for wide regions)

Binding changes:
- Python: BinaryGenomeAssembly exposed, calc_gc_content/calc_dinucl_freq
  accept either type via PyAny dispatch
- R: load_binary_genome_assembly() exposed, with_assembly! macro handles
  both types via dyn SequenceAccess
- Type stubs updated (Union types, missing ignore_unk_chroms parameter)

Tests: 265 total (262 existing + 3 new .fab round-trip/bad-magic/GC-parity)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Binary FASTA format, binding alignment, CLI UX
…osomes

Previously each chromosome used bin_size = chrom_size / n_bins, giving every
chromosome 250 bins regardless of size. This mismatched the compression and
UI layers which expect proportional bins (longest chrom gets n_bins, shorter
ones get fewer). The fix computes a uniform bin_width from the longest
chromosome so that bin index N maps to the same genomic position across all
files for a given genome, enabling valid cross-file aggregation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix region distribution binning to use uniform bin width across chrom…
Adds four new per-file summary statistics to GenomicIntervalSetStatistics,
rewrites gaps() to require chrom_sizes for correct boundary handling,
closes several pre-existing binding asymmetries, and exposes all of this
through the CLI, Python, R, and Wasm layers.

## New Rust functions (gtars-genomicdist)

- `calc_inter_peak_spacing` → SpacingStats (mean / median / std / IQR /
  log_mean / log_std) over calc_neighbor_distances(). NaN for empty,
  singleton, or all-overlapping inputs.
- `calc_peak_clusters(radius_bp)` → ClusterStats (n_clusters,
  n_clustered_peaks, mean/max cluster size over size-gt-1 clusters,
  fraction_clustered) wrapping the existing cluster() primitive.
- `calc_density_vector(chrom_sizes, n_bins)` → DensityVector. Dense
  zero-padded per-window count vector ordered karyotypically, with
  chrom_offsets so callers recover per-chromosome slices without
  re-binning.
- `calc_density_homogeneity(chrom_sizes, n_bins)` → DensityHomogeneity
  (n_windows, n_nonzero_windows, mean_count, variance, cv, Gini).
  Delegates to calc_density_vector to avoid the zero-window correctness
  trap present in region_distribution_with_chrom_sizes.

28 new Rust unit tests cover edge cases (empty, singleton, per-chrom,
cross-chrom, out-of-bounds, zero-padding).

## gaps() rewritten

`IntervalRanges::gaps` now takes a required `&HashMap<String, u32>` of
chromosome sizes and emits:
- leading gaps from 0 to first peak
- inter-region gaps (unchanged)
- trailing gaps from last peak to chrom_size
- full-chromosome gaps for any chromosome in chrom_sizes with no regions
  (matches bedtools complement)

Output is karyotypically ordered. Regions past chrom_size are clipped.

This is a **breaking API change** but corrects a real bug: the old
gaps() couldn't include trailing-gap regions (no access to chrom_size)
so the output was always wrong at the end of every chromosome.

The new signature subsumes the proposed `largest_gaps` feature, which
is dropped from scope — callers wanting top-N gaps filter gaps() output
themselves.

Updates the 3 existing callers (gtars-cli ranges, gtars-r, gtars-wasm)
and adds 10 Rust unit tests for the new behavior.

## Pre-existing binding gaps closed

- Python: expose `gaps()` on PyRegionSet (previously R/Wasm only)
- R: expose `cluster()` as `clusterRegions(maxGap)` (previously Py only)
- Wasm: expose `cluster(max_gap)` (previously Py only)

## CLI

- `gtars ranges gaps`: new required `--chrom-sizes` flag.
- `gtars genomicdist`: four new JSON output fields —
  `inter_peak_spacing`, `peak_clusters` (array over stitching radii),
  `density_vector`, `density_homogeneity`. Density fields gated on
  `--chrom-sizes` (warning when skipped, matching expected_partitions).
- New `--cluster-radii` flag (default "500,5000,50000") probes
  promoter / enhancer / domain scales in one invocation.

## Bindings

- **Python** (gtars-python): PyRegionSet gains gaps(), inter_peak_spacing(),
  peak_clusters(), density_vector(), density_homogeneity(). Four new
  #[pyclass] result wrappers with getters and __repr__. Module
  registration + .pyi stubs updated. 11 new pytest cases.
- **R** (gtars-r): new S4 generics interPeakSpacing, peakClusters,
  densityVector, densityHomogeneity. gaps() method updated to take
  chrom_sizes (start/end retained for IRanges generic compatibility,
  ignored). extendr-wrappers.R and man/*.Rd regenerated via
  rextendr::document(). 11 new testthat cases.
- **Wasm** (gtars-wasm): cluster(), interPeakSpacing(), peakClusters(),
  densityVector(), densityHomogeneity() on JsRegionSet. gaps() signature
  updated. Results via serde_wasm_bindgen. Compiles clean against
  wasm32-unknown-unknown.

## Side fix: calcDinuclFreq R wrapper arity

`rextendr::document()` exposed a pre-existing bug: R/extendr-wrappers.R
was checked in with a stale 3-arg signature for r_calc_dinucl_freq,
while the Rust source has taken 4 args since commit bd1e5b7. The
regenerated wrapper file is now correct, and R/genomicdist.R is updated
to pass the 4th ignoreUnknownChroms argument so calcDinuclFreq()
dispatches without a runtime arity error.

## Verification

- cargo test -p gtars-genomicdist — 296 passed (+28 new)
- cargo check --workspace — clean
- cargo check -p gtars-js --target wasm32-unknown-unknown — clean
- pytest tests/test_regionset.py — 25 passed (+11 new)
- Rscript -e 'devtools::test()' — 288 passed (+11 new; 2 pre-existing
  calcDinuclFreq failures fixed by the side fix above)
- End-to-end CLI smoke against ENCODE ENCFF730DLS (23K regions, hg19):
  all four new JSON fields populate with plausible values.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses review feedback on #254 — no code changes, documentation only.

Reviewer found that `n_bins=249` against canonical UCSC hg38.chrom.sizes
(455 entries) produced 3585 total windows because every contig shorter
than the derived bin_width contributes a single narrow bin. The
parameter semantic ("target bin count for the longest chromosome")
wasn't clear from the existing docstrings, and the consequence for
short contigs wasn't documented at all.

Updated docstrings now state:

- `n_bins` is the target bin count for the **longest chromosome** in
  `chrom_sizes`, not the total length of `counts`. Bin width is
  `max(chrom_sizes) / n_bins` (floored, minimum 1 bp). Total bins
  returned is `sum(ceil(size / bin_width))` across all chromosomes and
  can substantially exceed `n_bins`. To target a specific bin width,
  pass `n_bins = max_chrom_len / desired_bin_width_bp`.

- The last bin on each chromosome is narrower than `bin_width` whenever
  `chrom_size` is not an exact multiple. Chromosomes shorter than
  `bin_width` (common with UCSC alt / random / unplaced contigs) reduce
  to a single bin whose effective width equals the chromosome size.
  Counts are per-bin, not per-bp — bins of different effective widths
  are not directly comparable as densities.

The doc framing is a technical description of behavior, not a "you
should pre-filter your chrom_sizes" prescription — users decide what to
do with the information.

Mirrored the same language across every layer so the caller sees the
same explanation regardless of binding:

- gtars-genomicdist/src/statistics.rs — trait method docstrings for
  calc_density_vector and calc_density_homogeneity
- gtars-genomicdist/src/models.rs — struct-level docstrings for
  DensityVector and DensityHomogeneity, with field-level notes on
  bin_width and n_windows
- gtars-python/py_src/gtars/models/__init__.pyi — class-level and
  method-level docstrings
- gtars-python/src/models/region_set.rs — PyRegionSet method docs
- gtars-r/R/RegionSet-methods.R — roxygen Details sections on
  densityVector and densityHomogeneity generics
- gtars-r/src/rust/src/genomicdist.rs — extendr doc comments for
  r_calc_density_vector and r_calc_density_homogeneity (become Rd
  man pages)
- gtars-wasm/src/regionset.rs — wasm-bindgen doc comments for
  densityVector and densityHomogeneity
- gtars-cli/src/genomicdist/cli.rs — --bins help text now explains the
  semantic and how to compute a value from a target bin width
- gtars-cli/src/genomicdist/handlers.rs — GenomicDistOutput field docs
  for density_vector and density_homogeneity

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses review feedback on #254 — Sanghoon found that
calc_peak_clusters mixed two populations: `n_clusters` counted all
connected components including singletons, while `mean_cluster_size`
averaged only over clusters of size > 1. The naive product
`n_clusters * mean_cluster_size` was not `n_clustered_peaks`, which
caused real confusion during integration testing.

## Fix

Add `min_cluster_size: usize` parameter to `calc_peak_clusters` and
apply it **uniformly** to every size-dependent field:

- `n_clusters`          = count of clusters with size >= min_cluster_size
- `n_clustered_peaks`   = peaks belonging to those clusters
- `mean_cluster_size`   = mean size of those clusters
- `fraction_clustered`  = n_clustered_peaks / total_peaks (raw total)
- `max_cluster_size`    = max over **all** clusters (inherently unfiltered)

This preserves the arithmetic identity
`n_clusters * mean_cluster_size == n_clustered_peaks` at any threshold.

## Default value

Bindings default `min_cluster_size` to **2**. At this threshold every
field describes "clusters of at least 2 peaks" — the scientifically
meaningful view matching typical enhancer-clustering / super-enhancer
stitching analyses. `fraction_clustered` is then the fraction of peaks
with at least one neighbor within `radius_bp`.

Callers who want the simple-average view pass `min_cluster_size = 1`:
`mean_cluster_size` becomes the simple `total_peaks / n_clusters`, but
`n_clustered_peaks` degenerates to `total_peaks` and
`fraction_clustered` to `1.0` by definition (every peak is in a
cluster of size >= 1). Useful but less informative — the size-count
fields become tautological at this threshold.

Explored but rejected:
- Leaving min_cluster_size to affect only the mean (preserves existing
  asymmetry).
- Defaulting to 1 (simple mean): Sanghoon's identity trap returns
  because n_clusters and mean_cluster_size use different populations
  at any non-default value.

## Empty / edge cases

- Empty input: all fields zero, mean and fraction NaN (unchanged).
- All singletons at default min=2: n_clusters=0, n_clustered_peaks=0,
  max_cluster_size=1 (inherent max), mean=NaN, fraction=0.
- All singletons at min=1: n_clusters=n_peaks, all three peaks each
  contribute a size-1 cluster, mean=1.0, fraction=1.0.

## Changes by layer

- **Rust core** (gtars-genomicdist): trait method signature updated,
  impl rewritten to apply the filter uniformly, ClusterStats struct
  docstring rewritten, 8 unit tests updated/added covering default,
  min=1, min=3, all-singletons-both-thresholds, and cross-chromosome.
- **Python** (gtars-python): `peak_clusters(radius_bp, min_cluster_size=2)`
  with pyo3 default. .pyi stub updated. 3 new pytest cases
  (default, min=1 simple average, empty).
- **R** (gtars-r): S4 generic and methods default `min_cluster_size = 2L`.
  Rust extendr binding takes the parameter explicitly. testthat tests
  rewritten. extendr-wrappers.R regenerated via `rextendr::document()`;
  peakClusters.Rd and r_calc_peak_clusters.Rd also regenerated.
- **Wasm** (gtars-wasm): `peakClusters(radius_bp, min_cluster_size)` —
  no default at the Wasm layer, users pass both. Doc comment updated.
- **CLI** (gtars-cli genomicdist): new `--cluster-min-size` flag with
  default 2, applied uniformly across all --cluster-radii. Handler
  parses and threads through to `calc_peak_clusters`.

## Verification

- cargo test -p gtars-genomicdist — 299 passed (+3 new over last
  baseline: min=3 case, all-singletons-min=1, all-singletons-min=2)
- cargo check --workspace — clean
- Rscript -e 'devtools::test()' — 294 passed (no regressions)
- pytest tests/test_regionset.py — 26 passed
- End-to-end CLI against real ENCFF730DLS (23420 peaks):
    default (min=2), r=5000: n_clusters=5614, n_clustered=16555,
                              mean=2.949, fraction=0.707
    identity: 5614 * 2.949 == 16555 ✓
    min=1, r=5000:             n_clusters=12479, n_clustered=23420,
                              mean=1.877, fraction=1.000

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sanghoonio and others added 5 commits April 13, 2026 20:18
Add spatial-arrangement stats and fix gaps() chrom_sizes signature
…, R 0.9.0

Prep for next release after PR #254 (spatial-arrangement stats + gaps()
signature fix). Genomicdist gets a minor bump for new public API and the
breaking gaps() change; bindings and CLI bump to 0.9.0 since they re-export
the new functions and inherit the breaking change.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace PyResult<PyObject> with PyResult<Py<PyAny>> (PyObject is a
deprecated alias) and Python::with_gil with Python::attach. Both are
mechanical renames in pyo3 0.27 with no behavioral change; will become
hard errors in pyo3 0.28+.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove unused `wasm_bindgen::prelude::*` import from lib.rs (re-exports
don't need it; submodules import it themselves). Comment out the
[profile.release] block in Cargo.toml — cargo silently ignores profile
sections in non-root workspace members, so this block has been dead since
the wasm crate was created. Left commented with a note explaining the
correct placement options for any future opt-level work.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PathBuf is only used inside `from_pretrained`, which is itself
`#[cfg(feature = "huggingface")]`. Without huggingface enabled (e.g. in
the wasm build, which doesn't use that feature), the import becomes
unused and triggers an `unused_imports` warning. Gate the import with
the same cfg as its call site.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 93.04623% with 373 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.44%. Comparing base (00405cc) to head (d6840e3).
⚠️ Report is 152 commits behind head on master.

Files with missing lines Patch % Lines
gtars-genomicdist/src/partitions.rs 88.44% 58 Missing ⚠️
gtars-genomicdist/src/models.rs 87.41% 57 Missing ⚠️
gtars-igd/src/igd.rs 93.83% 57 Missing ⚠️
gtars-core/src/models/region_set_list.rs 88.21% 33 Missing ⚠️
gtars-lola/src/database.rs 93.68% 32 Missing ⚠️
gtars-genomicdist/src/signal.rs 92.28% 25 Missing ⚠️
gtars-lola/src/output.rs 92.68% 24 Missing ⚠️
gtars-genomicdist/src/statistics.rs 87.77% 22 Missing ⚠️
gtars-lola/src/enrichment.rs 97.68% 19 Missing ⚠️
gtars-genomicdist/src/asset.rs 94.66% 15 Missing ⚠️
... and 7 more
Additional details and impacted files
@@             Coverage Diff             @@
##           master     #256       +/-   ##
===========================================
+ Coverage   57.62%   82.44%   +24.81%     
===========================================
  Files          91       74       -17     
  Lines       13049    20117     +7068     
===========================================
+ Hits         7520    16585     +9065     
+ Misses       5529     3532     -1997     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sanghoonio and others added 2 commits April 19, 2026 16:41
Surgically reverts the four per-file summary statistics added in #254
(`calc_inter_peak_spacing`, `calc_peak_clusters`, `calc_density_vector`,
`calc_density_homogeneity`) and the `--cluster-radii` / `--cluster-min-size`
CLI flags that drove them. On review these APIs are either thin scalar
reductions over existing primitives (`calc_neighbor_distances`,
`region_distribution_with_chrom_sizes`) or — in the case of
`calc_peak_clusters` — parameterized by a biologically-underdetermined
stitching radius. Not load-bearing for any current consumer.

## Removed

- gtars-genomicdist: 4 trait methods and 4 result structs (`SpacingStats`,
  `ClusterStats`, `DensityVector`, `DensityHomogeneity`), ~28 unit tests.
- gtars-cli: `--cluster-radii`, `--cluster-min-size` flags and the four
  JSON output fields in `gtars genomicdist`.
- gtars-python: `inter_peak_spacing()`, `peak_clusters()`, `density_vector()`,
  `density_homogeneity()` on `PyRegionSet`; 4 `#[pyclass]` result wrappers;
  `.pyi` stubs; 11 pytest cases.
- gtars-r: `interPeakSpacing`, `peakClusters`, `densityVector`,
  `densityHomogeneity` S4 generics and methods; 4 `r_calc_*` Rust wrappers;
  8 `man/*.Rd` files; NAMESPACE exports; testthat cases.
- gtars-wasm: `interPeakSpacing`, `peakClusters`, `densityVector`,
  `densityHomogeneity` methods on `JsRegionSet`.

## Kept (also from #254, orthogonally good)

- `IntervalRanges::gaps(chrom_sizes)` rewrite. Real bug fix — the old
  signature couldn't emit trailing gaps and was silently wrong at every
  chromosome end. The breaking API change stands.
- `cluster()` binding parity: R `clusterRegions(maxGap)` and WASM
  `cluster(max_gap)`. These expose a pre-existing Rust primitive
  (`IntervalRanges::cluster` from #241) to R and WASM for parity with
  Python. Independent of the stats wrapper critique.
- `calcDinuclFreq` R wrapper arity fix (side fix in #254).
- Version bumps (genomicdist 0.8.0, cli/python/wasm/R 0.9.0). The `gaps()`
  signature change is breaking on its own and independently warrants
  the minor bump.

## Verification

- `cargo test -p gtars-genomicdist` — 275 passed, 0 failed
- `cargo check --workspace` — clean
- `cargo check -p gtars-js --target wasm32-unknown-unknown` — clean
- `pytest tests/test_regionset.py` — 17 passed
- `Rscript -e 'devtools::test()'` — 259 passed, 0 failed

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove spatial-arrangement stats from genomicdist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants