Use seqair pileup accumulator for base-depth by sstadick · Pull Request #108 · sstadick/perbase

sstadick · 2026-05-09T10:59:54Z

Summary 🤖

This is a downstream experiment for the seqair pileup aggregation API in my seqair fork: sstadick/seqair#1.

The branch pins seqair / seqair-types to my seqair fork commit 6d9251c and changes only the non-mate base-depth --seqair-pileup path to use the new custom accumulator API:

PileupEngine::pileup_with(...)
SeqairPileupPositionAccumulator

The existing mate-aware base-depth -m --seqair-pileup path still uses materialized PileupColumns because mate fixing needs per-column grouping by QNAME.

Why

Previously, non-mate base-depth --seqair-pileup materialized a PileupColumn and then perbase looped over column.raw_alignments() to compute depth, base counts, insertions, deletions, refskips, and fail counts.

With the accumulator API, perbase computes those row counts while seqair is already walking emitted pileup observations. For simple non-mate base-depth, this avoids materializing a public pileup column and avoids a downstream second pass over alignments.

Correctness checks

The existing htslib-vs-seqair process-region parity test now exercises the accumulator path for non-mate cases and the old materialized path for mate-aware cases.

The empty-SEQ regression test now compares all three paths:

htslib pileup
seqair materialized column path
seqair accumulator path

Local validation

cargo fmt --check

SDKROOT="$(xcrun --show-sdk-path)" \
BINDGEN_EXTRA_CLANG_ARGS="-isysroot $(xcrun --show-sdk-path)" \
cargo check

SDKROOT="$(xcrun --show-sdk-path)" \
BINDGEN_EXTRA_CLANG_ARGS="-isysroot $(xcrun --show-sdk-path)" \
cargo check --features seqair-pileup

SDKROOT="$(xcrun --show-sdk-path)" \
BINDGEN_EXTRA_CLANG_ARGS="-isysroot $(xcrun --show-sdk-path)" \
cargo test --features seqair-pileup

Results:

lib tests: 23 passed
bin tests: 117 passed
doctests: 1 passed

Benchmarking

Benchmark numbers are posted as a PR comment.

sstadick · 2026-05-09T11:05:50Z

Benchmark update for this branch using the seqair accumulator API from sstadick/seqair#1 (sstadick/seqair@6d9251c).

Setup: release build with --features seqair-pileup, HG00157 chr1 10 Mb BAM subset (paper/data/HG00157.chr1_10mb.bam) and BED (paper/data/hg00157_chr1_10mb.bed), writing TSV output. Non-mate base-depth --seqair-pileup uses the new accumulator path; mate-aware base-depth -m --seqair-pileup and only-depth --seqair are unchanged paths.

Mode	htslib	seqair	Notes
`base-depth`	`5.147 ± 0.461 s`	`5.104 ± 0.166 s`	3 runs; seqair accumulator `1.01x` faster; exact output parity, SHA-256 `150f6165...`
`base-depth -m`	`51.045 s`	`33.402 s`	1 run; seqair `1.53x` faster; same 9,892,897 row count; still the known 12 sparse default `-F 0` mate-order diffs
`base-depth -m -F 2304`	`51.072 s`	`33.770 s`	1 run; seqair `1.51x` faster; exact output parity, SHA-256 `393a5787...`
`only-depth`	`1.203 ± 0.157 s`	`1.263 ± 0.125 s`	3 runs; exact output parity, SHA-256 `0d98d1ab...`; unchanged path
`only-depth -x`	`859.3 ± 15.6 ms`	`1.200 ± 0.053 s`	3 runs; exact output parity, SHA-256 `876b7692...`; unchanged path

Output checks:

base-depth htslib              9,892,897 rows  150f61653f0cd9225787248fecbbd9d46bde1a9644632bcba8f1898ee780c572
base-depth seqair accumulator  9,892,897 rows  150f61653f0cd9225787248fecbbd9d46bde1a9644632bcba8f1898ee780c572

base-depth -m htslib           9,892,897 rows  07b50f9750b8760e49b997622d878d8f159276d0b39644bfc4e57bd4e88a04d7
base-depth -m seqair           9,892,897 rows  c73f7b661170eecfdb623c041d126e63cba638e2fed73eceeaeb58afb53e6b7f
base-depth -m diff lines: 12

base-depth -m -F2304 htslib    9,892,897 rows  393a578788a83f589de5e712aeb1abfc74bd4bbed464ab59ec0353a0ece81045
base-depth -m -F2304 seqair    9,892,897 rows  393a578788a83f589de5e712aeb1abfc74bd4bbed464ab59ec0353a0ece81045

only-depth htslib              3,282,664 rows  0d98d1abfe60c79d4bbb51bfa14fb00e5f5123311e216d28ba34cd0ea3c680b1
only-depth seqair              3,282,664 rows  0d98d1abfe60c79d4bbb51bfa14fb00e5f5123311e216d28ba34cd0ea3c680b1

only-depth -x htslib           3,285,180 rows  876b76926203980e3a9900f2d6e7e05cc7f77898fe4df413689ac39b60e9a030
only-depth -x seqair           3,285,180 rows  876b76926203980e3a9900f2d6e7e05cc7f77898fe4df413689ac39b60e9a030

Compared with the previous seqair v0.1.0 materialized-column run on this same benchmark, non-mate base-depth --seqair-pileup moved from 5.323 ± 0.127 s to 5.104 ± 0.166 s while preserving exact parity. The main intended win here is avoiding the downstream second pass/materialized column path for simple non-mate counting.

sstadick added 2 commits May 9, 2026 06:21

chore: use seqair 0.1.0 APIs

92fa593

feat: use seqair pileup accumulator

6d55e59

sstadick mentioned this pull request May 9, 2026

Add experimental pileup column aggregation API sstadick/seqair#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use seqair pileup accumulator for base-depth#108

Use seqair pileup accumulator for base-depth#108
sstadick wants to merge 2 commits into
feat/seqairfrom
feat/seqair-pileup-aggregation

sstadick commented May 9, 2026 •

edited

Loading

Uh oh!

sstadick commented May 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sstadick commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary 🤖

Why

Correctness checks

Local validation

Benchmarking

Uh oh!

sstadick commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sstadick commented May 9, 2026 •

edited

Loading

sstadick commented May 9, 2026 •

edited

Loading