Skip to content

Add experimental seqair pileup backend#107

Open
sstadick wants to merge 5 commits intomasterfrom
feat/seqair
Open

Add experimental seqair pileup backend#107
sstadick wants to merge 5 commits intomasterfrom
feat/seqair

Conversation

@sstadick
Copy link
Copy Markdown
Owner

@sstadick sstadick commented May 3, 2026

Summary

  • Add an optional seqair-pileup feature and base-depth --seqair-pileup experimental backend while keeping htslib as the default path.
  • Add an experimental only-depth --seqair path that uses seqair filter_raw to apply perbase read filters before slab writes/base decoding for rejected BAM records.
  • Refactor read filtering/mate-fix code around shared read-view traits so htslib and seqair pileups can feed the same counting logic.
  • Keep the htslib pileup/counting paths specialized to avoid regressions versus master.
  • Add a raw-PileupAlignment seqair non-mate counting path; this avoids the qname/store-backed AlignmentView wrapper for base-depth when mate fixing is not enabled.
  • Add feature-gated seqair parity tests and README notes, including the CRAM --ref-fasta requirement for the experimental base-depth backend.

Preliminary benchmark: small paper subset

Setup: paper-derived HG00157 10 Mb BAM subset (chr1:10,000,000-20,000,000), 3 hyperfine runs, writing TSV output files under /tmp. This was run on a busy laptop on battery, so treat as smoke-test level rather than formal benchmarking.

base-depth

Mode master htslib PR htslib PR seqair Notes
base-depth 4.647 ± 0.117 s 4.589 ± 0.086 s 4.828 ± 0.066 s htslib path has no regression; all outputs exact (9,892,897 rows, SHA-256 150f6165...)
base-depth -m 50.224 ± 0.261 s 49.343 ± 0.104 s 32.331 ± 0.151 s htslib path has no regression; seqair is 1.55x faster than master htslib

The raw seqair non-mate specialization improved the seqair base-depth smoke result versus the previous generic AlignmentView path in the same run (5.098 ± 0.174 s4.828 ± 0.066 s, about 1.06x faster), but it still did not clearly beat the optimized htslib path in this output-writing benchmark.

For mate-fix with default -F 0, htslib/seqair can still differ sparsely when same-qname secondary/supplementary observations are present and tie-breaking depends on backend iteration order. Rerunning mate-fix with secondary/supplementary excluded (-F 2304) produced identical outputs in the earlier investigation.

only-depth htslib regression smoke check

This checks that the read-filter refactor did not regress the existing htslib path.

Mode master PR Output parity
only-depth 947.9 ± 36.8 ms 901.9 ± 28.9 ms exact (3,282,664 rows)
only-depth -m 915.3 ± 17.1 ms 908.6 ± 11.5 ms exact (3,200,602 rows)
only-depth -x 803.4 ± 76.6 ms 770.2 ± 17.3 ms exact (3,285,180 rows)
only-depth -x -m 823.2 ± 35.8 ms 805.7 ± 20.3 ms exact (3,200,602 rows)

Experimental only-depth --seqair

The first seqair only-depth path is intentionally specialized separately for normal/fast and mate-aware/non-mate modes, and applies filter_raw before seqair RecordStore slab writes for rejected BAM records.

Caveat: seqair's indexed reader drops unmapped records before filter_raw, while the current htslib only-depth -x path can count fetched unmapped records if the user does not exclude them. To preserve parity, only-depth --seqair currently requires the unmapped bit to be excluded (-F 4, or -F 3852 when you would otherwise use -F 3848).

Smoke result with -F 3852 on the same subset, comparing PR htslib vs PR seqair:

Mode PR htslib PR seqair Output parity
only-depth -F 3852 870.9 ± 9.4 ms 943.8 ± 20.7 ms exact (3,227,777 rows, SHA-256 a7e0ac95...)
only-depth -F 3852 -m 884.5 ± 12.1 ms 944.1 ± 3.9 ms exact (3,147,857 rows, SHA-256 5cd0fa30...)
only-depth -F 3852 -x 758.8 ± 8.8 ms 902.3 ± 13.3 ms exact (3,227,777 rows, SHA-256 a7e0ac95...)
only-depth -F 3852 -x -m 796.7 ± 35.9 ms 940.8 ± 15.0 ms exact (3,147,857 rows, SHA-256 5cd0fa30...)

Takeaway: filter_raw works and outputs match when unmapped reads are excluded, but this first seqair only-depth path is not faster. Kept records still go through seqair's pileup-oriented RecordStore, including bases/qualities, so a real only-depth speedup likely needs an upstream/raw CIGAR-only visitor API that skips SEQ/QUAL/AUX for kept records too.

Testing

  • cargo fmt --check
  • SDKROOT=$(xcrun --show-sdk-path) BINDGEN_EXTRA_CLANG_ARGS="--sysroot=$(xcrun --show-sdk-path)" cargo check
  • SDKROOT=$(xcrun --show-sdk-path) BINDGEN_EXTRA_CLANG_ARGS="--sysroot=$(xcrun --show-sdk-path)" cargo test --features seqair-pileup

@killercup
Copy link
Copy Markdown

Hi! I did some small changes to the API before publishing 0.1 after seeing this! It's mainly in Softleif/seqair#45 and not super orderly but here's the gist:

  • IndexedBamReader.keep_unmapped(true)
  • RecordStore.records() gives you an Iterator now
  • A seq_len=0 record now emit Base::Unknowns for all matches

I also started on Softleif/seqair#48 but didn't merge that in yet, let me know your opinion that if you have time :)

@sstadick
Copy link
Copy Markdown
Owner Author

sstadick commented May 9, 2026

Thanks @killercup! Looks like you already fixed the things I had in mind!

A few other small things:

  • It seems that htslib gives a ref_end position for unmapped reads of ref_start+1, where seqair gives the ref_end as the full expansion of the cigar string. I didn't actually realize htslib did that till this diff popped up. I'm not sure which is more correct, but leaning toward htslib compat is probably best.
  • The ordering of reads that is returned from PileupColumn::alignments does not match htslib. This is something I need to fix in perbase, when tie-breaking between two reads if all else was equal I just keep whichever was returned to me first. This leads to small diffs between htslib and seqair when running with totally open filters (-F 0). Again, something that perbase should handle better, but a notable diff either way for seqair.

Some wishlist items:

  • pileups_into(&mut buffer) to allow the caller to pass a vec to be reused by the internals of seqair to avoid the Vec allocation for each call their
  • Even more configuration regarding what gets decoded, i.e. the ability to skip bases/quals/aux tags, or even just provide a RecordDecodeProfile to select what to decode. This one may not be realistic at all! But the only-depth path in perbase, and many other tools really, iterate over records as Intervals, not as sequences, and don't actually use a lot of that info.

@sstadick
Copy link
Copy Markdown
Owner Author

sstadick commented May 9, 2026

🤖 Benchmark update after switching this branch to crates.io seqair = 0.1.0 / seqair-types = 0.1.0 and wiring the new APIs (keep_unmapped(true), RecordStore::records(), empty-SEQ parity test).

Setup: same paper-derived HG00157 10 Mb BAM subset (chr1:10,000,000-20,000,000), release build with --features seqair-pileup, writing TSV outputs.

Mode htslib seqair Notes
base-depth 4.992 ± 0.207 s 5.323 ± 0.127 s 3 runs; exact output parity, SHA-256 150f6165...
base-depth -m 49.455 s 32.429 s 1 run; seqair 1.53x faster; same 9,892,897 row count; default -F 0 still has the known 12 sparse same-qname secondary/supplementary tie-order diffs
only-depth 914.5 ± 38.7 ms 1.024 ± 0.072 s 3 runs; default -F 0; exact output parity, SHA-256 0d98d1ab...
only-depth -x 757.2 ± 6.5 ms 1.012 ± 0.088 s 3 runs; default -F 0; exact output parity, SHA-256 876b7692...

A couple of correctness notes from the v0.1.0 API switch:

  • Empty SEQ=* is now fixed upstream for our use case. The ignored TODO is now a real htslib-vs-seqair parity test.
  • only-depth --seqair no longer requires users to exclude unmapped reads with -F 4; perbase now uses IndexedBamReader.keep_unmapped(true) and applies its normal read-filter semantics.
  • The only-depth seqair path now matches htslib at default -F 0, but is still slower on this benchmark. The remaining speedup opportunity still looks like a CIGAR/interval-only path that can avoid materializing SEQ/QUAL/AUX for kept records.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants