Conversation
|
Hi! I did some small changes to the API before publishing 0.1 after seeing this! It's mainly in Softleif/seqair#45 and not super orderly but here's the gist:
I also started on Softleif/seqair#48 but didn't merge that in yet, let me know your opinion that if you have time :) |
|
Thanks @killercup! Looks like you already fixed the things I had in mind! A few other small things:
Some wishlist items:
|
|
🤖 Benchmark update after switching this branch to crates.io Setup: same paper-derived HG00157 10 Mb BAM subset (
A couple of correctness notes from the v0.1.0 API switch:
|
Summary
seqair-pileupfeature andbase-depth --seqair-pileupexperimental backend while keeping htslib as the default path.only-depth --seqairpath that uses seqairfilter_rawto apply perbase read filters before slab writes/base decoding for rejected BAM records.master.PileupAlignmentseqair non-mate counting path; this avoids the qname/store-backedAlignmentViewwrapper forbase-depthwhen mate fixing is not enabled.--ref-fastarequirement for the experimentalbase-depthbackend.Preliminary benchmark: small paper subset
Setup: paper-derived HG00157 10 Mb BAM subset (
chr1:10,000,000-20,000,000), 3 hyperfine runs, writing TSV output files under/tmp. This was run on a busy laptop on battery, so treat as smoke-test level rather than formal benchmarking.base-depthmasterhtslibbase-depth4.647 ± 0.117 s4.589 ± 0.086 s4.828 ± 0.066 s9,892,897rows, SHA-256150f6165...)base-depth -m50.224 ± 0.261 s49.343 ± 0.104 s32.331 ± 0.151 s1.55xfaster thanmasterhtslibThe raw seqair non-mate specialization improved the seqair
base-depthsmoke result versus the previous genericAlignmentViewpath in the same run (5.098 ± 0.174 s→4.828 ± 0.066 s, about1.06xfaster), but it still did not clearly beat the optimized htslib path in this output-writing benchmark.For mate-fix with default
-F 0, htslib/seqair can still differ sparsely when same-qname secondary/supplementary observations are present and tie-breaking depends on backend iteration order. Rerunning mate-fix with secondary/supplementary excluded (-F 2304) produced identical outputs in the earlier investigation.only-depthhtslib regression smoke checkThis checks that the read-filter refactor did not regress the existing htslib path.
masteronly-depth947.9 ± 36.8 ms901.9 ± 28.9 ms3,282,664rows)only-depth -m915.3 ± 17.1 ms908.6 ± 11.5 ms3,200,602rows)only-depth -x803.4 ± 76.6 ms770.2 ± 17.3 ms3,285,180rows)only-depth -x -m823.2 ± 35.8 ms805.7 ± 20.3 ms3,200,602rows)Experimental
only-depth --seqairThe first seqair
only-depthpath is intentionally specialized separately for normal/fast and mate-aware/non-mate modes, and appliesfilter_rawbefore seqairRecordStoreslab writes for rejected BAM records.Caveat: seqair's indexed reader drops unmapped records before
filter_raw, while the current htslibonly-depth -xpath can count fetched unmapped records if the user does not exclude them. To preserve parity,only-depth --seqaircurrently requires the unmapped bit to be excluded (-F 4, or-F 3852when you would otherwise use-F 3848).Smoke result with
-F 3852on the same subset, comparing PR htslib vs PR seqair:only-depth -F 3852870.9 ± 9.4 ms943.8 ± 20.7 ms3,227,777rows, SHA-256a7e0ac95...)only-depth -F 3852 -m884.5 ± 12.1 ms944.1 ± 3.9 ms3,147,857rows, SHA-2565cd0fa30...)only-depth -F 3852 -x758.8 ± 8.8 ms902.3 ± 13.3 ms3,227,777rows, SHA-256a7e0ac95...)only-depth -F 3852 -x -m796.7 ± 35.9 ms940.8 ± 15.0 ms3,147,857rows, SHA-2565cd0fa30...)Takeaway:
filter_rawworks and outputs match when unmapped reads are excluded, but this first seqaironly-depthpath is not faster. Kept records still go through seqair's pileup-orientedRecordStore, including bases/qualities, so a realonly-depthspeedup likely needs an upstream/raw CIGAR-only visitor API that skips SEQ/QUAL/AUX for kept records too.Testing
cargo fmt --checkSDKROOT=$(xcrun --show-sdk-path) BINDGEN_EXTRA_CLANG_ARGS="--sysroot=$(xcrun --show-sdk-path)" cargo checkSDKROOT=$(xcrun --show-sdk-path) BINDGEN_EXTRA_CLANG_ARGS="--sysroot=$(xcrun --show-sdk-path)" cargo test --features seqair-pileup