Skip to content

Conversation

@d4straub
Copy link
Collaborator

@d4straub d4straub commented Nov 14, 2025

Adds SemiBin2, related to #874

SemiBin2 handles cases where it finds no bins gracefully: exit code 0. It does error though when there are too less contigs (but I could not find how many are required). Therefore errors dont need to be ignored, I think.

Whenever long reads are used in the assembly (if meta.lr_platform) then long read binning mode is activated. However I am not sure that is supposed to be used for SPAdes assemblies, because they are (at least last time I checked) short read assembles with long read stitching. But I follow the FAQ What should I do if I have hybrid data (short- and long-reads)? From SemiBin's point-of-view, you should generally treat this using the long-reads pipeline (--sequencing-type=long_read). from here.

About CI tests:

  • 8/8 succeed: 3 with SemiBin2 activated; see post below
  • test_alternatives should work as well and could be activated (requires snap update though)
  • longreadonly & longreadonly_alternatives produce assemblies with only 3 contigs, SemiBin2 complains; couldnt find any required minimum contig number though.

Still to do:

  • confirmation for successful -profile test_full pending (test running atm)

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@d4straub d4straub changed the title Integrate semibin/singleeasybin Add binner SemiBin2 Nov 14, 2025
@d4straub
Copy link
Collaborator Author

@nf-core-bot fix linting

@d4straub
Copy link
Collaborator Author

d4straub commented Nov 17, 2025

Investigated CI test outcomes:

  • 1 '-profile test' -> apparently no new bin files (locally SemiBin does produce bins), summary files changed -> snap mismatch causes fail
  • [skipped] 2 '-profile test_alternatives' has warning:
WARN: Input tuple does not match tuple declaration in process `NFCORE_MAG:MAG:DOMAIN_CLASSIFICATION:TIARA:TIARA_SUMMARY` -- offending value: [[:], [/home/runner/_work/mag/mag/~/tests/c090c86a5a5a2ac52952ba3cc887e719/work/e0/bf0d5166fd4ce157c46c583edeb3c7/MEGAHIT-SemiBin2-bins-test_minigut.binclassification.tsv], [:], [/home/runner/_work/mag/mag/~/tests/c090c86a5a5a2ac52952ba3cc887e719/work/d4/b3e898cef3fa26b001a9a0d38099dc/MEGAHIT-MetaBAT2-bins-test_minigut.binclassification.tsv]]
-> TIARA was not executed
-> new bin files, summary files changed -> snap mismatch causes fail
-> locally I can confirm the warning, but the pipeline executes all expected processes
  • 3 '-profile assembly_input' new bin files, summary files not changed -> snap mismatch causes fail
  • 4 '-profile hybrid' no new bin files! but process was executed and summary files changed -> snap mismatch causes fail
  • [skipped: assembly with too less contigs] 5 '-profile longreadonly' fails with:
Process `NFCORE_MAG:MAG:BINNING:SEMIBIN_SINGLEEASYBIN (minigut)` terminated with an error exit status (1)
2025-11-14 14:46:41 99ff53988499 SemiBin2[32] ERROR There are 3 contigs in input file FLYE-minigut.assembly.fasta, but only 3 contain(s) at least 1500 basepairs.
  • [skipped: assembly with too less contigs] 6 '-profile longreadonly_alternatives' fails as above
  • [skipped: because minimal test] 7 '-profile test_minimal'
  • [skipped: because too short contigs] 8 '-profile test_single_end' fails with:
Process `NFCORE_MAG:MAG:BINNING:SEMIBIN_SINGLEEASYBIN (test_minigut_sample2)` terminated with an error exit status (1)
2025-11-14 14:44:43 bf456a6a163f SemiBin2[32] ERROR There are 215 contigs in input file MEGAHIT-test_minigut_sample2.fa, but only 1 contain(s) at least 1500 basepairs.
--> skip that because data isnt appropriate.

Conclusion: Most tests might be fine, but not long reads only tests. I'll see what I can do to make them pass.

@d4straub d4straub marked this pull request as ready for review November 18, 2025 10:51
@d4straub d4straub requested a review from SPPearce November 18, 2025 13:07
Copy link
Contributor

@dialvarezs dialvarezs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very clean!

Minor comments from my end.

@d4straub
Copy link
Collaborator Author

Profile test_full successfully finished. I think though SemiBin2 and COMEBin need relatively long, maybe we have to make them opt-in in future.

Thanks for all the comments and the reviews!

@d4straub d4straub merged commit 142342b into nf-core:dev Nov 19, 2025
33 checks passed
@d4straub d4straub deleted the add-semibin branch November 19, 2025 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants