Add module PBMARKDUP #9457

sainsachiko · 2025-11-25T18:05:01Z

Add pbmarkdup to identifies and marks duplicate reads in PacBio HiFi (CCS) data.
#9456
Thank you for reviewing!

PR checklist

Closes #XXX

DLBPointon

làm tốt lắm!

I just have a few comments to flesh it out more.

It'll mean adding another test, sorry. Feel free to get another opinion though.

DLBPointon · 2025-11-26T10:59:32Z

modules/nf-core/pbmarkdup/meta.yml

+  - pbmarkdup:
+      description: |
+        pbmarkdup identifies and marks duplicate reads in PacBio HiFi (CCS) data. It clusters
+        highly similar CCS reads to detect PCR duplicates and flags them in the BAM output


Change the bit about it being BAM output, as there are 3 formats it can output.

DLBPointon · 2025-11-26T11:12:29Z

modules/nf-core/pbmarkdup/main.nf

+
+    output:
+    tuple val(meta), path("${prefix}.${suffix}"), emit: markduped
+    path "versions.yml"                         , emit: versions


I feel like the --dup-file dups.fasta flag needs better support, otherwise it won't get caught and output.

Perhaps an output channel, and something to capture the name the user will use in the config?

output: tuple val(meta), path("${dup_file}"), optional: true, emit: duplicates script: // This little chunk would be soley to have a string to give to the output tuple // not needed in the script as it exists in the args. def matcher = (task.ext.args =~ /--dup-file\s+(\S+)/) dup_file = matcher.find() ? matcher[0][1] : "" """ pbmarkdup \\ -j ${task.cpus} \\ $input \\ ${prefix}.${suffix} \\ $args """

Feel free to get another reviewer here, but otherwise you can't capture the dupes file unless you simplify your existing output by removing the suffix, but then you have to deal with a tuple[meta, file, file] in the workflow.

DLBPointon · 2025-11-26T11:23:51Z

modules/nf-core/pbmarkdup/main.nf

+
+    script:
+    def args  = task.ext.args  ?: ''
+


pbmarkdup can take a list of files as input, so i think you may just need to do some double checks to make sure names don't conflict.

Make sure input and output file names don't conflict, which seems like it could happen.

Adapted from CAT_CAT

if(file_list.contains("${prefix}.{suffix}") { error "PBMARKDUP: The name of the input file can't be the same as the output: " + "Change the prefix to avoid conflict." }

As it can take a list of input files, perhaps a check to make sure they are in fact unique files too? However i think this can be optional as it should be checked in the input_check or workflow of the pipeline.

def input_files = [input].collect { file.baseName } if (input_files.size() != input_files.unique().size()) { error "PBMARKDUP: Input files must have unique names. Found duplicates: ${input_files}" + "Check your input reads" }

Thanks @DLBPointon for kindly reviewing this. I have updated the code regarding your reviews, please take a look, thank you!

sainsachiko added 2 commits November 25, 2025 17:28

Add module pbmarkdup

2ad3c79

Merge remote-tracking branch 'upstream/master'

4016d20

sainsachiko force-pushed the master branch from 30855f9 to 31a70cf Compare November 25, 2025 19:18

Fix linting

9239d22

sainsachiko force-pushed the master branch from 31a70cf to 9239d22 Compare November 25, 2025 19:27

Update path to test data

2a75dc8

sainsachiko requested review from DLBPointon and muffato November 26, 2025 09:46

muffato removed their request for review November 26, 2025 10:17

sainsachiko requested a review from mashehu November 26, 2025 11:06

DLBPointon reviewed Nov 26, 2025

View reviewed changes

sainsachiko added 3 commits November 27, 2025 22:09

Update with code review (--dup-file, log, check file name collisions)

4d913d6

Fix linting

1271655

Update path to test data

2a92fde

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add module PBMARKDUP #9457

Add module PBMARKDUP #9457

sainsachiko commented Nov 25, 2025 •

edited

Loading

Uh oh!

DLBPointon left a comment

Uh oh!

DLBPointon Nov 26, 2025

Uh oh!

DLBPointon Nov 26, 2025 •

edited by mashehu

Loading

Uh oh!

DLBPointon Nov 26, 2025

Uh oh!

sainsachiko Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add module PBMARKDUP #9457

Are you sure you want to change the base?

Add module PBMARKDUP #9457

Conversation

sainsachiko commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR checklist

Uh oh!

DLBPointon left a comment

Choose a reason for hiding this comment

Uh oh!

DLBPointon Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

DLBPointon Nov 26, 2025 • edited by mashehu Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DLBPointon Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

sainsachiko Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sainsachiko commented Nov 25, 2025 •

edited

Loading

DLBPointon Nov 26, 2025 •

edited by mashehu

Loading