-
Notifications
You must be signed in to change notification settings - Fork 939
Add module PBMARKDUP #9457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add module PBMARKDUP #9457
Conversation
DLBPointon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
làm tốt lắm!
I just have a few comments to flesh it out more.
It'll mean adding another test, sorry. Feel free to get another opinion though.
| - pbmarkdup: | ||
| description: | | ||
| pbmarkdup identifies and marks duplicate reads in PacBio HiFi (CCS) data. It clusters | ||
| highly similar CCS reads to detect PCR duplicates and flags them in the BAM output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the bit about it being BAM output, as there are 3 formats it can output.
|
|
||
| output: | ||
| tuple val(meta), path("${prefix}.${suffix}"), emit: markduped | ||
| path "versions.yml" , emit: versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the --dup-file dups.fasta flag needs better support, otherwise it won't get caught and output.
Perhaps an output channel, and something to capture the name the user will use in the config?
output:
tuple val(meta), path("${dup_file}"), optional: true, emit: duplicates
script:
// This little chunk would be soley to have a string to give to the output tuple
// not needed in the script as it exists in the args.
def matcher = (task.ext.args =~ /--dup-file\s+(\S+)/)
dup_file = matcher.find() ? matcher[0][1] : ""
"""
pbmarkdup \\
-j ${task.cpus} \\
$input \\
${prefix}.${suffix} \\
$args
"""
Feel free to get another reviewer here, but otherwise you can't capture the dupes file unless you simplify your existing output by removing the suffix, but then you have to deal with a tuple[meta, file, file] in the workflow.
|
|
||
| script: | ||
| def args = task.ext.args ?: '' | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pbmarkdup can take a list of files as input, so i think you may just need to do some double checks to make sure names don't conflict.
Make sure input and output file names don't conflict, which seems like it could happen.
Adapted from CAT_CAT
if(file_list.contains("${prefix}.{suffix}") {
error "PBMARKDUP: The name of the input file can't be the same as the output: " +
"Change the prefix to avoid conflict."
}
As it can take a list of input files, perhaps a check to make sure they are in fact unique files too? However i think this can be optional as it should be checked in the input_check or workflow of the pipeline.
def input_files = [input].collect { file.baseName }
if (input_files.size() != input_files.unique().size()) {
error "PBMARKDUP: Input files must have unique names. Found duplicates: ${input_files}" +
"Check your input reads"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DLBPointon for kindly reviewing this. I have updated the code regarding your reviews, please take a look, thank you!
Add pbmarkdup to identifies and marks duplicate reads in PacBio HiFi (CCS) data.
#9456
Thank you for reviewing!
PR checklist
Closes #XXX
versions.ymlfile.labelnf-core modules test <MODULE> --profile dockernf-core modules test <MODULE> --profile singularitynf-core modules test <MODULE> --profile condanf-core subworkflows test <SUBWORKFLOW> --profile dockernf-core subworkflows test <SUBWORKFLOW> --profile singularitynf-core subworkflows test <SUBWORKFLOW> --profile conda