Skip to content

Conversation

@cmeesters
Copy link
Collaborator

@cmeesters cmeesters commented Nov 17, 2025

Summary by CodeRabbit

  • Chores
    • Reduced persistent intermediate artifacts by marking several pipeline outputs as temporary to lower disk usage and improve automatic cleanup.
  • Bug Fixes
    • Improved robustness of an unzip step by adding error handling and logging; failures now produce a safe empty output and a logged warning.
  • Style
    • Cosmetic formatting and logging-message improvements for clearer pipeline logs.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 17, 2025

Walkthrough

Intermediate outputs in preparation and docking Snakemake rules were marked temporary by wrapping rule output entries with temp(), and prepare_docking_ligand was adjusted to write its copied file to output[0] to match the temporary list-like output.

Changes

Cohort / File(s) Summary
Preparation rules
workflow/rules/preparation.smk
Wrapped outputs of makeReceptorPDBQT, prepareGeometry, and prepareDocking with temp(...); added try/except around gunzip to log exceptions and produce an empty output on failure; minor formatting changes.
Docking rules
workflow/rules/docking.smk
Wrapped outputs for prepare_docking_ligand and mergeDocking with temp(...); changed copy target in prepare_docking_ligand to output[0] to accommodate temporary/list-like outputs; minor formatting/message tweaks.
Analysis rules (formatting only)
workflow/rules/analyse.smk
Cosmetic reformatting and multiline wrapping of arguments/strings with no functional changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Homogeneous changes (mostly wrapping outputs in temp()).
  • Pay attention to the prepare_docking_ligand copy target change (output[0]) and the gunzip exception handling for correctness and logging behavior.

Possibly related PRs

  • fix: profile and localizing #7 — Modifies the same Snakemake rules (workflow/rules/docking.smk and workflow/rules/preparation.smk) and includes related changes to temp() output handling.

Poem

🐰 I hopped through rules both neat and small,

temp() wrapped outputs, I watched them fall,
Copies pointed true to output[0] fair,
Logs catch the stumble, empty files prepared,
A tidy trail of files — cleaned with care.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: more output files labelled temp' directly reflects the main changes across multiple Snakemake files, where temporary file declarations via temp(...) are added to several rule outputs.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/tempfiles

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8f2d1c and 83b37a8.

📒 Files selected for processing (3)
  • workflow/rules/analyse.smk (10 hunks)
  • workflow/rules/docking.smk (6 hunks)
  • workflow/rules/preparation.smk (5 hunks)
🔇 Additional comments (9)
workflow/rules/analyse.smk (3)

12-18: LGTM! Formatting improvements enhance readability.

The multiline formatting of localrules, logger.warning, list comprehensions, and function calls improves code readability without changing behavior.

Also applies to: 57-59, 95-97, 137-157, 204-206


325-331: Flag inconsistency in AI-generated summary.

The AI summary states "all changes are formatting-related and do not modify runtime behavior," but the addition of temp() at lines 325-331 is a functional change that affects file lifecycle management. This output directory will now be automatically cleaned up by Snakemake.


325-331: Temporary directory handling is correct and properly integrated.

The verification confirms that the temp(directory(...)) wrapper on the split2 checkpoint output is correctly implemented. The downstream checkpoint aggregation via aggregate_in2() (line 403: checkpoints.split2.get(**wildcards).output[0]) properly handles the temporary directory, and both prepareLigands2 and mergeDocking2 rules consume the output before cleanup occurs. This follows the standard Snakemake pattern for temporary intermediate outputs.

workflow/rules/docking.smk (3)

1-4: LGTM! Formatting improvements enhance readability.

The multiline formatting of localrules, message strings, lambda functions, and shell commands improves code readability without changing behavior.

Also applies to: 48-48, 51-51, 110-112, 116-116, 123-123


42-43: LGTM! Intermediate docking preparation files correctly marked as temporary.

The receptor and geometry files are copies created for each docking job and are correctly marked as temporary for automatic cleanup after downstream consumption.


144-150: LGTM! Merged docking output correctly marked as temporary.

The merged docking results are intermediate files that are consumed by downstream analysis rules and are appropriately marked as temporary for automatic cleanup.

workflow/rules/preparation.smk (3)

35-38: LGTM! Formatting improvements enhance readability.

The multiline formatting of log and message strings improves code readability without changing behavior.


125-125: No issues found—temp() marking on receptor PDBQT is safe.

Verification confirms that only two rules consume rules.makeReceptorPDBQT.output: prepareDocking (line 256) and docking (line 39). Snakemake's DAG ensures that both rules execute before the temporary file is cleaned up, so marking the receptor PDBQT file as temporary is correct.


220-220: Temp file marking verified as safe for downstream docking pipeline.

Verification confirms the temporary file declarations are safe:

  • prepareGeometry temp file (grid/{receptorID}_grid.txt): Properly integrated with explicit downstream dependency via prepare_docking_local rule, which takes it as input and copies it to the docking directory before further processing.
  • prepareDocking temp file (receptor/{receptorID}.txt): No downstream dependencies found, making the temp() marking harmless (file is unused throughout the workflow).

Snakemake resolves path-based dependencies correctly and ensures all rules consuming these files complete before cleanup.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
workflow/rules/docking.smk (2)

66-71: Critical syntax error: Missing closing parenthesis for temp().

The temp() call is missing its closing parenthesis. Additionally, the comma after line 71 suggests this might have been intended as a named output, but the output name is missing.

Apply this diff to fix the syntax:

     output:
-        temp(path.join(
+        temp(
+            path.join(
             "docking",
             "{receptorID}",
             "{dataset}",
             "{database}_{dataset}_{name}_{i}.txt",
-        ),
+            )
+        ),

Note: If this was intended to be a named output, you'll need to add the name, e.g., lig=temp(path.join(...)).


136-146: The merged docking output is orphaned and will be deleted.

The verification confirms the original concern: mergeDocking produces temporary output at docking/{receptorID}/{receptorID}_{database}_{dataset}_{name}.pdbqt.gz, but no downstream rules consume this path. The workflow's final outputs (in results/) are generated independently by rules in analyse.smk, not from the merged docking files. With temp() marking, these merged results will be automatically deleted after the workflow completes.

Either remove the temp() wrapper if users need to keep merged docking results, or delete the mergeDocking rule entirely if it serves no purpose.

workflow/rules/preparation.smk (1)

252-260: Remove temp() wrapper from prepareDocking output or ensure this rule's output is actually consumed.

Verification confirms this temporary file is never consumed downstream. The docking.smk rule prepare_docking_local takes its receptor input directly from rules.makeReceptorPDBQT.output, not from the receptor/{receptorID}.txt file created by prepareDocking. Since no downstream rule references this output, marking it as temporary will cause Snakemake to delete it prematurely. Either remove the temp() wrapper if this is a persistent metadata file, or remove the rule entirely if it's unused.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5c8a5e3 and b8f2d1c.

📒 Files selected for processing (2)
  • workflow/rules/docking.smk (3 hunks)
  • workflow/rules/preparation.smk (3 hunks)
🔇 Additional comments (2)
workflow/rules/preparation.smk (2)

120-133: LGTM! Appropriate use of temp() for intermediate receptor files.

Marking the receptor PDBQT file as temporary is a good practice for reducing disk space usage. Snakemake will automatically preserve this file until all downstream rules (like prepareDocking and rules in docking.smk) have consumed it.


214-239: LGTM! Grid geometry files appropriately marked as temporary.

The grid geometry file is an intermediate artifact that can be safely removed after dependent rules consume it. This change aligns well with the PR's objective of reducing disk space usage.

os.makedirs(outdir, exist_ok=True)

shutil.copy(input.ligands, output.lig)
shutil.copy(input.ligands, output[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Output reference change is correct, but breaks downstream rule.

The change from output.lig to output[0] is correct since temp() returns a positional output. However, this breaks the docking rule at line 94, which still references rules.prepare_docking_ligand.output.lig.

Apply this diff to fix the broken reference:

     input:
         rec=rules.prepare_docking_local.output[0],
         geo=rules.prepare_docking_local.output[1],
-        lig=rules.prepare_docking_ligand.output.lig,
+        lig=rules.prepare_docking_ligand.output[0],

Alternatively, if you prefer named outputs for clarity, modify the output definition at lines 66-71 to:

    output:
        lig=temp(
            path.join(
                "docking",
                "{receptorID}",
                "{dataset}",
                "{database}_{dataset}_{name}_{i}.txt",
            )
        ),

And keep line 94 as is. This approach provides better readability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants