Skip to content

Commit 6da7216

Browse files
pinin4fjordsclaude
andauthored
Optimize bbsplit index handling to avoid large file copies (#9447)
* Optimize bbsplit index handling to avoid large file copies Replace full directory copy with symlink-based approach: symlink all index files except summary.txt (which must be copied to apply timestamp fixes). This significantly reduces I/O overhead while preserving the critical timestamp correction functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Simplify bbsplit index handling implementation Refactored the bash script sections for index handling to be more concise and maintainable while preserving all functionality: - Consolidated directory creation and file operations into single find loops - Replaced readlink -f with realpath for better macOS compatibility - Used conditional operators to reduce if/else verbosity - Inlined variables where appropriate to reduce line count These changes improve code clarity and cross-platform compatibility without affecting the underlying logic of file moves, symlink handling, or timestamp management. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>
1 parent c1408b5 commit 6da7216

File tree

1 file changed

+12
-14
lines changed
  • modules/nf-core/bbmap/bbsplit

1 file changed

+12
-14
lines changed

modules/nf-core/bbmap/bbsplit/main.nf

Lines changed: 12 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -68,15 +68,16 @@ process BBMAP_BBSPLIT {
6868
}
6969
"""
7070
71-
# If using a pre-built index, copy it to avoid modifying input files in place,
72-
# then fix timestamps. When we stage in the index files the time stamps get
73-
# disturbed, which bbsplit doesn't like. Fix the time stamps in its summaries.
74-
# This needs to be done via Java to match what bbmap does.
71+
# If using a pre-built index, create writable structure: symlink all files except
72+
# summary.txt (which we copy to modify). When we stage in the index files the time
73+
# stamps get disturbed, which bbsplit doesn't like. Fix the time stamps in summaries.
7574
if [ "$use_index" == "true" ]; then
76-
cp -rL input_index index_writable
77-
78-
for summary_file in \$(find index_writable/ref/genome -name summary.txt); do
79-
# Extract the path from summary.txt and update it to point to index_writable
75+
find input_index/ref -type f | while read -r f; do
76+
target="index_writable/\${f#input_index/}"
77+
mkdir -p "\$(dirname "\$target")"
78+
[[ \$(basename "\$f") == "summary.txt" ]] && cp "\$f" "\$target" || ln -s "\$(realpath "\$f")" "\$target"
79+
done
80+
find index_writable/ref/genome -name summary.txt | while read -r summary_file; do
8081
src=\$(grep '^source' "\$summary_file" | cut -f2- -d\$'\\t' | sed 's|.*/ref/|index_writable/ref/|')
8182
mod=\$(echo "System.out.println(java.nio.file.Files.getLastModifiedTime(java.nio.file.Paths.get(\\"\$src\\")).toMillis());" | jshell -J-Djdk.lang.Process.launchMechanism=vfork -)
8283
sed -e 's|bbsplit_index/ref|index_writable/ref|' -e "s|^last modified.*|last modified\\t\$mod|" "\$summary_file" > \${summary_file}.tmp && mv \${summary_file}.tmp \${summary_file}
@@ -95,14 +96,11 @@ process BBMAP_BBSPLIT {
9596
$args 2>| >(tee ${prefix}.log >&2)
9697
9798
# Summary files will have an absolute path that will make the index
98-
# impossible to use in other processes- we can fix that
99+
# impossible to use in other processes - fix paths and rename atomically
99100
if [ -d bbsplit_build/ref/genome ]; then
100-
for summary_file in \$(find bbsplit_build/ref/genome -name summary.txt); do
101-
src=\$(grep '^source' "\$summary_file" | cut -f2- -d\$'\\t' | sed 's|.*/bbsplit_build|bbsplit_index|')
102-
sed "s|^source.*|source\\t\$src|" "\$summary_file" > \${summary_file}.tmp && mv \${summary_file}.tmp \${summary_file}
101+
find bbsplit_build/ref/genome -name summary.txt | while read -r summary_file; do
102+
sed "s|^source.*|source\\t\$(grep '^source' "\$summary_file" | cut -f2- -d\$'\\t' | sed 's|.*/bbsplit_build|bbsplit_index|')|" "\$summary_file" > \${summary_file}.tmp && mv \${summary_file}.tmp \${summary_file}
103103
done
104-
105-
# Atomically rename the completed index
106104
mv bbsplit_build bbsplit_index
107105
fi
108106

0 commit comments

Comments
 (0)