Skip to content

Conversation

@rroutsong
Copy link
Collaborator

@rroutsong rroutsong commented Dec 5, 2025

Globus + slurm transfer CLI


Easy transfer from biowulf to skyline using this CLI

Features:

  • Automated Globus transfers via Slurm batch jobs with up to 7-day runtime support
  • Fixed endpoint configuration from Biowulf to Skyline (prevents misrouting)
  • Auto-detects file vs directory sources and sets recursive transfer appropriately
  • Path depth validation enforces minimum 4-level depth to prevent accidental high-level deletions
  • Large file count warning alerts when directories contain ≥5,000 files with interactive prompt
  • Pre-flight validation runs before authentication to fast-fail invalid requests
  • OAuth2 Globus authentication with NIH.gov domain restriction and session-based tokens
  • Organized destination paths auto-constructed as /archive/<sanitized_label>/<source_name>
  • Path and job name sanitization removes invalid characters and enforces length limits
  • Optional cleanup mode (--cleanup flag) safely deletes source after verified successful transfer
  • Multi-stage cleanup verification checks Slurm exit code, Globus task status, and transfer output
  • Automatic token revocation after cleanup completes for security
  • Slurm job dependencies ensures cleanup only runs after successful transfer (afterok)
  • Fixed resource allocation (16GB RAM, 4 CPUs, 7-day limit on norm partition)
  • Timestamped script generation prevents filename collisions and provides audit trail
  • Comprehensive logging with separate output/error files for each job
  • Transfer monitoring polls Globus every 30 seconds until completion
  • Cleanup monitoring polls Globus every 10 seconds for deletion status
  • Progress feedback during file counting (updates every 10,000 files)
  • Destination pre-creation attempts to create directories before transfer
  • Detailed status output shows endpoints, paths, UUIDs, and resource settings
  • Interactive prompts for authentication and large file count confirmations
  • Graceful error handling with clear messages and appropriate exit codes
  • Safety-first approach aborts cleanup on any verification failure to preserve source files
  • JSON parsing from Globus CLI using jmespath for reliable value extraction
  • Slurm output parsing extracts task IDs from transfer job output for verification
  • Executable scripts automatically set with proper permissions (0o755)
  • Google-style docstrings on all functions for auto-documentation
  • Command-line help with examples and resource allocation summary
  • Embedded bash scripts with all Slurm directives and Globus commands
  • Working directory management ensures scripts execute in correct location
  • Handles both single files and directories with appropriate transfer flags
  • Status extraction parses Globus JSON responses and strips formatting
  • Minimal dependencies (only requires globus_sdk beyond standard library)
  • Module structure can be imported as library or run as script
  • Audit trail preserves all generated scripts, outputs, and errors with timestamps

@rroutsong rroutsong merged commit b5b16e7 into main Dec 8, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants