diff --git a/.github/workflows/README.md b/.github/workflows/README.md new file mode 100644 index 0000000..9df9834 --- /dev/null +++ b/.github/workflows/README.md @@ -0,0 +1,185 @@ +# GitHub Actions Documentation Deployment Workflow + +## Overview + +The `.github/workflows/docs.yml` workflow automatically builds and deploys MkDocs Material documentation to GitHub Pages when changes are merged into the `main` branch. + +## Trigger Conditions + +The workflow runs when: + +1. **Manual trigger**: Via the Actions tab (`workflow_dispatch`) +2. **Automatic trigger**: Push to `main` branch when any of these paths change: + - `docs/**` - Documentation files + - `bin/**` - Source code with docstrings (e.g., `slurm_transfer.py`) + - `src/**` - Additional source code + - `mkdocs.yml` - MkDocs configuration file + +## What the Workflow Does + +### Step 1: Checkout Repository +```yaml +uses: actions/checkout@v4 +with: + fetch-depth: 0 +``` +- Checks out the repository with full history +- `fetch-depth: 0` ensures all git history is available (required for git revision date plugin) + +### Step 2: Set Up Python +```yaml +uses: actions/setup-python@v5 +with: + python-version: '3.11' + cache: 'pip' +``` +- Installs Python 3.11 +- Enables pip caching for faster builds + +### Step 3: Install Dependencies +```bash +pip install --upgrade pip +pip install -r docs/requirements.txt +``` +- Installs all required packages including: + - mkdocs-material + - mkdocstrings + - mkdocstrings-python + - All other documentation dependencies + +### Step 4: Configure Git +```bash +git config user.name github-actions[bot] +git config user.email github-actions[bot]@users.noreply.github.com +``` +- Configures Git for the gh-deploy command +- Uses GitHub Actions bot identity + +### Step 5: Deploy Documentation +```bash +mkdocs gh-deploy --force --clean --verbose +``` +- Builds the documentation +- Deploys to the `gh-pages` branch +- `--force`: Overwrites existing deployment +- `--clean`: Removes old files not in current build +- `--verbose`: Detailed output for debugging + +## Permissions + +```yaml +permissions: + contents: write +``` + +The workflow has `contents: write` permission, which allows it to: +- Push to the `gh-pages` branch +- Update GitHub Pages deployment + +## GitHub Pages Setup + +To enable the documentation site: + +1. Go to repository **Settings** → **Pages** +2. Under "Build and deployment": + - **Source**: Deploy from a branch + - **Branch**: `gh-pages` / `(root)` +3. Click **Save** + +The site will be available at: `https://openomics.github.io/arc/` + +## Viewing Deployment Status + +Monitor deployment in several places: + +### 1. Actions Tab +- View workflow runs: `https://github.com/OpenOmics/arc/actions` +- See build logs and any errors +- Check deployment status + +### 2. Commits +- Successful deployments show a ✓ next to the commit +- Click the checkmark to see workflow details + +### 3. Environments +- Go to repository main page +- Look for "Environments" in the right sidebar +- Shows active deployments and URLs + +## Testing Locally + +Before pushing changes, test documentation locally: + +```bash +# Install dependencies +pip install -r docs/requirements.txt + +# Serve locally with live reload +mkdocs serve + +# Build to verify no errors +mkdocs build --strict +``` + +Access local docs at: http://127.0.0.1:8000 + +## Troubleshooting + +### Build Fails on Missing Dependencies + +**Problem**: Workflow fails with "ModuleNotFoundError" + +**Solution**: Ensure all required packages are in `docs/requirements.txt` + +### Documentation Not Updating + +**Problem**: Changes pushed but site doesn't update + +**Check**: +1. Did changes affect monitored paths? +2. Was push to `main` branch? +3. Check Actions tab for errors +4. Verify GitHub Pages is enabled + +### mkdocstrings Import Errors + +**Problem**: Cannot import modules for docstring extraction + +**Solution**: Ensure `bin/__init__.py` exists and modules are importable + +### Permission Denied + +**Problem**: Workflow fails with permission errors + +**Solution**: +1. Verify `permissions: contents: write` is set +2. Check repository Settings → Actions → General → Workflow permissions +3. Ensure "Read and write permissions" is selected + +## Manual Deployment + +To manually trigger deployment: + +1. Go to **Actions** tab +2. Select **docs** workflow +3. Click **Run workflow** +4. Select `main` branch +5. Click **Run workflow** + +## Workflow Updates + +When updating the workflow: + +1. Edit `.github/workflows/docs.yml` +2. Test changes on a feature branch first +3. Merge to main to activate new workflow + +## Auto-Documentation Feature + +The workflow automatically extracts and renders: +- Function docstrings from `bin/slurm_transfer.py` +- Any other Python modules in `bin/` and `src/` +- Google-style docstrings → formatted HTML +- Type hints → displayed in signatures + +Changes to Python source code docstrings will automatically update the documentation when merged to main! diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml index f0c554b..6ce187c 100644 --- a/.github/workflows/docs.yml +++ b/.github/workflows/docs.yml @@ -1,18 +1,43 @@ name: docs + on: workflow_dispatch: push: + branches: + - main paths: - 'docs/**' + - 'bin/**' + - 'src/**' + - 'mkdocs.yml' + +permissions: + contents: write jobs: deploy: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v2 - - uses: actions/setup-python@v2 + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 0 + + - name: Set up Python + uses: actions/setup-python@v5 with: - python-version: 3.9 - - run: pip install --upgrade pip - - run: pip install -r docs/requirements.txt - - run: mkdocs gh-deploy --force + python-version: '3.11' + cache: 'pip' + + - name: Install dependencies + run: | + pip install --upgrade pip + pip install -r docs/requirements.txt + + - name: Configure Git + run: | + git config user.name github-actions[bot] + git config user.email github-actions[bot]@users.noreply.github.com + + - name: Deploy documentation + run: mkdocs gh-deploy --force --clean --verbose diff --git a/bin/__init__.py b/bin/__init__.py new file mode 100644 index 0000000..5668f11 --- /dev/null +++ b/bin/__init__.py @@ -0,0 +1 @@ +"""Bin module containing utility scripts for arc.""" diff --git a/bin/slurm_transfer.py b/bin/slurm_transfer.py new file mode 100755 index 0000000..51b3490 --- /dev/null +++ b/bin/slurm_transfer.py @@ -0,0 +1,714 @@ +#!/usr/bin/env python3 +# -*- coding: UTF-8 -*- +"""Slurm-based Globus Transfer Script. + +This script submits Slurm jobs to perform Globus transfers from Biowulf to Skyline +with optional cleanup. It generates Slurm batch scripts and submits them using sbatch. + +Resource Allocation: +- Partition: norm +- Memory: 16g +- CPUs: 4 +- Time: 7 days +""" + +import argparse +import sys +import os +import subprocess +import re +from datetime import datetime +from textwrap import dedent +from globus_sdk import NativeAppAuthClient + +SOURCE_ENDPOINT = 'e2620047-6d04-11e5-ba46-22000b92c6ec' +DEST_ENDPOINT = 'e2dff37b-3468-4c3a-b5d7-5a00802e20ab' +SOURCE_ALIAS = 'biowulf' +DEST_ALIAS = 'skyline' +DEST_BASE_DIR = '/rtb_idss/starfish/archive' +SLURM_PARTITION = 'norm' +SLURM_MEMORY = '16g' +SLURM_TIME = '7-00:00:00' +SLURM_CPUS = '4' +CLIENT_ID = "786b3f1d-d775-4631-8b31-6b2e7fbc9897" +PREFERRED_LOG_BASE = '/data/OpenOmics/dev/globus' +FALLBACK_LOG_BASE = os.path.expanduser('~/.globus') + + +def get_log_base_directory(): + """Determine the base directory for Slurm logs. + + Checks if the preferred directory (/data/OpenOmics/dev/globus/) is writable. + If not accessible or writable, falls back to ~/.globus in the user's home directory. + + Returns: + Path to the base directory for storing Slurm logs and job scripts. + """ + # Try preferred directory first + if os.path.exists(PREFERRED_LOG_BASE): + if os.access(PREFERRED_LOG_BASE, os.W_OK): + return PREFERRED_LOG_BASE + else: + print(f"Warning: {PREFERRED_LOG_BASE} exists but is not writable.") + print(f"Falling back to {FALLBACK_LOG_BASE}") + else: + print(f"Warning: {PREFERRED_LOG_BASE} does not exist.") + print(f"Falling back to {FALLBACK_LOG_BASE}") + + # Use fallback directory + os.makedirs(FALLBACK_LOG_BASE, exist_ok=True) + return FALLBACK_LOG_BASE + + +def sanitize_directory_name(name): + """Sanitize a string for use as a directory name. + + Removes or replaces invalid filesystem characters with underscores, + strips leading/trailing spaces and periods, and enforces a maximum length. + + Args: + name: The name to sanitize. Typically from argparse `--label`. + + Returns: + Sanitized directory name suitable for filesystem use. + """ + invalid_chars_pattern = r'[\\/:*?"<>|]' + sanitized_name = re.sub(invalid_chars_pattern, '_', name) + sanitized_name = sanitized_name.strip(' .') + max_length = 250 + if len(sanitized_name) > max_length: + sanitized_name = sanitized_name[:max_length] + return sanitized_name + + +def sanitize_job_name(name): + """Sanitize a string for use as a Slurm job name. + + More restrictive than directory names. Replaces spaces and invalid characters + with underscores, removes leading/trailing underscores and periods, and collapses + consecutive underscores. + + Args: + name: The name to sanitize. Typically from argparse `--label`. + + Returns: + Sanitized job name suitable for Slurm. + """ + invalid_chars_pattern = r'[\\/:*?"<>|\s]' + sanitized_name = re.sub(invalid_chars_pattern, '_', name) + sanitized_name = sanitized_name.strip('_.') + sanitized_name = re.sub(r'_+', '_', sanitized_name) + return sanitized_name + + +def authenticate_globus(): + """Authenticate with Globus and obtain access tokens. + + Initiates OAuth2 flow for Globus authentication with extended session-based tokens. + Requires NIH.gov domain authentication and requests refresh tokens for long-running jobs. + + Returns: + Tuple of (access_token, refresh_token, expires_at) on success, + or (None, None, None) on failure. + """ + print("\n" + "=" * 70) + print("Globus Authentication Required") + print("=" * 70) + + client = NativeAppAuthClient(CLIENT_ID) + client.oauth2_start_flow( + requested_scopes=[ + "urn:globus:auth:scope:transfer.api.globus.org:all", + f"urn:globus:auth:scope:transfer.api.globus.org:all[*https://auth.globus.org/scopes/{DEST_ENDPOINT}/data_access *https://auth.globus.org/scopes/{SOURCE_ENDPOINT}/data_access]" + ], + refresh_tokens=True + ) + + authorize_url = client.oauth2_get_authorize_url( + query_params={ + "session_required_single_domain": "nih.gov", + "session_message": "Please authenticate for extended access" + } + ) + + print(f"\nPlease visit this URL to authenticate:\n{authorize_url}\n") + + try: + auth_code = input("Enter the authorization code: ").strip() + except (KeyboardInterrupt, EOFError): + print("\nAuthentication cancelled.") + return None, None, None + + if not auth_code: + print("Error: No authorization code provided.") + return None, None, None + + try: + token_response = client.oauth2_exchange_code_for_tokens(auth_code) + transfer_tokens = token_response.by_resource_server["transfer.api.globus.org"] + access_token = transfer_tokens["access_token"] + refresh_token = transfer_tokens.get("refresh_token") + expires_at = transfer_tokens.get("expires_at_seconds") + + print("\nAuthentication successful!") + print(f"Access token expires at: {expires_at}") + print("=" * 70 + "\n") + + return access_token, refresh_token, expires_at + except Exception as e: + print(f"\nError during authentication: {e}", file=sys.stderr) + return None, None, None + + +def validate_source_path(source_path): + """Validate source path meets safety requirements. + + Performs pre-flight checks to ensure the source path is safe to transfer: + 1. Path must be at least 4 levels deep to prevent accidental high-level transfers + 2. Warns if directory contains >= 5000 files (potential cruft) + + Args: + source_path: Source file or directory path. From argparse `--source`. + + Raises: + ValueError: If path is not at least 4 levels deep. + SystemExit: If user declines to proceed with large file count. + """ + # Check path depth (must be at least 4 nodes deep) + # Example: /data/user/project/subdir = 4 nodes + path_parts = [p for p in source_path.rstrip('/').split('/') if p] + if len(path_parts) < 4: + raise ValueError( + f"Error: Source path '{source_path}' is not deep enough.\n" + f"Path has {len(path_parts)} level(s), but must be at least 4 levels deep.\n" + f"This safety check prevents accidental transfer/deletion of high-level directories.\n" + f"Example of valid path: /data/username/project/dataset (4 levels)" + ) + + # Check file count if source is a directory + if os.path.isdir(source_path): + print("\nScanning source directory for file count...") + file_count = 0 + for root, dirs, files in os.walk(source_path): + file_count += len(files) + # Provide progress feedback for large directories + if file_count % 10000 == 0: + print(f" Scanned {file_count} files so far...") + + print(f"Total files found: {file_count:,}") + + if file_count >= 5000: + print("\n" + "!" * 70) + print("WARNING: Large Number of Files Detected") + print("!" * 70) + print(f"The source directory contains {file_count:,} files.") + print("\nThis is a significant number of files to transfer. Please consider:") + print(" - Removing temporary files (*.tmp, *.temp)") + print(" - Cleaning up intermediate pipeline outputs") + print(" - Removing log files or consolidating them") + print(" - Archiving or removing old working directories") + print("\nTransferring many small files can be slow and may impact") + print("storage quotas and file system performance.") + print("!" * 70) + + response = input("\nDo you want to proceed with the transfer anyway? (yes/no): ").strip().lower() + if response not in ['yes', 'y']: + print("\nTransfer cancelled. Please clean up unnecessary files and try again.") + sys.exit(0) + print("\nProceeding with transfer...") + + +def construct_dest_path(source_path, label): + """Construct destination path from source and label. + + Creates destination path structure: DEST_BASE_DIR/sanitized_label/source_basename. + + Args: + source_path: Source file or directory path. From argparse `--source`. + label: Transfer label used for directory naming. From argparse `--label`. + + Returns: + Full destination path for the transfer. + """ + source_bn = os.path.basename(source_path.rstrip('/')) + sanitized_label = sanitize_directory_name(label) + return os.path.join(DEST_BASE_DIR, sanitized_label, source_bn) + + +def create_transfer_script(source_endpoint, dest_endpoint, source_path, dest_path, + label, access_token, recursive=False): + """Create bash script for Globus transfer Slurm job. + + Generates a complete Slurm batch script that authenticates with Globus, + submits a transfer task, monitors completion, and reports final status. + + Args: + source_endpoint: Source endpoint UUID. + dest_endpoint: Destination endpoint UUID. + source_path: Source file/directory path. From argparse `--source`. + dest_path: Destination file/directory path. + label: Transfer task label. From argparse `--label`. + access_token: Globus access token for authentication. + recursive: Whether to transfer recursively (auto-detected for directories). + + Returns: + Complete bash script content with Slurm directives. + """ + recursive_flag = "--recursive \\\n " if recursive else "" + + script = dedent(f"""\ +#!/bin/bash +#SBATCH --partition={SLURM_PARTITION} +#SBATCH --time={SLURM_TIME} +#SBATCH --mem={SLURM_MEMORY} +#SBATCH --cpus-per-task={SLURM_CPUS} + +echo "Starting Globus transfer job" +echo "Label: {label}" +echo "Source: {source_endpoint}:{source_path}" +echo "Destination: {dest_endpoint}:{dest_path}" +echo "Started at: $(date)" + +export GLOBUS_CLI_ACCESS_TOKEN="{access_token}" + +echo "Ensuring destination directory exists..." +globus mkdir {dest_endpoint}:{dest_path} 2>/dev/null || echo "Directory already exists or cannot be created (will attempt transfer anyway)" + +TASK_ID=$(globus transfer \\ + {source_endpoint}:{source_path} \\ + {dest_endpoint}:{dest_path} \\ + --label "{label}" \\ + {recursive_flag}-F json \\ + --jmespath 'task_id') + +if [ -z "$TASK_ID" ]; then + echo "Error: Failed to submit transfer" + exit 1 +fi + +TASK_ID=${{TASK_ID#\\"}} +TASK_ID=${{TASK_ID%\\"}} +TASK_ID=$(echo "$TASK_ID" | xargs) + +echo "Transfer task ID: $TASK_ID" + +echo "Waiting for transfer to complete..." +globus task wait "$TASK_ID" --polling-interval 30 + +STATUS=$(globus task show "$TASK_ID" -F json --jmespath 'status') +STATUS=${{STATUS#\\"}} +STATUS=${{STATUS%\\"}} +STATUS=$(echo "$STATUS" | xargs) +echo "Transfer status: $STATUS" +echo "Completed at: $(date)" + +if [ "$STATUS" = "SUCCEEDED" ]; then + echo "Transfer completed successfully" + exit 0 +else + echo "Transfer failed or was cancelled" + exit 1 +fi +""") + return script + + +def create_cleanup_script(endpoint, paths, access_token, recursive=False): + """Create bash script for cleanup Slurm job. + + Generates a Slurm batch script that verifies transfer success via Slurm job + dependency and Globus task status, then deletes source files and revokes the + access token. Only executes if the transfer job succeeded. + + Args: + endpoint: Endpoint UUID where files should be deleted. + paths: List of paths to delete. Currently supports single path only. + access_token: Globus access token for authentication. + recursive: Whether to delete recursively (auto-detected for directories). + + Returns: + Complete bash script content with Slurm directives. + + Raises: + NotImplementedError: If multiple paths are provided (batch mode not implemented). + """ + recursive_flag = "--recursive \\\n" if recursive else "" + + if len(paths) == 1: + delete_target = f'{endpoint}:"{paths[0]}"' + else: + raise NotImplementedError("Batch deletion of multiple paths not yet implemented") + + script = dedent(f"""\ +#!/bin/bash +#SBATCH --partition={SLURM_PARTITION} +#SBATCH --time={SLURM_TIME} +#SBATCH --mem={SLURM_MEMORY} +#SBATCH --cpus-per-task={SLURM_CPUS} + +echo "Starting cleanup job" +echo "Endpoint: {endpoint}" +echo "Paths to delete: {' '.join(paths)}" +echo "Started at: $(date)" + +export GLOBUS_CLI_ACCESS_TOKEN="{access_token}" + +TRANSFER_JOB_ID=$(echo "$SLURM_JOB_DEPENDENCY" | grep -oP 'afterok:\\K[0-9]+' | head -n1) + +if [ -z "$TRANSFER_JOB_ID" ]; then + echo "Error: Could not determine transfer job ID from dependency" + exit 1 +fi + +echo "Transfer job ID: $TRANSFER_JOB_ID" + +TRANSFER_OUTPUT=$(scontrol show job "$TRANSFER_JOB_ID" | grep -oP 'StdOut=\\K[^ ]+') + +if [ -z "$TRANSFER_OUTPUT" ] || [ ! -f "$TRANSFER_OUTPUT" ]; then + echo "Error: Cannot find transfer job output file: $TRANSFER_OUTPUT" + exit 1 +fi + +echo "Transfer job output file: $TRANSFER_OUTPUT" + +TRANSFER_EXIT_CODE=$(scontrol show job "$TRANSFER_JOB_ID" | grep -oP 'ExitCode=\\K[0-9]+:[0-9]+' | cut -d: -f1) + +if [ "$TRANSFER_EXIT_CODE" != "0" ]; then + echo "Error: Transfer job failed with exit code: $TRANSFER_EXIT_CODE" + echo "Aborting cleanup to preserve source files." + exit 1 +fi + +GLOBUS_TASK_ID=$(grep "Transfer task ID:" "$TRANSFER_OUTPUT" | tail -n1 | awk '{{print $NF}}') + +if [ -z "$GLOBUS_TASK_ID" ]; then + echo "Error: Could not extract Globus task ID from transfer output" + exit 1 +fi + +echo "Globus task ID: $GLOBUS_TASK_ID" + +echo "Verifying Globus transfer status..." +GLOBUS_STATUS=$(globus task show "$GLOBUS_TASK_ID" -F json --jmespath 'status') +GLOBUS_STATUS=${{GLOBUS_STATUS#\\"}} +GLOBUS_STATUS=${{GLOBUS_STATUS%\\"}} +GLOBUS_STATUS=$(echo "$GLOBUS_STATUS" | xargs) + +if [ "$GLOBUS_STATUS" != "SUCCEEDED" ]; then + echo "Error: Globus transfer status is not SUCCEEDED: $GLOBUS_STATUS" + echo "Aborting cleanup to preserve source files." + exit 1 +fi + +echo "Transfer verified as successful. Proceeding with cleanup..." + +TASK_ID=$(globus delete \\ + --label "Cleanup after transfer" \\ + {recursive_flag}-F json \\ + --jmespath 'task_id' \\ + {delete_target}) + +if [ -z "$TASK_ID" ]; then + echo "Error: Failed to submit delete task" + exit 1 +fi + +TASK_ID=${{TASK_ID#\\"}} +TASK_ID=${{TASK_ID%\\"}} +TASK_ID=$(echo "$TASK_ID" | xargs) +echo "Delete task ID: $TASK_ID" + +echo "Waiting for deletion to complete..." +globus task wait "$TASK_ID" --polling-interval 10 + +STATUS=$(globus task show "$TASK_ID" -F json --jmespath 'status') +STATUS=${{STATUS#\\"}} +STATUS=${{STATUS%\\"}} +STATUS=$(echo "$STATUS" | xargs) +echo "Deletion status: $STATUS" +echo "Completed at: $(date)" + +if [ "$STATUS" = "SUCCEEDED" ]; then + echo "Cleanup completed successfully" + + echo "Revoking Globus access token..." + globus session revoke --force 2>/dev/null || echo "Token revocation completed (or already revoked)" + echo "Token revoked successfully" + + exit 0 +else + echo "Cleanup failed" + exit 1 +fi +""") + return script + + +def submit_slurm_job(script_content, job_name, dependency=None, log_dir=None): + """Submit a Slurm job using sbatch. + + Writes script content to a timestamped file, adds Slurm directives for job name, + output/error files, working directory, and optional job dependency, then submits + via sbatch command. + + Args: + script_content: Content of the bash script to execute. + job_name: Name for the Slurm job. + dependency: Job ID to depend on (afterok dependency). + log_dir: Directory for job scripts and output files. If None, uses current directory. + + Returns: + Job ID of submitted job, or None on failure. + """ + if log_dir is None: + log_dir = "." + + # Ensure log directory exists + os.makedirs(log_dir, exist_ok=True) + + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + script_filename = os.path.join(log_dir, f"{job_name}_{timestamp}.sh") + output_file = os.path.join(log_dir, f"{job_name}_{timestamp}.out") + error_file = os.path.join(log_dir, f"{job_name}_{timestamp}.err") + + if "#SBATCH" not in script_content: + slurm_header = dedent(f"""#!/bin/bash + #SBATCH --job-name={job_name} + #SBATCH --output={output_file} + #SBATCH --error={error_file} + #SBATCH --chdir={os.path.abspath(log_dir)} + """) + if dependency is not None: + slurm_header += f"#SBATCH --dependency=afterok:{dependency}\n" + script_content = slurm_header + "\n" + script_content + else: + lines = script_content.split('\n') + insert_index = 1 + for i, line in enumerate(lines): + if line.strip().startswith('#SBATCH'): + insert_index = i + 1 + additional_directives = [ + f"#SBATCH --job-name={job_name}", + f"#SBATCH --output={output_file}", + f"#SBATCH --error={error_file}", + f"#SBATCH --chdir={os.path.abspath(log_dir)}" + ] + if dependency is not None: + additional_directives.append(f"#SBATCH --dependency=afterok:{dependency}") + lines[insert_index:insert_index] = additional_directives + script_content = '\n'.join(lines) + + with open(script_filename, 'w') as f: + f.write(script_content) + os.chmod(script_filename, 0o755) + + try: + result = subprocess.run( + ['sbatch', script_filename], + capture_output=True, + text=True, + check=True + ) + output = result.stdout.strip() + job_id = None + if "Submitted batch job" in output: + job_id = int(output.split()[-1]) + else: + try: + job_id = int(output) + except ValueError: + job_id = None + print(f"Submitted job {job_name} with ID: {job_id}") + print(f"Script: {script_filename}") + print(f"Output: {output_file}") + print(f"Error: {error_file}") + return job_id + except subprocess.CalledProcessError as e: + print(f"Error submitting job {job_name}: {e}", file=sys.stderr) + print(f"sbatch output: {e.stdout}", file=sys.stderr) + print(f"sbatch error: {e.stderr}", file=sys.stderr) + return None + except Exception as e: + print(f"Error submitting job {job_name}: {e}", file=sys.stderr) + return None + + +def main(): + """Main function orchestrating Globus transfer workflow. + + Parses command-line arguments, authenticates with Globus, detects source type + (file or directory), constructs destination path, and submits Slurm jobs for + transfer and optional cleanup. + """ + parser = argparse.ArgumentParser( + description=f'Submit Slurm jobs for Globus transfers from Biowulf to Skyline with optional cleanup\n\n' + f'Destination: {DEST_BASE_DIR}/\n' + f'Log Directory: {PREFERRED_LOG_BASE}/ (or {FALLBACK_LOG_BASE}/)\n\n' + f'Resource Allocation:\n' + f' Partition: {SLURM_PARTITION}\n' + f' Memory: {SLURM_MEMORY}\n' + f' CPUs: {SLURM_CPUS}\n' + f' Time: {SLURM_TIME}', + formatter_class=argparse.RawDescriptionHelpFormatter, + epilog=dedent(""" + This script transfers files from Biowulf to Skyline using Slurm-managed Globus transfers. + The destination path is automatically constructed as: /data/rtb_idss/starfish/archive/ + + Log files (job scripts, stdout, stderr) are stored in a subdirectory named after the sanitized + label. The script first attempts to use /data/OpenOmics/dev/globus/ as the base directory. + If this location is not writable, it falls back to ~/.globus in your home directory. + + The script automatically detects whether the source is a file or directory and sets + the recursive flag accordingly. Destination directories are created automatically. + + Examples: + # Transfer a directory from Biowulf to Skyline + %(prog)s --source /data/user/project --label "Project Archive" + + # Transfer with cleanup (delete source after successful transfer) + %(prog)s --source /data/user/project --label "Project Archive" --cleanup + + # Transfer a single file + %(prog)s --source /data/user/file.txt --label "File backup" + """) + ) + + parser.add_argument( + '--source', + required=True, + help=f'Source path on {SOURCE_ALIAS} (Biowulf /data/)' + ) + + parser.add_argument( + '--label', + required=True, + help='Label for the transfer task (used to construct destination directory name)' + ) + + parser.add_argument( + '--cleanup', + action='store_true', + help='Submit a cleanup job to delete source files after successful transfer' + ) + + args = parser.parse_args() + + # Pre-flight checks on source path + source_path = os.path.abs(args.source) + if not os.path.exists(source_path): + print(f"Error: Source path '{source_path}' does not exist or is not accessible.", file=sys.stderr) + sys.exit(1) + + try: + validate_source_path(source_path) + except ValueError as e: + print(str(e), file=sys.stderr) + sys.exit(1) + + # Authenticate with Globus + access_token, refresh_token, expires_at = authenticate_globus() + + if not access_token: + print("Error: Authentication failed. Cannot proceed with transfer.", file=sys.stderr) + sys.exit(1) + + # Set up log directory structure + log_base = get_log_base_directory() + sanitized_label_dir = sanitize_directory_name(args.label) + log_dir = os.path.join(log_base, sanitized_label_dir) + + # Create log directory if it doesn't exist + try: + os.makedirs(log_dir, exist_ok=True) + print(f"\nLog directory: {log_dir}") + except OSError as e: + print(f"Error: Could not create log directory '{log_dir}': {e}", file=sys.stderr) + sys.exit(1) + + if os.path.isdir(source_path): + recursive = True + elif os.path.isfile(source_path): + recursive = False + else: + print(f"Error: Source path '{source_path}' does not exist or is not accessible.", file=sys.stderr) + sys.exit(1) + + dest_path = construct_dest_path(args.source, args.label) + + print("=" * 70) + print("Globus Transfer Job Submission (Biowulf -> Skyline)") + print("=" * 70) + print(f"Source: {SOURCE_ALIAS} ({SOURCE_ENDPOINT}):{args.source}") + print(f"Source Type: {'Directory' if recursive else 'File'}") + print(f"Destination: {DEST_ALIAS} ({DEST_ENDPOINT}):{dest_path}") + print(f"Label: {args.label}") + print(f"Recursive: {recursive}") + print(f"Cleanup: {args.cleanup}") + print(f"Partition: {SLURM_PARTITION}") + print(f"Memory: {SLURM_MEMORY}") + print(f"CPUs: {SLURM_CPUS}") + print(f"Time: {SLURM_TIME}") + print("=" * 70) + + transfer_script = create_transfer_script( + SOURCE_ENDPOINT, + DEST_ENDPOINT, + args.source, + dest_path, + args.label, + access_token, + recursive + ) + + sanitized_label = sanitize_job_name(args.label)[:50] + transfer_job_name = f"GLOBUS_transfer_{sanitized_label}" + + transfer_job_id = submit_slurm_job( + transfer_script, + transfer_job_name, + log_dir=log_dir + ) + + if transfer_job_id is None: + print("Failed to submit transfer job", file=sys.stderr) + sys.exit(1) + + print(f"\nTransfer job submitted: {transfer_job_id}") + + if args.cleanup: + print("\nPreparing cleanup job...") + cleanup_script = create_cleanup_script( + SOURCE_ENDPOINT, + [args.source], + access_token, + recursive + ) + cleanup_job_name = f"GLOBUS_cleanup_{sanitized_label}" + + cleanup_job_id = submit_slurm_job( + cleanup_script, + cleanup_job_name, + dependency=transfer_job_id, + log_dir=log_dir + ) + + if cleanup_job_id is None: + print("Failed to submit cleanup job", file=sys.stderr) + print(f"Transfer job {transfer_job_id} is still running") + sys.exit(1) + + print(f"\nCleanup job submitted: {cleanup_job_id}") + print(f"Cleanup job will run after transfer job {transfer_job_id} completes successfully") + print("Note: Globus access token will be revoked after cleanup completes") + + print("\n" + "=" * 70) + print("Job submission complete!") + print("=" * 70) + print("\nMonitor jobs with: squeue -u $USER") + print(f"Cancel jobs with: scancel {transfer_job_id}" + + (f" {cleanup_job_id}" if args.cleanup else "")) + print(f"\nJob scripts and output files are in: {log_dir}") + + +if __name__ == "__main__": + main() diff --git a/docs/requirements.txt b/docs/requirements.txt index dbe98a1..4abc60c 100644 --- a/docs/requirements.txt +++ b/docs/requirements.txt @@ -1,34 +1,28 @@ babel>=2.9.1 -click==7.1.2 -future==0.18.2 -gitdb==4.0.5 -GitPython==3.1.7 -htmlmin==0.1.12 +click +future +gitdb +GitPython importlib-metadata>=3.10 -Jinja2==2.11.3 -joblib==0.16.0 -jsmin==3.0.0 -livereload==2.6.1 -lunr==0.5.8 -Markdown==3.2.2 -MarkupSafe==1.1.1 +Jinja2 +joblib +jsmin +livereload +lunr +Markdown +MarkupSafe mkdocs>=1.3.0 -mkdocs-awesome-pages-plugin==2.2.1 -mkdocs-git-revision-date-localized-plugin==0.7 +mkdocs-awesome-pages-plugin +mkdocs-git-revision-date-localized-plugin mkdocs-material mkdocs-material-extensions -mkdocs-minify-plugin==0.3.0 -mkdocs-redirects==1.0.1 +mkdocs-minify-plugin +mkdocs-redirects nltk>=3.6.6 pygments>=2.12 pymdown-extensions -pytz==2020.1 -PyYAML>=5.4 -regex -six==1.15.0 -smmap==3.0.4 -tornado==6.0.4 -tqdm==4.48.2 -zipp==3.1.0 mkdocs-git-revision-date-plugin mike +mkdocstrings +mkdocstrings-python +griffe>=0.25.0 diff --git a/docs/usage/globus_transfer.md b/docs/usage/globus_transfer.md new file mode 100644 index 0000000..d1d89c5 --- /dev/null +++ b/docs/usage/globus_transfer.md @@ -0,0 +1,485 @@ +# Globus Transfer with Slurm + +## Introduction + +The `slurm_transfer.py` script provides an automated, managed approach to transferring large datasets between Biowulf and Skyline using Globus. This script leverages Slurm job scheduling to handle long-running transfers that can span up to 7 days, with optional automated cleanup of source files after successful transfer completion. + +**Key Features:** + +- **Automated Slurm Job Management**: Submits Globus transfers as Slurm batch jobs with predefined resource allocations +- **Fixed Endpoint Support**: Transfers only from Biowulf to Skyline endpoints +- **Pre-Flight Validation**: Checks path depth and warns about large file counts before transfer +- **Smart Path Detection**: Automatically detects whether the source is a file or directory +- **Destination Path Construction**: Builds organized destination paths based on user-provided labels +- **Optional Cleanup**: Safely deletes source files after verifying successful transfer +- **Token Management**: Handles Globus authentication and automatic token revocation +- **Job Dependencies**: Ensures cleanup only runs after successful transfer completion +- **Comprehensive Logging**: Generates detailed output and error logs for all operations + +## Supported Endpoints + +This script supports **only** the following fixed Globus endpoint pair: + +| Endpoint | UUID | Alias | Purpose | +|----------|------|-------|---------| +| **Biowulf** | `e2620047-6d04-11e5-ba46-22000b92c6ec` | biowulf | Source endpoint | +| **Skyline** | `e2dff37b-3468-4c3a-b5d7-5a00802e20ab` | skyline | Destination endpoint | + +**Note**: The Skyline endpoint is mounted at `/data/`. The destination base directory is set to `/rtb_idss/starfish/archive` (excluding the `/data/` prefix). + +## Resource Allocation + +All Slurm jobs submitted by this script use the following fixed resource allocation: + +| Resource | Value | Description | +|----------|-------|-------------| +| Partition | `norm` | Standard partition for regular jobs | +| Memory | `16g` | 16 GB of RAM | +| CPUs | `4` | 4 CPU cores per task | +| Time Limit | `7-00:00:00` | 7 days maximum runtime | + +## Pre-Flight Validation + +Before submitting any jobs or requesting Globus authentication, the script performs safety checks: + +### Path Depth Requirement + +**Required**: Source path must be at least 4 levels deep. + +✅ **Valid**: `/data/username/project/dataset` (4 levels) +❌ **Invalid**: `/data/username/project` (3 levels) + +**Why?** This prevents accidental transfer/deletion of high-level directories, especially important when using `--cleanup`. + +### Large File Count Warning + +If the source directory contains **5,000 or more files**, you'll receive a warning: + +``` +Scanning source directory for file count... +Total files found: 12,345 + +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +WARNING: Large Number of Files Detected +!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +The source directory contains 12,345 files. + +This is a significant number of files to transfer. Please consider: + - Removing temporary files (*.tmp, *.temp) + - Cleaning up intermediate pipeline outputs + - Removing log files or consolidating them + - Archiving or removing old working directories +... + +Do you want to proceed with the transfer anyway? (yes/no): +``` + +You can choose to: +- **Proceed**: Type `yes` to continue with the transfer +- **Cancel**: Type `no` to cancel and clean up files first + +!!! tip "Cleaning Up Before Transfer" + Consider removing pipeline temporary files, logs, and intermediate outputs to reduce transfer time and storage usage. See [Pre-Flight Checks Documentation](../PRE_FLIGHT_CHECKS.md) for cleanup strategies. + +## Usage + +### Basic Transfer + +Transfer a directory from Biowulf to Skyline: + +```bash +./slurm_transfer.py \ + --source /data/user/project \ + --label "Project Archive" +``` + +This will: + +1. Prompt for Globus authentication +2. Create destination path: `/data/rtb_idss/starfish/archive/Project_Archive/project` +3. Submit a Slurm transfer job +4. Monitor the transfer until completion + +### Transfer with Cleanup + +Transfer a directory and automatically delete the source after successful completion: + +```bash +./slurm_transfer.py \ + --source /data/user/project \ + --label "Project Archive" \ + --cleanup +``` + +This will submit two Slurm jobs: + +1. **Transfer job**: Copies files from Biowulf to Skyline +2. **Cleanup job**: Runs only after successful transfer, verifies completion, deletes source files, and revokes the Globus token + +### Transfer a Single File + +The script automatically detects file vs. directory sources: + +```bash +./slurm_transfer.py \ + --source /data/user/important_file.txt \ + --label "File Backup" +``` + +## Authentication Workflow + +When you run the script, it will initiate Globus authentication: + +``` +====================================================================== +Globus Authentication Required +====================================================================== + +Please visit this URL to authenticate: +https://auth.globus.org/v2/oauth2/authorize?... + +Enter the authorization code: +``` + +**Steps:** + +1. Open the provided URL in your browser +2. Authenticate with your NIH credentials +3. Copy the authorization code +4. Paste it into the terminal prompt +5. The script will exchange the code for access and refresh tokens + +**Token Lifespan:** Tokens are session-based with extended expiry suitable for multi-day transfers. When using `--cleanup`, tokens are automatically revoked after all operations complete. + +## Log Directory Structure + +All job scripts, output files, and error logs are organized in a structured directory hierarchy: + +### Primary Log Location + +By default, logs are stored at: +``` +/data/OpenOmics/dev/globus// +``` + +**Example**: For a transfer with label `"Project Archive 2023"`, logs will be in: +``` +/data/OpenOmics/dev/globus/Project_Archive_2023/ +``` + +### Fallback Location + +If `/data/OpenOmics/dev/globus/` is not writable (permission denied or doesn't exist), the script automatically falls back to: +``` +~/.globus// +``` + +The script will create this directory if it doesn't exist. + +### Log Files Generated + +Within the log directory, you'll find timestamped files: + +| File Pattern | Description | +|--------------|-------------| +| `GLOBUS_transfer_