Skip to content

1.9.0-dev#64

Open
mikedarcy wants to merge 7 commits intomasterfrom
1.9.0-dev
Open

1.9.0-dev#64
mikedarcy wants to merge 7 commits intomasterfrom
1.9.0-dev

Conversation

@mikedarcy
Copy link
Copy Markdown
Collaborator

@mikedarcy mikedarcy commented Jan 30, 2026

  • Implement Parallel/concurrent fetch support. #44: add support for concurrent/parallel file fetching via ThreadPoolExecutor.
    Opt-in via CLI --fetch-concurrency N argument or API fetch_concurrency=N parameter on resolve_fetch() and materialize(). Default remains serial (concurrency=1) for full backward compatibility.

  • New configuration keys in bdbag.json:
    max_concurrent_fetches (default 8): ceiling for the effective concurrency. The requested concurrency is clamped to this value.
    concurrent_fetch_exclude_schemes (default ["globus"]): transport schemes that are always fetched serially, even when concurrent fetching is enabled.

  • Thread-safety improvements: os.makedirs in fetch/init.py now uses exist_ok=True to avoid race conditions.

  • HTTP session creation in fetch_http.py is now protected by a threading.Lock.

  • Fetcher instance creation uses double-checked locking to prevent duplicate transport instantiation.

  • Ctrl+C (SIGINT) handling during concurrent fetches: a custom signal handler sets a cancellation event that causes in-flight workers to abort promptly, with clean shutdown and KeyboardInterrupt re-raised to the caller.

  • CLI now catches KeyboardInterrupt and exits cleanly with an "Interrupted by user." message instead of a traceback.

  • Dropped support for Python < 3.9.

  • Fixed SSL certificate verification bypass whitelist matching to use origin-based comparison instead of substring matching, preventing unintended domain matches.

  • Fixed keychain authentication entry matching to use URL prefix comparison instead of substring matching, preventing credential leakage to unintended hosts.

  • Replaced use of eval() for numeric filter comparisons with operator module functions.

  • Removed obsolete Python version checks throughout the codebase.

* Fixed SSL certificate verification bypass whitelist matching to use origin-based comparison instead of substring matching, preventing unintended domain matches.
* Fixed keychain authentication entry matching to use URL prefix comparison instead of substring matching, preventing credential leakage to unintended hosts.
* Replaced use of `eval()` for numeric filter comparisons with `operator` module functions.
* Removed obsolete Python version checks throughout the codebase.
@mikedarcy mikedarcy self-assigned this Jan 30, 2026
@mikedarcy mikedarcy marked this pull request as draft January 30, 2026 03:23
@mikedarcy mikedarcy added this to the 1.9 milestone Jan 30, 2026
@coveralls
Copy link
Copy Markdown

coveralls commented Jan 30, 2026

Coverage Status

coverage: 92.682% (+1.4%) from 91.25%
when pulling 4de3a48 on 1.9.0-dev
into 539a9ba on master.

…a ThreadPoolExecutor.

  Opt-in via CLI --fetch-concurrency N argument or API fetch_concurrency=N parameter on resolve_fetch() and materialize(). Default remains serial (concurrency=1) for full backward compatibility.
* New configuration keys in bdbag.json:
  max_concurrent_fetches (default 8): ceiling for the effective concurrency. The requested concurrency is clamped to this value.
  concurrent_fetch_exclude_schemes (default ["globus"]): transport schemes that are always fetched serially, even when concurrent fetching is enabled.
* Thread-safety improvements: os.makedirs in fetch/__init__.py now uses exist_ok=True to avoid race conditions.
* HTTP session creation in fetch_http.py is now protected by a threading.Lock.
* Fetcher instance creation uses double-checked locking to prevent duplicate transport instantiation.
* Ctrl+C (SIGINT) handling during concurrent fetches: a custom signal handler sets a cancellation event that causes in-flight workers to abort promptly, with clean shutdown and KeyboardInterrupt re-raised to the caller.
* CLI now catches KeyboardInterrupt and exits cleanly with an "Interrupted by user." message instead of a traceback.
  Concurrent fetches shared a single requests.Session dict on the transport
  instance, causing 401 failures and auth header corruption under load (e.g.
  244/382 requests failing at fetch_concurrency=8 with cookie auth). Fix uses
  threading.local() so each worker thread gets its own session cache. A
  central registry (_all_thread_session_dicts) lets cleanup() close all
  sessions regardless of which thread created them. Adds a regression test
  verifying that concurrent calls to get_session() return distinct Session
  objects per thread.

Update Pipfile.lock
  Using thread.ident as a dict key caused the test to fail on Linux (Python
  3.9-3.12) because short-lived threads can have their OS thread ID recycled
  before the next thread starts, producing dict collisions. Replaced with a
  pre-allocated list indexed by thread ordinal, which is stable regardless
  of OS thread scheduling.
@mikedarcy mikedarcy marked this pull request as ready for review March 23, 2026 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants