Conversation
* Fixed SSL certificate verification bypass whitelist matching to use origin-based comparison instead of substring matching, preventing unintended domain matches. * Fixed keychain authentication entry matching to use URL prefix comparison instead of substring matching, preventing credential leakage to unintended hosts. * Replaced use of `eval()` for numeric filter comparisons with `operator` module functions. * Removed obsolete Python version checks throughout the codebase.
…a ThreadPoolExecutor. Opt-in via CLI --fetch-concurrency N argument or API fetch_concurrency=N parameter on resolve_fetch() and materialize(). Default remains serial (concurrency=1) for full backward compatibility. * New configuration keys in bdbag.json: max_concurrent_fetches (default 8): ceiling for the effective concurrency. The requested concurrency is clamped to this value. concurrent_fetch_exclude_schemes (default ["globus"]): transport schemes that are always fetched serially, even when concurrent fetching is enabled. * Thread-safety improvements: os.makedirs in fetch/__init__.py now uses exist_ok=True to avoid race conditions. * HTTP session creation in fetch_http.py is now protected by a threading.Lock. * Fetcher instance creation uses double-checked locking to prevent duplicate transport instantiation. * Ctrl+C (SIGINT) handling during concurrent fetches: a custom signal handler sets a cancellation event that causes in-flight workers to abort promptly, with clean shutdown and KeyboardInterrupt re-raised to the caller. * CLI now catches KeyboardInterrupt and exits cleanly with an "Interrupted by user." message instead of a traceback.
Concurrent fetches shared a single requests.Session dict on the transport instance, causing 401 failures and auth header corruption under load (e.g. 244/382 requests failing at fetch_concurrency=8 with cookie auth). Fix uses threading.local() so each worker thread gets its own session cache. A central registry (_all_thread_session_dicts) lets cleanup() close all sessions regardless of which thread created them. Adds a regression test verifying that concurrent calls to get_session() return distinct Session objects per thread. Update Pipfile.lock
Using thread.ident as a dict key caused the test to fail on Linux (Python 3.9-3.12) because short-lived threads can have their OS thread ID recycled before the next thread starts, producing dict collisions. Replaced with a pre-allocated list indexed by thread ordinal, which is stable regardless of OS thread scheduling.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implement Parallel/concurrent fetch support. #44: add support for concurrent/parallel file fetching via ThreadPoolExecutor.
Opt-in via CLI --fetch-concurrency N argument or API fetch_concurrency=N parameter on resolve_fetch() and materialize(). Default remains serial (concurrency=1) for full backward compatibility.
New configuration keys in bdbag.json:
max_concurrent_fetches(default 8): ceiling for the effective concurrency. The requested concurrency is clamped to this value.concurrent_fetch_exclude_schemes(default ["globus"]): transport schemes that are always fetched serially, even when concurrent fetching is enabled.Thread-safety improvements: os.makedirs in fetch/init.py now uses exist_ok=True to avoid race conditions.
HTTP session creation in fetch_http.py is now protected by a threading.Lock.
Fetcher instance creation uses double-checked locking to prevent duplicate transport instantiation.
Ctrl+C (SIGINT) handling during concurrent fetches: a custom signal handler sets a cancellation event that causes in-flight workers to abort promptly, with clean shutdown and KeyboardInterrupt re-raised to the caller.
CLI now catches KeyboardInterrupt and exits cleanly with an "Interrupted by user." message instead of a traceback.
Dropped support for Python < 3.9.
Fixed SSL certificate verification bypass whitelist matching to use origin-based comparison instead of substring matching, preventing unintended domain matches.
Fixed keychain authentication entry matching to use URL prefix comparison instead of substring matching, preventing credential leakage to unintended hosts.
Replaced use of
eval()for numeric filter comparisons withoperatormodule functions.Removed obsolete Python version checks throughout the codebase.