Release: season span-fetch + correctness/pool fixes#129
Merged
Conversation
…m, cap workers to GEE pool (#128) Two issues surfaced by a real 16-model season-analysis run: 1. Per-year fetching: seasons.fetch_and_analyze_years_fixed fetched one GEE call per year (N_years per model). Refactor to fetch the whole period (plus any year-crossing tail) in ONE call and slice per year in memory; the per-year helpers already filter by year/window, so results are identical. ~21x fewer fetches per model on a 21-year run. 2. Unsafe parallelism in season_analysis/ensemble.py: analyze_one_model uses process-global state — use_nex_gddp() monkeypatches seasons.get_climate_data and redirect_stdout() swaps sys.stdout — which is not thread-safe. Concurrent models clobbered each other's model/scenario binding (observed as a scenario=ssp245 leak under a ssp585 run). Revert this module to serial; the span-fetch above is the real speedup here and applies regardless of concurrency. Also cap the auto worker count at 10 (was 16) across the parameter-based ensembles (hazards, climatology, statistics, periods) to match GEE's HTTP connection pool size and eliminate 'connection pool is full' churn. These four pass model/scenario as parameters and remain safely parallel.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Promotes #128 from staging to main: span-fetch the season-analysis path (one GEE call per model instead of one per year), revert the unsafe parallelism in season_analysis/ensemble.py to serial (its global monkeypatch/redirect_stdout caused a scenario leak under threads), and cap auto workers at 10 to match the GEE connection pool across the parameter-based ensembles.