Add cache versioning and dataset-local pre-cache lookup by cbyrohl · Pull Request #230 · cbyrohl/scida

cbyrohl · 2026-02-27T21:21:20Z

Summary

Introduces CACHE_FORMAT_VERSION = 1 constant that is stamped into every new cache file (HDF5 attribute + JSON field), enabling automatic invalidation when the format changes in future releases
Adds find_precached_file() to search ancestor directories for admin-pre-placed cache files at {ancestor}/postprocessing/scida/{hash}.{ext}, eliminating the need for every user to independently rebuild caches for shared datasets
Pre-cache is strictly read-only: scida never writes to or deletes pre-cache files
Precedence: user cache > dataset-local pre-cache > create from scratch
Invalid pre-caches (version mismatch, corrupt) fall through gracefully to user cache creation

Files changed

src/scida/misc.py — CACHE_FORMAT_VERSION, find_precached_file()
src/scida/io/_base.py — Version write/validate in ChunkedHDF5Loader, pre-cache fallback in load() and load_metadata()
src/scida/series.py — JSON version write/validate in DatasetSeries, pre-cache fallback in __init__(), safe deletion logic (never deletes pre-cache)
tests/test_precache.py — 14 new tests covering versioning and pre-cache for HDF5 and JSON paths

Closes #195

Test plan

14 new unit/integration tests in tests/test_precache.py
160 non-external tests pass (no regressions)
345 external tests pass with real simulation data
Verify admin-placed pre-cache works on a shared filesystem (manual)

🤖 Generated with Claude Code

Introduces CACHE_FORMAT_VERSION to stamp every cache file (HDF5 and JSON), enabling automatic invalidation when the format changes. Adds find_precached_file() to search parent directories for admin-pre-placed cache files at {ancestor}/postprocessing/scida/{hash}.{ext}, with strict read-only semantics (pre-cache is never written to or deleted). Closes #195 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add scida.devtools package with: - _cache_deploy.py: deploy_precache() and deploy_series_precache() functions for copying local cache files to a shared basefolder, plus MPCDF_TARGETS list - __main__.py: typer CLI (python -m scida.devtools) with build, deploy, build-deploy, and *-all commands for batch cache operations - __init__.py: re-exports public API for backwards-compatible imports Tests: 9 new tests covering deploy functions (single/multi path, error cases, overwrite, round-trip loadability for both HDF5 and series JSON caches). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

lazy=False on DatasetSeries doesn't force dataset init — delay_init is always applied. Instead, explicitly access .data on each dataset to trigger initialization and cache creation. Verified with real data (TNGvariation_simulation): build creates 5 HDF5 caches + 1 series JSON, deploy copies all to basefolder. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

delay_init was always applied regardless of the lazy flag, so lazy=False was a no-op. Now when lazy=False, evaluate_lazy() is called on each dataset after construction, triggering full init and cache creation. This also simplifies the CLI back to using scida.load(lazy=False) instead of the manual per-dataset workaround. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Commands are now scoped under `python -m scida.devtools cache`: cache build, cache deploy, cache build-deploy, cache build-all, cache deploy-all, cache build-deploy-all This makes the naming clearer (operations are on caches) and leaves room for future devtools subcommand groups. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cbyrohl and others added 5 commits February 27, 2026 13:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cache versioning and dataset-local pre-cache lookup#230

Add cache versioning and dataset-local pre-cache lookup#230
cbyrohl wants to merge 5 commits into
mainfrom
precache-lookup

cbyrohl commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cbyrohl commented Feb 27, 2026

Summary

Files changed

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant