Add cache versioning and dataset-local pre-cache lookup#230
Draft
cbyrohl wants to merge 5 commits into
Draft
Conversation
Introduces CACHE_FORMAT_VERSION to stamp every cache file (HDF5 and JSON),
enabling automatic invalidation when the format changes. Adds
find_precached_file() to search parent directories for admin-pre-placed
cache files at {ancestor}/postprocessing/scida/{hash}.{ext}, with strict
read-only semantics (pre-cache is never written to or deleted).
Closes #195
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add scida.devtools package with: - _cache_deploy.py: deploy_precache() and deploy_series_precache() functions for copying local cache files to a shared basefolder, plus MPCDF_TARGETS list - __main__.py: typer CLI (python -m scida.devtools) with build, deploy, build-deploy, and *-all commands for batch cache operations - __init__.py: re-exports public API for backwards-compatible imports Tests: 9 new tests covering deploy functions (single/multi path, error cases, overwrite, round-trip loadability for both HDF5 and series JSON caches). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
lazy=False on DatasetSeries doesn't force dataset init — delay_init is always applied. Instead, explicitly access .data on each dataset to trigger initialization and cache creation. Verified with real data (TNGvariation_simulation): build creates 5 HDF5 caches + 1 series JSON, deploy copies all to basefolder. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
delay_init was always applied regardless of the lazy flag, so lazy=False was a no-op. Now when lazy=False, evaluate_lazy() is called on each dataset after construction, triggering full init and cache creation. This also simplifies the CLI back to using scida.load(lazy=False) instead of the manual per-dataset workaround. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Commands are now scoped under `python -m scida.devtools cache`: cache build, cache deploy, cache build-deploy, cache build-all, cache deploy-all, cache build-deploy-all This makes the naming clearer (operations are on caches) and leaves room for future devtools subcommand groups. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
CACHE_FORMAT_VERSION = 1constant that is stamped into every new cache file (HDF5 attribute + JSON field), enabling automatic invalidation when the format changes in future releasesfind_precached_file()to search ancestor directories for admin-pre-placed cache files at{ancestor}/postprocessing/scida/{hash}.{ext}, eliminating the need for every user to independently rebuild caches for shared datasetsFiles changed
src/scida/misc.py—CACHE_FORMAT_VERSION,find_precached_file()src/scida/io/_base.py— Version write/validate inChunkedHDF5Loader, pre-cache fallback inload()andload_metadata()src/scida/series.py— JSON version write/validate inDatasetSeries, pre-cache fallback in__init__(), safe deletion logic (never deletes pre-cache)tests/test_precache.py— 14 new tests covering versioning and pre-cache for HDF5 and JSON pathsCloses #195
Test plan
tests/test_precache.py🤖 Generated with Claude Code