This repository contains benchmarks for creating, reading, and storing huge 3D images in Zarr arrays. It was done as part of the HEFTIE project.
The goal is to benchmark writing data to Zarr with a range of different configurations (e.g., compression codec, chunk size...), to guide the choice of options for reading and writing 3D imaging data.
The final write-up can be found at https://heftieproject.github.io/zarr-benchmarks/.
zarr-developers/zarr-benchmarkLDeakin/zarr_benchmarks- Zarr Visualization Report
icechunkbenchmarks
Install the relevant dependencies with:
# Run from the top level of this repository
pip install -e .[plots]If using uv, you can also install the dependencies with:
uv pip install -e ".[plots]"Note: there are a number of optional dependencies that can be installed, if required. See the development dependencies section.
To run all benchmarks (with all images) run the following tox commands:
# Run with an image of a heart from the Human Organ Atlas
tox -- --benchmark-only --image=heart --config=all --benchmark-storage=data/results/heart
# Run with a dense segmentation (small subset of C3 segmentation data from the H01 release)
tox -- --benchmark-only --image=dense --config=all --benchmark-storage=data/results/dense
# Run with a sparse segmentation (small subset of '104 proofread cells' segmentation data from the H01 release)
tox -- --benchmark-only --image=sparse --config=all --benchmark-storage=data/results/sparseThis will run all benchmarks via zarr-python version 2 + 3 and tensorstore
with the given images. Each tox command will generate three result .json files
in the given --benchmark-storage directory - one for zarr-python version 2
({id}_zarr-python-v2.json), one for zarr-python version 3
({id}_zarr-python-v3.json) and one for tensorstore ({id}_tensorstore.json).
{id} is a four digit number (e.g. 0001) that increments automatically for
every new tox run.
If --benchmark-storage isn't specified, json files will be saved to the
default .benchmarks directory. We recommend setting --benchmark-storage to
an appropriately named sub-directory within data/results (as in the example
above).
Note: the first time these commands are run, the required datasets will be
downloaded from
HEFTIE's Zenodo repository and cached
locally on your computer. Later runs will re-use this data, and should be
faster. Information about the source of these datasets is provided in the
LICENSE file within each .zarr file on Zenodo.
--config=all will use parameters from all configuration files under
tests/benchmarks/benchmark_configs (except for dev which contains a small
selection of parameters for quick test runs). To run with parameters from a
single config file use e.g.
tox -- --benchmark-only --image=heart --config=shuffle --benchmark-storage=data/results/heartTo only run benchmarks for a specific package, use the -e option:
# tensorstore only
tox run -e py313-tensorstore -- --benchmark-only --image=heart --config=all --benchmark-storage=data/results/heart
# zarr-python v2 only
tox run -e py313-zarrv2 -- --benchmark-only --image=heart --config=all --benchmark-storage=data/results/heart
# zarr-python v3 only
tox run -e py313-zarrv3 -- --benchmark-only --image=heart --config=all --benchmark-storage=data/results/heartTo see a list of available environments, use tox -l.
Removing the --config option will use a small dev config to test a small
selection of parameters:
tox -- --benchmark-only --image=heart --benchmark-storage=data/results/heartYou can also use a smaller image (128x128x128 numpy array) by using
--image=dev (this is also the default if no --image option is provided):
tox -- --benchmark-only --image=dev --benchmark-storage=data/results/devYou can also override the default number of rounds / warmup rounds for each benchmark with:
tox -- --benchmark-only --image=dev --rounds=1 --warmup-rounds=0 --benchmark-storage=data/results/devAs described in the specific package section, you can also run with a single tox environment via e.g.:
tox run -e py313-tensorstore -- --benchmark-only --image=dev --benchmark-storage=data/results/devEverything after the first -- will be passed to the internal pytest call, so
you can add any pytest options you require.
Running tox without --benchmark-only, will run the tests + the benchmarks.
To only run the tests use:
tox -- --benchmark-skipOnce in your virtual environment, you can create plots with:
python src/zarr_benchmarks/create_plots.pyThis will process the latest benchmark results from data/results and create
plots as .png files under data/plots. If you want to process older benchmark
results, you can explicitly provide the ids of the zarr-python-v2,
zarr-python-v3 and tensorstore jsons:
python src/zarr_benchmarks/create_plots.py --json_ids 0001 002 0003To see more info about what these values represent and additional options run:
python src/zarr_benchmarks/create_plots.py -hIf required, you can install all tensorstore + zarr-python dependencies with:
pip install .[plots,tensorstore,zarr-python-v3]Use zarr-python-v2 if you need version 2 instead.
Further information about code structure / implementation, is provided in the developer docs.