HEFTIE final report

This is the final report for the Handling Enourmous Files from Tomography Imaging Experiments (HEFTIE) project, an EU funded project to improve tools and educational resources for handling huge imaging datasets. We hope these new resources will accelerate the analysis of 3D organ scans and unlock discoveries in the life sciences, benefiting also other fields, such as neuroscience, archaeology, and the earth sciences.

Digital textbook

We created a new digital textbook, called Handling Enormous Files from 3D Imaging Experiments. This textbook gives scientists:

an introduction to the motivation and theory behind chunked datasets
a practical introduction to creating chunked datasets, and the configuration options available
a guide to designing parallel processing algorithms to work efficiently with chunked datasets
a guide to exporting chunked datasets to other 'tradditional' datasets

Benchmarking for Zarr

We created a set of benchmarks for reading / writing data to Zarr with a range of different configurations. These benchmarks provide guidance on how selection of different configurations affect data size and read/write performance. The different parameters were:

Type of image
- Heart: HiP-CT scan of a heart from the Human Organ Atlas
- Dense: segmented neurons from electron microscopy
- Sparse: A few select segmented neurons from electron microscopy
Software libraries
- Tensorstore (fastest for both reading and writing data)
- zarr-python version 3
- zarr-python version 2 (slowest for both reading and writing data)
Compressor
- blosc-zstd provides the best compression ratio, for image and segmentation data. (options were blosc-blosclz, blosc-lz4, blosc-lz4hc, blosc-zlib, blosc-zstd as well as gzip and zstd)
Compression level
- Setting compression levels beyond ~3 results in slightly better data compression but much longer write times. Compression level does not affect read time.
Shuffle
- Setting the shuffle option increases data compression with no adverse effect on read/write times (shuffle, bitshuffle and noshuffle were the 3 options)
Zarr format version
- There was no noticeable difference between Zarr format 2 and Zarr format 3 data
Chunk size
- Setting a low chunk size (below around 90) has an adverse effect on read and write times.

Tools for working with chunked datasets

Contributions have been made to the zarr-python repository:

PRs have been opened in the zarr-python repository:

Prevent creation of arrays/groups under a parent array
[Holding space - LRUStoreCache]

PRs have also been opened for:

Improvements to cloud visualisation

Evaluated "Segment Anything 2"-based quick-select feature for 3D annotation of X-Ray tomography data
Added of arbitrary rotation of orthogonal viewports
Added selective segment visibility (don't show all segments always)
Added key-value metadata fields for segments and skeleton trees

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HEFTIE final report

Digital textbook

Benchmarking for Zarr

Tools for working with chunked datasets

Improvements to cloud visualisation

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

HEFTIEProject/project-report

Folders and files

Latest commit

History

Repository files navigation

HEFTIE final report

Digital textbook

Benchmarking for Zarr

Tools for working with chunked datasets

Improvements to cloud visualisation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Packages