Skip to content

Commit 7143f40

Browse files
TomNicholaspre-commit-ci[bot]chuckwondo
authored
Docs and API follow-ups to #601 (#619)
* move Parser definition to new parser.typing module * add API docs for Parser protocol and parser classes * avoid extra .hdf namespace for only one parser * rename reader -> parser * update custom parsers page * update usage docs * update roadmap to reflect where we actually are * update faq * note about the renaming of readers->parsers * minor qualification * release notes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * change import * ignore lint * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Nits from Chunk's review Co-authored-by: Chuck Daniels <[email protected]> * add note about using context managers * Add context manager for assertion Co-authored-by: Chuck Daniels <[email protected]> * fix nav * remove sphinx-style page link * syntax for note * correct link syntax * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * can't link to private zarr metadata class * can't link to private zarr metadata class * try following syntax for nested lists given here https://squidfunk.github.io/mkdocs-material/reference/lists/#using-ordered-lists --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Chuck Daniels <[email protected]>
1 parent 4cd6629 commit 7143f40

File tree

14 files changed

+356
-203
lines changed

14 files changed

+356
-203
lines changed

README.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ Please see the [documentation](https://virtualizarr.readthedocs.io/en/stable/ind
2828

2929
* Create virtual references pointing to bytes inside an archival file with [`open_virtual_dataset`](https://virtualizarr.readthedocs.io/en/latest/usage.html#opening-files-as-virtual-datasets).
3030
* Supports a [range of archival file formats](https://virtualizarr.readthedocs.io/en/latest/faq.html#how-do-virtualizarr-and-kerchunk-compare), including netCDF4 and HDF5, and has a pluggable system for supporting new formats.
31+
* Access data via the zarr-python API by reading from the zarr-compatible [`ManifestStore`](https://virtualizarr.readthedocs.io/en/latest/generated/virtualizarr.manifests.ManifestStore.html).
3132
* [Combine data from multiple files](https://virtualizarr.readthedocs.io/en/latest/usage.html#combining-virtual-datasets) into one larger datacube using [xarray's combining functions](https://docs.xarray.dev/en/stable/user-guide/combining.html), such as [`xarray.concat`](https://docs.xarray.dev/en/stable/generated/xarray.concat.html).
3233
* Commit the virtual references to storage either using the [Kerchunk references](https://fsspec.github.io/kerchunk/spec.html) specification or the [Icechunk](https://icechunk.io/) transactional storage engine.
3334
* Users access the virtual datacube simply as a single zarr-compatible store using [`xarray.open_zarr`](https://docs.xarray.dev/en/stable/generated/xarray.open_zarr.html).
@@ -42,15 +43,21 @@ You now have a choice between using VirtualiZarr and Kerchunk: VirtualiZarr prov
4243

4344
VirtualiZarr version 1 (mostly) achieves [feature parity](https://virtualizarr.readthedocs.io/en/latest/faq.html#how-do-virtualizarr-and-kerchunk-compare) with kerchunk's logic for combining datasets, providing an easier way to manipulate kerchunk references in memory and generate kerchunk reference files on disk.
4445

46+
VirtualiZarr version 2 (unreleased) will bring:
47+
48+
- Zarr v3 support,
49+
- A pluggable system of "parsers" for virtualizing custom file formats,
50+
- The `ManifestStore` abstraction, which allows for loading data without serializing to Kerchunk/Icechunk first,
51+
- Integration with [`obstore`](https://developmentseed.org/obstore/latest/),
52+
- Reference parsing that doesn't rely on kerchunk under the hood.
53+
4554
Future VirtualiZarr development will focus on generalizing and upstreaming useful concepts into the Zarr specification, the Zarr-Python library, Xarray, and possibly some new packages.
4655

4756
We have a lot of ideas, including:
48-
- [Zarr v3 support](https://github.com/zarr-developers/VirtualiZarr/issues/17)
4957
- [Zarr-native on-disk chunk manifest format](https://github.com/zarr-developers/zarr-specs/issues/287)
5058
- ["Virtual concatenation"](https://github.com/zarr-developers/zarr-specs/issues/288) of separate Zarr arrays
5159
- ManifestArrays as an [intermediate layer in-memory](https://github.com/zarr-developers/VirtualiZarr/issues/71) in Zarr-Python
5260
- [Separating CF-related Codecs from xarray](https://github.com/zarr-developers/VirtualiZarr/issues/68#issuecomment-2197682388)
53-
- [Generating references without kerchunk](https://github.com/zarr-developers/VirtualiZarr/issues/78)
5461

5562
If you see other opportunities then we would love to hear your ideas!
5663

docs/api.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,18 @@ Users can use xarray for every step apart from reading and serializing virtual r
1010
::: virtualizarr.open_virtual_dataset
1111
::: virtualizarr.open_virtual_mfdataset
1212

13+
### Parsers
14+
15+
Each parser understands how to read a specific file format, and a parser must be passed to `virtualizarr.open_virtual_dataset`.
16+
17+
::: virtualizarr.parsers.DMRPPParser
18+
::: virtualizarr.parsers.FITSParser
19+
::: virtualizarr.parsers.HDFParser
20+
::: virtualizarr.parsers.NetCDF3Parser
21+
::: virtualizarr.parsers.KerchunkJSONParser
22+
::: virtualizarr.parsers.KerchunkParquetParser
23+
::: virtualizarr.parsers.ZarrParser
24+
1325
### Serialization
1426

1527
::: virtualizarr.accessor.VirtualiZarrDatasetAccessor
@@ -25,14 +37,18 @@ Users can use xarray for every step apart from reading and serializing virtual r
2537

2638
### Developer API
2739

28-
If you want to write a new reader to create virtual references pointing to a custom file format, you will need to use VirtualiZarr's internal classes.
40+
If you want to write a new parser to create virtual references pointing to a custom file format, you will need to use VirtualiZarr's internal classes.
41+
See the page on custom parsers for more information.
2942

3043
#### Manifests
3144

3245
VirtualiZarr uses these classes to store virtual references internally.
46+
See the page on data structures for more information.
3347

3448
::: virtualizarr.manifests.ChunkManifest
3549
::: virtualizarr.manifests.ManifestArray
50+
::: virtualizarr.manifests.ManifestGroup
51+
::: virtualizarr.manifests.ManifestStore
3652

3753
#### Array API
3854

@@ -43,6 +59,12 @@ VirtualiZarr's [virtualizarr.manifests.ManifestArray][] objects support a limite
4359
::: virtualizarr.manifests.array_api.expand_dims
4460
::: virtualizarr.manifests.array_api.broadcast_to
4561

62+
#### Parser typing protocol
63+
64+
All custom parsers must follow the `virtualizarr.parsers.typing.Parser` typing protocol.
65+
66+
::: virtualizarr.parsers.typing.Parser
67+
4668
#### Parallelization
4769

4870
Parallelizing virtual reference generation can be done using a number of parallel execution frameworks.

0 commit comments

Comments
 (0)