-
-
Couldn't load subscription status.
- Fork 365
feat: Implement ZEP 8 URL syntax support for zarr-python #3369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add comprehensive ZEP 8 URL parsing and resolution system - Implement StoreAdapter ABC for extensible storage adapters - Add built-in adapters for file, memory, S3, GCS, HTTPS schemes - Support pipe-chained URLs like s3://bucket/data.zip|zip:|zarr3: - Add URLSegment parsing with validation - Integrate with zarr.open_group and zarr.open_array APIs - Include demo script and comprehensive test suite - Pass all existing tests + 35 new ZEP 8-specific tests
…into feature/zep8-url-support
|
One tricky thing about The "path" syntax is generally the default when running a regular s3-compatible server, but the "virtual host" syntax can commonly occur when someone defines a CNAME DNS entry to map their own domain or subdomain to an AWS S3 bucket. When designing this syntax for Neuroglancer, it seemed like it would be annoying to require users to use separate syntax to disambiguate the two cases. Instead, for operations where it matters (namely List), Neuroglancer just automatically determines which of the two cases applies by trying both ways and seeing which one succeeds, and then caching the result so that subsequent list operations don't require two requests. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3369 +/- ##
===========================================
- Coverage 94.92% 60.87% -34.06%
===========================================
Files 79 86 +7
Lines 9500 10231 +731
===========================================
- Hits 9018 6228 -2790
- Misses 482 4003 +3521
🚀 New features to boost your workflow:
|
| Examples:: | ||
|
|
||
| # Basic ZIP file storage | ||
| zarr.open_array("file:data.zip|zip", mode='w', shape=(10, 10), dtype="f4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a bad idea to break file uris.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is considered in the spec doc this PR is building on: zarr-developers/zeps#48
And later is says:
Implementations SHOULD not support `file://relative/path` since that
is ambiguous with the `file://hostname/path` syntax defined by
[RFC8089](https://datatracker.ietf.org/doc/html/rfc8089).
If you forsee serious issues here I'd encourage commenting on that PR on the standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partway through the code at this point (currently following the logic of store resolution). Posting some comments now so I don't lose them (i've been burned by this before)
| **In-memory storage:** | ||
|
|
||
| >>> # Create array in memory | ||
| >>> z = zarr.open_array("memory:", mode='w', shape=(5, 5), dtype="f4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I then access this from somewhere else using this syntax?
e.g. memory:aesr80s9e8ra?
| .. warning:: | ||
| The :class:`zarr.storage.ObjectStore` class is experimental. | ||
|
|
||
| URL-based Storage (ZEP 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what this section is missing is a showcasing of what the equivalent zarr-pyhthon code would be to put it in terms people are more familiar with.
So each section would be:
zarr.open_array("file:zep8-data.zip|zip" ....)
# is equivalent to
zarr.open_array(zarr.storage.ZipStore(...)...)| .. note:: | ||
| When using ZEP 8 URLs with third-party libraries like xarray, the URL syntax allows | ||
| seamless integration without requiring zarr-specific store creation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| .. note:: | |
| When using ZEP 8 URLs with third-party libraries like xarray, the URL syntax allows | |
| seamless integration without requiring zarr-specific store creation. |
This is already effectively stated above.
| URL-based Storage (ZEP 8) | ||
| ~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| Zarr supports URL-based storage following the ZEP 8 specification, which allows you to specify storage locations using URLs with chained adapters:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a link to the more extensive docs.
|
|
||
| - ``file:path.zip|zip`` - ZIP file on local filesystem | ||
| - ``s3://bucket/data.zip|zip`` - ZIP file in S3 bucket | ||
| - ``memory:`` - In-memory storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - ``memory:`` - In-memory storage |
not an example of piping
| from zarr.abc.store_adapter import StoreAdapter | ||
| from zarr.storage._local import LocalStore | ||
| from zarr.storage._logging import LoggingStore | ||
| from zarr.storage._memory import MemoryStore | ||
| from zarr.storage._zep8 import URLStoreResolver | ||
| from zarr.storage._zip import ZipStore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth considering makign these lazy, or not a big enough gain?
| >>> is_zep8_url("s3://bucket/data.zarr") | ||
| False | ||
| >>> is_zep8_url("file:///data.zarr") | ||
| False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really follow these returning false. seems like a downgrade in functionality, also having looked a the spec a bit I don't see where this is explicitly disallowed (though very possible i misread or misunderstood)
| if not is_zep8_url(url): | ||
| # Check if it's a simple scheme URL that we can handle | ||
| if "://" in url or ((":" in url) and not url.startswith("/")): | ||
| # Parse as a single segment URL - the parser should handle this | ||
| try: | ||
| segments = self.parser.parse(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I think this answers my question above. Since these are valid urls I think ideally is_zep8_url would handle these simple cases correctly.
| except Exception: | ||
| raise ValueError(f"Not a valid URL: {url}") from None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the expected exception's we are catching here. As written this might silently quash an exception that is a real bug.
I think this whole section can be consolidated and simplified a bit, especially as we don't actually do anyhthing differently here then in the else branch that calls the same parse
| for i, segment in enumerate(segments): | ||
| if segment.adapter in ("zarr2", "zarr3"): | ||
| # Skip zarr format segments - they don't create stores | ||
| # TODO: these should propagate to the open call somehow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling out this TODO as importnat before merging
This PR implements support for the ZEP 8 URL syntax in Zarr Python.
Some examples of what now works:
TODO:
docs/user-guide/*.rstchanges/closes #2943
fixes #2831
xref: zarr-developers/zeps#48
cc @jbms