Add support for NETCDF4_CLASSIC to h5netcdf engine #10686

huard · 2025-09-03T05:02:22Z

Added logic in the h5netcdf engine to write pseudo NETCDF4_CLASSIC files, reusing encoding logic used by the netcdf4` engine.

The files generated with the PR using the latest h5netcdf release (1.6.4) won't be recognized by third party software as genuine NETCDF4_CLASSIC files, in part because they have no _nc3_strict hidden global attribute. There are other differences with netCDF4 generated files, including string attributes padding, how _FillValue is stored, etc. Changes to h5netcdf will be necessary to make netCDF files fully compliant with the CLASSIC format.

[x ] Closes Support "NETCDF4_CLASSIC" format with engine h5netcdf #10676
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

…th h5netcdf engine

…ys with NETCDF4_CLASSIC

…cdf4 and h5netcdf

…able attributes as well.

…tribute.

shoyer · 2025-09-03T18:17:33Z

Xarray currently doesn't have any logic to build these metadata attributes. Currently this is all handled in h5netcdf.

We should also make sure that trying to use NetCDF4-only features (e.g., groups) results in an error.

huard · 2025-09-03T18:48:40Z

The last commit uses h5dump to display differences between the expected and actual content of the HDF5 file. I was also able to add a _nc3_strict global attribute.

I can try to raise an error if groups are used.

The remaining differences are related to the SUPERBLOCK version , the STRPAD character, and the _FillValue. Not sure I'll be able to resolve those.

    SUPER_BLOCK {
  -    SUPERBLOCK_VERSION 0
  ?                       ^
  +    SUPERBLOCK_VERSION 2
  ?                       ^
...
         ATTRIBUTE "foo" {
             DATATYPE  H5T_STRING {
                STRSIZE 8;
  -             STRPAD H5T_STR_NULLPAD;
  ?                                ^^^
  +             STRPAD H5T_STR_NULLTERM;
  ?                                ^^^^
                CSET H5T_CSET_ASCII;
                CTYPE H5T_C_S1;
             }
...
          ATTRIBUTE "_FillValue" {
             DATATYPE  H5T_IEEE_F64LE
  -          DATASPACE  SCALAR
  +          DATASPACE  SIMPLE { ( 1 ) / ( 1 ) }
             DATA {
             (0): nan
             }
          }

shoyer · 2025-09-03T20:04:08Z

If you really want to get metadata attributes and precise HDF5 types right, that should all be handled in h5netcdf. I think that's also the right place for h5dump tests.

In Xarray, all we should be doing for NETCDF4_CLASSIC is coercing some dtypes (using Xarray's encoders) to NetCDF3 compatible types.

kmuehlbauer · 2025-09-03T22:51:13Z

@huard Thanks for pushing this!

For the superblock issue, please add kwarg libver="earliest" when opening the file for writing. This will create the file with superblock version 0 for maximum backwards compatibility.

For the NULLPAD vs. NULLTERM there is some reading material here PyTables/PyTables#264 and here h5netcdf/h5netcdf#116. This one would need to be implemented in h5netcdf, if need be.

huard · 2025-09-04T15:25:27Z

@kmuehlbauer Thanks for the references, this is really helpful !

I'll remove the low-level stuff from this branch (_nc3_strict) and bring it into h5netcdf.

…tcdf).

…o `netCDF4_.get_datatype` skips required conversions. Remove global attribute from create_test_data because it impacts other tests in other files.

huard · 2025-09-09T02:50:51Z

This is ready for review.

While I added a test doing a roundtrip between netCDF4 and h5netcdf CLASSIC format, it does check how files are actually written inside the HDF5 file, just that they can be written and read consistently by xarray. Non-standard reading rules can hide non-standard writing rules. I'm planning to add "binary compatibility" tests within h5netcdf.

xarray/backends/h5netcdf_.py

shoyer · 2025-09-09T03:50:31Z

xarray/backends/h5netcdf_.py

+            if isinstance(value, bytes):
+                value = np.bytes_(value)


Why this special logic only for converting bytes? This seems unrelated to what we need for NETCDF4_CLASSIC.

To make sure strings are written as NC_CHAR, and not NC_STRING. See https://engee.com/helpcenter/stable/en/julia/NetCDF/strings.html

This is in fact the detail that our third party software in C++ choked on. The netCDF C library has both nc_get_att_text and nc_get_att_string functions. Calling nc_get_att_text on an NC_STRING raises an error.

@huard I'm just going over this again. I'm on board with @shoyer here.

Even without this addition:

if isinstance(value, bytes): value = np.bytes_(value)

ds = xr.Dataset( data_vars=dict(temp=("x", [1, 2, 3])), coords=dict(x=[0, 1, 2]), attrs=dict( plain_bytes=b"hello", numpy_bytes=np.bytes_(b"hello"), ), )

encodes properly for both engines to fixed size strings (aka NC_CHAR)

ATTRIBUTE "numpy_bytes" { DATATYPE H5T_STRING { STRSIZE 5; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR DATA { (0): "hello" } } ATTRIBUTE "plain_bytes" { DATATYPE H5T_STRING { STRSIZE 5; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } DATASPACE SCALAR DATA { (0): "hello" } }

I think removing the helper function and just conditionally running encode_nc3_attr_value(value) should be enough.

When I remove the cast and test with h5netcdf v1.6.4, plain_bytes is saved as a variable length string.

Is the plan to pin h5netcdf >=1.7 for the next xarray release?

@huard, you would need to test against h5netcdf main. Isn't the check for > 1.6.4?

No, h5netcdf is not pinned at all. But we have the version check in place. So all good, or do I miss something .

No, h5netcdf is not pinned at all. But we have the version check in place. So all good, or do I miss something?

My objective was to have some basic CLASSIC functionality working with older h5netcdf releases. If we keep this line in, xarray is able to save CLASSIC "passing" files even without the h5netcdf's main.

xarray (this branch) / h5netcdf 1.6.4 -> CLASSIC files won't be recognized as such by the netCDF library, but there is a fair chance 3rd party applications won't choke.

xarray (this branch minus the bytes_ cast) / h5netcdf 1.6.4 -> 3rd party apps likely to crash when reading attributes.

xarray (this branch minus the bytes_ cast) / h5netcdf 1.7.0 -> Fully compliant NETCDF4_CLASSIC format

shoyer · 2025-09-09T03:53:36Z

xarray/tests/test_backends.py

+    def test_string_attributes_stored_as_char(self, tmp_path):
+        import h5netcdf
+
+        original = Dataset(attrs={"foo": "bar"})
+        store_path = tmp_path / "tmp.nc"
+        original.to_netcdf(store_path, engine=self.engine, format=self.file_format)
+        with h5netcdf.File(store_path, "r") as ds:
+            # Check that the attribute is stored as a char array
+            assert ds._h5file.attrs["foo"].dtype == np.dtype("S3")


NumPy's S dtype actually corresponds to bytes, not str. I don't think we want to use it for storing attributes in general.

Using fixed width chars replicates the behavior of the netCDF4 backend for the CLASSIC format. Again, this has to do with the NC_CHAR vs NC_STRING formats.

Sticking as close as possible to netCDF4 output increases my confidence that the h5netcdf outputs will be compatible with 3rd party software expecting the CLASSIC format.

xarray/tests/test_backends.py

xarray/backends/h5netcdf_.py

shoyer · 2025-09-09T03:58:32Z

xarray/backends/h5netcdf_.py

+        if format == "NETCDF4_CLASSIC" and group is not None:
+            raise ValueError("Cannot create sub-groups in `NETCDF4_CLASSIC` format.")


Does h5netcdf give a suitable error message here already?

h5netcdf.File does not even have a format argument, so no.

xarray/backends/h5netcdf_.py

Co-authored-by: Stephan Hoyer <[email protected]>

huard

Thanks for the review. Made the suggested changes, but I'm afraid the string attributes need to be saved as fixed width char arrays to be compliant with the CLASSIC file format.

huard · 2025-09-09T14:19:54Z

xarray/backends/h5netcdf_.py

+        if format == "NETCDF4_CLASSIC" and group is not None:
+            raise ValueError("Cannot create sub-groups in `NETCDF4_CLASSIC` format.")


h5netcdf.File does not even have a format argument, so no.

xarray/backends/h5netcdf_.py

huard · 2025-09-09T14:36:42Z

xarray/backends/h5netcdf_.py

+            if isinstance(value, bytes):
+                value = np.bytes_(value)


To make sure strings are written as NC_CHAR, and not NC_STRING. See https://engee.com/helpcenter/stable/en/julia/NetCDF/strings.html

This is in fact the detail that our third party software in C++ choked on. The netCDF C library has both nc_get_att_text and nc_get_att_string functions. Calling nc_get_att_text on an NC_STRING raises an error.

xarray/backends/h5netcdf_.py

huard · 2025-09-09T15:14:33Z

xarray/tests/test_backends.py

+    def test_string_attributes_stored_as_char(self, tmp_path):
+        import h5netcdf
+
+        original = Dataset(attrs={"foo": "bar"})
+        store_path = tmp_path / "tmp.nc"
+        original.to_netcdf(store_path, engine=self.engine, format=self.file_format)
+        with h5netcdf.File(store_path, "r") as ds:
+            # Check that the attribute is stored as a char array
+            assert ds._h5file.attrs["foo"].dtype == np.dtype("S3")


Using fixed width chars replicates the behavior of the netCDF4 backend for the CLASSIC format. Again, this has to do with the NC_CHAR vs NC_STRING formats.

Sticking as close as possible to netCDF4 output increases my confidence that the h5netcdf outputs will be compatible with 3rd party software expecting the CLASSIC format.

xarray/tests/test_backends.py

…ormat=NETCDF4_CLASSIC group not None.

kmuehlbauer

@huard Sorry for letting this wait for so long. Thanks @dcherian for the reminder. This is looking good to me, one minor change needed. though.

huard · 2025-10-14T13:49:01Z

I'm happy to remove the cast to bytes if the next xarray releases pins h5netcdf >=1.7. If not, I think keeping the line is useful.

dcherian · 2025-10-14T14:14:57Z

Can we simply require h5netcdf>= 1.7.0 for classic writes instead?

huard · 2025-10-14T14:22:52Z

My original intent was to try to get as much mileage as possible within xarray, not knowing how the h5netcdf PR would fare. @dcherian if a h5netcdf released is planned before the next xarray release, I think your suggestion makes a lot of sense.

Something like that ?

        if Version(h5netcdf.__version__) > Version("1.6.4"):
            kwargs["format"] = format
        elif format == "NETCDF4_CLASSIC":
            raise ValueError("h5netcdf >= 1.7.0 is required to save output in NETCDF4_CLASSIC format.")

… to NETCDF4_CLASSIC format.

kmuehlbauer

Maybe we can just remove convert_string?

xarray/backends/h5netcdf_.py

Co-authored-by: Kai Mühlbauer <[email protected]>

kmuehlbauer · 2025-10-15T12:26:54Z

@huard FYI: I'll have h5netcdf 1.7.0 out later today. Just waiting for this one here to get in.

kmuehlbauer · 2025-10-15T13:14:02Z

Thanks @huard!

huard · 2025-10-15T17:11:47Z

Happy to contribute, thanks for your support.

huard added 6 commits September 2, 2025 14:03

Support NETCDF4_CLASSIC in the h5engine backend

68d5c73

convert bytes attributes to numpy.bytes_ in NETCDF4_CLASSIC format wi…

120af07

…th h5netcdf engine

added test to confirm string attributes are stored as numpy char arra…

f8f44f0

…ys with NETCDF4_CLASSIC

Added change to whats-new

522d37d

run pre-commit

783c407

Added test comparing CDL representation of test data written with net…

2f1c781

…cdf4 and h5netcdf

github-actions bot added topic-backends io labels Sep 3, 2025

huard added 3 commits September 3, 2025 01:02

Added global attribute to test data. Apply CLASSIC conversion to vari…

b142d38

…able attributes as well.

Merge branch 'main' into fix_10676

b14c373

Use h5dump to compare file content instead of CDL. Add _nc3_strict at…

909a96c

…tribute.

Merge branch 'fix_10676' of github.com:Ouranosinc/xarray into fix_10676

14d22a1

huard mentioned this pull request Sep 3, 2025

Option to write netcdf in "classic" mode h5netcdf/h5netcdf#280

Closed

raise error if writing groups to CLASSIC file.

35b50ce

huard added 5 commits September 5, 2025 13:14

remove h5dump test. Remove _nc3_strict attribute (should go into h5ne…

a187c8d

…tcdf).

fix h5netcdf version check.

0c1bc08

Merge branch 'main' into fix_10676

10707ee

try to fix tests

03ea2de

Set default format to NETCDF4 instead of None, because passing None t…

592b98b

…o `netCDF4_.get_datatype` skips required conversions. Remove global attribute from create_test_data because it impacts other tests in other files.

shoyer reviewed Sep 9, 2025

View reviewed changes

huard and others added 3 commits September 9, 2025 11:21

Apply suggestions from code review

13b60d0

Co-authored-by: Stephan Hoyer <[email protected]>

Suggestions from review.

cf8b4be

Merge branch 'fix_10676' of github.com:Ouranosinc/xarray into fix_10676

d0b0948

huard commented Sep 9, 2025

View reviewed changes

huard mentioned this pull request Sep 9, 2025

Add partial support for NETCDF4_CLASSIC format h5netcdf/h5netcdf#283

Merged

3 tasks

huard added 4 commits September 22, 2025 10:25

Merge branch 'main' into fix_10676

6351e66

Raise an error within get_child_store rather that __init__ with f…

b2a21d2

…ormat=NETCDF4_CLASSIC group not None.

Merge branch 'main' into fix_10676

50dbd36

Merge branch 'main' into fix_10676

6d686d4

dcherian requested a review from kmuehlbauer October 13, 2025 20:44

Merge branch 'main' into fix_10676

18e4bd0

kmuehlbauer requested changes Oct 14, 2025

View reviewed changes

huard and others added 2 commits October 14, 2025 11:27

Remove casting of bytes to np._bytes_. Require h5netcdf 1.7.0 to save…

d9c4879

… to NETCDF4_CLASSIC format.

Merge branch 'main' into fix_10676

99c90a5

kmuehlbauer reviewed Oct 15, 2025

View reviewed changes

xarray/backends/h5netcdf_.py Outdated Show resolved Hide resolved

xarray/backends/h5netcdf_.py Outdated Show resolved Hide resolved

xarray/backends/h5netcdf_.py Outdated Show resolved Hide resolved

huard and others added 3 commits October 15, 2025 08:22

Apply suggestions from code review

28bd354

Co-authored-by: Kai Mühlbauer <[email protected]>

Update xarray/backends/h5netcdf_.py

f2a29ca

Co-authored-by: Kai Mühlbauer <[email protected]>

Update xarray/backends/h5netcdf_.py

f1c1819

Co-authored-by: Kai Mühlbauer <[email protected]>

Merge branch 'main' into fix_10676

88d0841

kmuehlbauer approved these changes Oct 15, 2025

View reviewed changes

kmuehlbauer merged commit 58f26f9 into pydata:main Oct 15, 2025
34 of 37 checks passed

kmuehlbauer mentioned this pull request Oct 15, 2025

fix h5netcdf backend for format=None, use same rule as netcdf4 backend #10859

Merged

1 task

		if format == "NETCDF4_CLASSIC" and group is not None:
		raise ValueError("Cannot create sub-groups in `NETCDF4_CLASSIC` format.")

Uh oh!

Add support for NETCDF4_CLASSIC to h5netcdf engine #10686

Add support for NETCDF4_CLASSIC to h5netcdf engine #10686

Conversation

huard commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shoyer commented Sep 3, 2025

Uh oh!

huard commented Sep 3, 2025

Uh oh!

shoyer commented Sep 3, 2025

Uh oh!

kmuehlbauer commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huard commented Sep 4, 2025

Uh oh!

huard commented Sep 9, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

huard left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kmuehlbauer left a comment

Choose a reason for hiding this comment

Uh oh!

huard commented Oct 14, 2025

Uh oh!

dcherian commented Oct 14, 2025

Uh oh!

huard commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

huard commented Sep 3, 2025 •

edited

Loading

kmuehlbauer commented Sep 3, 2025 •

edited

Loading

huard commented Oct 14, 2025 •

edited

Loading