Skip to content

Conversation

@pp-mo
Copy link
Member

@pp-mo pp-mo commented Oct 25, 2025

Closes #6309

So far, just some ideas brewing

@pp-mo
Copy link
Member Author

pp-mo commented Oct 25, 2025

Older notes

Issues for iris char data

  • read + write, with + without encodings
  • ? choose to view cube/coord data as strings or (underlying) byte array
  • ?? char coord writing works, but char cube data does not

=========================
testing dimensions (FOR READS)

  • encoding can be None, "ascii" or "utf-8"
    • we should also test alternative spellings of utf-8 / ascii
    • but not fuss too much ?

EXISTING behaviour

  • is ok for ascii
  • but results depend on the presence of the "_Encoding"
    • since that is the default working of netCDF4-python

ASIDE: Python "standard encodings" : https://docs.python.org/3/library/codecs.html#standard-encodings
A table
normalise names like this...

    >>> codecs.lookup("u8").name
    'utf-8'
  • this produces "name" from "alternatives", as in the table
  • also fails when given junk
    • does not accept "" or None

Old discussion in netcdf4-python, refd by xarray docs
: Unidata/netcdf4-python#654 (comment)
From that specific comment by jswhit , (quoting old version of NCUG ?)

Applications writing string data using the char data type are encouraged to add
the special variable attribute "_Encoding" with a value that the netCDF libraries
recognize.
Currently those valid values are "UTF-8" or "ASCII", case insensitive.

In Unidata docs, reference is hard to find
STILL NOTHING in the Attributes Appendix (A).
In : https://docs.unidata.ucar.edu/netcdf-c/current/file_format_specifications.html

Note on char data: Although the characters used in netCDF names must be encoded
as UTF-8, character data may use other encodings.
The variable attribute “_Encoding” is reserved for this purpose in future implementations.

Outstanding issues

  • assumption that string dim of coords cannot be a data dim
  • how to manage backwards-compatible approach to coords + cubes
    • == expecting data cubes to contain strings ??
    • == OR converting (automatically, with turn-off FUTURE control??) ??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Fix iris handling of netcdf character array variables

1 participant