Skip to content

Conversation

@lukaspie
Copy link
Collaborator

@lukaspie lukaspie commented Mar 6, 2025

  • Update definitions to latest fairmat
  • Implement nameType
  • open enumerations
  • base class inheritance
    • read_nexus
    • validation
    • NOMAD parser

@rettigl
Copy link
Collaborator

rettigl commented Mar 6, 2025

Also the chemical formula normalization needs to be updated

@lukaspie
Copy link
Collaborator Author

lukaspie commented Mar 6, 2025

Also the chemical formula normalization needs to be updated

We could do that together with #485

@lukaspie lukaspie force-pushed the update-defs-and-related-tools branch from 0428e2b to 9f5dcfe Compare March 7, 2025 10:46
@lukaspie lukaspie changed the base branch from master to example_update March 7, 2025 10:46
@lukaspie lukaspie force-pushed the update-defs-and-related-tools branch 2 times, most recently from 547edd2 to 10f79e0 Compare March 10, 2025 14:50
@lukaspie lukaspie changed the base branch from example_update to master March 10, 2025 14:50
Copy link
Collaborator

@sanbrock sanbrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lukaspie lukaspie linked an issue Mar 12, 2025 that may be closed by this pull request
@lukaspie lukaspie mentioned this pull request Jan 31, 2025
12 tasks
@lukaspie lukaspie linked an issue Mar 12, 2025 that may be closed by this pull request
@lukaspie lukaspie force-pushed the update-defs-and-related-tools branch from 44d7651 to e2cae1f Compare March 12, 2025 14:08
@lukaspie
Copy link
Collaborator Author

@rettigl since this will lead to additional warnings in the mpes reader, we can make the following workaround for now: the testing framework allows you to have a *_ref_output.txt together with your test files. There, you can store the expected output of the reader. We then ignore those messages that are captured by the caplog level, but for which we have no know fix. The logic is implemented here.

Essentially, the only things we need to change for now is that we change this line to

dataconverter xarray_saved_small_calibration.h5 config_file.json eln_data.yaml --reader $READER --nxdl $NXDL --output example_eln.nxs &> eln_ref_output.txt

and then we pass that file to the test data in the test_reader.py (same for the other example).

For the mpes reader, I imagine this can be an intermediate solution until the sibling inheritance works. For the raman reader, among others, this needs to be implemented.

@rettigl
Copy link
Collaborator

rettigl commented Mar 19, 2025

This is an overall bigger problem that the inheritance from siblings does not work properly yet for fields and attributes. That is, identifier_1 does not actually have identifierNAME in its inheritance. Currently, we only add inherited fields/attributes if they have the exact same name (the code is here). Instead, we should add to the inheritance all the entities that match the name, type, etc. I am already workshipping something, but this breaks in other places.

Is there actually a way to do this correctly? How would you specify which variadic field a field should inherit from, or if at all? I would say we only do this explicit inheritance by name, and otherwise just ignore a concept name for fields/attributes if the corresponding name is defined as well.
I.e. that e.g. also NXdata/DATA[title] should become NXdata/title. We can maybe emitt an info for that case.
The same behavior should actually be for groups, I would say. E.g. in NXmpes, if you add ENTRY/SAMPLE[data], it should become ENTRY/data, i.e. NXdata, maybe with an info. Or how do you see this?
The last part is currently not working, i.e. ENTRY/SAMPLE[data] is treated as NXsample, and complains that ENTRY/data is missing.

@rettigl
Copy link
Collaborator

rettigl commented Mar 19, 2025

we can make the following workaround for now

As mentioned above, I would simply omitt the concept name if the instance name is literally defined on the same level, and issue an info.

@rettigl
Copy link
Collaborator

rettigl commented Mar 19, 2025

we can make the following workaround for now

As mentioned above, I would simply omitt the concept name if the instance name is literally defined on the same level, and issue an info.

After merging FAIRmat-NFDI/nexus_definitions#361, mpes should pass now. One weird thing is that we still get from read_nexus:
DEBUG: @HDF5_version - IS NOT IN SCHEMA
DEBUG: @delay_indices - IS NOT IN SCHEMA
DEBUG: @kx_indices - IS NOT IN SCHEMA
DEBUG: @ky_indices - IS NOT IN SCHEMA
DEBUG: @type - IS NOT IN SCHEMA

ENTRY/data/@energy_indices also does not have a docstring (we did not provide one in NXmpes, maybe should add).

@lukaspie
Copy link
Collaborator Author

This is an overall bigger problem that the inheritance from siblings does not work properly yet for fields and attributes. That is, identifier_1 does not actually have identifierNAME in its inheritance. Currently, we only add inherited fields/attributes if they have the exact same name (the code is here). Instead, we should add to the inheritance all the entities that match the name, type, etc. I am already workshipping something, but this breaks in other places.

Is there actually a way to do this correctly? How would you specify which variadic field a field should inherit from, or if at all?

Well, we want identifier_1 to inherit from NXobject/identifierNAME, right? So that identifier_1/@type is documented. The correct way to do this is to first check all possible siblings (based on the inheritance of a field's/attribute's parent) and then filter those where the name/nameType as well as the type match. For the type, it means that a field's type must be a subset of the inherited field's type. This is exactly what this ToDo is saying.

I would say we only do this explicit inheritance by name, and otherwise just ignore a concept name for fields/attributes if the corresponding name is defined as well. I.e. that e.g. also NXdata/DATA[title] should become NXdata/title. We can maybe emitt an info for that case.

Do you wish to replace that in the template then? How would we do this? We cannot really do this in the MultiFormatReader because we don't handle the nxdl files there and so far, the validation only checks, it doesn't actually modify the template.

The same behavior should actually be for groups, I would say. E.g. in NXmpes, if you add ENTRY/SAMPLE[data], it should become ENTRY/data, i.e. NXdata, maybe with an info. Or how do you see this? The last part is currently not working, i.e. ENTRY/SAMPLE[data] is treated as NXsample, and complains that ENTRY/data is missing.

I see the issue if you have an instance name that matches that of a different class than what you give in the concept name. In my opinion, we should not blindly replace ENTRY/SAMPLE[data] by ENTRY/data, because the fields/attributes in that group will not match those of NXmpes/NXentry/data. For this case, we should probably actually raise an error since this really messes up the whole structure.

@lukaspie
Copy link
Collaborator Author

The same behavior should actually be for groups, I would say. E.g. in NXmpes, if you add ENTRY/SAMPLE[data], it should become ENTRY/data, i.e. NXdata, maybe with an info. Or how do you see this? The last part is currently not working, i.e. ENTRY/SAMPLE[data] is treated as NXsample, and complains that ENTRY/data is missing.

I see the issue if you have an instance name that matches that of a different class than what you give in the concept name. In my opinion, we should not blindly replace ENTRY/SAMPLE[data] by ENTRY/data, because the fields/attributes in that group will not match those of NXmpes/NXentry/data. For this case, we should probably actually raise an error since this really messes up the whole structure.

In this seperate PR (#591), I made it so that if you give a concept name and a name that matches with a non-variadic name on the same level, a warning is raised and class is written as undocumented. That is ENTRY/SAMPLE[data] would still be written, but with the additional warning "Given group name 'SAMPLE' conflicts with the non-variadic name 'specified_group (req)', which should be of type NXdata."

What do you think of this solution? Note that this currently blocks AXISNAME[energy] from being used, because energy is not actually an AXISNAME. If we have proper inheritance for fields/attributes, we can adjust this again.

@lukaspie
Copy link
Collaborator Author

we can make the following workaround for now

As mentioned above, I would simply omitt the concept name if the instance name is literally defined on the same level, and issue an info.

After merging FAIRmat-NFDI/nexus_definitions#361, mpes should pass now.
One weird thing is that we still get from read_nexus:
DEBUG: @HDF5_version - IS NOT IN SCHEMA
DEBUG: @delay_indices - IS NOT IN SCHEMA
DEBUG: @kx_indices - IS NOT IN SCHEMA
DEBUG: @ky_indices - IS NOT IN SCHEMA
DEBUG: @type - IS NOT IN SCHEMA

ENTRY/data/@energy_indices also does not have a docstring (we did not provide one in NXmpes, maybe should add).

DEBUG: @HDF5_version - IS NOT IN SCHEMA was coming from a bug in the NXroot attributes writing. It is actually @HDF5_Version with an uppercase V. I fixed it in the mpes branch already, not it is documented.

The _indices are actually a bug. Apparently, the axes are not properly assigned to AXISNAME, so AXISNAME_indices is also not found. Will fix that later, when/if we have a resolution for nexusformat/definitions#1533.

@lukaspie lukaspie force-pushed the update-defs-and-related-tools branch from b6f4894 to 4bce7ac Compare March 20, 2025 11:12
@lukaspie lukaspie force-pushed the update-defs-and-related-tools branch from 4bce7ac to 9ac4396 Compare March 20, 2025 11:30
@rettigl
Copy link
Collaborator

rettigl commented Mar 20, 2025

Well, we want identifier_1 to inherit from NXobject/identifierNAME, right? So that identifier_1/@type is documented. The correct way to do this is to first check all possible siblings (based on the inheritance of a field's/attribute's parent) and then filter those where the name/nameType as well as the type match. For the type, it means that a field's type must be a subset of the inherited field's type. This is exactly what this ToDo is saying.

In the case of a partially variadic name, this might work, yes. But does it solve the DATA/AXISNAME issue?

@rettigl
Copy link
Collaborator

rettigl commented Mar 20, 2025

Do you wish to replace that in the template then? How would we do this? We cannot really do this in the MultiFormatReader because we don't handle the nxdl files there and so far, the validation only checks, it doesn't actually modify the template.

I think we don't need to replace anything, just accept that to fulfill a required field in the verification, no?

I see the issue if you have an instance name that matches that of a different class than what you give in the concept name. In my opinion, we should not blindly replace ENTRY/SAMPLE[data] by ENTRY/data, because the fields/attributes in that group will not match those of NXmpes/NXentry/data. For this case, we should probably actually raise an error since this really messes up the whole structure.

That is maybe the more proper solution, yes.

@rettigl
Copy link
Collaborator

rettigl commented Mar 20, 2025

In this seperate PR (#591), I made it so that if you give a concept name and a name that matches with a non-variadic name on the same level, a warning is raised and class is written as undocumented. That is ENTRY/SAMPLE[data] would still be written, but with the additional warning "Given group name 'SAMPLE' conflicts with the non-variadic name 'specified_group (req)', which should be of type NXdata."

What do you think of this solution? Note that this currently blocks AXISNAME[energy] from being used, because energy is not actually an AXISNAME. If we have proper inheritance for fields/attributes, we can adjust this again.

I think this is a good idea. But maybe we only merge this after we have fixed the sibling inheritance issue, so that it does not break our tests?

As we added now all relevant axes as named axes anyways, we can also remove the AXISNAME, and avoid the problem for now altogether. It anyways only shows up because of the flexibility with the "*" notation in the mpes reader to add whatever axes are present.

@rettigl
Copy link
Collaborator

rettigl commented Mar 20, 2025

Units inside NXcollection are still reported as undocumented:

The unit, /ENTRY[0.23_mJcm2]/INSTRUMENT[instrument]/DETECTOR[detector]/COLLECTION[detector_parameters]/amplifier_voltage/@units = V, is being written but has no documentation.

@rettigl
Copy link
Collaborator

rettigl commented Mar 20, 2025

My impression is that the validation has become much slower compared to earlier versions, I will try to verify this and see where the time is being spent.

@rettigl
Copy link
Collaborator

rettigl commented Mar 20, 2025

I also encountered a few times now the issue, that NX_FLOAT seems a bit restrictive. If I have e.g. from my machine metadata some field as int, which is supposed to be float, I suggest that we do a conversion in that case, rather than issuing a warning.

@lukaspie
Copy link
Collaborator Author

lukaspie commented Mar 20, 2025

Units inside NXcollection are still reported as undocumented:

The unit, /ENTRY[0.23_mJcm2]/INSTRUMENT[instrument]/DETECTOR[detector]/COLLECTION[detector_parameters]/amplifier_voltage/@units = V, is being written but has no documentation.

Should be okay now, can you double check, please?

Are you happy with the branch for now? We would like to make a new pynxtools release either today or tomorrow, because Christoph needs a new version for a conferene. I suggest we merge this first, then we check for further improvements after the release.

@lukaspie
Copy link
Collaborator Author

After discussion with @sanbrock and @mkuehbach, we merge here to get closer to a new release. Will review sibling inheritance as well as several other issues after the release is made.

@lukaspie lukaspie merged commit 8cc3e91 into master Mar 20, 2025
11 of 17 checks passed
@lukaspie lukaspie deleted the update-defs-and-related-tools branch March 20, 2025 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NXroot unit examples in the nxdl file, such as eV/mm Implement nameType="partial" NIAC resolution for uppercases

4 participants