Skip to content

Conversation

@karpob
Copy link
Collaborator

@karpob karpob commented Jun 6, 2025

Description

Reading gsi diag files can be pretty slow, especially for hyperspectral sounders (IASI being the worst at the moment). There are a number of things that contribute to this, but the big one is reshaping or thinning every dataset in the diag. There are other contributing factors like reading each variable for a count of nchan, making the decision to either thin, or reshape the file based on whether all values are equal in the first nchan values.

This seems like a bad idea. It's really not hard to imagine a case where a sensor goes crazy/has fill values across all channels for an observation. If it did it on the first observation, under the current scheme, it would choose to thin rather than reshape the observation. Instead I broke the variables out into lists that are to be thinned, and reshaped ,respectively. There is a fallback using the old scheme if the variable does not fall within the two lists.

For a channel summary comparison between 3 runs I have right now, under the current scheme it takes ~45 minutes, using the changes I've made here, this drops it to ~22 minutes on a dedicated node.

Timing before:

INFO Timers: Generate Dictionary                           0.04 seconds | Instances count: 001 | Per instance:     0.04 | Percent of total:  0.0%
INFO Timers: DataDriverExecute                          2759.65 seconds | Instances count: 003 | Per instance:   919.88 | Percent of total: 98.1%
INFO Timers: DataObjectConstructor                         0.05 seconds | Instances count: 003 | Per instance:     0.02 | Percent of total:  0.0%
INFO Timers: EvaDatasetFactory import: ...                 0.02 seconds | Instances count: 003 | Per instance:     0.01 | Percent of total:  0.0%
INFO Timers:   GsiObsSpace from ...
INFO Timers:   eva.data.gsi_obs_space
INFO Timers: DataObjectExecute                          2759.60 seconds | Instances count: 003 | Per instance:   919.87 | Percent of total: 98.1%
INFO Timers: TransformDriverExecute                       20.49 seconds | Instances count: 001 | Per instance:    20.49 | Percent of total:  0.7%
INFO Timers: Transform: accept_where                      10.95 seconds | Instances count: 006 | Per instance:     1.83 | Percent of total:  0.4%
INFO Timers: Transform: channel_stats                      9.40 seconds | Instances count: 005 | Per instance:     1.88 | Percent of total:  0.3%
INFO Timers: Transform: arithmetic                         0.01 seconds | Instances count: 006 | Per instance:     0.00 | Percent of total:  0.0%
INFO Timers: FigureDriverExecute                          15.92 seconds | Instances count: 001 | Per instance:    15.92 | Percent of total:  0.6%
INFO Timers: Graphics Loop                                 0.69 seconds | Instances count: 001 | Per instance:     0.69 | Percent of total:  0.0%
INFO Timers:
INFO Timers: Total time taken 2812.55 seconds.

Timing after:

INFO Timers:
INFO Timers: Generate Dictionary                           0.06 seconds | Instances count: 001 | Per instance:     0.06 | Percent of total:  0.0%
INFO Timers: DataDriverExecute                           873.36 seconds | Instances count: 003 | Per instance:   291.12 | Percent of total: 64.9%
INFO Timers: DataObjectConstructor                         0.03 seconds | Instances count: 003 | Per instance:     0.01 | Percent of total:  0.0%
INFO Timers: EvaDatasetFactory import: ...                 0.03 seconds | Instances count: 003 | Per instance:     0.01 | Percent of total:  0.0%
INFO Timers:   GsiObsSpace from ...
INFO Timers:   eva.data.gsi_obs_space
INFO Timers: DataObjectExecute                           873.33 seconds | Instances count: 003 | Per instance:   291.11 | Percent of total: 64.9%
INFO Timers: TransformDriverExecute                      426.81 seconds | Instances count: 001 | Per instance:   426.81 | Percent of total: 31.7%
INFO Timers: Transform: accept_where                     398.35 seconds | Instances count: 003 | Per instance:   132.78 | Percent of total: 29.6%
INFO Timers: Transform: arithmetic                         8.75 seconds | Instances count: 036 | Per instance:     0.24 | Percent of total:  0.6%
INFO Timers: Transform: channel_stats                     19.44 seconds | Instances count: 005 | Per instance:     3.89 | Percent of total:  1.4%
INFO Timers: FigureDriverExecute                          13.82 seconds | Instances count: 001 | Per instance:    13.82 | Percent of total:  1.0%
INFO Timers: Graphics Loop                                 1.28 seconds | Instances count: 001 | Per instance:     1.28 | Percent of total:  0.1%
INFO Timers:
INFO Timers: Total time taken 1345.72 seconds.
INFO Timers:


Dependencies

None

Impact

Speeds up and makes gsi_obs_space more consistent/safe among sensors.

@karpob karpob marked this pull request as draft June 7, 2025 18:43
@karpob
Copy link
Collaborator Author

karpob commented Jun 7, 2025

Making this a draft as I had some thoughts on how to change this. Fundamentally, the set of variables I have there doesn't make sense. It came about as I was combining a sensor which had the algorithm said had the same lat/lon's per channel to a sensor which supposedly had independent lat/lons per channel. I had some ideas on how to do this properly (check qc, check lat/lon repeats and reshape and thin as appropriate), or just be safe and reshape everything as everything is flat in the gsi and it makes sense in my mind to reshape everything to preserve alignment of all fields.

@Dooruk
Copy link
Collaborator

Dooruk commented Jun 9, 2025

Is this related to what @rtodling mentioned today in terms of calculating stats and sorting between different channels? Or is this even the exact same problem?

@karpob
Copy link
Collaborator Author

karpob commented Jun 9, 2025

No, this is completely different, I think. The problem here is more the slowness that gsi_obs_space encounters as it will reshape or thin every variable, even if the user doesn't request it. Also, as a secondary problem the method used to determine whether to reshape or thin seems like it could easily fail to do the right thing.

…tch gsi_obs_space_reshape_all the user can turn on.
@karpob
Copy link
Collaborator Author

karpob commented Jun 10, 2025

What I ended up doing here is add a switch that a user can set gsi_obs_space_reshape_all, if they care about getting ancillary data correct for microwave sensors (they all pretty much use footprint size per channel to get surface variables), or are just paranoid. To quell my own paranoia I hardcode a list of variables that must by reshaped by channel as they inherently have a spectral dimension. I take lat/lon out of the forced reshaping so maps remain easy to plot for most sensors (AMSU-A being the exception as it has a spectral dependent lat/lon).

The flag is set to false by default (or if unspecified) so behavior remains the same for all tests.

@karpob karpob marked this pull request as ready for review June 10, 2025 15:45
@CoryMartin-NOAA CoryMartin-NOAA merged commit 6e6d1ea into develop Jul 25, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants