speedup for gsi_obs_space #247

karpob · 2025-06-06T20:58:21Z

Description

Reading gsi diag files can be pretty slow, especially for hyperspectral sounders (IASI being the worst at the moment). There are a number of things that contribute to this, but the big one is reshaping or thinning every dataset in the diag. There are other contributing factors like reading each variable for a count of nchan, making the decision to either thin, or reshape the file based on whether all values are equal in the first nchan values.

This seems like a bad idea. It's really not hard to imagine a case where a sensor goes crazy/has fill values across all channels for an observation. If it did it on the first observation, under the current scheme, it would choose to thin rather than reshape the observation. Instead I broke the variables out into lists that are to be thinned, and reshaped ,respectively. There is a fallback using the old scheme if the variable does not fall within the two lists.

For a channel summary comparison between 3 runs I have right now, under the current scheme it takes ~45 minutes, using the changes I've made here, this drops it to ~22 minutes on a dedicated node.

Timing before:

INFO Timers: Generate Dictionary                           0.04 seconds | Instances count: 001 | Per instance:     0.04 | Percent of total:  0.0%
INFO Timers: DataDriverExecute                          2759.65 seconds | Instances count: 003 | Per instance:   919.88 | Percent of total: 98.1%
INFO Timers: DataObjectConstructor                         0.05 seconds | Instances count: 003 | Per instance:     0.02 | Percent of total:  0.0%
INFO Timers: EvaDatasetFactory import: ...                 0.02 seconds | Instances count: 003 | Per instance:     0.01 | Percent of total:  0.0%
INFO Timers:   GsiObsSpace from ...
INFO Timers:   eva.data.gsi_obs_space
INFO Timers: DataObjectExecute                          2759.60 seconds | Instances count: 003 | Per instance:   919.87 | Percent of total: 98.1%
INFO Timers: TransformDriverExecute                       20.49 seconds | Instances count: 001 | Per instance:    20.49 | Percent of total:  0.7%
INFO Timers: Transform: accept_where                      10.95 seconds | Instances count: 006 | Per instance:     1.83 | Percent of total:  0.4%
INFO Timers: Transform: channel_stats                      9.40 seconds | Instances count: 005 | Per instance:     1.88 | Percent of total:  0.3%
INFO Timers: Transform: arithmetic                         0.01 seconds | Instances count: 006 | Per instance:     0.00 | Percent of total:  0.0%
INFO Timers: FigureDriverExecute                          15.92 seconds | Instances count: 001 | Per instance:    15.92 | Percent of total:  0.6%
INFO Timers: Graphics Loop                                 0.69 seconds | Instances count: 001 | Per instance:     0.69 | Percent of total:  0.0%
INFO Timers:
INFO Timers: Total time taken 2812.55 seconds.

Timing after:

INFO Timers:
INFO Timers: Generate Dictionary                           0.06 seconds | Instances count: 001 | Per instance:     0.06 | Percent of total:  0.0%
INFO Timers: DataDriverExecute                           873.36 seconds | Instances count: 003 | Per instance:   291.12 | Percent of total: 64.9%
INFO Timers: DataObjectConstructor                         0.03 seconds | Instances count: 003 | Per instance:     0.01 | Percent of total:  0.0%
INFO Timers: EvaDatasetFactory import: ...                 0.03 seconds | Instances count: 003 | Per instance:     0.01 | Percent of total:  0.0%
INFO Timers:   GsiObsSpace from ...
INFO Timers:   eva.data.gsi_obs_space
INFO Timers: DataObjectExecute                           873.33 seconds | Instances count: 003 | Per instance:   291.11 | Percent of total: 64.9%
INFO Timers: TransformDriverExecute                      426.81 seconds | Instances count: 001 | Per instance:   426.81 | Percent of total: 31.7%
INFO Timers: Transform: accept_where                     398.35 seconds | Instances count: 003 | Per instance:   132.78 | Percent of total: 29.6%
INFO Timers: Transform: arithmetic                         8.75 seconds | Instances count: 036 | Per instance:     0.24 | Percent of total:  0.6%
INFO Timers: Transform: channel_stats                     19.44 seconds | Instances count: 005 | Per instance:     3.89 | Percent of total:  1.4%
INFO Timers: FigureDriverExecute                          13.82 seconds | Instances count: 001 | Per instance:    13.82 | Percent of total:  1.0%
INFO Timers: Graphics Loop                                 1.28 seconds | Instances count: 001 | Per instance:     1.28 | Percent of total:  0.1%
INFO Timers:
INFO Timers: Total time taken 1345.72 seconds.
INFO Timers:

Dependencies

None

Impact

Speeds up and makes gsi_obs_space more consistent/safe among sensors.

karpob · 2025-06-07T18:52:48Z

Making this a draft as I had some thoughts on how to change this. Fundamentally, the set of variables I have there doesn't make sense. It came about as I was combining a sensor which had the algorithm said had the same lat/lon's per channel to a sensor which supposedly had independent lat/lons per channel. I had some ideas on how to do this properly (check qc, check lat/lon repeats and reshape and thin as appropriate), or just be safe and reshape everything as everything is flat in the gsi and it makes sense in my mind to reshape everything to preserve alignment of all fields.

Dooruk · 2025-06-09T17:59:11Z

Is this related to what @rtodling mentioned today in terms of calculating stats and sorting between different channels? Or is this even the exact same problem?

karpob · 2025-06-09T18:03:24Z

No, this is completely different, I think. The problem here is more the slowness that gsi_obs_space encounters as it will reshape or thin every variable, even if the user doesn't request it. Also, as a secondary problem the method used to determine whether to reshape or thin seems like it could easily fail to do the right thing.

…tch gsi_obs_space_reshape_all the user can turn on.

karpob · 2025-06-10T15:41:13Z

What I ended up doing here is add a switch that a user can set gsi_obs_space_reshape_all, if they care about getting ancillary data correct for microwave sensors (they all pretty much use footprint size per channel to get surface variables), or are just paranoid. To quell my own paranoia I hardcode a list of variables that must by reshaped by channel as they inherently have a spectral dimension. I take lat/lon out of the forced reshaping so maps remain easy to plot for most sensors (AMSU-A being the exception as it has a spectral dependent lat/lon).

The flag is set to false by default (or if unspecified) so behavior remains the same for all tests.

safety fixes and speedup for gsi

b4009bc

karpob requested review from CoryMartin-NOAA and kevindougherty-noaa June 6, 2025 20:58

karpob added 4 commits June 6, 2025 17:06

norms.

7778cf3

norms.

05fc4e9

norms.

2ca1bde

norms.

d44aa88

karpob marked this pull request as draft June 7, 2025 18:43

address potential problems with ancillary data for microwave with swi…

df1e8c1

…tch gsi_obs_space_reshape_all the user can turn on.

karpob marked this pull request as ready for review June 10, 2025 15:45

CoryMartin-NOAA approved these changes Jul 25, 2025

View reviewed changes

CoryMartin-NOAA merged commit 6e6d1ea into develop Jul 25, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

speedup for gsi_obs_space #247

speedup for gsi_obs_space #247

Uh oh!

karpob commented Jun 6, 2025

Uh oh!

karpob commented Jun 7, 2025 •

edited

Loading

Uh oh!

Dooruk commented Jun 9, 2025

Uh oh!

karpob commented Jun 9, 2025

Uh oh!

karpob commented Jun 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

speedup for gsi_obs_space #247

speedup for gsi_obs_space #247

Uh oh!

Conversation

karpob commented Jun 6, 2025

Description

Dependencies

Impact

Uh oh!

karpob commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dooruk commented Jun 9, 2025

Uh oh!

karpob commented Jun 9, 2025

Uh oh!

karpob commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

karpob commented Jun 7, 2025 •

edited

Loading

karpob commented Jun 10, 2025 •

edited

Loading