Add `grouped_stats` for categorical or continuous binning by rhugonnet · Pull Request #774 · GlacioHack/geoutils

rhugonnet · 2025-11-29T07:05:03Z

This PR is to discuss the implementation of binning following #668, by adding a minimal example using pandas.
For a reminder, see the discussion in that PR. For my justifications of the following implementation, see my points brought at the bottom, in these three comments here.

Here's an example of the function output/input (values would be passed as 1D array from point clouds, or 2D flattened array using ravel() from rasters):

arrays = {"slope": np.random.normal(size=100), "aspect": np.random.normal(size=100)}
values = {"band1": np.random.normal(size=100), "band2": np.random.normal(size=100)}
statistics = ["mean", "std"]
bins = [np.linspace(-2, 2, 10), 10]

df = _grouped_stats(arrays, bins, values, statistics)

                                                   band1         band2    
                                                    mean std      mean std
bin_slope        bin_aspect                                               
(-2.001, -1.556] (-2.3529999999999998, -1.925]       NaN NaN       NaN NaN
                 (-1.925, -1.501]                    NaN NaN       NaN NaN
                 (-1.501, -1.078]              -1.121366 NaN  0.936404 NaN
                 (-1.078, -0.654]                    NaN NaN       NaN NaN
                 (-0.654, -0.231]               1.140788 NaN  1.200292 NaN
...                                                  ...  ..       ...  ..
(1.556, 2.0]     (-0.231, 0.192]                0.502442 NaN -0.083028 NaN
                 (0.192, 0.616]                      NaN NaN       NaN NaN
                 (0.616, 1.039]                 2.094497 NaN  1.168949 NaN
                 (1.039, 1.463]                      NaN NaN       NaN NaN
                 (1.463, 1.886]                      NaN NaN       NaN NaN
[90 rows x 4 columns]

rhugonnet · 2025-11-29T07:16:28Z

For rasters, we would add an option return_masks=True that also computes 2D masks for each bin combination derived with pd.cut(), and returns them either as a dictionary of masks.

rhugonnet · 2025-11-29T07:16:53Z

@belletva @adebardo @adehecq

adebardo · 2025-12-15T14:30:01Z

Do you think that if we want to add the ability to perform mask-based classification (for example, land cover), we would need a new function?
And therefore, could we have a function based on Pandas for the different processing steps?
And if I want to combine information between masks and binning, should I add that there as well in this module?

rhugonnet · 2025-12-15T20:35:28Z

For a classif (categorical binning), the exact same function works, we just need to enforce bin length equal to the number of categories if we want to have them all separate (can be the default):

# HERE: Binning arrays are now categorical
arrays = {"classif1": np.random.random_integers(0, 10, size=100), "classif2": np.random.random_integers(10, 20,
                                                                                                     size=100)}
values = {"band1": np.random.normal(size=100), "band2": np.random.normal(size=100)}
statistics = ["mean", "std"]

# HERE: Enforce bins of length 10
bins = [10, 10]

df = _grouped_stats(arrays, bins, values, statistics)
df

                                band1               band2          
                                 mean       std      mean       std
bin_classif1  bin_classif2                                         
(-0.011, 1.0] (9.989, 11.0]  0.087436  1.723041 -0.835955  1.061464
              (11.0, 12.0]  -0.301942  1.084940 -0.527191  1.107538
              (12.0, 13.0]   1.185885       NaN  1.460575       NaN
              (13.0, 14.0]   0.713294  0.948106  0.073826  0.609808
              (14.0, 15.0]   0.817307       NaN -0.717526       NaN
...                               ...       ...       ...       ...
(9.0, 10.0]   (15.0, 16.0]        NaN       NaN       NaN       NaN
              (16.0, 17.0]        NaN       NaN       NaN       NaN
              (17.0, 18.0]   0.238242  0.557863  0.488529  1.221224
              (18.0, 19.0]  -0.912399       NaN  2.245666       NaN
              (19.0, 20.0]  -0.517997       NaN  0.346709       NaN
[100 rows x 4 columns]

If we want, we can also overwrite the output to show only the center value of the categories (instead of an interval; looks like the first bin is slightly shifted by 0.01 by default).

I'm not sure I understand "combine information between masks and binning", you mean do them both simultaneously? If that's it, then yes, we can have any number of categorical + continuous variables binned simultaneously, following the above. 🙂

rhugonnet added 2 commits November 28, 2025 21:59

Initial commit on grouped_stats

ef422e2

Small changes

1cc24fe

adebardo mentioned this pull request Jan 8, 2026

Add grouped_stats for categorical or continuous binning #815

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `grouped_stats` for categorical or continuous binning#774

Add `grouped_stats` for categorical or continuous binning#774
rhugonnet wants to merge 2 commits into
GlacioHack:mainfrom
rhugonnet:add_grouped_stats

rhugonnet commented Nov 29, 2025 •

edited

Loading

Uh oh!

rhugonnet commented Nov 29, 2025

Uh oh!

rhugonnet commented Nov 29, 2025

Uh oh!

adebardo commented Dec 15, 2025

Uh oh!

rhugonnet commented Dec 15, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rhugonnet commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rhugonnet commented Nov 29, 2025

Uh oh!

rhugonnet commented Nov 29, 2025

Uh oh!

adebardo commented Dec 15, 2025

Uh oh!

rhugonnet commented Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rhugonnet commented Nov 29, 2025 •

edited

Loading

rhugonnet commented Dec 15, 2025 •

edited

Loading