Skip to content

Add grouped_stats for categorical or continuous binning#815

Open
adebardo wants to merge 4 commits into
GlacioHack:mainfrom
adebardo:774-grouped-stats
Open

Add grouped_stats for categorical or continuous binning#815
adebardo wants to merge 4 commits into
GlacioHack:mainfrom
adebardo:774-grouped-stats

Conversation

@adebardo
Copy link
Copy Markdown
Contributor

@adebardo adebardo commented Jan 8, 2026

Resolves #774

Context

The purpose of this PR is to offer users a simplified API for implementing grouped statistics for their raster.

To do this, we use pandas' capabilities to work on 1D arrays and their associated classification.

We implemented the function directly in the base.py file as well as the TUs.

Code

  1. Numerous safeguards have been put in place to ensure that users submit their entries in the correct format, as pandas is not capable of covering everything.
  2. Special case: If a user wishes to enter a single number (such as a threshold), the interval created is between -inf and +inf.
  3. Boolean masks can be used, and multimodal masks (e.g. landcover) can also be managed.

Tests

For testing purposes, we provide a fake raster to ensure that field truths can be calculated by hand.
We have implemented one test per bin type.

We also introduce panda dataframe equality into this file and the necessary tests.

Documentation

https://adebardo-geoutils.readthedocs.io/en/774-grouped-stats/stats.html

We have added documentation to the statistics tab. Currently, tests are being carried out on raster altitude assumptions against the same raster, as well as the binarisation of the glacier mask. We have opened a ticket to propose data additions.

@adebardo adebardo changed the title 774 grouped stats Add grouped_stats for categorical or continuous binning Jan 8, 2026
@adebardo
Copy link
Copy Markdown
Contributor Author

adebardo commented Jan 14, 2026

  • saving generated masked for bins
  • rename to zonal_stats

Comment thread doc/source/stats.md Outdated
Comment thread geoutils/stats/grouped_stats.py Outdated
Comment thread geoutils/stats/grouped_stats.py Outdated
Comment thread geoutils/stats/grouped_stats.py Outdated
Comment thread geoutils/stats/grouped_stats.py Outdated
Comment thread geoutils/stats/grouped_stats.py Outdated
Comment thread geoutils/stats/grouped_stats.py Outdated
@rhugonnet
Copy link
Copy Markdown
Member

rhugonnet commented Feb 6, 2026

@adebardo Thanks for all this work! Nice that it's taking shape, I think it's getting close, and we should be able to add zonal stats shortly after 🙂. (I'm actually working on separate code that should allow to add Dask/Multiprocessing support fairly easily there).
I started a quick review (I realized I forgot after the meeting, sorry! thankfully @belletva's comment reminded me).

On the technical aspects of the implementation: To be honest, I had a lot of trouble understanding them. I hope I didn't miss anything obvious 😅
I think this is a comment we made before: The PR description really needs to detail the logic behind the changes, how things work and why the design is chosen a certain way. It's usually 5min to write it up when you work on the PR. And otherwise it's 5 times more work for me (and any other reader) to understand what was done just from the code (I just spent 30min failing at understanding it), and we often end up going in circles... 😕
(EDIT: I realize you didn't tag for review on the PR yet, maybe it wasn't ready, sorry!).

@adebardo adebardo force-pushed the 774-grouped-stats branch 2 times, most recently from fb75a3e to 51aecef Compare March 3, 2026 16:09
Comment thread doc/source/stats.md Outdated
Comment thread doc/source/stats.md Outdated
Comment thread doc/source/stats.md
Comment thread doc/source/stats.md Outdated
@adebardo adebardo force-pushed the 774-grouped-stats branch from 64a4858 to 7458b53 Compare March 4, 2026 17:03
@adebardo adebardo force-pushed the 774-grouped-stats branch 3 times, most recently from 7323009 to 5fe1718 Compare March 5, 2026 09:58
@adebardo adebardo force-pushed the 774-grouped-stats branch 3 times, most recently from abc131f to 55ed6bb Compare March 5, 2026 12:38
@adebardo adebardo force-pushed the 774-grouped-stats branch from 55ed6bb to c5d2a85 Compare March 5, 2026 12:51
Comment thread doc/source/stats.md Outdated
Comment thread doc/source/stats.md
Using GeoUtils functions makes it very easy to visualise them.

```{code-cell} ipython3
group_by = {"raster": rast.data}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there is no file linked to rast ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we cover only NDArrayNum here, do you want to be more inclusive ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in case the user wants to reproduce the example. It is not possible with this example.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand the problem, but if you look at the beginning of stats.md file you'll see a file attached to raster

Comment thread doc/source/stats.md Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants