Add grouped_stats for categorical or continuous binning#774
Conversation
|
For rasters, we would add an option |
|
|
For a classif (categorical binning), the exact same function works, we just need to enforce bin length equal to the number of categories if we want to have them all separate (can be the default): # HERE: Binning arrays are now categorical
arrays = {"classif1": np.random.random_integers(0, 10, size=100), "classif2": np.random.random_integers(10, 20,
size=100)}
values = {"band1": np.random.normal(size=100), "band2": np.random.normal(size=100)}
statistics = ["mean", "std"]
# HERE: Enforce bins of length 10
bins = [10, 10]
df = _grouped_stats(arrays, bins, values, statistics)
dfIf we want, we can also overwrite the output to show only the center value of the categories (instead of an interval; looks like the first bin is slightly shifted by 0.01 by default). I'm not sure I understand "combine information between masks and binning", you mean do them both simultaneously? If that's it, then yes, we can have any number of categorical + continuous variables binned simultaneously, following the above. 🙂 |
This PR is to discuss the implementation of binning following #668, by adding a minimal example using pandas.
For a reminder, see the discussion in that PR. For my justifications of the following implementation, see my points brought at the bottom, in these three comments here.
Here's an example of the function output/input (
valueswould be passed as 1D array from point clouds, or 2D flattened array usingravel()from rasters):