Add ClassificationLayer for Raster Classification#668
Conversation
|
Thanks, this PR will be used as support for 28/03/2025 and debate with the all team in person :) |
adebardo
left a comment
There was a problem hiding this comment.
Its not a real review, I just look a it for a minute, thanks for work again
1a10549 to
d687665
Compare
86a9747 to
9c59796
Compare
77fe41a to
bf7fd44
Compare
| :raise ValueError: if req_stats_classes are not class names. | ||
| """ | ||
| if self.classification_masks is None or self.class_names is None: | ||
| raise ValueError("Classification has not been applied yet. Call apply_classification() first.") |
There was a problem hiding this comment.
Replace "apply_classification" -> apply
|
Thanks @vschaffn for the implementation. This will certainly be useful for mainy users!
|
|
I was about to write my feedback in the issue following previous comments there (GlacioHack/xdem#692 (comment)), but seeing @adehecq's comment I'll follow up here as well. Taking stock of existing tools: So I think the big question is: What's different in our case that would justify developping a new functionality? There are two aspects: So, in my view, it's only 2 that is relevant. But then, I don't think we want to create a completely new API when there is already a good abstraction covering all of our use cases that exists in Pandas/Xarray. So I'd be in favour of adding a stats, masks = dem.groupby(slope, slope_bins, return_masks=True)
stats.to_csv(out_csv)
masks.save(out_tif)If that API is still deemed too difficult by CNES, we could maintain the classes above to wrap up the operations, but it would duplicate things into a second API. |
|
Thinking a bit more about this, a couple additional points:
So, to simplify the API for a Raster and avoid the more complex logic of Pandas, we could immediately have a higher-level function class Raster
def grouped_statistic(self, band=1, vars, bins, apply_funcs):
df_self = pd.DataFrame([self.data[band-1].ravel(), *(v.ravel() for v in vars)])
return df_self.groupby([var1, pd.cut(var1, bins1, ...)], [...]).agg(apply_funcs)For this reason (and why it's like that in xDEM's |
|
So the question we need to answer to move forward is: What would be missing in the functionalities of such a For getting/saving the georeferenced masks, there could be an optional dictionary output using the same group name logic coded in this PR, with group key copied from the output dataframe, and all could be saved doing: stats, mask_dict = dem.grouped_statistics(slope, slope_bins, ...)
for name, mask in mask_dict.items():
mask.save("mask_"+name+".tif")But, if we wanted to have tailored plotting functions (to visualize the stats of 1D/2D groups), it'd be something hard to fit as an output of a function call (and we use it a lot in xDEM), and a class would be really useful. Finally, having a CLI functionality that wraps this (reading + binning + plotting + saving) all at once might also simplify things for end-users? These are all my thoughts! 😄 |
|
Just a note on my suggestion of adding a landcover classification. I think we could use the ESA worldcover. An example for accessing the data through the AWS bucket is shown here: https://github.com/ESA-WorldCover/esa-worldcover-datasets. |
|
Thank you @rhugonnet and @adehecq for your comments. It is nice to see that the topic is generating a lot of interest. However, I suggest that these ideas should correspond to new issues. It might be interesting to merge if there are no blocking points and propose new issues to be addressed later. |
|
@belletva Thanks for wrapping things up! 🙂 From my point of view, based on existing functionalities in SciPy/Pandas described above that already cover all use cases, I think we should have a wrapper method In short: If there is a need for a class structure, I haven't really grasped it yet. This rather fits a method-type functionality (which would be very fast to implement, as simply wrapping Pandas/SciPy). |
Resolves GlacioHack/xdem#692.
Context:
This PR introduces the
ClassificationLayerfunctionality within the geoutils library. The goal is to provide tools for classifying raster data into discrete classes, whether based on numeric intervals (e.g., binning elevation ranges) or categorical classes (e.g., land cover types). This enhancement improves geospatial workflows by allowing users to easily classify raster data and compute statistics for each class.Features:
ClassificationLayerAbstract Base Class:apply_classification), computing statistics (get_stats, based onRaster.get_stats), and saving results (save).BinningClass:ClassificationLayerfor classifying continuous raster data using custom binning.SegmentationClass:FusionClass:Save/Export Functionality:
Tests:
Unit tests for Binning to verify:
Unit tests for
Segmentation.Unit tests for
Fusion.Documentation:
Example: