Skip to content

Compute area-weighted statistics#640

Merged
vincentsarago merged 4 commits intomainfrom
CoverageStatistics
Sep 27, 2023
Merged

Compute area-weighted statistics#640
vincentsarago merged 4 commits intomainfrom
CoverageStatistics

Conversation

@vincentsarago
Copy link
Copy Markdown
Member

@vincentsarago vincentsarago commented Sep 20, 2023

ref NASA-IMPACT/veda-backend#226

This PR adds a coverage options to the get_array_statistics function to enable area weighted statistics. This PR do not take care of the coverage array creation which should be done by the client application (maybe with some helper in rio-tiler)

cc @kylebarron @j08lue

To Do

  • more tests
  • docs
  • validate the logic (why some values are defined using weight or not)
  • add coverage utility functions

Comment thread rio_tiler/utils.py
"minority": float(keys[counts.tolist().index(counts.min())].tolist())
if valid_pixels
else numpy.nan,
"mean": float(array.mean()),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't use the weighted array for the min and max (like https://github.com/isciences/exactextract#supported-statistics)

Comment thread rio_tiler/utils.py
Comment thread rio_tiler/utils.py
"majority": majority,
"minority": minority,
"unique": float(counts.size),
**dict(zip(percentiles_names, percentiles_values)),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 maybe the percentiles should use the weighted array?

Comment thread rio_tiler/utils.py Outdated
Comment thread tests/test_utils.py
# 3, 4
data = np.ma.array((1, 2, 3, 4)).reshape((1, 2, 2))

# Coverage Array
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should call this the weight array instead of coverage? coverage doesn't seem to make the most sense to me here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in https://github.com/isciences/exactextract, weight are really weight, while the coverage is called cell coverage fractions. I didn't want to confuse people 🤷

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, coverage came from exactextract because their weights are usually derived from the coverage of each polygon in the cell. I think weight is more general than coverage though. There could be use cases for weighted zonal stats that aren't partial-coverage

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coverage_weights ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we ever have both coverage and weights?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like exactextract does support both coverage and weight in effect, because it allows weight as a parameter and it computes coverage itself

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we ever have both coverage and weights?

Not planned right now but I don't want to be blocked in the future. I agree that coverage is not a perfect name but I don't want to use weights because it's not it's not weights but spatial fraction. I'm open to change to better name if you have ideas :D

Comment thread rio_tiler/models.py
(self.height, cover_scale, self.width, cover_scale)
).astype("float32")

return cover_array.sum(-1).sum(1) / (cover_scale**2)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly modified version of perrygeo/python-rasterstats#136

@sgoodm I know ☝️ is a 7 years old PR (😅) but I hope you don't mind that I reused some of the code here 🙏

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at all, @jacobwhall and I have been pushing use of COGs in our work at @aiddata and are fans of these projects, so happy to share some code to support 👍 👍

@vincentsarago
Copy link
Copy Markdown
Member Author

Note: we don't have a BaseReader method that returns statistics for a GeoJSON Feature but this is how it could look like in the client (e.g TiTiler):

with Reader(path) as src:
    data = src_dst.feature(
        shape,
        shape_crs=WGS84_CRS,
    )

    coverage_array = data.get_coverage_array(
        shape, shape_crs=WGS84_CRS
    )

    stats = data.statistics(coverage=coverage_array)

@vincentsarago vincentsarago marked this pull request as ready for review September 25, 2023 19:16
@vincentsarago vincentsarago merged commit 7bed9de into main Sep 27, 2023
@vincentsarago vincentsarago deleted the CoverageStatistics branch September 27, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants