-
Notifications
You must be signed in to change notification settings - Fork 394
CoverageSimplify: add progress callback and add GEOSCoverageSimplifyVWWithProgress_r() #1268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ee6b86f to
c1e2f2f
Compare
|
We already have a mechanism for requesting/checking interrupts, so I think a progress callback should handle progress reporting only. I'm not sure about adding new C API signatures with a callback vs separately setting a progress callback in the way we do for interrupt handling. In part that may depend on what other operations would benefit from a progress callback, and how easy it is to pass the callback to the portion of the code that would invoke it. It seems fairly straightforward to pass it through the call chain here but I think other functions such as @pramsey am I right that PostGIS has no way to take advantage of progress reporting, so it would largely come down to what is useful for GDAL / QGIS / others? |
|
Correct, there's no progress handling in PgSQL, so benefit to it, and we already have the interrupt stuff. |
I thought a bit about that, but that seems a bit messy to me. Passing all along the call chain is admittedly a bit more tedious but there is less ambiguity about which APIs are able to notify, and that could avoid issues if compositing progress of different subparts of the API was needed that would otherwise individually compete for a GEOS-wide progress callback attached to the GEOS context handle. I'm fine removing the interrupt part of the callback if preferred. |
now done |
|
Any extra changes needed ? |
|
It's just going to take some time to review (1) does this provide a reasonable progress estimate for different datasets / parameters, and (2) is this the signature we want to use? Which means looking at other long-running functions and thinking about whether they can work with such a signature. |
|
So, running the following: Hangs at zero for 2:05, and then goes to 100% in 11 seconds. (subdivisions shapefile form https://www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/boundary-limites/index2021-eng.cfm?year=21) |
I've just improved that with a new commit that adds sub-progress for the 2 createEdges() calls done in TPVWSimplifier::simplify().
It is strongly inspired from GDALProgressFunc. But I believe we should add back the boolean flag to cancel. GEOS Interrupt logic is quite inappropriate for multithreaded clients like GDAL since it is a global one (not per context). It is much more practical to hand over the cancel part to the progress function |
Here I'm referring not to the signature of the callback itself, but whether we want functions to accept a progress callback or if we want to instead set a per-context progress handler. I can see that you are able to pass the progress handler to the spot where it needs to be called in this case, but I'm not confident that it's practical to do so for other algorithms that would benefit from progress reporting.
Some discussion in #761
If starting from scratch, yes, but we don't want to have one function that is using a different interrupt mechanism than the rest of GEOS. |
what are you thinking to specifically ? I'm really skeptical about the per-context model for progress callback. In my refinement of the progression for CoverageSimplify I've used the trick of created scaled sub-progress callbacks like GDAL does, so if function A receives a progress function and function A calls B and C, then it passes a scaled progress for B from 0 to let's say 30% and a scale progress for C from 30% to 100%, and then B and C can do the same trick if they need. With a progression callback registered per-context I don't see how you would do that. |
Right, that would have to be done directly by each algorithm, using essentially the same strategy (i.e. deciding how much of the total time each algorithm phase takes, and then pro-rating reported time based on that). |
|
I'm reluctant to introduce performance monitoring args all the way down the call stack in each algorithm. Having a thread-local context I think would avoid that? |
I don't have an alternate solution, but I'm hesitant to hurry and merge this without seeing how the same approach works in other functions, or understanding if/how other GEOS users would take advantage of it. I apologize if I was unclear in OSGeo/gdal#12483 (comment), but I was trying to convey some hesitation about the idea of progress reporting in general. I'm by no means opposed to it but I think this PR demonstrates that it's a tricky problem. |
Are there other GEOS functions in top of readers' heads where a single invokation of them is done typically on a huge object, or take a long time and that would be worth investigating ? |
|
|
d9701e9 to
e4b8c7b
Compare
Here's a (complete?) list. Some of them (e.g. coverage functions, The clustering functions are usally run on full datasets too. |
I've added a commit adding GEOSUnaryUnionWithProgress_r(). For now I've restriced the progress computation on the phase where polygons are unioned. I've given it a try with the following OGR Python script: from osgeo import gdal, ogr
g = ogr.Geometry(ogr.wkbGeometryCollection)
ds = ogr.Open("lcsd000b21a_e.shp.zip")
lyr = ds.GetLayer(0)
for f in lyr:
g.AddGeometry(f.GetGeometryRef())
print('unary union...')
g.UnaryUnion(gdal.TermProgress)using GDAL branch https://github.com/rouault/gdal/tree/OGR_G_UnaryUnionEx and I get a rather smooth progression report |
|
Any thoughts about how this topic and my proof of concepts (with GEOSCoverageSimplifyVW and GEOSUnaryUnion) and if this is something that can be ultimately merged ? I believe the interface for progress report is reasonable, although a bit chatty for GEOS internals, but I don't see much doable alternative. CC'ing @nyalldawson if he has opinions regarding this and if progress callback in GEOS operations could be something usable by QGIS? |
I would definitely like to see this land (along with #761). From a QGIS perspective I often see users start geos-based processes on complex geometries, which look like they've hung (and can't be canceled in any safe way). Progress reporting, even if it's just an incredibly rough estimate, would be of immense value to QGIS users. |
|
@rouault OK if I push dbaston@def9816, to keep the call sites a bit cleaner? |
yes please! |
|
c3d5a9e reduces the null checks and simplifies the creation of subprogress callbacks a bit. |
|
I'm curious what people think about an API like: The C API would store the callback in a variable to be used during the next function invocation and then cleared. If that function supports progress monitoring, it will be provided with the callback. Otherwise, the callback will be called with 1.0 (100%) after the operation is completed. It's not super elegant, and I'm not sure it's a good idea, but I'm trying to think about whether we can avoid introducing two more signatures for each function that supports progress monitoring, and client code having to have a ton of preprocessor GEOS version checks. |
|
I am fine with it, it seems no more or less aesthetically displeasing than the alternative. I assume the variable being set will be on the context, so hopefully this translates into multi-threaded situations safely. |
I'm also fine with it, pending:
It would be nice that GEOS after/before invoking the progress also checks for the interruption, because in a GDAL context, the GDAL progress function could typically ask GEOS for interruption. |
|
I tagged 3.14.0beta but if this completes I think we should add it before RC (I'm turning into @robe2), as I am mostly tagging the beta to get the packagers to kick the tires on the build / package machinery, not to exercise the CAPI in any way. |
|
As an experiment, dbaston@0e4aae4 follows the pattern of #803 and allows for a progress callback to be registered with a context. This is a bit less magical than what I proposed above, in that the callback remains registered after use. If this concept is taken to completion, we would use comments in the C API to note a stable set of non-trivial functions that can report progress. (I'm thinking this would be everything outside of getters/setters/area/etc.) We can use the C API |
|
@dbaston you have thumbs up from me and @rouault, maybe @nyalldawson has an opinion as a prospective user of this API? |
Implement what has been discussed with @dr-jts in OSGeo/gdal#12483 (comment)
GDAL use of it in rouault/gdal@algorithm_progress...rouault:gdal:GEOSCoverageSimplifyVWWithProgress_r
CC @dbaston