Skip to content

Conversation

@rouault
Copy link
Contributor

@rouault rouault commented May 30, 2025

@rouault rouault force-pushed the GEOSCoverageSimplifyVWWithProgress branch 7 times, most recently from ee6b86f to c1e2f2f Compare May 30, 2025 12:57
@dbaston
Copy link
Member

dbaston commented May 30, 2025

We already have a mechanism for requesting/checking interrupts, so I think a progress callback should handle progress reporting only.

I'm not sure about adding new C API signatures with a callback vs separately setting a progress callback in the way we do for interrupt handling. In part that may depend on what other operations would benefit from a progress callback, and how easy it is to pass the callback to the portion of the code that would invoke it. It seems fairly straightforward to pass it through the call chain here but I think other functions such as GEOSUnaryUnion would be more challenging.

@pramsey am I right that PostGIS has no way to take advantage of progress reporting, so it would largely come down to what is useful for GDAL / QGIS / others?

@pramsey
Copy link
Member

pramsey commented May 30, 2025

Correct, there's no progress handling in PgSQL, so benefit to it, and we already have the interrupt stuff.

@rouault
Copy link
Contributor Author

rouault commented May 30, 2025

I'm not sure about adding new C API signatures with a callback vs separately setting a progress callback

I thought a bit about that, but that seems a bit messy to me. Passing all along the call chain is admittedly a bit more tedious but there is less ambiguity about which APIs are able to notify, and that could avoid issues if compositing progress of different subparts of the API was needed that would otherwise individually compete for a GEOS-wide progress callback attached to the GEOS context handle.

I'm fine removing the interrupt part of the callback if preferred.

@rouault
Copy link
Contributor Author

rouault commented May 31, 2025

I'm fine removing the interrupt part of the callback if preferred.

now done

@rouault
Copy link
Contributor Author

rouault commented Jun 4, 2025

Any extra changes needed ?

@dbaston
Copy link
Member

dbaston commented Jun 4, 2025

It's just going to take some time to review (1) does this provide a reasonable progress estimate for different datasets / parameters, and (2) is this the signature we want to use? Which means looking at other long-running functions and thinking about whether they can work with such a signature.

@dbaston
Copy link
Member

dbaston commented Jun 4, 2025

So, running the following:

gdal vector simplify-coverage ~/data/statscan/lcsd000b21a_e.shp /tmp/simp.gpkg --tolerance 1e-3 --overwrite --progress

Hangs at zero for 2:05, and then goes to 100% in 11 seconds.

(subdivisions shapefile form https://www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/boundary-limites/index2021-eng.cfm?year=21)

@rouault
Copy link
Contributor Author

rouault commented Jun 4, 2025

does this provide a reasonable progress estimate for different datasets / parameters

I've just improved that with a new commit that adds sub-progress for the 2 createEdges() calls done in TPVWSimplifier::simplify().

is this the signature we want to use?

It is strongly inspired from GDALProgressFunc. But I believe we should add back the boolean flag to cancel. GEOS Interrupt logic is quite inappropriate for multithreaded clients like GDAL since it is a global one (not per context). It is much more practical to hand over the cancel part to the progress function

@dbaston
Copy link
Member

dbaston commented Jun 4, 2025

is this the signature we want to use?

Here I'm referring not to the signature of the callback itself, but whether we want functions to accept a progress callback or if we want to instead set a per-context progress handler. I can see that you are able to pass the progress handler to the spot where it needs to be called in this case, but I'm not confident that it's practical to do so for other algorithms that would benefit from progress reporting.

GEOS Interrupt logic is quite inappropriate for multithreaded clients like GDAL since it is a global one (not per context).

Some discussion in #761

It is much more practical to hand over the cancel part to the progress function

If starting from scratch, yes, but we don't want to have one function that is using a different interrupt mechanism than the rest of GEOS.

@rouault
Copy link
Contributor Author

rouault commented Jun 4, 2025

but I'm not confident that it's practical to do so for other algorithms that would benefit from progress reporting.

what are you thinking to specifically ? I'm really skeptical about the per-context model for progress callback. In my refinement of the progression for CoverageSimplify I've used the trick of created scaled sub-progress callbacks like GDAL does, so if function A receives a progress function and function A calls B and C, then it passes a scaled progress for B from 0 to let's say 30% and a scale progress for C from 30% to 100%, and then B and C can do the same trick if they need. With a progression callback registered per-context I don't see how you would do that.

@dr-jts
Copy link
Contributor

dr-jts commented Jun 4, 2025

In my refinement of the progression for CoverageSimplify I've used the trick of created scaled sub-progress callbacks like GDAL does

Right, that would have to be done directly by each algorithm, using essentially the same strategy (i.e. deciding how much of the total time each algorithm phase takes, and then pro-rating reported time based on that).

@dr-jts
Copy link
Contributor

dr-jts commented Jun 4, 2025

I'm reluctant to introduce performance monitoring args all the way down the call stack in each algorithm. Having a thread-local context I think would avoid that?

@dbaston
Copy link
Member

dbaston commented Jun 4, 2025

what are you thinking to specifically ?

I don't have an alternate solution, but I'm hesitant to hurry and merge this without seeing how the same approach works in other functions, or understanding if/how other GEOS users would take advantage of it. I apologize if I was unclear in OSGeo/gdal#12483 (comment), but I was trying to convey some hesitation about the idea of progress reporting in general. I'm by no means opposed to it but I think this PR demonstrates that it's a tricky problem.

@rouault
Copy link
Contributor Author

rouault commented Jun 4, 2025

how the same approach works in other functions

Are there other GEOS functions in top of readers' heads where a single invokation of them is done typically on a huge object, or take a long time and that would be worth investigating ?

@dbaston
Copy link
Member

dbaston commented Jun 4, 2025

GEOSUnaryUnion would top my list for "single invocation on a huge object."

@rouault rouault force-pushed the GEOSCoverageSimplifyVWWithProgress branch from d9701e9 to e4b8c7b Compare June 4, 2025 23:49
@dr-jts
Copy link
Contributor

dr-jts commented Jun 5, 2025

other GEOS functions in top of readers' heads where a single invokation of them is done typically on a huge object, or take a long time

Here's a (complete?) list. Some of them (e.g. coverage functions, GEOSPolygonize) are designed to be run on full datasets. Others have an API which accepts single geometries as input, but are often run on large datasets by aggregating a set of geometries into a single collection (e.g. Convex Hull, Delaunay Triangulation)

GEOSConvsxHull
GEOSConcaveHull
GEOSDelaunayTriangulation
GEOSConstrainedDelaunayTriangulation
GEOSVoronoiDiagram
GEOSNode
GEOSPolygonize
GEOSTopologyPreserveSimplify

GEOSCoverageUnion
GEOSCoverageIsValid
GEOSCoverageClean et al
GEOSCoverageSimplifyVW

GEOSClusterDBSCAN
GEOSClusterGeometryDistance
GEOSClusterGeometryIntersects
GEOSClusterEnvelopeDistance
GEOSClusterEnvelopeIntersects

The clustering functions are usally run on full datasets too.

@rouault
Copy link
Contributor Author

rouault commented Jun 13, 2025

GEOSUnaryUnion would top my list for "single invocation on a huge object."

I've added a commit adding GEOSUnaryUnionWithProgress_r(). For now I've restriced the progress computation on the phase where polygons are unioned. I've given it a try with the following OGR Python script:

from osgeo import gdal, ogr

g = ogr.Geometry(ogr.wkbGeometryCollection)
ds = ogr.Open("lcsd000b21a_e.shp.zip")
lyr = ds.GetLayer(0)
for f in lyr:
    g.AddGeometry(f.GetGeometryRef())
print('unary union...')
g.UnaryUnion(gdal.TermProgress)

using GDAL branch https://github.com/rouault/gdal/tree/OGR_G_UnaryUnionEx and I get a rather smooth progression report

@rouault
Copy link
Contributor Author

rouault commented Jun 25, 2025

Any thoughts about how this topic and my proof of concepts (with GEOSCoverageSimplifyVW and GEOSUnaryUnion) and if this is something that can be ultimately merged ? I believe the interface for progress report is reasonable, although a bit chatty for GEOS internals, but I don't see much doable alternative.

CC'ing @nyalldawson if he has opinions regarding this and if progress callback in GEOS operations could be something usable by QGIS?

@nyalldawson
Copy link

if he has opinions regarding this and if progress callback in GEOS operations could be something usable by QGIS?

I would definitely like to see this land (along with #761). From a QGIS perspective I often see users start geos-based processes on complex geometries, which look like they've hung (and can't be canceled in any safe way). Progress reporting, even if it's just an incredibly rough estimate, would be of immense value to QGIS users.

@dbaston
Copy link
Member

dbaston commented Jul 8, 2025

@rouault OK if I push dbaston@def9816, to keep the call sites a bit cleaner?

@rouault
Copy link
Contributor Author

rouault commented Jul 8, 2025

to keep the call sites a bit cleaner?

yes please!

@dbaston
Copy link
Member

dbaston commented Jul 8, 2025

c3d5a9e reduces the null checks and simplifies the creation of subprogress callbacks a bit.

@dbaston
Copy link
Member

dbaston commented Jul 9, 2025

I'm curious what people think about an API like:

GEOSSetProgressCallback(pFn, pUserData)
GEOSCoverageSimplifyVW(geom, tol, preserve)

The C API would store the callback in a variable to be used during the next function invocation and then cleared. If that function supports progress monitoring, it will be provided with the callback. Otherwise, the callback will be called with 1.0 (100%) after the operation is completed.

It's not super elegant, and I'm not sure it's a good idea, but I'm trying to think about whether we can avoid introducing two more signatures for each function that supports progress monitoring, and client code having to have a ton of preprocessor GEOS version checks.

@pramsey
Copy link
Member

pramsey commented Jul 9, 2025

I am fine with it, it seems no more or less aesthetically displeasing than the alternative. I assume the variable being set will be on the context, so hopefully this translates into multi-threaded situations safely.

@rouault
Copy link
Contributor Author

rouault commented Jul 9, 2025

I am fine with it, it seems no more or less aesthetically displeasing than the alternative

I'm also fine with it, pending:

  • the documentation of the function where it is actually implemented mentions it (so people don't have false expectations)
  • and that the progress callback is really cleared in all situations after the GEOS function call, including when it exists following a GEOS C++ exception. It is important because otherwise if the progress callback remained installed and was later called in another GEOS function, the progress callback might be referencing objects (its user data) that are no longer alive.

It would be nice that GEOS after/before invoking the progress also checks for the interruption, because in a GDAL context, the GDAL progress function could typically ask GEOS for interruption.

@pramsey
Copy link
Member

pramsey commented Jul 21, 2025

I tagged 3.14.0beta but if this completes I think we should add it before RC (I'm turning into @robe2), as I am mostly tagging the beta to get the packagers to kick the tires on the build / package machinery, not to exercise the CAPI in any way.

@dbaston
Copy link
Member

dbaston commented Jul 31, 2025

As an experiment, dbaston@0e4aae4 follows the pattern of #803 and allows for a progress callback to be registered with a context. This is a bit less magical than what I proposed above, in that the callback remains registered after use. If this concept is taken to completion, we would use comments in the C API to note a stable set of non-trivial functions that can report progress. (I'm thinking this would be everything outside of getters/setters/area/etc.)

We can use the C API execute wrapper to tag functions as either having an actual progress reporting capability (coverage simplify) or falling back to a default progress reporter that will simplify emit 100% progress when the function finishes. (I've done this for unary union as an example.) Over time we could expand the set of functions that can report meaningful progress without making API changes.

@pramsey
Copy link
Member

pramsey commented Aug 1, 2025

@dbaston you have thumbs up from me and @rouault, maybe @nyalldawson has an opinion as a prospective user of this API?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants