Preview: per-row eventstamp-based etags for submissions-geojson #1718

brontolosone · 2025-12-14T18:57:12Z

Stacks onto:

and as such, contains code from those that's not yet in master, and this PR should not be merged, so I'm leaving it as draft.

Also, I'm still thinking about a way to handle the fact that the OData filter may refer to columns not present in the tables used in the etag calculation (case in point: ffgeo.path), and I'm still thinking about a way to make that calculation more reusable.

Anyway, the commit to actually look at is `fabc4d0` !

The blessed alternative to:

Fast eventcounter etags for geoextracts #1657

This introduces the first use of the submissions table's per-row eventstamps as etags.
It's a micro-PR to validate the approach.

Approach, reasoning:

Ideally, the etag for a response would be generated in the same statement as the (constituents of) the response body itself, so that it's 100% representative of it. But with_etag() doesn't really lend itself to that. It's a hard thing to accomplish anyway as for revalidation, you'd basically have to feed the etag to the DB and let it do some kind of awkward to express early return escapades to avoid recalculation if the etag matches, and if it doesn't, also somehow return the new etag together with the result.
So, etag revalidation and response generation will take place in separate statements, which means that (absent a repeatable-read (or higher) transaction isolation level) under concurrency the etag might not represent the resource accurately. Is this a problem? No. It may result in over-invalidation, but not in incorrectness. This has already been explained in the doc linked from Event counters for fast revalidation (etags) central#1439.
The two-step approach seems wasteful, and it is. We have to eat the cost of running the projection twice in case of invalidation. Thus we need to make sure that the filters are fast so that it's not that much of a problem.
In this case, I clocked 303 revalidations/s for a projection with a simple daterange filter applied. The approach from Fast eventcounter etags for geoextracts #1657 did 478 revalidations/s, but the difference is, IMO, a cheap price to pay for the much increased granularity. The hash-based default etag did... 11 revalidations/s on my geocollection-du-jour so the approach taken here is definitely a good performance increase on that default etag.

What has been done to verify that this works as intended?

So far, reasoning.

Why is this the best possible solution? Were any other approaches considered?

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

N/A

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

N/A

Before submitting this PR, please make sure you have:

run make test and confirmed all checks still pass OR confirm CircleCI build passes
verified that any code from external sources are properly credited in comments or that everything is internally sourced

sadiqkhoja · 2025-12-15T21:46:08Z

lib/model/query/geo-extracts.js

+const getSubmissionSelectionEtag = (formPK, odataQuery) => ({ db }) =>
+  db.oneFirst(sql`
+    SELECT
+        format('%s.%s', coalesce(max(sub.event), 0), count(sub.event))


I think count(*) would be faster than count(sub.event). Output wise both are same in a sense that sub.event can't be null ever, right?

IIUC count(*) lets PG choose which index to use to do the count on (to avoid a row scan). If it's being very very smart then that indeed could be faster (perhaps it has just been reading the PK index or something). I just happen to know that event has an index on it so the count() wouldn't result in a row scan. But yeah it's unlikely we'd ever get into the situation in which this count will result in a rowscan, we'd have to lose the PK index. I'll just change it for the less specific count(*) then.

Output wise both are same in a sense that sub.event can't be null ever, right?

They might be the same at this very start but they won't stay the same, a) because the counter is global, b) because of rows being deleted.

sadiqkhoja · 2025-12-16T03:22:58Z

lib/model/query/geo-extracts.js

+    FROM
+        submissions sub
+    WHERE
+        sub."formId" = ${formPK}


what will happen if the Form or Project is deleted?

There's this FK:

"submissions_formid_foreign" FOREIGN KEY ("formId") REFERENCES forms(id) ON DELETE CASCADE

so there shouldn't be any orphaned submissions. The next request (potentially conditional) would 404, we won't even get to the etag calculation. Is that what you mean?

sadiqkhoja

can you please add tests to ensure that ETag gets updated on the events that should trigger cache invalidation.

brontolosone · 2025-12-17T09:54:48Z

can you please add tests to ensure that ETag gets updated on the events that should trigger cache invalidation.

As an alternative to maintaining a whole comprehensive compendium of if-this-then-thats at the API level, I propose I'll just test the trigger itself, and thus whether insertions/mutations are indeed eventstamped.

And then later once (or if?) we add transparent caching through nginx, I'd like to test the behaviour of the caching infra as a whole, thus with nginx and its rather specific configuration in the loop.

brontolosone · 2025-12-18T15:03:53Z

I've added a test with 0186bf1 (to #1699) so that the DB mechanism itself is tested.

brontolosone added 4 commits December 14, 2025 18:15

Row event stamping for submissions table

40b95e2

use OData filter machinery for submission geo-endpoints

ffe9077

use OData filter machinery for entity geo-endpoints

f3c10ba

etag for submission-geojson

fabc4d0

brontolosone requested a review from sadiqkhoja December 14, 2025 19:02

sadiqkhoja reviewed Dec 15, 2025

View reviewed changes

sadiqkhoja reviewed Dec 16, 2025

View reviewed changes

brontolosone mentioned this pull request Dec 19, 2025

Event counters for fast revalidation (etags) getodk/central#1439

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Preview: per-row eventstamp-based etags for submissions-geojson #1718

Preview: per-row eventstamp-based etags for submissions-geojson #1718

Uh oh!

brontolosone commented Dec 14, 2025 •

edited

Loading

Uh oh!

sadiqkhoja Dec 15, 2025

Uh oh!

brontolosone Dec 16, 2025 •

edited

Loading

Uh oh!

brontolosone Dec 16, 2025

Uh oh!

sadiqkhoja Dec 16, 2025 •

edited

Loading

Uh oh!

brontolosone Dec 16, 2025

Uh oh!

sadiqkhoja left a comment

Uh oh!

brontolosone commented Dec 17, 2025

Uh oh!

brontolosone commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Preview: per-row eventstamp-based etags for submissions-geojson #1718

Are you sure you want to change the base?

Preview: per-row eventstamp-based etags for submissions-geojson #1718

Uh oh!

Conversation

brontolosone commented Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Anyway, the commit to actually look at is fabc4d0 !

What has been done to verify that this works as intended?

Why is this the best possible solution? Were any other approaches considered?

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

Before submitting this PR, please make sure you have:

Uh oh!

sadiqkhoja Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

brontolosone Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brontolosone Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

sadiqkhoja Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brontolosone Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

sadiqkhoja left a comment

Choose a reason for hiding this comment

Uh oh!

brontolosone commented Dec 17, 2025

Uh oh!

brontolosone commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brontolosone commented Dec 14, 2025 •

edited

Loading

Anyway, the commit to actually look at is `fabc4d0` !

brontolosone Dec 16, 2025 •

edited

Loading

sadiqkhoja Dec 16, 2025 •

edited

Loading