Skip to content

Conversation

@brontolosone
Copy link
Contributor

@brontolosone brontolosone commented Dec 14, 2025

Towards getodk/central#1439

Stacks onto:

and as such, contains code from those that's not yet in master, and this PR should not be merged, so I'm leaving it as draft.

Also, I'm still thinking about a way to handle the fact that the OData filter may refer to columns not present in the tables used in the etag calculation (case in point: ffgeo.path), and I'm still thinking about a way to make that calculation more reusable.

Anyway, the commit to actually look at is fabc4d0 !

The blessed alternative to:

This introduces the first use of the submissions table's per-row eventstamps as etags.
It's a micro-PR to validate the approach.

Approach, reasoning:

  1. Ideally, the etag for a response would be generated in the same statement as the (constituents of) the response body itself, so that it's 100% representative of it. But with_etag() doesn't really lend itself to that. It's a hard thing to accomplish anyway as for revalidation, you'd basically have to feed the etag to the DB and let it do some kind of awkward to express early return escapades to avoid recalculation if the etag matches, and if it doesn't, also somehow return the new etag together with the result.
  2. So, etag revalidation and response generation will take place in separate statements, which means that (absent a repeatable-read (or higher) transaction isolation level) under concurrency the etag might not represent the resource accurately. Is this a problem? No. It may result in over-invalidation, but not in incorrectness. This has already been explained in the doc linked from Event counters for fast revalidation (etags) central#1439.
  3. The two-step approach seems wasteful, and it is. We have to eat the cost of running the projection twice in case of invalidation. Thus we need to make sure that the filters are fast so that it's not that much of a problem.
  4. In this case, I clocked 303 revalidations/s for a projection with a simple daterange filter applied. The approach from Fast eventcounter etags for geoextracts #1657 did 478 revalidations/s, but the difference is, IMO, a cheap price to pay for the much increased granularity. The hash-based default etag did... 11 revalidations/s on my geocollection-du-jour so the approach taken here is definitely a good performance increase on that default etag.

What has been done to verify that this works as intended?

So far, reasoning.

Why is this the best possible solution? Were any other approaches considered?

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

N/A

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

N/A

Before submitting this PR, please make sure you have:

  • run make test and confirmed all checks still pass OR confirm CircleCI build passes
  • verified that any code from external sources are properly credited in comments or that everything is internally sourced

const getSubmissionSelectionEtag = (formPK, odataQuery) => ({ db }) =>
db.oneFirst(sql`
SELECT
format('%s.%s', coalesce(max(sub.event), 0), count(sub.event))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think count(*) would be faster than count(sub.event). Output wise both are same in a sense that sub.event can't be null ever, right?

Copy link
Contributor Author

@brontolosone brontolosone Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC count(*) lets PG choose which index to use to do the count on (to avoid a row scan). If it's being very very smart then that indeed could be faster (perhaps it has just been reading the PK index or something). I just happen to know that event has an index on it so the count() wouldn't result in a row scan. But yeah it's unlikely we'd ever get into the situation in which this count will result in a rowscan, we'd have to lose the PK index. I'll just change it for the less specific count(*) then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output wise both are same in a sense that sub.event can't be null ever, right?

They might be the same at this very start but they won't stay the same, a) because the counter is global, b) because of rows being deleted.

FROM
submissions sub
WHERE
sub."formId" = ${formPK}
Copy link
Contributor

@sadiqkhoja sadiqkhoja Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will happen if the Form or Project is deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's this FK:

"submissions_formid_foreign" FOREIGN KEY ("formId") REFERENCES forms(id) ON DELETE CASCADE

so there shouldn't be any orphaned submissions. The next request (potentially conditional) would 404, we won't even get to the etag calculation. Is that what you mean?

Copy link
Contributor

@sadiqkhoja sadiqkhoja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add tests to ensure that ETag gets updated on the events that should trigger cache invalidation.

@brontolosone
Copy link
Contributor Author

can you please add tests to ensure that ETag gets updated on the events that should trigger cache invalidation.

As an alternative to maintaining a whole comprehensive compendium of if-this-then-thats at the API level, I propose I'll just test the trigger itself, and thus whether insertions/mutations are indeed eventstamped.

And then later once (or if?) we add transparent caching through nginx, I'd like to test the behaviour of the caching infra as a whole, thus with nginx and its rather specific configuration in the loop.

@brontolosone
Copy link
Contributor Author

I've added a test with 0186bf1 (to #1699) so that the DB mechanism itself is tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants