fix(query): re-probe Parquet schema when cache changes underneath a running engine by Lazare-42 · Pull Request #422 · kenn-io/msgvault

Lazare-42 · 2026-06-26T07:04:08Z

What

The DuckDB query engine probes each Parquet table's optional columns once at construction and trusts that snapshot for the rest of the process lifetime. A long-running process (the serve daemon, the MCP server) goes stale when build-cache/sync rewrites the analytics cache with a different column set.

When that happens, the cached "column present" verdict puts a now-absent column into a SELECT * REPLACE (... AS message_type) list, which DuckDB rejects:

Binder Error: Column "message_type" in REPLACE list not found in FROM clause

Every Search/Aggregate call then fails until the process is restarted.

Why

Mixed-version writers (or full rebuilds with an older builder) can drop optional columns from the cache. The engine assumed the schema was immutable for its lifetime; it isn't for a server whose cache is rebuilt out of band.

How

Fingerprint the cache (per-probed-table file count + size + mtime).
When the fingerprint changes, re-probe optional columns and re-register the SQL views on demand (cheap no-op on the common unchanged path).
Guard optionalCols with an RWMutex since refresh can now race with reads.
Regression test rebuilds the cache underneath a live engine and asserts the query recovers instead of crashing.

🤖 Generated with Claude Code

The DuckDB engine probed each Parquet table's optional columns once at construction and trusted that snapshot for the process lifetime. A long-running mcp-http server therefore went stale when build-cache/sync rewrote the analytics cache with a different column set: the cached "column present" verdict put a now-absent column into a SELECT * REPLACE list, which DuckDB rejects with Binder Error: Column "message_type" in REPLACE list not found in FROM clause crashing every search_messages and aggregate call until restart. Fingerprint the cache (per-table file count + size + mtime) and re-probe optional columns (and re-register views) on demand when it changes. Guard optionalCols with a RWMutex since refresh now happens concurrently with reads. Add a regression test that rebuilds the cache underneath a live engine and asserts the query recovers instead of crashing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

roborev-ci · 2026-06-26T12:44:00Z

roborev: Combined Review (`393f632`)

Stale DuckDB search cache remains a Medium correctness issue; no security findings reported.

Medium

internal/query/duckdb.go:2446 - SearchFastWithStats serves cached temp tables keyed only by search conditions. After ensureFreshOptionalCols detects an analytics cache rebuild, repeating the same search can return stale rows, counts, and stats from pre-rebuild Parquet data. Refresh/check the cache fingerprint before the cache-hit branch and include it in the search cache key, or drop/re-materialize the search cache whenever the Parquet fingerprint changes.

Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 2m33s), codex_security (codex/security, done, 40s) | Total: 3m19s

wesm · 2026-06-26T22:58:07Z

looking

SearchFastWithStats kept a materialized temp table alive for pagination and keyed it only by search predicates. When build-cache or sync rewrote the Parquet analytics files under a long-running engine, the optional-column probe could refresh while the search path still served rows, counts, and stats from the pre-rebuild temp table.\n\nInclude the observed Parquet cache fingerprint in the search cache key and refresh that fingerprint before considering a cache hit, so repeated searches rematerialize against current analytics data after an out-of-band cache rebuild. The regression rewrites Parquet beneath a live engine and repeats the same fast search to prove counts, stats, and row ordering come from the rebuilt cache.\n\nGenerated with Codex (GPT-5)\nCo-authored-by: Codex <codex@openai.com>

roborev-ci · 2026-06-26T23:16:05Z

roborev: Combined Review (`1dd0bb7`)

Summary verdict: Changes need fixes before merge due to stale cache invalidation risks.

Medium

internal/query/duckdb.go:177 — NewDuckDBEngine records the cache fingerprint after probing Parquet schemas. If the cache is rebuilt between the probe and assignment, optionalCols can describe old files while cacheFP matches new files, causing ensureFreshOptionalCols to treat stale schema data as fresh.
- Fix: Capture the fingerprint before probing, or compare fingerprints before and after the probe and retry until stable before storing optionalCols and cacheFP.
internal/query/duckdb.go:287 — The fingerprint used to invalidate SearchFastWithStats cache only includes messages, participants, conversations, and sources. Results and cached stats also depend on message_recipients, labels, message_labels, and attachments, so rebuilding only those files can leave cached rows or stats stale.
- Fix: Include all RequiredParquetDirs in the fingerprint, with the messages directory using the partitioned glob.

Panel: ci_default_security | Synthesis: codex, 12s | Members: codex_default (codex/default, done, 3m56s), codex_security (codex/security, done, 1m4s) | Total: 5m12s

Search cache invalidation now depends on the same Parquet directory set required for a complete analytics cache. Without the auxiliary tables in the fingerprint, attachment, label, recipient, or relationship rebuilds could leave SearchFastWithStats serving stale cached stats or rows for repeated searches.\n\nSchema probing now also records optional columns only from a stable fingerprint window. If build-cache rewrites files while a long-running engine is probing schemas, the probe retries instead of storing columns from one cache generation with the fingerprint from another.\n\nGenerated with Codex (GPT-5)\nCo-authored-by: Codex <codex@openai.com>

roborev-ci · 2026-06-27T00:36:11Z

roborev: Combined Review (`e673436`)

No issues found.

Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 4m23s), codex_security (codex/security, done, 12s) | Total: 4m35s

CI runs the custom testify-helper-check analyzer as part of lint-ci, while the local lint target only runs golangci-lint. The cache drift regression tests crossed the direct-package-call threshold, so the CI-only analyzer rejected them even though the tests themselves passed.\n\nUse local assert and require helpers in the assertion-heavy tests so the drift coverage follows the repository's testify style rule.\n\nGenerated with Codex (GPT-5)\nCo-authored-by: Codex <codex@openai.com>

roborev-ci · 2026-06-27T18:51:12Z

roborev: Combined Review (`d52c39e`)

No Medium, High, or Critical findings were reported.

Both reviews are clean at the requested severity threshold.

Panel: ci_default_security | Synthesis: codex, 5s | Members: codex_default (codex/default, done, 6m21s), codex_security (codex/security, done, 14s) | Total: 6m40s

wesm merged commit a120738 into kenn-io:main Jun 27, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(query): re-probe Parquet schema when cache changes underneath a running engine#422

fix(query): re-probe Parquet schema when cache changes underneath a running engine#422
wesm merged 4 commits into
kenn-io:mainfrom
Lazare-42:fix/parquet-reprobe-on-cache-change

Lazare-42 commented Jun 26, 2026

Uh oh!

roborev-ci Bot commented Jun 26, 2026

Uh oh!

wesm commented Jun 26, 2026

Uh oh!

roborev-ci Bot commented Jun 26, 2026

Uh oh!

roborev-ci Bot commented Jun 27, 2026

Uh oh!

roborev-ci Bot commented Jun 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Lazare-42 commented Jun 26, 2026

What

Why

How

Uh oh!

roborev-ci Bot commented Jun 26, 2026

roborev: Combined Review (393f632)

Medium

Uh oh!

wesm commented Jun 26, 2026

Uh oh!

roborev-ci Bot commented Jun 26, 2026

roborev: Combined Review (1dd0bb7)

Medium

Uh oh!

roborev-ci Bot commented Jun 27, 2026

roborev: Combined Review (e673436)

Uh oh!

roborev-ci Bot commented Jun 27, 2026

roborev: Combined Review (d52c39e)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

roborev: Combined Review (`393f632`)

roborev: Combined Review (`1dd0bb7`)

roborev: Combined Review (`e673436`)

roborev: Combined Review (`d52c39e`)