[recipes] Synthesis capture — Query-as-Ingest pattern#212
Draft
alanshurafa wants to merge 7 commits into
Draft
Conversation
Adds capture_synthesis MCP tool + POST /synthesis REST endpoint so complex query results can be captured back as new thoughts with full provenance. Enforces anti-loop (no synthesis-of-synthesis), primary-parent, and 3+ source constraints.
…erge Previously the REST handler set source: rest_synthesis and then called Object.assign(mergedMetadata, body.metadata), letting a caller overwrite the reserved source identifier (and by extension spoof the write channel for downstream filtering/reporting recipes). Restructure the merge so body.metadata is applied first, handler-controlled fields (question/topics/tags) next, and reserved provenance keys (source, source_type, derivation_layer, derivation_method, derived_from) stamped last via explicit key-assign. Caller-supplied metadata can no longer spoof provenance fields.
Add upper bounds on user/LLM-controlled input to prevent accidental or adversarial floods into Postgres and the embedding API: - content: max 50KB (~10k words) - source_thought_ids: max 50 items (keeps .in() query plan sane) - question: max 2000 chars - topics/tags: max 20 entries, 200 chars each MCP path enforces via Zod max() / array.max(); REST path mirrors the same limits imperatively (returns 413 Payload Too Large) since it has no Zod schema. Both caps are documented inline; adjust both sides together if longer syntheses are needed.
…bility The stock upsert_thought RPC on origin/main only persists p_payload.metadata and silently drops top-level keys like source_type, derivation_layer, derivation_method, and derived_from. Until the sibling provenance-chains recipe lands with an updated RPC, every synthesis written against the stock install loses its provenance fields — and the anti-loop safety check becomes vacuous because no synthesis row ever gets tagged source_type='synthesis' in a queryable column. Fix: mirror the same four provenance fields into metadata.provenance.* in both handlers. On the patched RPC, top-level fields still populate the dedicated columns (no regression). On the stock RPC, the metadata mirror is the ONLY durable copy — callers and future readers can fall back to thoughts.metadata->'provenance' to reconstruct chains. Also aligns MCP-side embedding soft-fail with REST: both paths now return a success result with a warning message when the embedding patch write fails. Previously MCP returned isError: true which would trigger spurious retries of an already-successful capture.
Add DEPENDENCIES.md covering two unresolved couplings: 1. Sibling recipe provenance-chains is unmerged. Explains the stock upsert_thought RPC dropping top-level keys, the metadata.provenance.* mirror that keeps this recipe workable in the meantime, and the exact cleanup path (TODO(synthesis-capture) markers) once the patched RPC lands. 2. Stock search_thoughts / list_thoughts don't expose row IDs, so the advertised MCP flow (search, synthesize, capture) can't actually be completed by a model on its own. Document workarounds (manual ID injection, custom read tool) until a base update lands. Also add a Known Limitations section to README.md summarizing both blockers plus the new input caps and embedding soft-fail semantics, linking out to DEPENDENCIES.md for detail.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Depends on
This is opened as a draft. Flip to ready-for-review once its dependencies land on main.
What this adds
`recipes/synthesis-capture/` — Karpathy-inspired Query-as-Ingest pattern:
Known limitation (documented)
Stock `search_thoughts` / `list_thoughts` MCP tools don't expose thought IDs, so a model following the README's "search → synthesize → capture" flow can't populate `source_thought_ids` without another tool. Users provide IDs manually or via MCP tool update.
Review history
2 fix rounds + 2 Codex verify rounds + 1 Claude review. Final Codex clean.
See `recipes/synthesis-capture/README.md` + `DEPENDENCIES.md`.