Skip to content

feat(parser): reusable source-set framework#877

Open
mariusvniekerk wants to merge 7 commits into
fam/facade-corefrom
fam/source-set-framework
Open

feat(parser): reusable source-set framework#877
mariusvniekerk wants to merge 7 commits into
fam/facade-corefrom
fam/source-set-framework

Conversation

@mariusvniekerk

@mariusvniekerk mariusvniekerk commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Adds the reusable source-set bases and functional options that concrete providers compose for discovery, lookup, fingerprinting, and changed-path classification. The framework symbols are exported so they can be consumed across the package without tripping unused-symbol lint at intermediate branches in the stack.

Also adds the WithCompanionFiles option, unifies the path-under-root helper onto a single root-first signature, and removes the dead SQLiteFanoutSourceSet.

Where to look: internal/parser/jsonl_source_set.go, single_file_source_set.go, and the source-set base types.

@roborev-ci

roborev-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

roborev: Combined Review (37dc43b)

Summary verdict: one medium correctness issue should be fixed before merge; no security issues were found.

Medium

  • internal/parser/single_file_source_set.go:192 - SourcesForChangedPath returns nil as soon as the first configured root contains WatchRoot but its classifier rejects the path. With overlapping configured roots, a broad root can prevent a later, more specific root from classifying a valid change, so live updates can be missed.
    • Fix: continue checking other matching configured roots when classifyPath returns false; only return nil after all candidates fail. Add an overlapping-root test.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 9m17s), codex_security (codex/security, done, 1m13s) | Total: 10m37s

@roborev-ci

roborev-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

roborev: Combined Review (f708cf2)

Medium finding only: the new parser source-set fallback paths can bypass freshness checks.

Medium

  • internal/parser/jsonl_source_set.go:302 and internal/parser/multi_session_container.go:260
    RequireFreshSource is bypassed on stored-path fallback paths. These branches can return a source ref without verifying that the file/member still exists, so normal provider-authoritative sync can proceed to fingerprint/parse stale sources instead of returning not found.
    Fix: When RequireFreshSource is true, validate the fallback source before returning it: stat/reuse source inclusion checks for JSONL, and apply memberPresent for multi-session fallback matches.

Panel: ci_default_security | Synthesis: codex, 6s | Members: codex_default (codex/default, done, 6m43s), codex_security (codex/security, done, 2m27s) | Total: 9m16s

@mariusvniekerk mariusvniekerk force-pushed the fam/source-set-framework branch from f708cf2 to 5c6be8e Compare June 26, 2026 18:54
@roborev-ci

roborev-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

roborev: Combined Review (5c6be8e)

Summary verdict: Medium issues remain in parser source-set integration; no High or Critical findings were reported.

Medium

  • Location: internal/parser/source_set.go:116
    Problem: SourceSetFactory only passes ProviderConfig with roots/machine into source-set builders, so migrated providers cannot access EngineConfig.PathRewriter. Providers whose IDs or stored paths depend on a canonical source path, notably Aider remote sync, may hash temp extraction paths instead of rewritten remote paths and create unstable session IDs.
    Fix: Add a path rewriter to ProviderConfig, pass it from all engine provider construction sites, and have source-set parse/lookup closures use it where legacy processing currently does.

  • Location: internal/parser/multi_session_container.go:230
    Problem: SourcesForChangedPath ignores ChangedPathRequest.StoredSourcePaths, even though shared-container providers need persisted virtual paths for DB/WAL/root deletion events where individual members can no longer be rediscovered. This can collapse an event to the physical container or no source, leaving stale virtual sessions unprocessed.
    Fix: When a container/sibling event classifies, also classify and emit matching StoredSourcePaths as member sources, deduped, before falling back to the direct classified source.


Panel: ci_default_security | Synthesis: codex, 9s | Members: codex_default (codex/default, done, 7m46s), codex_security (codex/security, done, 49s) | Total: 8m44s

Introduce the SourceSet bases (JSONL, directory JSONL, single-file,
multi-session container, sibling-metadata, SQLite fan-out), the
functional with*() option set, the generic SourceSet provider/factory
plumbing, and the virtual-path and source-identity helpers up front, so
every provider migration constructs its source set through options
instead of a struct literal.
SQLiteFanoutSourceSet had no production callers -- it was referenced only by its own definition and duplicated multiSessionContainerSourceSet. The package-level helpers it relied on (cleanJSONLRoots, addJSONLSource, sortJSONLSources, and friends) are defined and used elsewhere, so removing it orphans nothing.
… order

pathIsUnderRoot took (path, root), reversing the root-first convention
used by pathUnderRoot(root, candidate) and relUnder(dir, child). Its only
caller already had root and path available, and pathUnderRoot has the same
containment semantics (root==path is not "under"; rejects ".." escape;
handles trailing separators via filepath.Clean). Drop the duplicate and
repoint the caller so all containment checks share one root-first helper.
Providers whose transcript freshness depends on a sidecar file had no
base-level hook: only the SiblingMetadataSourceSet wrapper handled
companions, forcing a separate wrapper type instead of a plain option.

Add withCompanionFiles(transcriptPath -> companions) so the JSONLSourceSet
base folds companions into the three places they matter: their basenames
join the watch-plan include globs, their size/mtime (and content when
hashing is enabled) fold into the SourceFingerprint, and a changed
companion path maps back to its owning transcript so a sidecar write
re-parses the session. The wiring reuses the existing sibling-metadata
helpers rather than adding a third independent sidecar mechanism.
writeSourceFile and the generic source-set tests were introduced by the qwenpaw migration but are framework-level helpers used by ~20 provider test files. Placing them on the source-set-framework branch lets every family branch build its tests.
The reusable source-set scaffolding (functional options, source-set constructors, and the generic source-set provider/factory) was unexported. Because providers don't consume it until higher branches in the stack, staticcheck's unused linter flagged ~62 of these symbols as dead at every mid-stack branch, and the always-run golangci-lint pre-commit hook failed on each such commit. Exporting the API makes the unused analyzer ignore them (it never reports exported identifiers), eliminating the spurious findings stack-wide with no import cycle.
@mariusvniekerk mariusvniekerk force-pushed the fam/source-set-framework branch from 5c6be8e to 0bf59bc Compare June 26, 2026 19:06
@roborev-ci

roborev-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

roborev: Combined Review (0bf59bc)

Findings require changes before merge.

Medium

  • internal/parser/multi_session_container.go:260 - storedPathFallback returns a source without honoring RequireFreshSource / memberPresent, unlike the normal classified-path branch. A stale fallback can be accepted during normal sync and proceed to fingerprint/parse after the member has disappeared. Apply the same req.RequireFreshSource && !s.memberPresent(match.toSource(root)) check before returning fallback matches.

  • internal/parser/jsonl_source_set.go:302 - StoredPathFallbackRoot bypasses the existing-file checks used by sourceForPath, so FindSource can return a stale or missing path even when callers requested RequireFreshSource. When RequireFreshSource is set, validate that the fallback path still exists and is an includable source before returning it.


Panel: ci_default_security | Synthesis: codex, 8s | Members: codex_default (codex/default, done, 8m1s), codex_security (codex/security, done, 49s) | Total: 8m58s

filepath.Rel returns OS-native separators, so on Windows the JSONLSource
RelPath came back with backslashes (nested\deleted.jsonl) while the rest
of the parser keys, display paths, and tests use forward-slash relative
paths. Normalize with filepath.ToSlash so RelPath is platform-stable; this
is a no-op on Unix and fixes the Windows Go Test failure in
TestJSONLSourceSetChangedPathClassifiesDeletedFiles.
@roborev-ci

roborev-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

roborev: Combined Review (1f9da0b)

No issues found.

Summary: The change adds reusable parser source-set foundations for JSONL, single-file, multi-session, companion sidecar, provider lookup, and virtual source-path handling with focused JSONL tests.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 8m2s), codex_security (codex/security, failed, 3s) | Total: 8m5s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant