Teach federate_auto to recognize schema and component names#57
Teach federate_auto to recognize schema and component names#57maxkle1nz wants to merge 1 commit into
Conversation
…ct artifacts OpenAPI and proto contract discovery was already useful at the service and operation level, but many real cross-repo seams are carried by schema names rather than only by routes or service identifiers. This change teaches federate_auto to extract and match those schema/component names too. The new behavior stays lightweight and local-first: .proto messages/enums and OpenAPI components.schemas become additional contract tokens that can promote a nearby repo into the federation set when the current workspace already references them. Constraint: Must keep schema discovery heuristic and dependency-free inside the current MCP crate Rejected: Full schema graph/model parser layer in this slice | too large for the next grounded step Confidence: high Scope-risk: moderate Reversibility: clean Directive: Schema/component tokens should strengthen contract discovery, not replace stronger path/manifest evidence or future semantic graph approaches Tested: cargo fmt --check; cargo check -p m1nd-mcp -p m1nd-ingest; cargo test -p m1nd-ingest -p m1nd-mcp -- --nocapture; cargo clippy -p m1nd-mcp -p m1nd-ingest -- -D warnings; real MCP stdio smoke with openapi schema/component fixture Not-tested: Live field smoke against a real external OpenAPI/proto schema repo
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ffc8a786e2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| if lower.contains("components:") && lower.contains("schemas:") { | ||
| if let Ok(schema_regex) = Regex::new(r#"(?m)^\s{4}([A-Za-z0-9_.-]+)\s*:\s*$"#) { |
There was a problem hiding this comment.
Restrict schema token regex to components.schemas block
The new OpenAPI schema extractor matches any key at exactly four spaces (^\s{4}...:) once components: and schemas: exist anywhere in the file, so it also captures path-method keys like get/post in normal OpenAPI YAML. Those generic tokens are then fed into contract_token_appears_in_content, which allows plain substring matches, so unrelated repos can be flagged as openapi_contract_match candidates just because their code contains common words like get.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR strengthens federate_auto’s contract-artifact discovery by promoting schema-/component-level signals (protobuf message/enum names and OpenAPI components.schemas) into the evidence used to identify sibling repos, and updates documentation to reflect the expanded “schema lane”.
Changes:
- Extend protobuf contract token extraction to include
messageandenumidentifiers. - Add OpenAPI contract token extraction (operationIds + attempted
components.schemaskeys) and route token inclusion, and use it foryaml|yml|jsonartifacts. - Update README/wiki/changelog/tasknotes and regenerate
docs/wiki-buildoutputs accordingly.
Reviewed changes
Copilot reviewed 32 out of 34 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Updates federate_auto description to mention schema/component discovery. |
| m1nd-mcp/src/audit_handlers.rs | Adds proto message/enum extraction, introduces OpenAPI token extraction, updates artifact handling, and adds tests. |
| docs/wiki/src/api-reference/exploration.md | Updates federate_auto documentation wording to mention schema/components. |
| docs/wiki-build/tutorials/quickstart.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/tutorials/multi-agent.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/tutorials/first-query.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/searcher-c2a407aa.js | Regenerated mdBook output (searchindex filename reference update). |
| docs/wiki-build/print.html | Regenerated mdBook output (searchindex hash + federate_auto wording update). |
| docs/wiki-build/introduction.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/index.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/faq.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/concepts/xlr-noise-cancellation.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/concepts/structural-holes.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/concepts/spreading-activation.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/concepts/hebbian-plasticity.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/changelog.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/benchmarks.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/architecture/overview.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/architecture/mcp-server.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/architecture/ingest.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/architecture/graph-engine.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/api-reference/perspectives.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/api-reference/overview.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/api-reference/memory.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/api-reference/lifecycle.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/api-reference/exploration.html | Regenerated mdBook output (searchindex hash + federate_auto wording update). |
| docs/wiki-build/api-reference/analysis.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/api-reference/activation.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/wiki-build/404.html | Regenerated mdBook output (searchindex hash reference update). |
| docs/AGENT-TASKNOTES.md | Updates tasknotes to reflect schema/component token discovery scope. |
| CHANGELOG.md | Documents new contract-artifact discovery, but currently includes duplicated bullet blocks. |
| .github/wiki/API-Reference.md | Updates tool list wording for m1nd_federate_auto to mention schema/components. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if lower.contains("components:") && lower.contains("schemas:") { | ||
| if let Ok(schema_regex) = Regex::new(r#"(?m)^\s{4}([A-Za-z0-9_.-]+)\s*:\s*$"#) { | ||
| for captures in schema_regex.captures_iter(content) { | ||
| if let Some(value) = captures.get(1).map(|m| m.as_str().trim()) { | ||
| if !value.is_empty() { |
There was a problem hiding this comment.
extract_openapi_contract_tokens collects schema/component names via a regex that scans the entire document for any line with exactly 4-space indentation and a trailing :. In typical OpenAPI YAML this will also match HTTP method keys like get:/post: under paths, which are very common tokens and can cause many false-positive repo matches (because contract_token_appears_in_content starts with a substring contains). Consider restricting extraction to the components.schemas block (e.g., parse YAML/JSON and read components.schemas keys, or at least slice the text to the schemas: section and stop when indentation decreases).
| "yaml" | "yml" | "json" => { | ||
| let openapi_tokens = extract_openapi_contract_tokens(&content); | ||
| if openapi_tokens.is_empty() { | ||
| ( | ||
| extract_mcp_tool_tokens(&content), | ||
| "mcp_tool_contract_match", | ||
| "medium", | ||
| ) | ||
| } else { | ||
| (openapi_tokens, "openapi_contract_match", "medium") | ||
| } |
There was a problem hiding this comment.
OpenAPI schema discovery is applied to yaml|yml|json, but the current schema-name regex only matches YAML indentation patterns and won't find components.schemas keys in JSON OpenAPI documents. If JSON support is intended here, consider parsing JSON and extracting components.schemas keys (or add a JSON-specific pattern) so .json specs get the same schema/component evidence as .yaml/.yml.
| - now also discovers sibling repos from local manifest/workspace signals such as: | ||
| - `Cargo.toml` path dependencies | ||
| - `Cargo.toml` workspace members | ||
| - `package.json` workspaces and `file:` dependencies | ||
| - `pnpm-workspace.yaml` package globs | ||
| - `pyproject.toml` workspace/path dependencies | ||
| - `go.work` use directives | ||
| - now also discovers sibling repos from import/package-name matches against | ||
| nearby repo identities, even when no path-style hint exists | ||
| - now also discovers sibling repos from contract artifacts such as: | ||
| - `.proto` package/service definitions | ||
| - `.proto` messages and enums | ||
| - MCP tool-name surfaces in nearby providers | ||
| - OpenAPI/Swagger `operationId`, routes, and `components.schemas` | ||
| - `pyproject.toml` workspace/path dependencies | ||
| - `go.work` use directives | ||
| - now also discovers sibling repos from import/package-name matches against | ||
| nearby repo identities, even when no path-style hint exists |
There was a problem hiding this comment.
The new federate_auto changelog bullets are duplicated: manifest/workspace signals and import/package-name discovery are already described above, and the contract-artifacts list is repeated multiple times in this same section. Please dedupe this block so each capability is listed once (otherwise the changelog reads like repeated copy/paste).
Summary
federate_autowith schema/component discovery for contract artifacts.protomessages/enums and OpenAPIcomponents.schemasinto contract evidenceWhy this matters
Service and contract discovery improved a lot with
.proto, MCP tools, and OpenAPIoperationIds, but many real repo boundaries are expressed at the schema/component layer. This PR adds that next step.What changed
m1nd-mcp/src/audit_handlers.rsmessageandenumnamescomponents.schemasValidation
cargo fmt --checkcargo check -p m1nd-mcp -p m1nd-ingestcargo test -p m1nd-ingest -p m1nd-mcp -- --nocapturecargo clippy -p m1nd-mcp -p m1nd-ingest -- -D warningsReal MCP smoke
OpenAPI schema fixture via real stdio MCP:
UserProfileandlistUsersopenapi.yamlwithoperationId: listUsersandcomponents.schemas.UserProfilefederate_auto(execute=false)discovered one repoevidence_types = ["openapi_contract_match"]sampled_pathsincluded bothUserProfileandlistUsersskipped_paths = 0Remaining scope intentionally left out