Skip to content

Teach federate_auto to recognize schema and component names#57

Open
maxkle1nz wants to merge 1 commit into
mainfrom
codex/m1nd-schema-contracts
Open

Teach federate_auto to recognize schema and component names#57
maxkle1nz wants to merge 1 commit into
mainfrom
codex/m1nd-schema-contracts

Conversation

@maxkle1nz
Copy link
Copy Markdown
Owner

Summary

  • extend federate_auto with schema/component discovery for contract artifacts
  • promote .proto messages/enums and OpenAPI components.schemas into contract evidence
  • align README, changelog, tasknotes, wiki source, and wiki build with the stronger schema lane

Why this matters

Service and contract discovery improved a lot with .proto, MCP tools, and OpenAPI operationIds, but many real repo boundaries are expressed at the schema/component layer. This PR adds that next step.

What changed

  • m1nd-mcp/src/audit_handlers.rs
    • extend proto contract extraction with message and enum names
    • extend OpenAPI contract extraction with components.schemas
    • add focused tests for proto message matching and OpenAPI schema matching
  • docs
    • refresh public wording and tasknotes to reflect schema/component discovery

Validation

  • cargo fmt --check
  • cargo check -p m1nd-mcp -p m1nd-ingest
  • cargo test -p m1nd-ingest -p m1nd-mcp -- --nocapture
  • cargo clippy -p m1nd-mcp -p m1nd-ingest -- -D warnings

Real MCP smoke

OpenAPI schema fixture via real stdio MCP:

  • current repo contains UserProfile and listUsers
  • sibling repo contains openapi.yaml with operationId: listUsers and components.schemas.UserProfile
  • federate_auto(execute=false) discovered one repo
  • evidence_types = ["openapi_contract_match"]
  • sampled_paths included both UserProfile and listUsers
  • skipped_paths = 0

Remaining scope intentionally left out

  • full schema graph/model semantics
  • richer field/property-level matching inside OpenAPI/proto
  • live field smoke against a real external schema repo

…ct artifacts

OpenAPI and proto contract discovery was already useful at the service and operation level, but many real cross-repo seams are carried by schema names rather than only by routes or service identifiers. This change teaches federate_auto to extract and match those schema/component names too.

The new behavior stays lightweight and local-first: .proto messages/enums and OpenAPI components.schemas become additional contract tokens that can promote a nearby repo into the federation set when the current workspace already references them.

Constraint: Must keep schema discovery heuristic and dependency-free inside the current MCP crate
Rejected: Full schema graph/model parser layer in this slice | too large for the next grounded step
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: Schema/component tokens should strengthen contract discovery, not replace stronger path/manifest evidence or future semantic graph approaches
Tested: cargo fmt --check; cargo check -p m1nd-mcp -p m1nd-ingest; cargo test -p m1nd-ingest -p m1nd-mcp -- --nocapture; cargo clippy -p m1nd-mcp -p m1nd-ingest -- -D warnings; real MCP stdio smoke with openapi schema/component fixture
Not-tested: Live field smoke against a real external OpenAPI/proto schema repo
Copilot AI review requested due to automatic review settings April 5, 2026 18:12
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ffc8a786e2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

if lower.contains("components:") && lower.contains("schemas:") {
if let Ok(schema_regex) = Regex::new(r#"(?m)^\s{4}([A-Za-z0-9_.-]+)\s*:\s*$"#) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict schema token regex to components.schemas block

The new OpenAPI schema extractor matches any key at exactly four spaces (^\s{4}...:) once components: and schemas: exist anywhere in the file, so it also captures path-method keys like get/post in normal OpenAPI YAML. Those generic tokens are then fed into contract_token_appears_in_content, which allows plain substring matches, so unrelated repos can be flagged as openapi_contract_match candidates just because their code contains common words like get.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens federate_auto’s contract-artifact discovery by promoting schema-/component-level signals (protobuf message/enum names and OpenAPI components.schemas) into the evidence used to identify sibling repos, and updates documentation to reflect the expanded “schema lane”.

Changes:

  • Extend protobuf contract token extraction to include message and enum identifiers.
  • Add OpenAPI contract token extraction (operationIds + attempted components.schemas keys) and route token inclusion, and use it for yaml|yml|json artifacts.
  • Update README/wiki/changelog/tasknotes and regenerate docs/wiki-build outputs accordingly.

Reviewed changes

Copilot reviewed 32 out of 34 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
README.md Updates federate_auto description to mention schema/component discovery.
m1nd-mcp/src/audit_handlers.rs Adds proto message/enum extraction, introduces OpenAPI token extraction, updates artifact handling, and adds tests.
docs/wiki/src/api-reference/exploration.md Updates federate_auto documentation wording to mention schema/components.
docs/wiki-build/tutorials/quickstart.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/tutorials/multi-agent.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/tutorials/first-query.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/searcher-c2a407aa.js Regenerated mdBook output (searchindex filename reference update).
docs/wiki-build/print.html Regenerated mdBook output (searchindex hash + federate_auto wording update).
docs/wiki-build/introduction.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/index.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/faq.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/concepts/xlr-noise-cancellation.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/concepts/structural-holes.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/concepts/spreading-activation.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/concepts/hebbian-plasticity.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/changelog.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/benchmarks.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/architecture/overview.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/architecture/mcp-server.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/architecture/ingest.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/architecture/graph-engine.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/api-reference/perspectives.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/api-reference/overview.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/api-reference/memory.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/api-reference/lifecycle.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/api-reference/exploration.html Regenerated mdBook output (searchindex hash + federate_auto wording update).
docs/wiki-build/api-reference/analysis.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/api-reference/activation.html Regenerated mdBook output (searchindex hash reference update).
docs/wiki-build/404.html Regenerated mdBook output (searchindex hash reference update).
docs/AGENT-TASKNOTES.md Updates tasknotes to reflect schema/component token discovery scope.
CHANGELOG.md Documents new contract-artifact discovery, but currently includes duplicated bullet blocks.
.github/wiki/API-Reference.md Updates tool list wording for m1nd_federate_auto to mention schema/components.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1634 to +1638
if lower.contains("components:") && lower.contains("schemas:") {
if let Ok(schema_regex) = Regex::new(r#"(?m)^\s{4}([A-Za-z0-9_.-]+)\s*:\s*$"#) {
for captures in schema_regex.captures_iter(content) {
if let Some(value) = captures.get(1).map(|m| m.as_str().trim()) {
if !value.is_empty() {
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract_openapi_contract_tokens collects schema/component names via a regex that scans the entire document for any line with exactly 4-space indentation and a trailing :. In typical OpenAPI YAML this will also match HTTP method keys like get:/post: under paths, which are very common tokens and can cause many false-positive repo matches (because contract_token_appears_in_content starts with a substring contains). Consider restricting extraction to the components.schemas block (e.g., parse YAML/JSON and read components.schemas keys, or at least slice the text to the schemas: section and stop when indentation decreases).

Copilot uses AI. Check for mistakes.
Comment on lines +1768 to +1778
"yaml" | "yml" | "json" => {
let openapi_tokens = extract_openapi_contract_tokens(&content);
if openapi_tokens.is_empty() {
(
extract_mcp_tool_tokens(&content),
"mcp_tool_contract_match",
"medium",
)
} else {
(openapi_tokens, "openapi_contract_match", "medium")
}
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenAPI schema discovery is applied to yaml|yml|json, but the current schema-name regex only matches YAML indentation patterns and won't find components.schemas keys in JSON OpenAPI documents. If JSON support is intended here, consider parsing JSON and extracting components.schemas keys (or add a JSON-specific pattern) so .json specs get the same schema/component evidence as .yaml/.yml.

Copilot uses AI. Check for mistakes.
Comment thread CHANGELOG.md
Comment on lines 20 to +37
- now also discovers sibling repos from local manifest/workspace signals such as:
- `Cargo.toml` path dependencies
- `Cargo.toml` workspace members
- `package.json` workspaces and `file:` dependencies
- `pnpm-workspace.yaml` package globs
- `pyproject.toml` workspace/path dependencies
- `go.work` use directives
- now also discovers sibling repos from import/package-name matches against
nearby repo identities, even when no path-style hint exists
- now also discovers sibling repos from contract artifacts such as:
- `.proto` package/service definitions
- `.proto` messages and enums
- MCP tool-name surfaces in nearby providers
- OpenAPI/Swagger `operationId`, routes, and `components.schemas`
- `pyproject.toml` workspace/path dependencies
- `go.work` use directives
- now also discovers sibling repos from import/package-name matches against
nearby repo identities, even when no path-style hint exists
Copy link

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new federate_auto changelog bullets are duplicated: manifest/workspace signals and import/package-name discovery are already described above, and the contract-artifacts list is repeated multiple times in this same section. Please dedupe this block so each capability is listed once (otherwise the changelog reads like repeated copy/paste).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants