Skip to content

Refs #406: Add public discovery routes#552

Open
ifanatics-media wants to merge 3 commits into
ramimbo:mainfrom
ifanatics-media:codex/406-discovery-routes
Open

Refs #406: Add public discovery routes#552
ifanatics-media wants to merge 3 commits into
ramimbo:mainfrom
ifanatics-media:codex/406-discovery-routes

Conversation

@ifanatics-media
Copy link
Copy Markdown

@ifanatics-media ifanatics-media commented May 28, 2026

Refs #406

Summary

  • Add bounded public discovery routes for /robots.txt, /sitemap.xml, and /favicon.ico.
  • Link the base template to /favicon.ico so browsers stop probing a missing icon path.
  • Keep the sitemap conservative: stable public entry points only, using MERGEWORK_PUBLIC_BASE_URL rather than the request/test host.

Evidence

Live production smoke before the fix, using unauthenticated public requests only:

  • GET https://mrwk.ltclab.site/robots.txt -> HTTP 404 JSON {"detail":"Not Found"}
  • GET https://mrwk.ltclab.site/sitemap.xml -> HTTP 404 JSON {"detail":"Not Found"}
  • GET https://mrwk.ltclab.site/favicon.ico -> HTTP 404 JSON {"detail":"Not Found"}
  • API-host discovery routes also returned bounded JSON 404s.

This is a small browser/crawler polish fix: standard discovery URLs should either exist intentionally or fail closed. The current behavior is bounded, but it creates routine browser 404 noise and gives crawlers/agents no sitemap entry point for public docs, bounties, ledger, wallet, and status pages.

Live bounty preflight for #406 / internal bounty 66 showed status=open, awards_remaining=15, and no active attempts.

Validation

  • .\.venv\Scripts\python.exe -m pytest tests\test_api_mcp.py::test_public_discovery_routes_are_bounded_and_use_public_origin tests\test_api_mcp.py::test_head_requests_match_get_routes_without_body -q -> 2 passed
  • .\.venv\Scripts\python.exe -m pytest tests\test_api_mcp.py tests\test_hub.py -q -> 81 passed
  • .\.venv\Scripts\python.exe -m pytest -q -> 415 passed
  • .\.venv\Scripts\python.exe -m ruff check . -> passed
  • .\.venv\Scripts\python.exe -m ruff format --check . -> 79 files already formatted
  • .\.venv\Scripts\python.exe -m mypy app -> success
  • .\.venv\Scripts\python.exe scripts\docs_smoke.py -> docs smoke ok
  • git diff --check -> clean

No secrets, wallet material, private keys, tokens, cookies, OAuth state, private data, production mutation, price claims, liquidity claims, exchange claims, bridge promises, or private security details are included.

Summary by CodeRabbit

  • New Features

    • Added /sitemap.xml, /robots.txt, and /favicon.ico endpoints to improve SEO and site discoverability
    • Added favicon link in site header
    • Discovery endpoints use a configured public base URL and normalize it to avoid malformed or double-slash URLs
  • Tests

    • Added tests verifying discovery endpoints return correct content and headers, respond to HEAD with empty bodies, are excluded from the API schema, and respect base URL normalization

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: f82189fb-8e43-4643-8f81-160e248d3554

📥 Commits

Reviewing files that changed from the base of the PR and between 2a1de8a and d405c15.

📒 Files selected for processing (2)
  • app/main.py
  • tests/test_api_mcp.py

📝 Walkthrough

Walkthrough

Adds three SEO discovery endpoints (/robots.txt, /sitemap.xml, /favicon.ico) to FastAPI with inline favicon content and sitemap configuration, links the favicon in the base template, and tests that all routes return expected content using the public base URL.

Changes

Public discovery routes

Layer / File(s) Summary
SEO route implementation and constants
app/main.py
Imports xml_escape, defines PUBLIC_SITEMAP_PATHS list and FAVICON_SVG content, and registers three non-schema routes that serve robots directives, dynamically generated XML sitemap, and SVG favicon using the configured public base URL.
Template favicon link
app/templates/base.html
Adds <link rel="icon"> tag referencing /favicon.ico with SVG image type in the template head.
Discovery route test coverage
tests/test_api_mcp.py
Tests all three endpoints for correct status codes, content types, exact/partial body matches (including public URLs and absence of test-host), normalization of configured public base URL, omission from OpenAPI paths, and validates HEAD requests return empty bodies.
🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Refs #406: Add public discovery routes' clearly names the changed surface (discovery routes) and is directly related to the main changeset.
Description check ✅ Passed The description includes all required template sections with substantive content: Summary (features added), Evidence (production behavior and bounty preflight), and Test Evidence (all checks marked and detailed validation results provided).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Mergework Public Artifact Hygiene ✅ Passed No investment claims, price claims, cash-out/off-ramp claims, fabricated payouts, or private security details found in PR files (main.py, base.html, test_api_mcp.py) or description.
Bounty Pr Focus ✅ Passed PR #552 (Refs #406) correctly implements discovery routes with focused scope: three routes, one template change, two tests. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: f12fc1fa-de83-4ee4-8420-87ea4d8ec7a8

📥 Commits

Reviewing files that changed from the base of the PR and between d8532d4 and 33f56be.

📒 Files selected for processing (3)
  • app/main.py
  • app/templates/base.html
  • tests/test_api_mcp.py

Comment thread tests/test_api_mcp.py
Copy link
Copy Markdown

@Baijack-star Baijack-star left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for a narrow test-coverage gap before merge.

The route implementation itself looks correct in the slice I checked: /robots.txt, /sitemap.xml, and /favicon.ico return bounded 200 responses; the sitemap uses MERGEWORK_PUBLIC_BASE_URL rather than the test/request host; the base template links /favicon.ico; and the routes are currently absent from /openapi.json because they are registered with include_in_schema=False.

The missing piece is regression coverage for that last contract. Since these are public browser/crawler discovery routes and intentionally hidden from the API schema, please add assertions to test_public_discovery_routes_are_bounded_and_use_public_origin (or a focused companion test) that /robots.txt, /sitemap.xml, and /favicon.ico are not present in client.get("/openapi.json").json()["paths"]. This matches the CodeRabbit pre-merge warning and prevents a future refactor from accidentally exposing these non-API routes in OpenAPI.

Validation I ran on head 33f56be6bffbaa16110d3af5b9b7cec51c537a62:

  • focused discovery/HEAD tests -> 2 passed
  • tests/test_api_mcp.py tests/test_hub.py -> 81 passed
  • route smoke: all three discovery routes returned 200 with expected content types and / rendered the favicon link
  • OpenAPI probe: /robots.txt, /sitemap.xml, and /favicon.ico are currently absent from paths
  • scoped Ruff check/format on Python files passed
  • mypy app/main.py passed
  • docs smoke passed
  • git diff --check origin/main...HEAD clean

@ifanatics-media
Copy link
Copy Markdown
Author

Addressed in 2a1de8a by extending test_public_discovery_routes_are_bounded_and_use_public_origin to assert /robots.txt, /sitemap.xml, and /favicon.ico stay absent from /openapi.json paths.

Validation after the update:

  • .\.venv\Scripts\python.exe -m pytest tests\test_api_mcp.py::test_public_discovery_routes_are_bounded_and_use_public_origin tests\test_api_mcp.py::test_head_requests_match_get_routes_without_body -q -> 2 passed
  • .\.venv\Scripts\python.exe -m pytest tests\test_api_mcp.py tests\test_hub.py -q -> 81 passed
  • .\.venv\Scripts\python.exe -m pytest -q -> 415 passed
  • .\.venv\Scripts\python.exe -m ruff check . -> passed
  • .\.venv\Scripts\python.exe -m ruff format --check . -> 79 files already formatted
  • .\.venv\Scripts\python.exe -m mypy app -> success
  • .\.venv\Scripts\python.exe scripts\docs_smoke.py -> docs smoke ok
  • git diff --check -> clean

Copy link
Copy Markdown

@yunrongy424-oss yunrongy424-oss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for one small URL-normalization edge case.

/sitemap.xml already normalizes settings.public_base_url with rstrip(/), but /robots.txt builds the sitemap URL from the raw setting. Deploy validation allows MERGEWORK_PUBLIC_BASE_URL with path /, so https://mrwk.example.test/ is a valid origin-style setting. With that setting, the current route returns:

Sitemap: https://mrwk.example.test//sitemap.xml

That is avoidable crawler/discovery noise and inconsistent with the sitemap route's normalized origin. Please reuse a stripped base URL in robots_txt() and add a regression assertion for the trailing-slash setting.

Evidence on head 33f56be6bffbaa16110d3af5b9b7cec51c537a62:

  • focused discovery/HEAD tests -> 2 passed
  • scoped Ruff check/format on app/main.py and tests/test_api_mcp.py -> passed
  • mypy app/main.py -> passed
  • docs smoke -> ok
  • git diff --check origin/main...HEAD -> clean
  • extra probe with MERGEWORK_PUBLIC_BASE_URL=https://mrwk.example.test/ reproduced the doubled sitemap slash above

No secrets, wallet material, private deployment values, private vulnerability details, live mutation, price claims, liquidity claims, or off-ramp claims were used.

@ifanatics-media
Copy link
Copy Markdown
Author

Addressed the trailing-slash base URL edge in d405c15 by normalizing the base URL in robots_txt() before appending /sitemap.xml, matching the sitemap route. I also added test_public_discovery_routes_normalize_public_base_url, which sets MERGEWORK_PUBLIC_BASE_URL=https://mrwk.example.test/ and verifies neither robots nor sitemap emits doubled slashes.

Validation after this update:

  • .\.venv\Scripts\python.exe -m pytest tests\test_api_mcp.py::test_public_discovery_routes_are_bounded_and_use_public_origin tests\test_api_mcp.py::test_public_discovery_routes_normalize_public_base_url tests\test_api_mcp.py::test_head_requests_match_get_routes_without_body -q -> 3 passed
  • .\.venv\Scripts\python.exe -m pytest tests\test_api_mcp.py tests\test_hub.py -q -> 82 passed
  • .\.venv\Scripts\python.exe -m pytest -q -> 416 passed
  • .\.venv\Scripts\python.exe -m ruff check . -> passed
  • .\.venv\Scripts\python.exe -m ruff format --check . -> 79 files already formatted
  • .\.venv\Scripts\python.exe -m mypy app -> success
  • .\.venv\Scripts\python.exe scripts\docs_smoke.py -> docs smoke ok
  • git diff --check -> clean

Copy link
Copy Markdown

@yunrongy424-oss yunrongy424-oss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed current head d405c15ab50b962fa4c76346bab5da87ef8bf13d after the follow-up fix.

The trailing-slash edge I flagged is resolved: robots_txt() now normalizes settings.public_base_url before appending /sitemap.xml, and the new regression test covers MERGEWORK_PUBLIC_BASE_URL=https://mrwk.example.test/ without emitting doubled slashes in either robots or sitemap output.

Validation on the current head:

  • focused discovery/normalization/HEAD tests -> 3 passed
  • Ruff check/format on app/main.py and tests/test_api_mcp.py -> passed
  • docs smoke -> ok

No remaining blocker in my reviewed slice.

Copy link
Copy Markdown

@Baijack-star Baijack-star left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed current head d405c15ab50b962fa4c76346bab5da87ef8bf13d after the two follow-up commits.

My previous blocker is resolved: test_public_discovery_routes_are_bounded_and_use_public_origin now asserts /robots.txt, /sitemap.xml, and /favicon.ico stay absent from /openapi.json, so these public browser/crawler routes remain outside the API schema contract.

I also checked the later trailing-slash fix from this head. robots_txt() now normalizes settings.public_base_url before appending /sitemap.xml, matching the sitemap route, and the new regression covers MERGEWORK_PUBLIC_BASE_URL=https://mrwk.example.test/ without doubled slashes in robots or sitemap output.

Validation run locally on current head:

  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 ./.venv/bin/python -m pytest tests/test_api_mcp.py::test_public_discovery_routes_are_bounded_and_use_public_origin tests/test_api_mcp.py::test_public_discovery_routes_normalize_public_base_url tests/test_api_mcp.py::test_head_requests_match_get_routes_without_body -q -> 3 passed
  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 ./.venv/bin/python -m pytest tests/test_api_mcp.py tests/test_hub.py -q -> 82 passed
  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 ./.venv/bin/python scripts/docs_smoke.py -> docs smoke ok
  • ./.venv/bin/python -m mypy app/main.py -> success
  • ./.venv/bin/python -m ruff check app/main.py tests/test_api_mcp.py -> passed
  • ./.venv/bin/python -m ruff format --check app/main.py tests/test_api_mcp.py -> already formatted
  • git diff --check origin/main...HEAD -> clean
  • ad hoc TestClient smoke with MERGEWORK_PUBLIC_BASE_URL=https://mrwk.example.test/ confirmed all three discovery routes return 200, no doubled sitemap URL is emitted, and all three remain absent from OpenAPI.

No remaining blocker in my reviewed slice.

@tinyopsstudio
Copy link
Copy Markdown

Reviewed PR #552 at d405c15ab50b962fa4c76346bab5da87ef8bf13d for the public discovery routes.

Evidence checked:

  • inspected app/main.py, app/templates/base.html, and tests/test_api_mcp.py;
  • confirmed /robots.txt, /sitemap.xml, and /favicon.ico are registered with include_in_schema=False, so they stay out of the OpenAPI contract;
  • confirmed robots and sitemap use settings.public_base_url.rstrip("/"), avoiding test-host leakage and doubled slashes when MERGEWORK_PUBLIC_BASE_URL has a trailing slash;
  • confirmed the base template points browsers at /favicon.ico, and the HEAD middleware leaves discovery-route bodies empty on HEAD requests.

Validation:

  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run --extra dev python -m pytest tests/test_api_mcp.py::test_public_discovery_routes_are_bounded_and_use_public_origin tests/test_api_mcp.py::test_public_discovery_routes_normalize_public_base_url tests/test_api_mcp.py::test_head_requests_match_get_routes_without_body -q -> 3 passed
  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run --extra dev python -m pytest tests/test_api_mcp.py tests/test_hub.py -q -> 82 passed
  • PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run --extra dev python -m pytest -q -> 416 passed
  • uv run --extra dev ruff check app/main.py tests/test_api_mcp.py -> passed
  • uv run --extra dev ruff format --check app/main.py tests/test_api_mcp.py -> 2 files already formatted
  • git diff --check origin/main...HEAD -> clean

Assessment: no blocker found. The change is bounded to browser/crawler discovery surfaces and keeps the API schema unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants