Skip to content

Clawmark: lower catalog trust gate to the 0-5 scale (threshold 3.0)#12

Merged
ryan10sa-star merged 3 commits into
mainfrom
claude/laughing-thompson-QNxrw
May 22, 2026
Merged

Clawmark: lower catalog trust gate to the 0-5 scale (threshold 3.0)#12
ryan10sa-star merged 3 commits into
mainfrom
claude/laughing-thompson-QNxrw

Conversation

@ryan10sa-star

@ryan10sa-star ryan10sa-star commented May 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

Companion to aria-registry#8. ARIA is migrating Clawmark scores to the canonical 0–5 scale (CANONICAL_CLAWMARK_STANDARD.md). The Carapace SDK catalog gate compared clawmark_score against 80 (the old 0–100 scale) — once ARIA serves 0–5 scores, run_gate_check / runGateCheck would fail-closed for every tool.

Scope note: the original Phase B spec said "do not change the Carapace SDK". The maintainer explicitly authorized this change — leaving the threshold at 80 guarantees a broken gate the moment ARIA flips to 0–5, so the SDK gate must move in lockstep.

The gate threshold is lowered to 3.0 (= beta or above).

Changes

  • carapace/catalog.pyCatalogEntry.clawmark_score is now float; from_dict coerces ARIA's null (unscored) to 0.0; run_gate_check score_threshold 803.0.
  • typescript/src/catalog.tsCatalogEntry.clawmark_score is now number | null; runGateCheck scoreThreshold 803.0; the gate treats null/unscored as 0.
  • tests/test_v05_phase_b.py — catalog fixtures converted 0–100 → 0–5 (85→4.5, 50→2.0, 90→4.8, 75/80→4.0); custom-threshold test uses 1.0; new test_from_dict_handles_null_score.
  • typescript/test/v05_receipts.test.jsmakeCatalogState fixtures converted 0–100 → 0–5.

Notes

  • certification_tier is left on CatalogEntry for backward compatibility — ARIA no longer emits it (it becomes ""), but nothing breaks.
  • No change to gate ordering, fail-open semantics, or the receipt API.

Test plan

  • pytest tests/test_v05_phase_b.py
  • node --test typescript/test/v05_receipts.test.js (after tsc build of typescript/)
  • Confirm against aria-registry#8 that /aria/v1/catalog emits 0–5 clawmark_score

https://claude.ai/code/session_01Sm3am5LeiJqJgEkW7oD2vu


Generated by Claude Code


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag @codesmith with what you need. Autofix is disabled.

ARIA now serves Clawmark scores on the canonical 0-5 scale
(CANONICAL_CLAWMARK_STANDARD.md). The Carapace SDK catalog gate compared
against 80 (the old 0-100 scale), which would fail-closed for every tool
once ARIA flips to 0-5.

- catalog.py: CatalogEntry.clawmark_score is now float; from_dict coerces
  ARIA's null (unscored) to 0.0; run_gate_check score_threshold 80 -> 3.0.
- catalog.ts: CatalogEntry.clawmark_score is now `number | null`;
  runGateCheck scoreThreshold 80 -> 3.0; gate treats null as 0.
- test_v05_phase_b.py: catalog fixtures converted 0-100 -> 0-5
  (85->4.5, 50->2.0, 90->4.8, 75/80->4.0); custom-threshold test uses
  1.0; new test for the null-score coercion.

Companion to aria-registry#8.
makeCatalogState used 0-100 scores (85/50/90); on the 0-5 scale a 50
would pass the gate, so the low-score fixture no longer exercised the
clawmark_gate failure. 85->4.5, 50->2.0, 90->4.8.

Companion to aria-registry#8.
The CI `python` job runs pytest in `python/`, which holds a second copy
of the SDK (python/carapace, python/tests) byte-identical to the repo
root. The previous commit only updated the root copy, so the CI-tested
package still gated at 80 against the 0-5 test fixtures — the python
job failed.

Apply the same 0-5 threshold change (score_threshold 80 -> 3.0, float
clawmark_score, null-score coercion, fixtures converted to 0-5) to
python/carapace/catalog.py and python/tests/test_v05_phase_b.py so the
two copies stay in sync.
@ryan10sa-star ryan10sa-star marked this pull request as ready for review May 22, 2026 02:32
Copilot AI review requested due to automatic review settings May 22, 2026 02:32
@ryan10sa-star ryan10sa-star merged commit 32b8a0e into main May 22, 2026
1 of 2 checks passed

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@ryan10sa-star ryan10sa-star deleted the claude/laughing-thompson-QNxrw branch June 14, 2026 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants