From 3e5fa91d9966574010d8f8f1301f3f1ca2384230 Mon Sep 17 00:00:00 2001 From: jlin2 Date: Thu, 28 May 2026 09:56:31 -0700 Subject: [PATCH 1/2] Add deprecate action and staleness check to BitLesson workflow BitLesson's knowledge base was effectively append-only: the round stop gate validates only the Delta format, with no way to retire a superseded lesson and no check that an existing lesson's cited paths still resolve. After a refactor, entries silently rot, and a stale lesson handed to an implementer is worse than none. deprecate: - bitlesson-validate-delta.sh accepts `Action: deprecate`, routed through the existing concrete-ID + Notes checks so it references a real entry. - Deprecation is a tombstone, not a delete: mark the entry `Status: deprecated` and keep it for history; bitlesson-select.sh never selects a deprecated entry. - Contract text updated across docs/bitlesson.md, commands/start-rlcr-loop.md, templates/bitlesson.md, setup-rlcr-loop.sh, loop-codex-stop-hook.sh, and the bitlesson-delta-{invalid,missing} block templates. staleness: - New scripts/bitlesson-staleness.sh reports entries whose cited file references no longer resolve under the project root. Advisory by default, --strict exits non-zero, deprecated entries skipped. - Detection is extension-anchored for precision: it ignores prose slashes and ratios (GO/NO-GO, 248/275), fenced template blocks, and ellipses. tests: - +4 validator cases (deprecate) and a new staleness suite (8 cases), both registered in run-all-tests.sh. Full suite: 2265 pass, 0 fail. Co-Authored-By: Claude Opus 4.7 (1M context) --- commands/start-rlcr-loop.md | 2 +- docs/bitlesson.md | 37 ++- hooks/loop-codex-stop-hook.sh | 4 +- .../block/bitlesson-delta-invalid.md | 1 + .../block/bitlesson-delta-missing.md | 2 +- scripts/bitlesson-select.sh | 3 +- scripts/bitlesson-staleness.sh | 254 ++++++++++++++++++ scripts/bitlesson-validate-delta.sh | 7 +- scripts/setup-rlcr-loop.sh | 6 +- templates/bitlesson.md | 11 + tests/run-all-tests.sh | 4 +- tests/test-bitlesson-staleness.sh | 122 +++++++++ tests/test-bitlesson-validate-delta.sh | 27 ++ 13 files changed, 466 insertions(+), 14 deletions(-) create mode 100755 scripts/bitlesson-staleness.sh create mode 100755 tests/test-bitlesson-staleness.sh diff --git a/commands/start-rlcr-loop.md b/commands/start-rlcr-loop.md index 0c74c07b..cc89ee68 100644 --- a/commands/start-rlcr-loop.md +++ b/commands/start-rlcr-loop.md @@ -174,7 +174,7 @@ Per round requirements: 1. Read `.humanize/bitlesson.md` before execution 2. Run `bitlesson-selector` for each task/sub-task 3. Apply selected lesson IDs (or `NONE`) during implementation -4. Include `## BitLesson Delta` in the round summary with `Action: none|add|update` +4. Include `## BitLesson Delta` in the round summary with `Action: none|add|update|deprecate` If a problem is solved only after multiple rounds, add or update a precise lesson entry in `.humanize/bitlesson.md` (specific problem + specific solution). By default, empty `.humanize/bitlesson.md` does not block `Action: none`; use `--require-bitlesson-entry-for-none` to enforce strict blocking. diff --git a/docs/bitlesson.md b/docs/bitlesson.md index 01bb32e5..d60884b3 100644 --- a/docs/bitlesson.md +++ b/docs/bitlesson.md @@ -39,7 +39,7 @@ Required summary shape: ```markdown ## BitLesson Delta -- Action: none|add|update +- Action: none|add|update|deprecate - Lesson ID(s): - Notes: ``` @@ -47,5 +47,38 @@ Required summary shape: Validation rules are strict: - `Action: none` must use `Lesson ID(s): NONE` or leave the field empty -- `Action: add` and `Action: update` must reference concrete `BL-YYYYMMDD-short-name` IDs that exist in `.humanize/bitlesson.md` +- `Action: add`, `Action: update`, and `Action: deprecate` must reference concrete `BL-YYYYMMDD-short-name` IDs that exist in `.humanize/bitlesson.md` - `--require-bitlesson-entry-for-none` can be used to block empty knowledge bases from repeatedly reporting `none` + +## Deprecating lessons + +The knowledge base would otherwise only grow: when a subsystem is removed or a lesson is +superseded, the entry becomes misleading but there is no contracted way to retire it. +`Action: deprecate` fills that gap. Deprecation is a **tombstone, not a delete**: + +- Keep the entry (so its ID still resolves and the history is preserved) and add a + `Status: deprecated — ` line to it. +- The selector (`scripts/bitlesson-select.sh`) treats any entry with a `Status: deprecated` + line as retired and never selects it for a sub-task. + +## Staleness check + +Lesson *content* (the bug→fix knowledge) usually stays valid across refactors, but the +*references* it cites (`Scope:` paths, `path/to/file.py`, `dir:line`) drift when code moves. +The stop gate validates Delta *format* only — it does not re-check that existing lessons still +point at real files — so after a reorg a lesson can silently rot and a rotted lesson handed to +an implementer is worse than none. + +`scripts/bitlesson-staleness.sh` scans the knowledge base and reports entries whose cited +paths no longer resolve under the project root: + +```bash +scripts/bitlesson-staleness.sh --bitlesson-file .humanize/bitlesson.md +# add --strict to exit non-zero when any entry has unresolved references +``` + +It is **advisory by default** (exit 0). Deprecated entries are skipped. Path detection is +heuristic: it checks slash-bearing paths against the project root and bare filenames +(e.g. `run_infer.py`) anywhere under the root, and ignores glob/brace tokens and illustrative +snippets it cannot resolve. Use it at loop start (or periodically) to find entries that need an +`update` (fix the references) or a `deprecate`. diff --git a/hooks/loop-codex-stop-hook.sh b/hooks/loop-codex-stop-hook.sh index bd35a5dd..ac55a45c 100755 --- a/hooks/loop-codex-stop-hook.sh +++ b/hooks/loop-codex-stop-hook.sh @@ -1536,7 +1536,7 @@ continue_review_loop_with_issues() { - [List unresolved items, if any] ## BitLesson Delta -- Action: none|add|update +- Action: none|add|update|deprecate - Lesson ID(s): NONE - Notes: [what changed and why] EOF @@ -2026,7 +2026,7 @@ if [[ ! -f "$NEXT_SUMMARY_FILE" ]]; then - [List unresolved items, if any] ## BitLesson Delta -- Action: none|add|update +- Action: none|add|update|deprecate - Lesson ID(s): NONE - Notes: [what changed and why] EOF diff --git a/prompt-template/block/bitlesson-delta-invalid.md b/prompt-template/block/bitlesson-delta-invalid.md index 8fb838da..91932b8b 100644 --- a/prompt-template/block/bitlesson-delta-invalid.md +++ b/prompt-template/block/bitlesson-delta-invalid.md @@ -5,3 +5,4 @@ Your `## BitLesson Delta` section exists, but must include one action: - `none` - `add` - `update` +- `deprecate` (retire a superseded lesson: mark its entry `Status: deprecated` and keep it for history) diff --git a/prompt-template/block/bitlesson-delta-missing.md b/prompt-template/block/bitlesson-delta-missing.md index 228024e8..3c0f9cba 100644 --- a/prompt-template/block/bitlesson-delta-missing.md +++ b/prompt-template/block/bitlesson-delta-missing.md @@ -6,7 +6,7 @@ Required minimal format: ```markdown ## BitLesson Delta -- Action: none|add|update +- Action: none|add|update|deprecate - Lesson ID(s): - Notes: ``` diff --git a/scripts/bitlesson-select.sh b/scripts/bitlesson-select.sh index 07f90a30..b8d401a4 100755 --- a/scripts/bitlesson-select.sh +++ b/scripts/bitlesson-select.sh @@ -167,7 +167,8 @@ $BITLESSON_CONTENT 1. Match only lessons that are directly relevant to the sub-task scope and failure mode. 2. Prefer precision over recall: do not include weakly related lessons. 3. If nothing is relevant, return \`NONE\`. -4. Use only the information in this prompt. Do not use tools, shell commands, browser access, MCP servers, or repository inspection. +4. Never select a lesson whose entry contains a \`Status: deprecated\` line; treat it as retired. +5. Use only the information in this prompt. Do not use tools, shell commands, browser access, MCP servers, or repository inspection. ## Output Format (Stable) diff --git a/scripts/bitlesson-staleness.sh b/scripts/bitlesson-staleness.sh new file mode 100755 index 00000000..231c7ea4 --- /dev/null +++ b/scripts/bitlesson-staleness.sh @@ -0,0 +1,254 @@ +#!/usr/bin/env bash +# +# bitlesson-staleness.sh +# +# Advisory scan of a BitLesson knowledge base for entries whose cited file +# references no longer resolve under the project root. Lesson *content* usually +# stays valid across refactors, but the *references* it cites drift when code +# moves. The stop gate validates only the Delta *format*, so a lesson can +# silently rot; this script surfaces it. +# +# Precision over recall (a noisy advisory gets ignored). Detection is anchored +# on a known file extension, because prose almost never produces `word.ext` +# tokens whereas "GO/NO-GO", "validators/gates", or "248/275" are common: +# - `dir/sub/file.py` -> checked verbatim against the project root +# - bare `file.py` -> checked anywhere under the root +# - tokens inside ``` fenced blocks ```, glob/brace tokens, ellipsis +# placeholders, and extensionless prose are ignored +# - entries marked `Status: deprecated` are skipped +# +# Trade-off: extensionless directory references (e.g. a Scope of `src/foo`) are +# not verified; reference a concrete file when you want it checked. +# +# Exit codes: 0 (advisory, default). With --strict: 2 if any entry has +# unresolved references. 1 on usage/IO error. + +set -euo pipefail + +usage() { + cat <<'EOF' >&2 +Usage: + bitlesson-staleness.sh --bitlesson-file [--project-root ] [--strict] + +Options: + --bitlesson-file BitLesson knowledge base (e.g. .humanize/bitlesson.md) + --project-root Root to resolve references against (default: derived + from the bitlesson file's git toplevel / .humanize parent) + --strict Exit 2 when any entry has unresolved references +EOF +} + +BITLESSON_FILE="" +PROJECT_ROOT="" +STRICT="false" + +while [[ $# -gt 0 ]]; do + case "$1" in + --bitlesson-file) + BITLESSON_FILE="${2:-}" + shift 2 + ;; + --project-root) + PROJECT_ROOT="${2:-}" + shift 2 + ;; + --strict) + STRICT="true" + shift + ;; + -h|--help) + usage + exit 0 + ;; + *) + echo "Error: Unknown argument: $1" >&2 + usage + exit 1 + ;; + esac +done + +if [[ -z "$BITLESSON_FILE" ]]; then + echo "Error: --bitlesson-file is required" >&2 + usage + exit 1 +fi + +if [[ ! -f "$BITLESSON_FILE" ]]; then + echo "Error: BitLesson file not found: $BITLESSON_FILE" >&2 + exit 1 +fi + +# Derive project root the same way bitlesson-select.sh does. +if [[ -z "$PROJECT_ROOT" ]]; then + BITLESSON_DIR="$(cd "$(dirname "$BITLESSON_FILE")" && pwd -P)" + if git -C "$BITLESSON_DIR" rev-parse --show-toplevel >/dev/null 2>&1; then + PROJECT_ROOT="$(git -C "$BITLESSON_DIR" rev-parse --show-toplevel)" + elif [[ "$(basename "$BITLESSON_DIR")" == ".humanize" ]]; then + PROJECT_ROOT="$(cd "$BITLESSON_DIR/.." && pwd -P)" + else + PROJECT_ROOT="$BITLESSON_DIR" + fi +fi + +if [[ ! -d "$PROJECT_ROOT" ]]; then + echo "Error: Project root is not a directory: $PROJECT_ROOT" >&2 + exit 1 +fi + +# Extract candidate file references per lesson block. Emits tab-separated records: +# META +# CAND +# where S = slash-bearing path (checked verbatim), F = bare filename (found anywhere). +extract_candidates() { + awk ' + BEGIN { + EXT = "(py|sh|md|json|js|ts|tsx|jsx|yaml|yml|toml|txt|sql|cfg|ini|c|cc|cpp|h|hpp|go|rs|rb|java)" + in_fence = 0 + } + + function flush( i) { + if (label == "") return + key = (id != "" ? id : label) + printf "META\t%s\t%d\n", key, dep + if (!dep) { + for (i = 1; i <= nc; i++) { + printf "CAND\t%s\t%s\t%s\n", ctype[i], key, ctok[i] + } + } + delete ctok; delete ctype; delete seen + nc = 0; dep = 0; id = ""; label = "" + } + + /^```/ { in_fence = !in_fence; next } + /^~~~/ { in_fence = !in_fence; next } + in_fence { next } + + /^##[[:space:]]*Lesson:/ { + flush() + label = $0 + sub(/^##[[:space:]]*Lesson:[[:space:]]*/, "", label) + next + } + + label != "" { + if ($0 ~ /^Lesson ID:/) { + id = $0 + sub(/^Lesson ID:[[:space:]]*/, "", id) + gsub(/^[[:space:]]+|[[:space:]]+$/, "", id) + } + if (tolower($0) ~ /^status:[[:space:]]*deprecated/) { + dep = 1 + } + + line = $0 + # split on markdown/punctuation delimiters (backtick, parens, comma, + # double-quote, angle brackets, semicolon, apostrophe=\047) + gsub(/[`(),"<>;\047]/, " ", line) + n = split(line, toks, /[[:space:]]+/) + for (j = 1; j <= n; j++) { + t = toks[j] + sub(/:[0-9]+(-[0-9]+)?$/, "", t) # strip trailing :line / :line-range + sub(/[.,:;]+$/, "", t) # strip trailing punctuation + sub(/^[.,:;]+/, "", t) # strip leading punctuation + if (t == "") continue + if (index(t, "...") > 0) continue # ellipsis / abbreviated example path + if (index(t, "//") > 0) continue # URL-ish + if (t ~ /^-/) continue # flag-like + if (index(t,"*")||index(t,"?")||index(t,"[")||index(t,"]")|| \ + index(t,"{")||index(t,"}")||index(t,"=")||index(t,"|")|| \ + index(t,"$")||index(t,"@")||index(t,"!")) continue + # require a known file extension at the END only; a known + # extension followed by "/" means prose joined files (score.py/labeler.py) + if (t ~ ("\\." EXT "/")) continue + if (t !~ ("\\." EXT "$")) continue + if (t !~ /^[A-Za-z0-9._\/-]+$/) continue + + ttype = (index(t, "/") > 0) ? "S" : "F" + if ((ttype "\t" t) in seen) continue + seen[ttype "\t" t] = 1 + nc++ + ctype[nc] = ttype + ctok[nc] = t + } + } + + END { flush() } + ' "$BITLESSON_FILE" +} + +declare -A FILE_CACHE + +file_exists_somewhere() { + local name="$1" hit + if [[ -n "${FILE_CACHE[$name]+x}" ]]; then + [[ "${FILE_CACHE[$name]}" == "1" ]] + return $? + fi + hit=$(find "$PROJECT_ROOT" -path '*/.git' -prune -o -type f -name "$name" -print 2>/dev/null | head -n1 || true) + if [[ -n "$hit" ]]; then + FILE_CACHE[$name]=1 + return 0 + fi + FILE_CACHE[$name]=0 + return 1 +} + +TOTAL=0 +DEPRECATED=0 +STALE_LESSONS=0 +CURRENT_KEY="" +CURRENT_UNRESOLVED="" + +emit_lesson_report() { + [[ -n "$CURRENT_KEY" ]] || return 0 + if [[ -n "$CURRENT_UNRESOLVED" ]]; then + STALE_LESSONS=$((STALE_LESSONS + 1)) + echo "STALE: $CURRENT_KEY" + # shellcheck disable=SC2001 + echo "$CURRENT_UNRESOLVED" | sed 's/^/ - /' + fi + CURRENT_KEY="" + CURRENT_UNRESOLVED="" +} + +while IFS=$'\t' read -r rec a b c; do + case "$rec" in + META) + # a=key, b=deprecated + emit_lesson_report + TOTAL=$((TOTAL + 1)) + [[ "$b" == "1" ]] && DEPRECATED=$((DEPRECATED + 1)) + CURRENT_KEY="$a" + ;; + CAND) + # a=type, b=key, c=token + resolved=0 + if [[ "$a" == "S" ]]; then + if [[ -e "$PROJECT_ROOT/$c" || -e "$c" ]]; then + resolved=1 + fi + else + if file_exists_somewhere "$c"; then + resolved=1 + fi + fi + if [[ "$resolved" -eq 0 ]]; then + if [[ -n "$CURRENT_UNRESOLVED" ]]; then + CURRENT_UNRESOLVED="$CURRENT_UNRESOLVED"$'\n'"$c" + else + CURRENT_UNRESOLVED="$c" + fi + fi + ;; + esac +done < <(extract_candidates) +emit_lesson_report + +echo "" +echo "BitLesson staleness: scanned $TOTAL entr$([[ "$TOTAL" -eq 1 ]] && echo y || echo ies) ($DEPRECATED deprecated, skipped); $STALE_LESSONS with unresolved references." + +if [[ "$STRICT" == "true" && "$STALE_LESSONS" -gt 0 ]]; then + exit 2 +fi +exit 0 diff --git a/scripts/bitlesson-validate-delta.sh b/scripts/bitlesson-validate-delta.sh index 648303b0..b8156d23 100755 --- a/scripts/bitlesson-validate-delta.sh +++ b/scripts/bitlesson-validate-delta.sh @@ -187,7 +187,7 @@ Your summary is missing the required `## BitLesson Delta` section. Required minimal format: ```markdown ## BitLesson Delta -- Action: none|add|update +- Action: none|add|update|deprecate - Lesson ID(s): - Notes: ``` @@ -203,7 +203,7 @@ BITLESSON_ACTION_CANDIDATES=$(echo "$BITLESSON_DELTA_BLOCK" | sed -nE 's/^[[:spa BITLESSON_ACTION_COUNT=$(echo "$BITLESSON_ACTION_CANDIDATES" | awk 'NF{c++} END{print c+0}') BITLESSON_ACTION=$(echo "$BITLESSON_ACTION_CANDIDATES" | awk 'NF{print; exit}') -if [[ "$BITLESSON_ACTION_COUNT" -ne 1 ]] || [[ "$BITLESSON_ACTION" != "none" && "$BITLESSON_ACTION" != "add" && "$BITLESSON_ACTION" != "update" ]]; then +if [[ "$BITLESSON_ACTION_COUNT" -ne 1 ]] || [[ "$BITLESSON_ACTION" != "none" && "$BITLESSON_ACTION" != "add" && "$BITLESSON_ACTION" != "update" && "$BITLESSON_ACTION" != "deprecate" ]]; then FALLBACK=$(cat <<'EOF' # Invalid BitLesson Delta Action @@ -211,10 +211,11 @@ Your `## BitLesson Delta` section exists, but it must include one action: - `none` - `add` - `update` +- `deprecate` EOF ) REASON=$(load_and_render_safe "$TEMPLATE_DIR" "block/bitlesson-delta-invalid.md" "$FALLBACK") - block_exit "$REASON" "Loop: BitLesson Delta must include action none/add/update (round $CURRENT_ROUND)" + block_exit "$REASON" "Loop: BitLesson Delta must include action none/add/update/deprecate (round $CURRENT_ROUND)" fi BITLESSON_IDS_RAW=$(echo "$BITLESSON_DELTA_BLOCK" | sed -nE 's/^[[:space:]-]*Lesson ID\(s\):[[:space:]]*(.*)$/\1/p' | head -n1) diff --git a/scripts/setup-rlcr-loop.sh b/scripts/setup-rlcr-loop.sh index eb775b14..d9d3fa44 100755 --- a/scripts/setup-rlcr-loop.sh +++ b/scripts/setup-rlcr-loop.sh @@ -1365,9 +1365,9 @@ Before executing each task or sub-task, you MUST: 3. Follow the selected lesson IDs (or \`NONE\`) during implementation Include a \`## BitLesson Delta\` section in your summary with: -- Action: none|add|update +- Action: none|add|update|deprecate - Lesson ID(s): NONE or comma-separated IDs -- Notes: what changed and why (required if action is add or update) +- Notes: what changed and why (required if action is add, update, or deprecate) Reference: @$BITLESSON_FILE EOF @@ -1537,7 +1537,7 @@ echo " - What was implemented" echo " - Files created/modified" echo " - Tests added/passed" echo " - Any remaining items" -echo " - ## BitLesson Delta section (Action: none|add|update)" +echo " - ## BitLesson Delta section (Action: none|add|update|deprecate)" echo "" echo "Codex will review this summary to determine if work is complete." echo "===========================================" diff --git a/templates/bitlesson.md b/templates/bitlesson.md index 3723a46c..5238bc6a 100644 --- a/templates/bitlesson.md +++ b/templates/bitlesson.md @@ -18,6 +18,17 @@ Validation Evidence: Source Rounds: ``` +## Deprecation + +To retire a superseded or obsolete lesson, do not delete it. Keep the entry (its ID must +still resolve) and append a status line so the selector skips it and the history is preserved: + +```markdown +Status: deprecated — +``` + +Report the retirement in the round summary with `Action: deprecate` and the Lesson ID(s). + ## Entries diff --git a/tests/run-all-tests.sh b/tests/run-all-tests.sh index bc38a7e5..1fdbbe00 100755 --- a/tests/run-all-tests.sh +++ b/tests/run-all-tests.sh @@ -102,8 +102,10 @@ TEST_SUITES=( "test-explore-command-structure.sh" # Ask Codex tests "test-ask-codex.sh" - # Bitlesson routing tests + # Bitlesson tests "test-bitlesson-select-routing.sh" + "test-bitlesson-validate-delta.sh" + "test-bitlesson-staleness.sh" # Provider routing tests "test-model-router.sh" # Skill monitor tests diff --git a/tests/test-bitlesson-staleness.sh b/tests/test-bitlesson-staleness.sh new file mode 100755 index 00000000..663cedf0 --- /dev/null +++ b/tests/test-bitlesson-staleness.sh @@ -0,0 +1,122 @@ +#!/usr/bin/env bash +# +# Tests for bitlesson-staleness.sh reference-resolution scan +# + +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]:-$0}")" && pwd)" +PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)" +source "$SCRIPT_DIR/test-helpers.sh" + +SCANNER="$PROJECT_ROOT/scripts/bitlesson-staleness.sh" + +echo "========================================" +echo "BitLesson Staleness Scanner Tests" +echo "========================================" +echo "" + +setup_test_dir +PROJ="$TEST_DIR/proj" +mkdir -p "$PROJ/scripts/audio_cot" "$PROJ/scripts/emotion" +echo "# manifest_io" > "$PROJ/scripts/audio_cot/manifest_io.py" +echo "# run_infer" > "$PROJ/scripts/emotion/run_infer.py" + +KB="$PROJ/.humanize/bitlesson.md" +mkdir -p "$(dirname "$KB")" +cat > "$KB" <<'EOF' +# BitLesson Knowledge Base + +## Lesson: good +Lesson ID: BL-20260101-good +Scope: scripts/audio_cot data-generating scripts (manifest_io.py) +Solution: Derive roots in `scripts/audio_cot/manifest_io.py` from __file__. +Source Rounds: round-1. + +## Lesson: stale +Lesson ID: BL-20260102-stale +Scope: eval_audio_cot/scripts data-generating scripts +Solution: Canonical path is `outputs/analysis/score.json`. +Source Rounds: round-2. + +## Lesson: dep +Lesson ID: BL-20260103-dep +Scope: scripts/gone things +Status: deprecated — subsystem removed +Solution: Refers to `scripts/gone/removed.py` which no longer exists. +Source Rounds: round-3. + +## Lesson: bare +Lesson ID: BL-20260104-bare +Scope: the inference runner +Solution: `run_infer.py` emits the contract rows. +Source Rounds: round-4. + +## Lesson: prose +Lesson ID: BL-20260105-prose +Scope: cross-cutting reporting conventions +Solution: Report GO/NO-GO verdicts and ratios like 248/275 and 2/7; cover SSU/SSR/SDP. +Source Rounds: round-5. +EOF + +OUT=$(bash "$SCANNER" --bitlesson-file "$KB" --project-root "$PROJ") + +if echo "$OUT" | grep -q "STALE: BL-20260102-stale"; then + pass "flags a lesson whose cited paths do not resolve" +else + fail "flags a lesson whose cited paths do not resolve" "STALE: BL-20260102-stale" "$OUT" +fi + +if echo "$OUT" | grep -q "STALE: BL-20260101-good"; then + fail "does not flag a lesson whose refs all resolve" "no STALE for good" "$OUT" +else + pass "does not flag a lesson whose refs all resolve" +fi + +if echo "$OUT" | grep -q "STALE: BL-20260104-bare"; then + fail "resolves bare filenames found anywhere under root" "no STALE for bare" "$OUT" +else + pass "resolves bare filenames found anywhere under root" +fi + +if echo "$OUT" | grep -q "STALE: BL-20260103-dep"; then + fail "skips deprecated entries" "no STALE for dep" "$OUT" +else + pass "skips deprecated entries" +fi + +if echo "$OUT" | grep -q "STALE: BL-20260105-prose"; then + fail "ignores prose slashes and ratios (GO/NO-GO, 248/275, SSU/SSR/SDP)" "no STALE for prose" "$OUT" +else + pass "ignores prose slashes and ratios (GO/NO-GO, 248/275, SSU/SSR/SDP)" +fi + +if echo "$OUT" | grep -q "1 deprecated"; then + pass "summary reports the deprecated/skipped count" +else + fail "summary reports the deprecated/skipped count" "1 deprecated" "$OUT" +fi + +# default is advisory: exit 0 even with a stale entry +set +e +bash "$SCANNER" --bitlesson-file "$KB" --project-root "$PROJ" >/dev/null +RC=$? +set -e +if [[ "$RC" -eq 0 ]]; then + pass "advisory by default (exit 0) even when stale entries exist" +else + fail "advisory by default (exit 0) even when stale entries exist" "0" "$RC" +fi + +# --strict exits 2 when a stale entry exists +set +e +bash "$SCANNER" --bitlesson-file "$KB" --project-root "$PROJ" --strict >/dev/null +RC=$? +set -e +if [[ "$RC" -eq 2 ]]; then + pass "--strict exits 2 when an entry has unresolved references" +else + fail "--strict exits 2 when an entry has unresolved references" "2" "$RC" +fi + +print_test_summary "BitLesson Staleness Scanner Test Summary" diff --git a/tests/test-bitlesson-validate-delta.sh b/tests/test-bitlesson-validate-delta.sh index c63b2128..001f949f 100755 --- a/tests/test-bitlesson-validate-delta.sh +++ b/tests/test-bitlesson-validate-delta.sh @@ -151,4 +151,31 @@ make_summary_file "$SUMMARY_FILE" "update" "Normal text flow still exposes the B RESULT=$(run_validator "$SUMMARY_FILE" "$BITLESSON_FILE") assert_passes "BitLesson Delta in normal text still passes validation" "$RESULT" +SUMMARY_FILE="$TEST_DIR/deprecate-valid.md" +make_summary_file "$SUMMARY_FILE" "deprecate" "Subsystem removed; lesson superseded and tombstoned." +RESULT=$(run_validator "$SUMMARY_FILE" "$BITLESSON_FILE") +assert_passes "deprecate action passes with a concrete ID and Notes" "$RESULT" + +SUMMARY_FILE="$TEST_DIR/deprecate-none-ids.md" +cat > "$SUMMARY_FILE" < Date: Thu, 28 May 2026 10:59:20 -0700 Subject: [PATCH 2/2] Simplify bitlesson-staleness.sh Behavior unchanged (same flags on the real KB, all 8 tests green); ~254 -> 132 lines. - Drop guards now subsumed by extension-anchoring + the path charset (glob/special-char chain, URL `//`, ratio, leading-dash checks). - Replace the per-token `find` + cache with a single repo scan: collect file basenames once, resolve bare filenames via membership test. - Replace the report state machine with an ordered map printed at the end. Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/bitlesson-staleness.sh | 264 +++++++++------------------------ 1 file changed, 71 insertions(+), 193 deletions(-) diff --git a/scripts/bitlesson-staleness.sh b/scripts/bitlesson-staleness.sh index 231c7ea4..98dbdd6f 100755 --- a/scripts/bitlesson-staleness.sh +++ b/scripts/bitlesson-staleness.sh @@ -1,93 +1,50 @@ #!/usr/bin/env bash # -# bitlesson-staleness.sh +# bitlesson-staleness.sh — advisory scan of a BitLesson knowledge base for +# entries whose cited file references no longer resolve under the project root. +# Lesson content usually survives a refactor; the paths it cites do not, and the +# stop gate only checks Delta format, so entries silently rot. # -# Advisory scan of a BitLesson knowledge base for entries whose cited file -# references no longer resolve under the project root. Lesson *content* usually -# stays valid across refactors, but the *references* it cites drift when code -# moves. The stop gate validates only the Delta *format*, so a lesson can -# silently rot; this script surfaces it. +# Detection is anchored on a known file extension (prose rarely produces +# `word.ext` tokens, unlike "GO/NO-GO" or "248/275"): `dir/file.py` is checked +# verbatim against the root, bare `file.py` anywhere under it. Fenced blocks, +# ellipses, and entries marked `Status: deprecated` are skipped. Extensionless +# directory references are not verified — cite a concrete file to have it checked. # -# Precision over recall (a noisy advisory gets ignored). Detection is anchored -# on a known file extension, because prose almost never produces `word.ext` -# tokens whereas "GO/NO-GO", "validators/gates", or "248/275" are common: -# - `dir/sub/file.py` -> checked verbatim against the project root -# - bare `file.py` -> checked anywhere under the root -# - tokens inside ``` fenced blocks ```, glob/brace tokens, ellipsis -# placeholders, and extensionless prose are ignored -# - entries marked `Status: deprecated` are skipped -# -# Trade-off: extensionless directory references (e.g. a Scope of `src/foo`) are -# not verified; reference a concrete file when you want it checked. -# -# Exit codes: 0 (advisory, default). With --strict: 2 if any entry has -# unresolved references. 1 on usage/IO error. +# Exit: 0 (advisory). With --strict: 2 if any entry has unresolved references. set -euo pipefail -usage() { - cat <<'EOF' >&2 -Usage: - bitlesson-staleness.sh --bitlesson-file [--project-root ] [--strict] - -Options: - --bitlesson-file BitLesson knowledge base (e.g. .humanize/bitlesson.md) - --project-root Root to resolve references against (default: derived - from the bitlesson file's git toplevel / .humanize parent) - --strict Exit 2 when any entry has unresolved references -EOF -} - BITLESSON_FILE="" PROJECT_ROOT="" STRICT="false" while [[ $# -gt 0 ]]; do case "$1" in - --bitlesson-file) - BITLESSON_FILE="${2:-}" - shift 2 - ;; - --project-root) - PROJECT_ROOT="${2:-}" - shift 2 - ;; - --strict) - STRICT="true" - shift - ;; + --bitlesson-file) BITLESSON_FILE="${2:-}"; shift 2 ;; + --project-root) PROJECT_ROOT="${2:-}"; shift 2 ;; + --strict) STRICT="true"; shift ;; -h|--help) - usage - exit 0 - ;; - *) - echo "Error: Unknown argument: $1" >&2 - usage - exit 1 - ;; + echo "Usage: bitlesson-staleness.sh --bitlesson-file [--project-root ] [--strict]" + exit 0 ;; + *) echo "Error: Unknown argument: $1" >&2; exit 1 ;; esac done -if [[ -z "$BITLESSON_FILE" ]]; then - echo "Error: --bitlesson-file is required" >&2 - usage +if [[ -z "$BITLESSON_FILE" || ! -f "$BITLESSON_FILE" ]]; then + echo "Error: --bitlesson-file must point to an existing file" >&2 exit 1 fi -if [[ ! -f "$BITLESSON_FILE" ]]; then - echo "Error: BitLesson file not found: $BITLESSON_FILE" >&2 - exit 1 -fi - -# Derive project root the same way bitlesson-select.sh does. +# Derive the project root the same way bitlesson-select.sh does. if [[ -z "$PROJECT_ROOT" ]]; then - BITLESSON_DIR="$(cd "$(dirname "$BITLESSON_FILE")" && pwd -P)" - if git -C "$BITLESSON_DIR" rev-parse --show-toplevel >/dev/null 2>&1; then - PROJECT_ROOT="$(git -C "$BITLESSON_DIR" rev-parse --show-toplevel)" - elif [[ "$(basename "$BITLESSON_DIR")" == ".humanize" ]]; then - PROJECT_ROOT="$(cd "$BITLESSON_DIR/.." && pwd -P)" + dir="$(cd "$(dirname "$BITLESSON_FILE")" && pwd -P)" + if git -C "$dir" rev-parse --show-toplevel >/dev/null 2>&1; then + PROJECT_ROOT="$(git -C "$dir" rev-parse --show-toplevel)" + elif [[ "$(basename "$dir")" == ".humanize" ]]; then + PROJECT_ROOT="$(cd "$dir/.." && pwd -P)" else - PROJECT_ROOT="$BITLESSON_DIR" + PROJECT_ROOT="$dir" fi fi @@ -96,159 +53,80 @@ if [[ ! -d "$PROJECT_ROOT" ]]; then exit 1 fi -# Extract candidate file references per lesson block. Emits tab-separated records: +# All file basenames under the root (one pass), used to resolve bare filenames. +ALL_BASENAMES="$(find "$PROJECT_ROOT" -path '*/.git' -prune -o -type f -print 2>/dev/null | sed 's#.*/##' | sort -u || true)" + +# Per lesson block emit tab-separated records: # META -# CAND -# where S = slash-bearing path (checked verbatim), F = bare filename (found anywhere). +# CAND (S = has a slash, checked verbatim; F = bare name) extract_candidates() { awk ' - BEGIN { - EXT = "(py|sh|md|json|js|ts|tsx|jsx|yaml|yml|toml|txt|sql|cfg|ini|c|cc|cpp|h|hpp|go|rs|rb|java)" - in_fence = 0 - } - - function flush( i) { + BEGIN { EXT = "\\.(py|sh|md|json|js|ts|tsx|jsx|yaml|yml|toml|txt|sql|cfg|ini|c|cc|cpp|h|hpp|go|rs|rb|java)" } + function flush( i) { if (label == "") return key = (id != "" ? id : label) printf "META\t%s\t%d\n", key, dep - if (!dep) { - for (i = 1; i <= nc; i++) { - printf "CAND\t%s\t%s\t%s\n", ctype[i], key, ctok[i] - } - } - delete ctok; delete ctype; delete seen - nc = 0; dep = 0; id = ""; label = "" - } - - /^```/ { in_fence = !in_fence; next } - /^~~~/ { in_fence = !in_fence; next } - in_fence { next } - - /^##[[:space:]]*Lesson:/ { - flush() - label = $0 - sub(/^##[[:space:]]*Lesson:[[:space:]]*/, "", label) - next + if (!dep) for (i = 1; i <= nc; i++) printf "CAND\t%s\t%s\t%s\n", ctype[i], key, ctok[i] + delete ctok; delete ctype; delete seen; nc = 0; dep = 0; id = ""; label = "" } - - label != "" { - if ($0 ~ /^Lesson ID:/) { - id = $0 - sub(/^Lesson ID:[[:space:]]*/, "", id) - gsub(/^[[:space:]]+|[[:space:]]+$/, "", id) - } - if (tolower($0) ~ /^status:[[:space:]]*deprecated/) { - dep = 1 - } - + /^```/ || /^~~~/ { fence = !fence; next } + fence { next } + /^##[[:space:]]*Lesson:/ { flush(); label = $0; sub(/^##[[:space:]]*Lesson:[[:space:]]*/, "", label); next } + label == "" { next } + { + if ($0 ~ /^Lesson ID:/) { id = $0; sub(/^Lesson ID:[[:space:]]*/, "", id); gsub(/^[[:space:]]+|[[:space:]]+$/, "", id) } + if (tolower($0) ~ /^status:[[:space:]]*deprecated/) dep = 1 line = $0 - # split on markdown/punctuation delimiters (backtick, parens, comma, - # double-quote, angle brackets, semicolon, apostrophe=\047) - gsub(/[`(),"<>;\047]/, " ", line) + gsub(/[`(),"<>;\047]/, " ", line) # markdown/punct delimiters incl. backtick, apostrophe n = split(line, toks, /[[:space:]]+/) for (j = 1; j <= n; j++) { t = toks[j] - sub(/:[0-9]+(-[0-9]+)?$/, "", t) # strip trailing :line / :line-range - sub(/[.,:;]+$/, "", t) # strip trailing punctuation - sub(/^[.,:;]+/, "", t) # strip leading punctuation + sub(/:[0-9]+(-[0-9]+)?$/, "", t) # trailing :line / :line-range + gsub(/^[.,:;]+|[.,:;]+$/, "", t) # surrounding punctuation if (t == "") continue - if (index(t, "...") > 0) continue # ellipsis / abbreviated example path - if (index(t, "//") > 0) continue # URL-ish - if (t ~ /^-/) continue # flag-like - if (index(t,"*")||index(t,"?")||index(t,"[")||index(t,"]")|| \ - index(t,"{")||index(t,"}")||index(t,"=")||index(t,"|")|| \ - index(t,"$")||index(t,"@")||index(t,"!")) continue - # require a known file extension at the END only; a known - # extension followed by "/" means prose joined files (score.py/labeler.py) - if (t ~ ("\\." EXT "/")) continue - if (t !~ ("\\." EXT "$")) continue + if (index(t, "...") > 0) continue # ellipsis / abbreviated path + if (t ~ (EXT "/")) continue # prose join e.g. score.py/labeler.py + if (t !~ (EXT "$")) continue # must end in a known extension if (t !~ /^[A-Za-z0-9._\/-]+$/) continue - ttype = (index(t, "/") > 0) ? "S" : "F" - if ((ttype "\t" t) in seen) continue - seen[ttype "\t" t] = 1 - nc++ - ctype[nc] = ttype - ctok[nc] = t + if ((ttype t) in seen) continue + seen[ttype t] = 1 + ctype[++nc] = ttype; ctok[nc] = t } } - END { flush() } ' "$BITLESSON_FILE" } -declare -A FILE_CACHE - -file_exists_somewhere() { - local name="$1" hit - if [[ -n "${FILE_CACHE[$name]+x}" ]]; then - [[ "${FILE_CACHE[$name]}" == "1" ]] - return $? - fi - hit=$(find "$PROJECT_ROOT" -path '*/.git' -prune -o -type f -name "$name" -print 2>/dev/null | head -n1 || true) - if [[ -n "$hit" ]]; then - FILE_CACHE[$name]=1 - return 0 - fi - FILE_CACHE[$name]=0 - return 1 -} - +declare -A UNRESOLVED +ORDER=() TOTAL=0 DEPRECATED=0 -STALE_LESSONS=0 -CURRENT_KEY="" -CURRENT_UNRESOLVED="" - -emit_lesson_report() { - [[ -n "$CURRENT_KEY" ]] || return 0 - if [[ -n "$CURRENT_UNRESOLVED" ]]; then - STALE_LESSONS=$((STALE_LESSONS + 1)) - echo "STALE: $CURRENT_KEY" - # shellcheck disable=SC2001 - echo "$CURRENT_UNRESOLVED" | sed 's/^/ - /' - fi - CURRENT_KEY="" - CURRENT_UNRESOLVED="" -} while IFS=$'\t' read -r rec a b c; do - case "$rec" in - META) - # a=key, b=deprecated - emit_lesson_report - TOTAL=$((TOTAL + 1)) - [[ "$b" == "1" ]] && DEPRECATED=$((DEPRECATED + 1)) - CURRENT_KEY="$a" - ;; - CAND) - # a=type, b=key, c=token - resolved=0 - if [[ "$a" == "S" ]]; then - if [[ -e "$PROJECT_ROOT/$c" || -e "$c" ]]; then - resolved=1 - fi - else - if file_exists_somewhere "$c"; then - resolved=1 - fi - fi - if [[ "$resolved" -eq 0 ]]; then - if [[ -n "$CURRENT_UNRESOLVED" ]]; then - CURRENT_UNRESOLVED="$CURRENT_UNRESOLVED"$'\n'"$c" - else - CURRENT_UNRESOLVED="$c" - fi - fi - ;; - esac + if [[ "$rec" == "META" ]]; then + TOTAL=$((TOTAL + 1)) + [[ "$b" == "1" ]] && DEPRECATED=$((DEPRECATED + 1)) + else # CAND: a=type, b=key, c=token + if [[ "$a" == "S" ]]; then + [[ -e "$PROJECT_ROOT/$c" || -e "$c" ]] && continue + else + grep -qxF -- "$c" <<<"$ALL_BASENAMES" && continue + fi + [[ -n "${UNRESOLVED[$b]:-}" ]] || ORDER+=("$b") + UNRESOLVED[$b]+="${UNRESOLVED[$b]:+$'\n'}$c" + fi done < <(extract_candidates) -emit_lesson_report + +for key in ${ORDER[@]+"${ORDER[@]}"}; do + echo "STALE: $key" + printf '%s\n' "${UNRESOLVED[$key]}" | sed 's/^/ - /' +done echo "" -echo "BitLesson staleness: scanned $TOTAL entr$([[ "$TOTAL" -eq 1 ]] && echo y || echo ies) ($DEPRECATED deprecated, skipped); $STALE_LESSONS with unresolved references." +echo "BitLesson staleness: scanned $TOTAL entries ($DEPRECATED deprecated, skipped); ${#ORDER[@]} with unresolved references." -if [[ "$STRICT" == "true" && "$STALE_LESSONS" -gt 0 ]]; then +if [[ "$STRICT" == "true" && "${#ORDER[@]}" -gt 0 ]]; then exit 2 fi exit 0