Cut Claude Code token usage by 40.9%. A 400-line Python hook that builds a static graph of your repo (symbols + imports + git-hot files) and injects ranked file:line candidates into the prompt before Claude sees it. So the first turn opens the right file instead of grepping for it.
No embeddings. No server. No model call. ~50 ms.
Need this applied to a private repo this week? Fund the $1,000 48-hour implementation sprint or read the audit scope. The OSS tool stays free; the sprint is for teams that want a private report, CI leak gate, and one concrete repo/workflow patch.
curl -fsSL https://raw.githubusercontent.com/sravan27/context-os/main/setup.sh | bash60-second demo: graph stats → autocontext block with import counts → cross-repo eval (auto_context 0.545 winning) → 9/9 CI floors PASS. Reproduce with bash docs/distribution/demo.sh.
If your team is already spending heavily on Claude Code, Codex, Cursor, or other coding agents and wants a private cost-leak report, I am doing a small number of 48-hour audits this week: AI Agent Cost Leak Audit.
The open-source hook stays MIT and free. The paid audit is for teams that want the same measurement discipline applied to their own repo, prompts, and agent workflows.
What the paid sprint ships:
- a private repo scorecard using the same leak signals as the Action
- a short report on the highest-cost agent loops and file-noise sources
- one concrete CI, ignore-rule, or repo-guidance patch where the fix is clear
- a handoff note your team can reuse when running Claude Code, Codex, Cursor, or internal agents
Quick local preview:
python3 python/agent_cost_leak_check.py --repo . --jsonCI recipe: docs/AGENT-COST-LEAK-CHECKER.md. For public intake without sharing private code, use the private audit request template.
Versioned GitHub Action:
- uses: sravan27/context-os@v2.9.0
with:
max-score: "40"Live A/B on 36 real claude --print calls, identical fixture, identical model, only difference is whether the hook is active:
| Metric | Value |
|---|---|
| Aggregate tokens | −40.9% |
| Prompt-level wins | 6/6 |
| Bootstrap 95% CI | 32.7%–48.9% |
| Paired t-test | p = 5.1e-7 |
| Wall-clock | −35.3% |
Raw JSON for every call: python/evals/reports/live-session-bench-raw.json · methodology: docs/METHODOLOGY.md.
Cross-repo: 36 hand-labeled prompts × 3 unseen OSS repos (axios, ripgrep, requests). Weighted MRR 0.545 vs 0.461 best lexical baseline — +18.2%. Beats every baseline in every language. Report: multi-repo-eval.md.
Before:
user: where is the gitignore parser
claude: Glob → Grep → Read → Read → Read → "found it in walk.rs"
After:
<context-os:autocontext>
crates/ignore/src/gitignore.rs:42 · Gitignore (struct)
crates/ignore/src/gitignore.rs:118 · matched (fn) · imports: …
</context-os:autocontext>
claude: Read crates/ignore/src/gitignore.rs → done
Per-project:
curl -fsSL https://raw.githubusercontent.com/sravan27/context-os/main/setup.sh | bashGlobal response-shaping + env vars to ~/.claude/:
curl -fsSL https://raw.githubusercontent.com/sravan27/context-os/main/setup.sh | bash -s -- --globalReproduce the eval locally:
git clone https://github.com/sravan27/context-os && cd context-os
python3 python/evals/runners/ranker_floor.py # 9 CI-enforced floors, ~45s
python3 python/evals/runners/multi_repo_eval.py # cross-repo eval, ~2 minsetup.sh writes 28 techniques across CLAUDE.md, .claudeignore, .claude/settings.json, eleven slash commands, an output style, a Haiku explorer subagent, and six stdlib-Python hooks under .claude/hooks/. Full list with evidence per row: docs/TECHNIQUES.md.
The centerpiece is auto_context.py (UserPromptSubmit hook) plus build_repo_graph.py (install-time graph builder). All hooks fail-open — if they break, your session keeps going.
- No LLM routing, model swapping, prompt rewriting.
- No proxy. Claude Code talks to Anthropic directly.
- No telemetry, no phone-home, no analytics. Read
setup.sh.
curl -fsSL https://raw.githubusercontent.com/sravan27/context-os/main/setup.sh | bash -s -- --uninstallRemoves only the <!-- context-os --> block from CLAUDE.md and files context-os wrote. Idempotent.
- On repos where prompts already name the exact class (
psf/requestscalling outPreparedRequest), well-tuned BM25 ties us. Lexical-ceiling regime. - Live A/B is 36 calls on 6 prompts —
p < 1e-6is real but not Anthropic-scale. - Symbol extraction is regex-based and ships handlers for Python, TS/JS, Rust, Go. Other languages fall back to path-only ranking.
- Hook adds ~12–15% input overhead per turn; amortizes in 1–2 turns on non-trivial repos.
- Hook p99 latency 118 ms at 10k files, 589 ms at 50k.
Full caveats: docs/limitations.md.
Claude Code on macOS + Linux. Requires python3 (stdlib only). Optional Rust binary (apps/cli) adds output compression and session-memory hooks.
MIT. See LICENSE.
