headroom-sidecar

Token-compression sidecar for LLM gateways. Drop it next to any OpenAI-compatible gateway (Hivemind, LiteLLM, custom proxy) and reduce prompt tokens 15-86% before they reach your backend.

Forked from chopratejas/headroom v0.23.0 compression modules, repackaged as a standalone HTTP sidecar service.

How it works

Client → LLM Gateway → headroom-sidecar :9100/compress → Compressed → Backend
                        (HTTP call, graceful fallback on failure)

The gateway sends the request body to the sidecar before forwarding to the LLM. If the sidecar is down or compression doesn't help, the original body passes through unchanged. Zero risk of data loss.

Compression pipeline

Stage	What it does
`content_router`	Detects content type (plain text, JSON, code)
`compressor`	SmartCrusher for JSON, Kompress (abbreviation/filler removal) for plain text
`cache_aligner`	Separates static prefix from dynamic content for caching alignment

All stages include size guards — if a stage would increase content size, it's skipped.

Quick start

# Docker
docker run -d --name headroom-sidecar \
  -p 9100:9100 \
  -e HCP_MIN_BODY_SIZE=1000 \
  -e HCP_RATE_LIMIT_RPM=600 \
  headroom-sidecar

# Or docker-compose
docker compose up -d

Endpoints

Endpoint	Method	Description
`/health`	GET	Status, uptime, stats
`/compress`	POST	Compress request body
`/expand`	POST	Expand `<<ccr:HASH>>` markers
`/metrics`	GET	Prometheus-compatible counters

Compression request

curl -X POST http://localhost:9100/compress \
  -H 'Content-Type: application/json' \
  -d '{
    "body": {
      "messages": [
        {"role": "user", "content": "Please review this code carefully..."}
      ]
    },
    "target": "messages"
  }'

Response:

{
  "body": {"messages": [{"role": "user", "content": "Review code..."}]},
  "metadata": {
    "action": "compressed",
    "original_chars": 1128,
    "compressed_chars": 157,
    "savings_pct": 86.08,
    "stages_run": ["content_router", "compressor", "cache_aligner"],
    "elapsed_ms": 4.2
  }
}

Environment variables

Variable	Default	Description
`HCP_HOST`	`0.0.0.0`	Listen address
`HCP_PORT`	`9100`	Listen port
`HCP_MIN_BODY_SIZE`	`1000`	Skip messages shorter than this
`HCP_TIMEOUT`	`3.0`	Pipeline timeout (seconds)
`HCP_RATE_LIMIT_RPM`	`600`	Max requests per minute
`HCP_RATE_LIMIT_BURST`	`50`	Burst allowance
`HCP_DB_PATH`	`/var/lib/headroom/ccr.db`	CCR cache database path

Integration examples

Go gateway (Hivemind pattern)

func CompressBody(endpoint string, body []byte) []byte {
    resp, err := http.Post(endpoint+"/compress",
        "application/json", bytes.NewReader(body))
    if err != nil {
        return body // graceful fallback
    }
    defer resp.Body.Close()
    var result struct {
        Body     json.RawMessage `json:"body"`
        Metadata struct {
            Action string `json:"action"`
        } `json:"metadata"`
    }
    json.NewDecoder(resp.Body).Decode(&result)
    if result.Metadata.Action == "compressed" {
        return result.Body
    }
    return body
}

Any gateway via config

Point your gateway's compression endpoint at the sidecar:

[compression]
enabled = true
endpoint = "http://127.0.0.1:9100"
min_body_size = 1000
timeout_ms = 1000

Tests

python3 -m pytest tests/ -q    # 657 tests

Modules included

Cherry-picked from headroom v0.23.0:

compression/ — SmartCrusher, ContentRouter, CacheAligner, TagProtector, Kompress
ccr/ — Compress-Cache-Retrieve pipeline
scoring/ — Line importance scoring, TOIN observer
memory/ — Intelligent context injection, bubbling, hierarchy
validation/ — Input validation, injection detection
resilience/ — Circuit breaker, auth routing
simulation/ — Token cost prediction

License

Apache 2.0 (inherited from chopratejas/headroom)

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
audits		audits
config		config
docs		docs
launchers		launchers
prompts		prompts
scripts		scripts
src		src
tests		tests
.coverage		.coverage
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MILESTONE_CONTRACT.md		MILESTONE_CONTRACT.md
README.md		README.md
SHIP-VERDICT.md		SHIP-VERDICT.md
SHIP-VERIFICATION-PLAN.md		SHIP-VERIFICATION-PLAN.md
benchmark_results.json		benchmark_results.json
docker-compose.yml		docker-compose.yml
prd.json		prd.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

headroom-sidecar

How it works

Compression pipeline

Quick start

Endpoints

Compression request

Environment variables

Integration examples

Go gateway (Hivemind pattern)

Any gateway via config

Tests

Modules included

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

headroom-sidecar

How it works

Compression pipeline

Quick start

Endpoints

Compression request

Environment variables

Integration examples

Go gateway (Hivemind pattern)

Any gateway via config

Tests

Modules included

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages