Fetch models from OpenRouter, filter them by price, then benchmark each for latency and capability to help you pick the right model for your budget.
flowchart TD
CLI[CLI Args\nclap] -->|config path, flags| Main
Main -->|load + validate| Config[config.toml]
Main -->|fetch| OR[OpenRouter /v1/models]
OR -->|Vec<Model>| Filter[Price Filter\nmodels.rs]
Filter -->|filtered models| Bench[Benchmark Engine\nbenchmark.rs]
Bench -->|concurrent, semaphore-capped| OR2[OpenRouter /v1/chat/completions]
OR2 -->|response + latency| Scorer[Keyword Scorer]
Scorer -->|TestScore per tier| TierCalc[Tier Calculator\nLow / Medium / High]
TierCalc -->|BenchmarkResult| Display[Display\ndisplay.rs]
Display -->|table, JSON, or model ID| STDOUT[stdout]
Main -->|tracing events| STDERR[stderr]
| Module | Role |
|---|---|
cli.rs |
CLI argument definitions (clap derive) |
config.rs |
TOML config structs + validation |
models.rs |
OpenRouter models API, price filtering |
benchmark.rs |
Capability tests, latency measurement, tier assignment |
display.rs |
Human-readable table, JSON, and --best-model output |
error.rs |
Typed AppError / ConfigError hierarchy |
Each test is sent --queries-per-model times (default 3). A test passes if
the majority of queries return the expected output. Latency statistics
(min / mean / p50 / max / stddev) are computed across all queries.
| Tier | Tests | Pass condition |
|---|---|---|
| Low | 2+2=?, capital of France |
Required keywords in response |
| Medium | Write a Python reverse function, list 3 planets | Required keywords in response |
| High | Apple-sharing reasoning puzzle, syllogism | Required keywords in response |
A model's overall tier = highest tier where all tests of that tier pass.
# Install Rust (if needed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Copy and fill in your config
cp config.toml.example config.toml
# edit config.toml — add your OpenRouter API key
# Build
cargo build --release# Benchmark top 10 models matching price filter (default: table output, 3 queries per model)
cargo run --release
# Print only the best model name — useful in scripts
cargo run --release -- --best-model
MODEL=$(choose-model --best-model)
# List filtered models without benchmarking
cargo run --release -- --list-only
# JSON output (logs go to stderr, results to stdout — safe to pipe)
cargo run --release -- --output json | jq .
# More queries per model for more reliable statistics
cargo run --release -- --queries-per-model 5
# Benchmark more models
cargo run --release -- --limit 20
# Benchmark all models matching price filter (no limit — may take a while)
cargo run --release -- --limit 0
# Increase concurrency
cargo run --release -- --concurrency 8
# Show/hide recommendation line (overrides config)
cargo run --release -- --recommend true
# Use a different config file
cargo run --release -- --config /path/to/other.tomlSet RUST_LOG=debug for verbose tracing output (has no effect with --best-model).
| Target | Description |
|---|---|
make build |
Debug build |
make release |
Optimised release build |
make run |
Benchmark top 10 models (override: LIMIT=20 QUERIES_PER_MODEL=5) |
make run-all |
Benchmark all models matching price filter |
make list |
List filtered models without benchmarking |
make test |
Unit + integration tests (no network) |
make test-live |
All tests including live OpenRouter call |
make lint |
clippy with warnings as errors |
make fmt |
Auto-format with cargo fmt |
make check |
Format check + clippy (CI-friendly) |
make clean |
Remove build artifacts |
# Unit + integration tests (no network)
cargo test
# Live API smoke test (requires API key)
OPENROUTER_API_KEY=sk-or-... cargo test -- --ignored