Skip to content

misza222/choose-model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

choose-model

Fetch models from OpenRouter, filter them by price, then benchmark each for latency and capability to help you pick the right model for your budget.

Architecture

flowchart TD
    CLI[CLI Args\nclap] -->|config path, flags| Main
    Main -->|load + validate| Config[config.toml]
    Main -->|fetch| OR[OpenRouter /v1/models]
    OR -->|Vec<Model>| Filter[Price Filter\nmodels.rs]
    Filter -->|filtered models| Bench[Benchmark Engine\nbenchmark.rs]
    Bench -->|concurrent, semaphore-capped| OR2[OpenRouter /v1/chat/completions]
    OR2 -->|response + latency| Scorer[Keyword Scorer]
    Scorer -->|TestScore per tier| TierCalc[Tier Calculator\nLow / Medium / High]
    TierCalc -->|BenchmarkResult| Display[Display\ndisplay.rs]
    Display -->|table, JSON, or model ID| STDOUT[stdout]
    Main -->|tracing events| STDERR[stderr]
Loading

Module responsibilities

Module Role
cli.rs CLI argument definitions (clap derive)
config.rs TOML config structs + validation
models.rs OpenRouter models API, price filtering
benchmark.rs Capability tests, latency measurement, tier assignment
display.rs Human-readable table, JSON, and --best-model output
error.rs Typed AppError / ConfigError hierarchy

Capability tiers

Each test is sent --queries-per-model times (default 3). A test passes if the majority of queries return the expected output. Latency statistics (min / mean / p50 / max / stddev) are computed across all queries.

Tier Tests Pass condition
Low 2+2=?, capital of France Required keywords in response
Medium Write a Python reverse function, list 3 planets Required keywords in response
High Apple-sharing reasoning puzzle, syllogism Required keywords in response

A model's overall tier = highest tier where all tests of that tier pass.

Setup

# Install Rust (if needed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Copy and fill in your config
cp config.toml.example config.toml
# edit config.toml — add your OpenRouter API key

# Build
cargo build --release

Running

# Benchmark top 10 models matching price filter (default: table output, 3 queries per model)
cargo run --release

# Print only the best model name — useful in scripts
cargo run --release -- --best-model
MODEL=$(choose-model --best-model)

# List filtered models without benchmarking
cargo run --release -- --list-only

# JSON output (logs go to stderr, results to stdout — safe to pipe)
cargo run --release -- --output json | jq .

# More queries per model for more reliable statistics
cargo run --release -- --queries-per-model 5

# Benchmark more models
cargo run --release -- --limit 20

# Benchmark all models matching price filter (no limit — may take a while)
cargo run --release -- --limit 0

# Increase concurrency
cargo run --release -- --concurrency 8

# Show/hide recommendation line (overrides config)
cargo run --release -- --recommend true

# Use a different config file
cargo run --release -- --config /path/to/other.toml

Set RUST_LOG=debug for verbose tracing output (has no effect with --best-model).

make targets

Target Description
make build Debug build
make release Optimised release build
make run Benchmark top 10 models (override: LIMIT=20 QUERIES_PER_MODEL=5)
make run-all Benchmark all models matching price filter
make list List filtered models without benchmarking
make test Unit + integration tests (no network)
make test-live All tests including live OpenRouter call
make lint clippy with warnings as errors
make fmt Auto-format with cargo fmt
make check Format check + clippy (CI-friendly)
make clean Remove build artifacts

Testing

# Unit + integration tests (no network)
cargo test

# Live API smoke test (requires API key)
OPENROUTER_API_KEY=sk-or-... cargo test -- --ignored

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors