feat: mock mode for CI + Docker-based eval mode

## Priority: Medium

PawBench requires a live LLM endpoint. Can't run in CI without a real server.

### Proposal

1. **`--mock` mode**: Ship recorded responses for each built-in scenario. Tests run against these without needing an endpoint. Good for CI, contributor testing.

2. **`--docker` mode**: Spin up a local vLLM/Ollama container with a small model (qwen3-0.6b) for integration testing. Slow but fully self-contained.

3. **Record mode**: `pawbench --record responses/` saves actual API responses as fixtures for future `--mock` runs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: mock mode for CI + Docker-based eval mode #4

Priority: Medium

Proposal

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: mock mode for CI + Docker-based eval mode #4

Description

Priority: Medium

Proposal

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions