Skip to content

feat: mock mode for CI + Docker-based eval mode #4

@zenprocess

Description

@zenprocess

Priority: Medium

PawBench requires a live LLM endpoint. Can't run in CI without a real server.

Proposal

  1. --mock mode: Ship recorded responses for each built-in scenario. Tests run against these without needing an endpoint. Good for CI, contributor testing.

  2. --docker mode: Spin up a local vLLM/Ollama container with a small model (qwen3-0.6b) for integration testing. Slow but fully self-contained.

  3. Record mode: pawbench --record responses/ saves actual API responses as fixtures for future --mock runs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions