This proof of concept exercises TraceCore with a lightweight Pydantic AI agent and a deterministic dice task. It demonstrates both the offline deterministic harness tests and the live gateway-backed flow.
- Agent:
agents/dice_game_agent.py- Deterministic mode (default) seeds
randomto keep runs reproducible. - Optional Pydantic AI mode (
use_pydantic_ai=True) wires the agent togateway/gemini:gemini-3-flash-preview.
- Deterministic mode (default) seeds
- Task:
tasks/dice_game@1- Deterministic sandbox enforcing a "roll a 4" contract (
max_rolls=3). - Validator inspects hidden state to guarantee replay fidelity.
- Deterministic sandbox enforcing a "roll a 4" contract (
- Python 3.12+
pip install -e ".[pydantic_poc]"to install TraceCore plus the Pydantic AI extra (orpip install -e .followed bypip install pydantic-ai>=1.66.0)PYDANTIC_AI_GATEWAY_API_KEYwhen exercising the live gateway tests (Option B)
No external network calls required.
python -m pytest tests/test_dice_game_agent.py -vThis verifies seeded rolling, incremental seeding, and the TraceCore agent interface (reset/observe/act).
Requires PYDANTIC_AI_GATEWAY_API_KEY in the environment.
set PYDANTIC_AI_GATEWAY_API_KEY=sk_live_your_key_here # Windows PowerShell
python -m pytest tests/test_dice_game_pydantic.py -vtest_pydantic_ai_with_apidrivesrun_standalone()which calls the gateway.test_agent_with_pydantic_modeinstantiates the agent withuse_pydantic_ai=True.
Once tests pass, run the agent against the deterministic task:
agent-bench run --agent agents/dice_game_agent.py --task dice_game@1 --seed 42- Deterministic mode requires no credentials and emits the same action trace per seed.
- For Pydantic AI mode, set
PYDANTIC_AI_GATEWAY_API_KEYand re-run withTRACECORE_PYDANTIC=1(or modify the agent instantiation) if you want the CLI episode to call the gateway.
Pair this PoC with docs/record_mode.md to test the sealed execution contract:
- Record a canonical run:
agent-bench run --agent agents/dice_game_agent.py --task dice_game@1 --seed 42 --record
- Replay locally (
agent-bench run ...without--record). - Enforce in CI via
agent-bench test --strict.
Because the dice task and agent are deterministic, mismatches are obvious and easy to debug, making it an ideal sandbox for validating new runtime features.
- API keys:
PYDANTIC_AI_GATEWAY_API_KEYshould be managed via.envor your secret manager; never commit it to Git. If you need to pass it at runtime, prefer shell environment exports over inline flags. - Network policy: Only Option B requires external calls. When running record mode with network access, restrict domains to the gateway host you configured.
- Baseline integrity: Treat captured baselines as signed artifacts. Re-record only when intentional and always review the tool call JSONL before committing.
- Prompt injections: The deterministic dice task is self-contained, but any real tasks or gateway prompts should sanitize observations before feeding them back into the agent. When testing adapters, assert that tool output schemas reject injected instructions instead of blindly relaying them.