Skip to content

made sure exp 3 prompts interface with the existing model interface m…#15

Open
helenlu66 wants to merge 13 commits into
mainfrom
interface-prompts-consolidated
Open

made sure exp 3 prompts interface with the existing model interface m…#15
helenlu66 wants to merge 13 commits into
mainfrom
interface-prompts-consolidated

Conversation

@helenlu66
Copy link
Copy Markdown
Member

@helenlu66 helenlu66 commented May 29, 2026

Changes Made

  1. Added the experiment 3 prompting condition sets to .prompting_experiments/
  2. Moved all agent-facing prose out of .interface/into ./prompting_experiments/prompt_templates/. Modules in .interface/ no longer contains any prose. They import the prompt templates to build prompts.

Tests

Check the prompt. Running the following produces ./prompting_experiments/prompts.txt. This contains the prompt for each condition built based on what the observation is like after the agent takes 3 random steps in the env mazes/validation_10/V01_empty_room.json.

python3 -m prompting_experiments.preview_prompts

Interface smoke tests should still pass

python3 -m interface.smoke_tests.smoke_prompting_observation_querying --suite all --max-steps 5
python3 -m interface.smoke_tests.smoke_replay --maze V04_single_key.json

pytests still pass

python3 -m pytest tests

Notes

  1. condition set 3 context_window text_summary condition is not currently supported due to missing interface for turning history into text summary
  2. condition set 5 in-context learning is currently not supported due to missing in-context example generation module

Future PR

  1. build the experiment sweep driver that actually iterates through the conditions and log the resutls

@helenlu66 helenlu66 mentioned this pull request May 29, 2026
Base automatically changed from interface to main May 29, 2026 16:15
seanrivera
seanrivera previously approved these changes Jun 4, 2026
Copy link
Copy Markdown
Member

@seanrivera seanrivera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still good to merge just need to merge ogbench first

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a structured “prompting_experiments” package (condition sets + prompt templates) and refactors the interface prompt construction code to consume those templates, aligning experiment-variant prompts with the existing model interface while adding utilities to preview prompts and expanding test coverage.

Changes:

  • Add prompting_experiments/ registry + condition-set definitions + reusable prompt template modules (system/user/observation/querying/feedback).
  • Refactor interface prompt assembly (runner/querying/prompt_strategies/renderer/observation/feedback/coords) to use the new templates and updated config defaults.
  • Add prompt preview CLI + snapshot output and add a comprehensive prompt/observation behavior test suite.

Reviewed changes

Copilot reviewed 28 out of 29 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/test_prompt_observation_text.py New tests validating observation text rendering, history blocks, and prompt assembly across condition variants.
pyproject.toml Adds multinet-preview-prompts entrypoint and includes prompting_experiments* in package discovery.
prompting_experiments/prompts.txt Adds a generated prompt preview artifact (snapshot of prompts).
prompting_experiments/prompt_templates/init.py Initializes the prompt templates package.
prompting_experiments/prompt_templates/system.py System-prompt template strings (task/mechanisms/rules/valid actions/initial maze header).
prompting_experiments/prompt_templates/user.py User-prompt template strings (observation section, question, optional status/mechanism hints).
prompting_experiments/prompt_templates/observation.py Observation/history template strings and cell-description templates.
prompting_experiments/prompt_templates/querying.py Querying-mode template strings (questions + FINAL_OUTPUT instructions).
prompting_experiments/prompt_templates/feedback.py Centralized feedback message templates for step outcomes and parse failures.
prompting_experiments/preview_prompts.py New CLI to roll out a few random steps and render system/user prompts per condition/variant.
prompting_experiments/core.py Shared dataclasses for condition sets and variants + config builder helpers.
prompting_experiments/exp_design.py Registry wiring for condition sets and iterator helper.
prompting_experiments/init.py Exposes condition-set registry types and iterators.
prompting_experiments/condition_set_1_prompt.py Condition set definition: prompt verbosity variants.
prompting_experiments/condition_set_2_observation_format.py Condition set definition: observation format variants.
prompting_experiments/condition_set_3_context_window.py Condition set definition: context window variants.
prompting_experiments/condition_set_4_querying_strategy.py Condition set definition: querying strategy variants.
prompting_experiments/condition_set_5_in_context_learning.py Condition set definition: in-context learning variants (mostly not implemented).
interface/smoke_tests/smoke_prompting_observation_querying.py Updates smoke test agent behavior + assertions for new prompt formats/instructions.
interface/runner.py Refactors prompt building (moves initial maze to user prompt, injects querying instructions/questions, uses feedback templates).
interface/querying.py Splits querying behavior into user question / suffix / FINAL_OUTPUT instruction helpers and uses templates.
interface/prompt_strategies.py Removes embedded prose and uses prompting_experiments.prompt_templates for system/user prompt text.
interface/renderer.py Uses observation templates for consistent text rendering; changes observation rendering and adds inventory-only helper.
interface/observation.py Updates observation/history formatting (inventory labels for image-only history; new options for description/facing).
interface/feedback.py Replaces inline strings with centralized feedback templates.
interface/coords.py Uses observation templates for describe_cell() output strings.
interface/config.py Changes defaults and adds flags controlling observation description/facing inclusion.
.gitignore Adds common IDE configuration ignores.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +124 to 126
checks.append(_check("system includes mechanism list", "The environment may contain:" in system))
checks.append(_check("system includes rules block", "RULES (domain logic):" in system))

Comment thread interface/runner.py
Comment on lines 21 to 27
from interface.observation import (
current_image_blocks,
current_observation_text,
history_content_blocks,
history_text,
recent_history_steps,
)
Comment on lines +16 to +28
from .condition_set_3_context_window import CONDITION_SET as CONDITION_SET_3
from .condition_set_4_querying_strategy import CONDITION_SET as CONDITION_SET_5
from .condition_set_5_in_context_learning import CONDITION_SET as CONDITION_SET_6
from .core import ConditionSet, Variant, iter_condition_configs as _iter_condition_configs


CONDITION_SETS: Mapping[str, ConditionSet] = {
CONDITION_SET_1.name: CONDITION_SET_1,
CONDITION_SET_2.name: CONDITION_SET_2,
CONDITION_SET_3.name: CONDITION_SET_3,
CONDITION_SET_5.name: CONDITION_SET_5,
CONDITION_SET_6.name: CONDITION_SET_6,
}
Comment on lines +136 to +148
def build_preview(
maze_path: Path,
max_steps: int,
preview_steps: int,
rollout_seed: int,
) -> str:
chunks = [
"Prompt Experiment Preview",
f"Maze: {maze_path}",
f"Max steps: {max_steps}",
f"Preview prompt state: after {preview_steps} random steps (seed: {rollout_seed})",
"",
]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants