made sure exp 3 prompts interface with the existing model interface m…#15
Open
helenlu66 wants to merge 13 commits into
Open
made sure exp 3 prompts interface with the existing model interface m…#15helenlu66 wants to merge 13 commits into
helenlu66 wants to merge 13 commits into
Conversation
Closed
seanrivera
previously approved these changes
Jun 4, 2026
Member
seanrivera
left a comment
There was a problem hiding this comment.
Still good to merge just need to merge ogbench first
…m prompt template
There was a problem hiding this comment.
Pull request overview
This PR introduces a structured “prompting_experiments” package (condition sets + prompt templates) and refactors the interface prompt construction code to consume those templates, aligning experiment-variant prompts with the existing model interface while adding utilities to preview prompts and expanding test coverage.
Changes:
- Add
prompting_experiments/registry + condition-set definitions + reusable prompt template modules (system/user/observation/querying/feedback). - Refactor
interfaceprompt assembly (runner/querying/prompt_strategies/renderer/observation/feedback/coords) to use the new templates and updated config defaults. - Add prompt preview CLI + snapshot output and add a comprehensive prompt/observation behavior test suite.
Reviewed changes
Copilot reviewed 28 out of 29 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_prompt_observation_text.py | New tests validating observation text rendering, history blocks, and prompt assembly across condition variants. |
| pyproject.toml | Adds multinet-preview-prompts entrypoint and includes prompting_experiments* in package discovery. |
| prompting_experiments/prompts.txt | Adds a generated prompt preview artifact (snapshot of prompts). |
| prompting_experiments/prompt_templates/init.py | Initializes the prompt templates package. |
| prompting_experiments/prompt_templates/system.py | System-prompt template strings (task/mechanisms/rules/valid actions/initial maze header). |
| prompting_experiments/prompt_templates/user.py | User-prompt template strings (observation section, question, optional status/mechanism hints). |
| prompting_experiments/prompt_templates/observation.py | Observation/history template strings and cell-description templates. |
| prompting_experiments/prompt_templates/querying.py | Querying-mode template strings (questions + FINAL_OUTPUT instructions). |
| prompting_experiments/prompt_templates/feedback.py | Centralized feedback message templates for step outcomes and parse failures. |
| prompting_experiments/preview_prompts.py | New CLI to roll out a few random steps and render system/user prompts per condition/variant. |
| prompting_experiments/core.py | Shared dataclasses for condition sets and variants + config builder helpers. |
| prompting_experiments/exp_design.py | Registry wiring for condition sets and iterator helper. |
| prompting_experiments/init.py | Exposes condition-set registry types and iterators. |
| prompting_experiments/condition_set_1_prompt.py | Condition set definition: prompt verbosity variants. |
| prompting_experiments/condition_set_2_observation_format.py | Condition set definition: observation format variants. |
| prompting_experiments/condition_set_3_context_window.py | Condition set definition: context window variants. |
| prompting_experiments/condition_set_4_querying_strategy.py | Condition set definition: querying strategy variants. |
| prompting_experiments/condition_set_5_in_context_learning.py | Condition set definition: in-context learning variants (mostly not implemented). |
| interface/smoke_tests/smoke_prompting_observation_querying.py | Updates smoke test agent behavior + assertions for new prompt formats/instructions. |
| interface/runner.py | Refactors prompt building (moves initial maze to user prompt, injects querying instructions/questions, uses feedback templates). |
| interface/querying.py | Splits querying behavior into user question / suffix / FINAL_OUTPUT instruction helpers and uses templates. |
| interface/prompt_strategies.py | Removes embedded prose and uses prompting_experiments.prompt_templates for system/user prompt text. |
| interface/renderer.py | Uses observation templates for consistent text rendering; changes observation rendering and adds inventory-only helper. |
| interface/observation.py | Updates observation/history formatting (inventory labels for image-only history; new options for description/facing). |
| interface/feedback.py | Replaces inline strings with centralized feedback templates. |
| interface/coords.py | Uses observation templates for describe_cell() output strings. |
| interface/config.py | Changes defaults and adds flags controlling observation description/facing inclusion. |
| .gitignore | Adds common IDE configuration ignores. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+124
to
126
| checks.append(_check("system includes mechanism list", "The environment may contain:" in system)) | ||
| checks.append(_check("system includes rules block", "RULES (domain logic):" in system)) | ||
|
|
Comment on lines
21
to
27
| from interface.observation import ( | ||
| current_image_blocks, | ||
| current_observation_text, | ||
| history_content_blocks, | ||
| history_text, | ||
| recent_history_steps, | ||
| ) |
Comment on lines
+16
to
+28
| from .condition_set_3_context_window import CONDITION_SET as CONDITION_SET_3 | ||
| from .condition_set_4_querying_strategy import CONDITION_SET as CONDITION_SET_5 | ||
| from .condition_set_5_in_context_learning import CONDITION_SET as CONDITION_SET_6 | ||
| from .core import ConditionSet, Variant, iter_condition_configs as _iter_condition_configs | ||
|
|
||
|
|
||
| CONDITION_SETS: Mapping[str, ConditionSet] = { | ||
| CONDITION_SET_1.name: CONDITION_SET_1, | ||
| CONDITION_SET_2.name: CONDITION_SET_2, | ||
| CONDITION_SET_3.name: CONDITION_SET_3, | ||
| CONDITION_SET_5.name: CONDITION_SET_5, | ||
| CONDITION_SET_6.name: CONDITION_SET_6, | ||
| } |
Comment on lines
+136
to
+148
| def build_preview( | ||
| maze_path: Path, | ||
| max_steps: int, | ||
| preview_steps: int, | ||
| rollout_seed: int, | ||
| ) -> str: | ||
| chunks = [ | ||
| "Prompt Experiment Preview", | ||
| f"Maze: {maze_path}", | ||
| f"Max steps: {max_steps}", | ||
| f"Preview prompt state: after {preview_steps} random steps (seed: {rollout_seed})", | ||
| "", | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes Made
.prompting_experiments/.interface/into./prompting_experiments/prompt_templates/. Modules in.interface/no longer contains any prose. They import the prompt templates to build prompts.Tests
Check the prompt. Running the following produces
./prompting_experiments/prompts.txt. This contains the prompt for each condition built based on what the observation is like after the agent takes 3 random steps in the envmazes/validation_10/V01_empty_room.json.Interface smoke tests should still pass
pytests still pass
Notes
Future PR