-
Notifications
You must be signed in to change notification settings - Fork 6k
Centralize truncation in conversation history #5652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
38241e5 to
df2fabc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
codex/codex-rs/core/src/client_common.rs
Lines 71 to 84 in 30faba1
| pub(crate) fn get_formatted_input(&self) -> Vec<ResponseItem> { | |
| let mut input = self.input.clone(); | |
| // when using the *Freeform* apply_patch tool specifically, tool outputs | |
| // should be structured text, not json. Do NOT reserialize when using | |
| // the Function tool - note that this differs from the check above for | |
| // instructions. We declare the result as a named variable for clarity. | |
| let is_freeform_apply_patch_tool_present = self.tools.iter().any(|tool| match tool { | |
| ToolSpec::Freeform(f) => f.name == "apply_patch", | |
| _ => false, | |
| }); | |
| if is_freeform_apply_patch_tool_present { | |
| reserialize_shell_outputs(&mut input); | |
| } |
The new truncation flow expects function_call_output content sent back to the model to be plain text with Exit code/Wall time headers, and the added integration tests assert that these outputs are not valid JSON. However, Prompt::get_formatted_input only converts the JSON payload produced by format_exec_output_for_model into a structured plain‑text string when a freeform apply_patch tool is present (client_common.rs lines 71‑84). For normal sessions that use only the standard shell tool, the reserialization step never runs, so the next prompt still contains the original JSON blob. This leaves the function_call_output valid JSON (and without the expected line prefixes), causing the new tests to fail and the model to see a different format than intended. The reserialization should happen whenever shell outputs are present, not only when the freeform apply_patch tool is configured.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let output = mock2 | ||
| .single_request() | ||
| .function_call_output_text(call_id) | ||
| .expect("function_call_output present for rmcp call"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and anyhow::Result
| // Verifies that an MCP tool call result exceeding the model formatting limits | ||
| // is truncated before being sent back to the model. | ||
| #[tokio::test(flavor = "multi_thread", worker_threads = 1)] | ||
| async fn mcp_tool_call_output_exceeds_limit_truncated_for_model() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we have all truncation tests in one place or colocated with tool-specific tests ( next to other mcp tests)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's distributed now like the shell tool ones. I feel we should at least have some of them in one place to test the truncation functionality. wdut?
| 5 | ||
| 6 | ||
| .* | ||
| \[\.{3} omitted \d+ of 400 lines \.{3}\] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we fully hardcode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| .expect("function error output present"); | ||
|
|
||
| // Expect plaintext with byte-truncation marker and no omitted-lines marker | ||
| assert!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use regex match approach ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait should we hardcode or use regex?
| } | ||
| } | ||
|
|
||
| impl ConversationHistory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why second impl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be removed
| } | ||
| } | ||
|
|
||
| fn truncate_context_output(content: &str) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove
| format_exec_output(content) | ||
| } | ||
|
|
||
| pub(crate) fn format_exec_output(content: &str) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: rename, it's not exec specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. I just copied the code and forgot that lol. thanks
| ResponseItem::FunctionCallOutput { | ||
| call_id: call_id.clone(), | ||
| output: FunctionCallOutputPayload { | ||
| content: truncated, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't content sometimes JSON encoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it can be for MCPs for something. It won't affect our main tools tho. The model doesn't care because it's text but I guess it can lose understanding of the json schema. I think MCP servers should handle their output to not overwhelm the model. This is our line of defense against violators. thoughts?
| let truncated_by_bytes = content.len() > MODEL_FORMAT_MAX_BYTES; | ||
| let marker = if omitted > 0 { | ||
| Some(format!( | ||
| "\n[... omitted {omitted} of {total_lines} lines ...]\n\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a bit wrong. You are counting metadata lines and not just shell output lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true. it would be +3 or sth.
move the truncation logic to conversation history to use on any tool output. This will help us in avoiding edge cases while truncating the tool calls and mcp calls.