Skip to content

[swe] Add truncation fix for miles agentic tool call#785

Open
maocheng23 wants to merge 2 commits intomainfrom
feat/add_response_max_length
Open

[swe] Add truncation fix for miles agentic tool call#785
maocheng23 wants to merge 2 commits intomainfrom
feat/add_response_max_length

Conversation

@maocheng23
Copy link
Contributor

As title, add length limit for agentic tool call

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the agentic tool call system by introducing a configurable mechanism to limit the total token length of agent responses. It integrates this limit across the server and client components, ensuring that responses exceeding the specified token count are either truncated or result in an 'OutputLimitExceeded' status, thereby preventing excessively long outputs and improving resource management.

Highlights

  • Response Token Limit: Introduced a new parameter, max_total_response_tokens, to specify the maximum allowed token length for agentic tool call responses, encompassing both prompt and completion tokens.
  • Error Handling for Output Limits: Added TotalResponseLengthExceededError to the server's exception handling, allowing the system to report an 'OutputLimitExceeded' status when responses exceed the defined token limit.
  • Client-Side Truncation: Implemented client-side logic to truncate agent responses if their total token count surpasses max_total_response_tokens, ensuring adherence to the specified limit.
  • Command-Line Argument: Provided a new command-line argument, --max-total-response-tokens, for easy configuration of the response token limit when invoking agentic tool calls.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a mechanism to limit the total number of tokens in an agentic tool call by truncating the response. The changes span across the server, the agent function, and the rollout generation logic, introducing a new max_total_response_tokens parameter. The implementation correctly propagates this parameter and applies truncation when the limit is exceeded. I've found one potential issue regarding robustness in the new truncation logic that could lead to a TypeError if certain sample properties are None.

Comment on lines +119 to +120
sample.rollout_log_probs = sample.rollout_log_probs[:keep_tokens]
sample.loss_mask = sample.loss_mask[:keep_tokens]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The rollout_log_probs and loss_mask attributes of a Sample can be None according to their type hints in miles/utils/types.py. Slicing a None value will raise a TypeError. To make this utility function more robust, you should add checks to ensure these attributes are not None before attempting to slice them.

Suggested change
sample.rollout_log_probs = sample.rollout_log_probs[:keep_tokens]
sample.loss_mask = sample.loss_mask[:keep_tokens]
if sample.rollout_log_probs is not None:
sample.rollout_log_probs = sample.rollout_log_probs[:keep_tokens]
if sample.loss_mask is not None:
sample.loss_mask = sample.loss_mask[:keep_tokens]

if exc_type in _TIMEOUT_EXCEPTIONS:
return "LimitsExceeded"
if exc_type in _OUTPUT_LIMIT_EXCEPTIONS:
return "OutputLimitExceeded"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird to have both LimitsExceeded and OutputLimitExceeded. I'd propose changing LimitsExceeded to TimeLimitExceeded, and OutputLimitExceeded to SequenceLengthLimitExceeded.

More generally I think passing literal strings around isn't really the best practice? I'm OK with it as-is, though, since everything in under experimental.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


metadata = input.sample.metadata
if max_total_response_tokens is not None:
metadata = {**metadata, "max_total_response_tokens": max_total_response_tokens}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit: I think Python style guide would recommend metadata |= {"max_total_response_tokens": max_total_response_tokens} since this modifies the dictionary metadata in-place, instead of creating a new dictionary instance and assigning it to metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would prefer to do copy and paste as it's safer.

Comment on lines +100 to +102
allowed_output = sample.response_length - overshoot
if allowed_output <= 0:
break

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here if I understand correctly, if the prompt part of a sample already exceeds the length limit, then the entire sample is dropped. I think this is quite dangerous since if max_total_response_tokens is so small such that the prompt of the first sample crosses the limit, then the returned list will be empty, which crashes merge_samples in sample_utils.py.

I'm not sure whether we can do the following: if the prompt in a sample has 500 tokens, and the response in the same sample has 200 tokens, with token limit 400, then we clip the sample, keeping just the first 400 tokens in the prompt with no response.

Also I'd recommend changing the name max_total_response_tokens into max_total_sample_tokens or something similar, since we are counting both prompt and response tokens.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two parts:

  1. Here we actually collect all samples from a single trajectory, and then merge later, so we will still have some data (unless the initial prompt is super long)
  2. update the name to max_seq_len

@maocheng23 maocheng23 requested a review from Shi-Dong March 24, 2026 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants