Skip to content

Conversation

@finbarrtimbers
Copy link
Collaborator

@finbarrtimbers finbarrtimbers commented Nov 24, 2025

This will make it easier to switch out API providers in the future (e.g. SGLang) and enables us to use vLLM's native tool parsing (in a subsequent PR).

Runs:


Note

Switches generation to an internal vLLM OpenAI API server with new SamplingConfig/RequestOutput flow, updating GRPO pipeline and tests accordingly.

  • Inference pipeline (vllm_utils):
    • Start an embedded vLLM OpenAI API server per actor and use openai.AsyncOpenAI for completions with health checks and backoff.
    • Add dataclasses: SamplingConfig, CompletionOutput, RequestOutput; update process_completed_request and tooling flow to use them.
    • Implement truncate_tool_output_tokens(tokens, current_prompt_len, current_response_len, max_model_len, max_tokens) and integrate into tool handling.
    • Revise actor init and request processing: build server (build_app/serve_http), create client, manage seeds, accumulate outputs, and queue results.
    • Misc: expose _create_server_args, improve _should_stop, KV-cache concurrency retrieval, and bundle placement helpers.
  • Trainer integration (grpo_fast.py):
    • Replace vllm.SamplingParams with vllm_utils.SamplingConfig; create train/eval configs via dataclasses.replace.
    • Update accumulation to accept new generation config type.
  • Tests:
    • Add test_vllm_utils.py covering token truncation and process_completed_request with/without tools.
    • Update test_utils.py to mock vLLM config directly (no vLLM import) for ModelDims.from_vllm_config; remove unused imports.

Written by Cursor Bugbot for commit 10b2a9c. This will update automatically on new commits. Configure here.

@finbarrtimbers finbarrtimbers self-assigned this Nov 24, 2025
vllm_config = engine_args.create_engine_config()
vllm_dims = utils.ModelDims.from_vllm_config(vllm_config)
vllm_dims.device_name = "h100"
expected_dims = MODEL_DIMS["Qwen/Qwen2.5-7B"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are needed because we changed the import order elsewhere, and so we have to change what we mock inside vllm.

top_p=args.vllm_top_p,
max_tokens=args.response_length,
include_stop_str_in_output=True,
skip_special_tokens=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a bit unsure about changing this. Shouldn't we keep special tokens and the stop str so they are included in the loss later down the line?

@finbarrtimbers
Copy link
Collaborator Author

finbarrtimbers commented Nov 26, 2025 via email

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Tests missing required `reward_fn` parameter in function calls

The tests remove reward_fn from calls to accumulate_inference_batches, but the function signature at line 1570 still requires reward_fn: Callable as a mandatory parameter (no default value). The function also actively uses reward_fn at line 1669 with asyncio.run(reward_fn(...)). This will cause TypeError at runtime because a required positional argument is missing.

open_instruct/test_grpo_fast.py#L619-L621

tokenizer=tokenizer,
prompt_dataset=mock_dataset,
)

open_instruct/test_grpo_fast.py#L665-L667

tokenizer=tokenizer,
prompt_dataset=mock_dataset,
)

open_instruct/test_grpo_fast.py#L858-L860

tokenizer=tokenizer,
prompt_dataset=mock_dataset,
filter_zero_std_samples=True,

Fix in Cursor Fix in Web


Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Tests call function without required reward_fn parameter

The test changes remove the reward_fn parameter from calls to accumulate_inference_batches, but the function still declares reward_fn: Callable as a required positional parameter (no default value) and still uses it internally to compute scores. This will cause a TypeError: missing 1 required positional argument: 'reward_fn' when tests run.

open_instruct/test_grpo_fast.py#L619-L620

tokenizer=tokenizer,
prompt_dataset=mock_dataset,

open_instruct/test_grpo_fast.py#L665-L666

tokenizer=tokenizer,
prompt_dataset=mock_dataset,

open_instruct/test_grpo_fast.py#L858-L859

tokenizer=tokenizer,
prompt_dataset=mock_dataset,

Fix in Cursor Fix in Web


@finbarrtimbers finbarrtimbers added this pull request to the merge queue Dec 4, 2025
Merged via the queue into main with commit 70beeac Dec 4, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants