-
Notifications
You must be signed in to change notification settings - Fork 471
Switches generation to use vllm's OpenAI API, rather than going through vLLM directly. #1226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| vllm_config = engine_args.create_engine_config() | ||
| vllm_dims = utils.ModelDims.from_vllm_config(vllm_config) | ||
| vllm_dims.device_name = "h100" | ||
| expected_dims = MODEL_DIMS["Qwen/Qwen2.5-7B"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are needed because we changed the import order elsewhere, and so we have to change what we mock inside vllm.
| top_p=args.vllm_top_p, | ||
| max_tokens=args.response_length, | ||
| include_stop_str_in_output=True, | ||
| skip_special_tokens=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still a bit unsure about changing this. Shouldn't we keep special tokens and the stop str so they are included in the loss later down the line?
|
Yes, that was a mistake. Adding back
…On Wed, Nov 26, 2025 at 15:28, Hamish Ivison ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In open_instruct/grpo_fast.py
<#1226 (comment)>
:
> max_tokens=args.response_length,
- include_stop_str_in_output=True,
- skip_special_tokens=False,
Still a bit unsure about changing this. Shouldn't we keep special tokens
and the stop str so they are included in the loss later down the line?
—
Reply to this email directly, view it on GitHub
<#1226 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYN6RLDX4Y5E3ZUVOFFH2D36YSSRAVCNFSM6AAAAACNBZNVDSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTKMJSHE2DIMBQGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Tests missing required `reward_fn` parameter in function calls
The tests remove reward_fn from calls to accumulate_inference_batches, but the function signature at line 1570 still requires reward_fn: Callable as a mandatory parameter (no default value). The function also actively uses reward_fn at line 1669 with asyncio.run(reward_fn(...)). This will cause TypeError at runtime because a required positional argument is missing.
open_instruct/test_grpo_fast.py#L619-L621
open-instruct/open_instruct/test_grpo_fast.py
Lines 619 to 621 in 30de6d2
| tokenizer=tokenizer, | |
| prompt_dataset=mock_dataset, | |
| ) |
open_instruct/test_grpo_fast.py#L665-L667
open-instruct/open_instruct/test_grpo_fast.py
Lines 665 to 667 in 30de6d2
| tokenizer=tokenizer, | |
| prompt_dataset=mock_dataset, | |
| ) |
open_instruct/test_grpo_fast.py#L858-L860
open-instruct/open_instruct/test_grpo_fast.py
Lines 858 to 860 in 30de6d2
| tokenizer=tokenizer, | |
| prompt_dataset=mock_dataset, | |
| filter_zero_std_samples=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Tests call function without required reward_fn parameter
The test changes remove the reward_fn parameter from calls to accumulate_inference_batches, but the function still declares reward_fn: Callable as a required positional parameter (no default value) and still uses it internally to compute scores. This will cause a TypeError: missing 1 required positional argument: 'reward_fn' when tests run.
open_instruct/test_grpo_fast.py#L619-L620
open-instruct/open_instruct/test_grpo_fast.py
Lines 619 to 620 in 794f5bf
| tokenizer=tokenizer, | |
| prompt_dataset=mock_dataset, |
open_instruct/test_grpo_fast.py#L665-L666
open-instruct/open_instruct/test_grpo_fast.py
Lines 665 to 666 in 794f5bf
| tokenizer=tokenizer, | |
| prompt_dataset=mock_dataset, |
open_instruct/test_grpo_fast.py#L858-L859
open-instruct/open_instruct/test_grpo_fast.py
Lines 858 to 859 in 794f5bf
| tokenizer=tokenizer, | |
| prompt_dataset=mock_dataset, |
This will make it easier to switch out API providers in the future (e.g. SGLang) and enables us to use vLLM's native tool parsing (in a subsequent PR).
Runs:
Note
Switches generation to an internal vLLM OpenAI API server with new SamplingConfig/RequestOutput flow, updating GRPO pipeline and tests accordingly.
openai.AsyncOpenAIfor completions with health checks and backoff.SamplingConfig,CompletionOutput,RequestOutput; updateprocess_completed_requestand tooling flow to use them.truncate_tool_output_tokens(tokens, current_prompt_len, current_response_len, max_model_len, max_tokens)and integrate into tool handling.build_app/serve_http), create client, manage seeds, accumulate outputs, and queue results._create_server_args, improve_should_stop, KV-cache concurrency retrieval, and bundle placement helpers.vllm.SamplingParamswithvllm_utils.SamplingConfig; create train/eval configs viadataclasses.replace.test_vllm_utils.pycovering token truncation andprocess_completed_requestwith/without tools.test_utils.pyto mock vLLM config directly (no vLLM import) forModelDims.from_vllm_config; remove unused imports.Written by Cursor Bugbot for commit 10b2a9c. This will update automatically on new commits. Configure here.