Switches generation to use vllm's OpenAI API, rather than going through vLLM directly. #1226

finbarrtimbers · 2025-11-24T18:24:33Z

This will make it easier to switch out API providers in the future (e.g. SGLang) and enables us to use vLLM's native tool parsing (in a subsequent PR).

Runs:

Single GPU GRPO: Beaker
Multi-node GRPO: Beaker
Single GPU GRPO with tools: Beaker

Note

Switches generation to an internal vLLM OpenAI API server with new SamplingConfig/RequestOutput flow, updating GRPO pipeline and tests accordingly.

Inference pipeline (vllm_utils):
- Start an embedded vLLM OpenAI API server per actor and use openai.AsyncOpenAI for completions with health checks and backoff.
- Add dataclasses: SamplingConfig, CompletionOutput, RequestOutput; update process_completed_request and tooling flow to use them.
- Implement truncate_tool_output_tokens(tokens, current_prompt_len, current_response_len, max_model_len, max_tokens) and integrate into tool handling.
- Revise actor init and request processing: build server (build_app/serve_http), create client, manage seeds, accumulate outputs, and queue results.
- Misc: expose _create_server_args, improve _should_stop, KV-cache concurrency retrieval, and bundle placement helpers.
Trainer integration (grpo_fast.py):
- Replace vllm.SamplingParams with vllm_utils.SamplingConfig; create train/eval configs via dataclasses.replace.
- Update accumulation to accept new generation config type.
Tests:
- Add test_vllm_utils.py covering token truncation and process_completed_request with/without tools.
- Update test_utils.py to mock vLLM config directly (no vLLM import) for ModelDims.from_vllm_config; remove unused imports.

^{Written by Cursor Bugbot for commit 10b2a9c. This will update automatically on new commits. Configure here.}

finbarrtimbers · 2025-11-26T15:44:02Z

open_instruct/test_utils.py

-            vllm_config = engine_args.create_engine_config()
-            vllm_dims = utils.ModelDims.from_vllm_config(vllm_config)
-        vllm_dims.device_name = "h100"
+        expected_dims = MODEL_DIMS["Qwen/Qwen2.5-7B"]


These changes are needed because we changed the import order elsewhere, and so we have to change what we mock inside vllm.

hamishivi · 2025-11-26T22:28:33Z

open_instruct/grpo_fast.py

+        top_p=args.vllm_top_p,
        max_tokens=args.response_length,
-        include_stop_str_in_output=True,
-        skip_special_tokens=False,


Still a bit unsure about changing this. Shouldn't we keep special tokens and the stop str so they are included in the loss later down the line?

finbarrtimbers · 2025-11-26T22:34:00Z

Yes, that was a mistake. Adding back

…

On Wed, Nov 26, 2025 at 15:28, Hamish Ivison ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In open_instruct/grpo_fast.py <#1226 (comment)> : > max_tokens=args.response_length, - include_stop_str_in_output=True, - skip_special_tokens=False, Still a bit unsure about changing this. Shouldn't we keep special tokens and the stop str so they are included in the loss later down the line? — Reply to this email directly, view it on GitHub <#1226 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAYN6RLDX4Y5E3ZUVOFFH2D36YSSRAVCNFSM6AAAAACNBZNVDSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTKMJSHE2DIMBQGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

open_instruct/grpo_fast.py

open_instruct/test_grpo_fast.py

open_instruct/vllm_utils.py

cursor

Bug: Tests missing required `reward_fn` parameter in function calls

The tests remove reward_fn from calls to accumulate_inference_batches, but the function signature at line 1570 still requires reward_fn: Callable as a mandatory parameter (no default value). The function also actively uses reward_fn at line 1669 with asyncio.run(reward_fn(...)). This will cause TypeError at runtime because a required positional argument is missing.

open_instruct/test_grpo_fast.py#L619-L621

open-instruct/open_instruct/test_grpo_fast.py

Lines 619 to 621 in 30de6d2

    
               tokenizer=tokenizer, 
        
               prompt_dataset=mock_dataset, 
        
           )

open_instruct/test_grpo_fast.py#L665-L667

open-instruct/open_instruct/test_grpo_fast.py

Lines 665 to 667 in 30de6d2

    
               tokenizer=tokenizer, 
        
               prompt_dataset=mock_dataset, 
        
           )

open_instruct/test_grpo_fast.py#L858-L860

open-instruct/open_instruct/test_grpo_fast.py

Lines 858 to 860 in 30de6d2

    
           tokenizer=tokenizer, 
        
           prompt_dataset=mock_dataset, 
        
           filter_zero_std_samples=True,

open_instruct/test_grpo_fast.py

cursor

Bug: Tests call function without required reward_fn parameter

The test changes remove the reward_fn parameter from calls to accumulate_inference_batches, but the function still declares reward_fn: Callable as a required positional parameter (no default value) and still uses it internally to compute scores. This will cause a TypeError: missing 1 required positional argument: 'reward_fn' when tests run.

open_instruct/test_grpo_fast.py#L619-L620

open-instruct/open_instruct/test_grpo_fast.py

Lines 619 to 620 in 794f5bf

    
           tokenizer=tokenizer, 
        
           prompt_dataset=mock_dataset,

open_instruct/test_grpo_fast.py#L665-L666

open-instruct/open_instruct/test_grpo_fast.py

Lines 665 to 666 in 794f5bf

    
           tokenizer=tokenizer, 
        
           prompt_dataset=mock_dataset,

open_instruct/test_grpo_fast.py#L858-L859

open-instruct/open_instruct/test_grpo_fast.py

Lines 858 to 859 in 794f5bf

    
           tokenizer=tokenizer, 
        
           prompt_dataset=mock_dataset,

open_instruct/vllm_utils.py

finbarrtimbers added 10 commits November 20, 2025 14:16

First commit. Uses openai client.

71865e2

More changes to get openai client working

b5b3d38

updated code

0e0190a

updated code

77b88ad

fixed code

786c023

Cleaned up code

3118ee8

updated code

f54b938

mor eopenai server fixes

63668c3

Fixes to openai client.

0f38114

Fixed port issue

eeb7a4c

finbarrtimbers self-assigned this Nov 24, 2025

finbarrtimbers added 17 commits November 24, 2025 12:21

updated code.

8a92ce0

updated code.

6f538fc

Cleaned up code

6338bb2

Cleaned up config

0f5ef7f

cleans up cumulative logprob

cd80c5e

Cleaned up health check

16d383e

Updated code to simplify it

3ecc86c

Merge branch 'main' into finbarr/openai-client

de265b7

fixed duplicate kwarg error

ed1a573

Cleaned up code.

fe6389b

now, init timeout matches main

82eae9f

Merge branch 'main' into finbarr/openai-client

649b6c5

Fixed test

17b95c1

Fixed test

5f87d08

updated code

f48ecdb

updated code

1cf878b

Updated code

c7b2a73

finbarrtimbers commented Nov 26, 2025

View reviewed changes

Removed unused shutdown method.

f6df97b

hamishivi reviewed Nov 26, 2025

View reviewed changes

Added test

d21aaba

cursor bot reviewed Dec 1, 2025

View reviewed changes

open_instruct/grpo_fast.py Outdated Show resolved Hide resolved

open_instruct/test_grpo_fast.py Outdated Show resolved Hide resolved

open_instruct/vllm_utils.py Outdated Show resolved Hide resolved

finbarrtimbers added 3 commits December 1, 2025 09:05

Combined tests

d3b4bd6

ran test

86f2c0e

test passes

1715952

cursor bot reviewed Dec 2, 2025

View reviewed changes

open_instruct/vllm_utils.py Outdated Show resolved Hide resolved

finbarrtimbers added 3 commits December 2, 2025 04:45

Removed debug code.

954a502

Added decoded text

0ac67c5

Fixed bug where we weren't properly including the stop string.

6cd0d52

cursor bot reviewed Dec 2, 2025

View reviewed changes

open_instruct/vllm_utils.py Show resolved Hide resolved

Merge branch 'main' into finbarr/openai-client

6ff4855

cursor bot reviewed Dec 2, 2025

View reviewed changes

open_instruct/vllm_utils.py Show resolved Hide resolved

Refactored truncate_tool_output_tokens and added test.

24ecb63

finbarrtimbers mentioned this pull request Dec 2, 2025

Fixes bug in token truncation. #1233

Closed

finbarrtimbers added 2 commits December 2, 2025 12:07

Cleaned up PR.

03e5039

cleaned up PR

30de6d2

cursor bot reviewed Dec 2, 2025

View reviewed changes

open_instruct/test_grpo_fast.py Show resolved Hide resolved

finbarrtimbers added 2 commits December 2, 2025 12:20

Cleaned up PR significantly.

d19f3b6

Cleaned up health check.

794f5bf

cursor bot reviewed Dec 2, 2025

View reviewed changes

finbarrtimbers added 2 commits December 2, 2025 12:45

Merge branch 'main' into finbarr/openai-client

5e2c280

ran linter

71edea0

cursor bot reviewed Dec 2, 2025

View reviewed changes

open_instruct/vllm_utils.py Outdated Show resolved Hide resolved

Fixed merge regression.

10b2a9c

hamishivi approved these changes Dec 4, 2025

View reviewed changes

finbarrtimbers added this pull request to the merge queue Dec 4, 2025

Merged via the queue into main with commit 70beeac Dec 4, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switches generation to use vllm's OpenAI API, rather than going through vLLM directly. #1226

Switches generation to use vllm's OpenAI API, rather than going through vLLM directly. #1226

Uh oh!

finbarrtimbers commented Nov 24, 2025 •

edited by cursor bot

Loading

Uh oh!

finbarrtimbers Nov 26, 2025

Uh oh!

hamishivi Nov 26, 2025

Uh oh!

finbarrtimbers commented Nov 26, 2025 via email

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	tokenizer=tokenizer,
	prompt_dataset=mock_dataset,
	filter_zero_std_samples=True,

Switches generation to use vllm's OpenAI API, rather than going through vLLM directly. #1226

Switches generation to use vllm's OpenAI API, rather than going through vLLM directly. #1226

Uh oh!

Conversation

finbarrtimbers commented Nov 24, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

finbarrtimbers Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

hamishivi Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

finbarrtimbers commented Nov 26, 2025 via email

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Bug: Tests missing required `reward_fn` parameter in function calls

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Bug: Tests call function without required reward_fn parameter

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

finbarrtimbers commented Nov 24, 2025 •

edited by cursor bot

Loading