feat(vllm): add local vLLM OpenAI-server backend and GPU tests#235
Merged
prabhuteja12 merged 2 commits intomainfrom May 11, 2026
Merged
feat(vllm): add local vLLM OpenAI-server backend and GPU tests#235prabhuteja12 merged 2 commits intomainfrom
prabhuteja12 merged 2 commits intomainfrom
Conversation
a06bd98 to
43f55e2
Compare
43f55e2 to
e6061b9
Compare
prabhuteja12
approved these changes
May 11, 2026
Introduce VLLMLocalServerOpenAIModel that launches vllm serve locally and routes eval requests through the existing OpenAIModel(base_url=...) HTTP client to enable a recordable interface boundary. Add session-scoped GPU tests covering batching/stop sequences/parameter overrides and document the new CLI usage.
Make token counting optional: if tiktoken can’t map the model name, skip local tokenization and rely on API usage (or return None for sequence-position metadata). Avoid misleading fallback encodings, skip concat-compression when no encoder is available, and fail logprobs() explicitly without a local encoder.
e6061b9 to
80ab678
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add local vLLM OpenAI-server backend and GPU tests
Introduce
VLLMLocalServerOpenAIModelthat launchesvllm servelocally and routes eval requests through the existingOpenAIModel(base_url=...)HTTP client to enable a recordable interface boundary. Add session-scoped GPU tests covering batching/stop sequences/parameter overrides and document the new CLI usage.PR Checklist
/docs/).What type of PR is this? (check all applicable)
Description
Related Tickets & Documents
QA Instructions, Screenshots, Recordings
Please replace this line with instructions on how to test your changes, a note
on the hardware and config this has been tested on, as well as any relevant
additional information.
Added/updated tests?
have not been included
[optional] Are there any post deployment tasks we need to perform?