-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat: Support logprobs for vLLM models in OpenAI Frontend
#8538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support logprobs for vLLM models in OpenAI Frontend
#8538
Conversation
…probs-in-triton-openai' of https://github.com/triton-inference-server/server into spolisetty/tri-216-add-support-for-logprobs-and-top_logprobs-in-triton-openai
…and-top_logprobs-in-triton-openai
…probs-in-triton-openai' of https://github.com/triton-inference-server/server into spolisetty/tri-216-add-support-for-logprobs-and-top_logprobs-in-triton-openai
logprobs for vLLM models in OpenAI APIlogprobs for vLLM models in OpenAI Frontend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for logprobs (log probabilities) functionality in the OpenAI-compatible frontend for vLLM models. The feature allows users to request detailed probability information for generated tokens, which is useful for understanding model confidence and exploring alternative completions.
Key changes:
- Added logprobs support for both chat completions and standard completions endpoints
- Implemented conversion from vLLM's logprobs format to OpenAI's format
- Added comprehensive test coverage for logprobs functionality including validation and streaming
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| python/openai/openai_frontend/engine/utils/triton.py | Added helper functions to parse and convert logprobs from vLLM responses to OpenAI format for both chat and completion endpoints |
| python/openai/openai_frontend/engine/triton_engine.py | Integrated logprobs support into request handling, validation, and response generation for both streaming and non-streaming modes |
| python/openai/tests/test_openai_client.py | Added async tests for logprobs functionality using the OpenAI client library, including validation tests |
| python/openai/tests/test_chat_completions.py | Added HTTP-level tests for chat completions with logprobs, including edge cases and validation |
| python/openai/tests/test_completions.py | Added HTTP-level tests for completions with logprobs, including edge cases and validation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
whoisj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for delivering on this @pskiran1 !
What does the PR do?
This PR adds support for
logprobsfunctionality in the OpenAI-compatible frontend for vLLM models. The feature allows users to request detailed probability information for generated tokens, which is useful for understanding model confidence and exploring alternative completions.Key changes:
Background:
https://platform.openai.com/docs/api-reference/completions/create
https://platform.openai.com/docs/api-reference/chat/create
Checklist
<commit_type>: <Title>Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Where should the reviewer start?
Test plan:
Caveats:
Background
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)