feat: Support reasoning_content in OpenAI chat completions streaming response #100
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Add support for
reasoning_contentfield in streaming responses from OpenAI-compatible APIs.Problem
Some LLM inference engines (e.g., vLLM with reasoning models like DeepSeek-R1 or QwQ)
return streaming content in the
reasoning_contentfield instead ofcontent.This causes the benchmark to incorrectly report
output_tokens = 1becausethe actual generated text is not captured.
Solution
Check both
contentandreasoning_contentfields when processing streaming chunks.This maintains backward compatibility while adding support for reasoning models.
Testing
Tested with:
reasoning_content)content)Both scenarios now correctly capture output tokens and metrics.