Skip to content

Conversation

@tc3oliver
Copy link

Summary

Add support for reasoning_content field in streaming responses from OpenAI-compatible APIs.

Problem

Some LLM inference engines (e.g., vLLM with reasoning models like DeepSeek-R1 or QwQ)
return streaming content in the reasoning_content field instead of content.
This causes the benchmark to incorrectly report output_tokens = 1 because
the actual generated text is not captured.

Solution

Check both content and reasoning_content fields when processing streaming chunks.
This maintains backward compatibility while adding support for reasoning models.

Testing

Tested with:

  • vLLM serving a 120B reasoning model (uses reasoning_content)
  • Ollama serving llama3:70b (uses standard content)

Both scenarios now correctly capture output tokens and metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant