-
Couldn't load subscription status.
- Fork 319
[Feature] Qwen3 Reranker #695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
004f0a4 to
c16c0bd
Compare
c16c0bd to
5763cd3
Compare
2f91a32 to
221323b
Compare
|
@sigridjineth Thanks for the great work! Excited to see if this will get merged |
|
I wonder if it would be a simpler code change to support the model as a SequenceClassificationModel as mentioned in this discussion? |
221323b to
b7eabe1
Compare
f3289f5 to
c2742b7
Compare
c2742b7 to
7d32afe
Compare
|
I will try to work on #698 |
|
Mark |
|
I tried this PR with a "converted-to-classifier" Qwen3-reranker-0.6B and it litteraly explodes at warming up, trying to allocate more than 80GB (tested on the Metal version on my mac). Something obvious I may have missed ? |

What does this PR do?
This PR adds support for Qwen3 reranker models to
text-embeddings-inference. (Issue: #643)These models function as binary classifiers that determine the relevance between a query and a document. They output a simple probability score, making them perfect for re-ranking search results.
Key Changes
ListwiseRerankermodel type to properly distinguish these models from standard cross-encoder models.predictmethod to extract the logits for "yes" and "no" tokens.is_rerankerflag is set in the model's config.Technical Details
9693) and "no" (ID:2152) tokens.Who can review?
Anyone in the community is welcome to review the PR once the tests have passed. Feel free to tag anyone who might be interested.
@OlivierDehaene or @Narsil