Add Qwen3 Bidirectional encoder (voyage-4-nano support)#66
Open
agsuy wants to merge 1 commit into
Open
Conversation
New `qwen3_bidirec.py` module mirroring `llama_bidirec.py` for the Qwen3 architecture: same transformer building blocks as `qwen3.py` (reuses `Qwen3DecoderLayer`) but swaps the autoregressive causal mask for full bidirectional attention, mean-pools rather than last-token-pools, and optionally appends a `nn.Linear(hidden_size, num_labels)` projection head before pooling (used for Matryoshka outputs). Motivating model: `voyageai/voyage-4-nano` (Voyage's first open-weights embedding model, Apache 2.0): Qwen3 base, 340M params, 2048d Matryoshka head. Upstream config declares `"model_type": "qwen3"` with `"use_bidirectional_attention": true` and `"num_labels": 2048` — the existing `qwen3.py` rejects it because `Model` has no slot for the projection weight (`linear.weight`). Changes: * `mlx_embeddings/models/qwen3_bidirec.py` (new): bidirectional Qwen3 with optional Matryoshka projection. Reuses `Qwen3DecoderLayer` and `mean_pooling`; no new runtime dependencies. * `mlx_embeddings/utils.py` `_get_model_arch`: when `model_type == "qwen3"` and `use_bidirectional_attention=True`, route to the new module. The existing `qwen3.py` is untouched and continues to serve models that don't set the flag (e.g. `mlx-community/Qwen3-Embedding-0.6B-8bit`). * `README.md`: list the new architecture. See PR description for step-by-step validation and downstream benchmark numbers.
3e51938 to
6a32e1e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Qwen3 Bidirectional encoder (voyage-4-nano support)
Summary
Adds a new
qwen3_bidirec.pymodel module that supports Qwen3-based bidirectional encoder embedding models — most notably voyageai/voyage-4-nano, Voyage AI's first open-weights release (Apache 2.0, January 2026, 340M params, 2048d Matryoshka head, 8K context).Mirrors the existing
llama_bidirec.pypattern: a separate module file alongsideqwen3.py, dispatched viamodel_typein the HF config. Same idea, Qwen3 internals.What this is solving
The upstream voyage-4-nano config declares:
{ "model_type": "qwen3", "use_bidirectional_attention": true, "num_labels": 2048 }The current
qwen3.pyModel class is decoder-only — causal mask, last-token pooling, no projection head. Trying to load voyage-4-nano (either viamlx_embeddings.convertorload) fails with:…because the upstream HF module
Qwen3BidirectionalModelstores its Matryoshka projection atself.linear(top-level, alongsideself.model), and the existingqwen3.Modelhas no slot for it.What changed
mlx_embeddings/models/qwen3_bidirec.pyQwen3DecoderLayerfromqwen3.pyandmean_poolingfrompooling.py.mlx_embeddings/utils.py(_get_model_arch)model_type == "qwen3"AND config hasuse_bidirectional_attention=True, route toqwen3_bidirec. Otherwise route as before.README.mdNo new runtime dependencies.
qwen3.pyis untouched — existing Qwen3 / Qwen3-Embedding-* models that don't set the flag continue to load via the original module.Design notes
qwen3.ModelArgsrather than redefining all fields — keeps the schema in one place and meansQwen3DecoderLayerworks on either ModelArgs without changes.self.linear(top-level, not nested underself.model) because that matches the upstream HF safetensors layout.sanitize()has a special case forlinear.weight/linear.biasto prevent the existingmodel.prefix rule from wrongly rewriting it.MODEL_REMAPPINGentry would have been the smaller diff, but it can't condition on more thanmodel_type. The two-condition check (model_type == "qwen3"ANDuse_bidirectional_attention=True) is the smallest extension that makes upstream voyage configs "just work" without users editing config.json.Validation
1. Convert the upstream snapshot
python -m mlx_embeddings.convert \ --hf-path voyageai/voyage-4-nano \ --mlx-path ./voyage-4-nano-bf16 \ --dtype bfloat162. Load + embed a sentence and verify routing and output shape
3. Regression check on the existing Qwen3 decoder path