Skip to content

feat: add llama model_type support#2

Open
Krigsexe wants to merge 1 commit into
xigh:mainfrom
Krigsexe:add-llama-support
Open

feat: add llama model_type support#2
Krigsexe wants to merge 1 commit into
xigh:mainfrom
Krigsexe:add-llama-support

Conversation

@Krigsexe
Copy link
Copy Markdown

Summary

  • Add "llama" to the supported model_type values in config.rs
  • Map it to Qwen3 family since the architecture is identical (RMSNorm, RoPE, SwiGLU, GQA, no bias, same SafeTensors layer naming)
  • Disable QK norms for llama models (Llama does not use per-head QK RMS norms unlike Qwen3)

Motivation

Llama-architecture models like PleIAs/Baguettotron (321M params, 80 layers, Apache 2.0) cannot currently run on herbert-rs because the model_type: "llama" is not recognized. The underlying architecture is functionally identical to Qwen3 -- same layer structure, same SafeTensors naming convention (model.layers.X.self_attn.q_proj.weight), same activation function.

Changes

  • crates/core/src/config.rs: 3 changes, 1 file, +7/-3 lines

Testing

Tested with PleIAs/Baguettotron (Q4 backend) on Ryzen 5 3600 (AVX2):

  • Decode: 38.8 tok/s
  • Prefill: 188.9 tok/s
  • Model load: 0.7s

Map "llama" model_type to Qwen3 family since the architecture is
identical (RMSNorm, RoPE, SwiGLU, GQA, no bias). The only difference
is that Llama models do not use per-head QK norms, which is handled
by checking the model_type string.

This enables running Llama-architecture models like PleIAs/Baguettotron
(321M params, 80 layers) directly on herbert-rs without conversion.

Tested with Baguettotron-Q4 on Ryzen 5 3600 (AVX2): 38.8 tok/s decode,
188.9 tok/s prefill.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants