[WIP] feat(config): runtime config decoupling(design for reference)#383
Draft
rjzhb wants to merge 3 commits into
Draft
[WIP] feat(config): runtime config decoupling(design for reference)#383rjzhb wants to merge 3 commits into
rjzhb wants to merge 3 commits into
Conversation
1d7661a to
490b777
Compare
490b777 to
66150e6
Compare
Signed-off-by: rjzhb <rjzhb222@163.com>
d02e72c to
4f0eea5
Compare
|
This PR has been inactive for 14 days and is marked as stale. It will be closed in 3 days if there is no further activity. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Legacy config flows pass HuggingFace PretrainedConfig objects straight into ModelConfig and model code. When transformers changes field names, nesting, or defaults, bugs show up deep in the runtime and fixes spread across many files.
This PR adds an engine-owned EngineModelSpec IR between HF parsing and the runtime. HF config is translated once per model in an adapter (hf_config → EngineModelSpec). Engine code should depend on the spec, not the HF schema. After a transformers update, we usually only fix parsing or the adapter—not model/loader code.
Pilots: minimax_m2, qwen3_5 / qwen3_5_moe
Flag: TOKENSPEED_USE_ENGINE_SPEC=1 (default off; legacy path unchanged)
Shared spec design
Both pilots use the same
EngineModelSpecentry type—not separate per-model config systems.Shared shell — every model produces the same top-level shape:
Shared components — architecture-specific bodies are built from reusable blocks:
GQAAttentionSpec— attention heads / KV / RoPE dimMoEMLPSpec— expert count, top-k, routingRMSNormSpec,RopePositionSpec— norm and position encodingMiniMax-M2 and Qwen3.5 both use these; only the adapter mapping from HF fields differs.
Typed
bodyunion — model-specific details live inbody, not the shell:num_mtp_modules, …)GatedDeltaNetSpec,full_attention_interval, dense/MoE sizes, …)Single dispatch —
build_engine_spec()routes bymodel_typetoadapters/minimax_m2oradapters/qwen3_5; both returnEngineModelSpec.ModelConfigthen branches onspec.body.typeonly for the RuntimeView bridge.When need New models: add a
bodyvariant + adapter, reuse existing components where possible—no new flat config type per model.