Skip to content

[WIP] feat(config): runtime config decoupling(design for reference)#383

Draft
rjzhb wants to merge 3 commits into
lightseekorg:mainfrom
rjzhb:feat/runtime-config-decoupling
Draft

[WIP] feat(config): runtime config decoupling(design for reference)#383
rjzhb wants to merge 3 commits into
lightseekorg:mainfrom
rjzhb:feat/runtime-config-decoupling

Conversation

@rjzhb

@rjzhb rjzhb commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

Legacy config flows pass HuggingFace PretrainedConfig objects straight into ModelConfig and model code. When transformers changes field names, nesting, or defaults, bugs show up deep in the runtime and fixes spread across many files.

This PR adds an engine-owned EngineModelSpec IR between HF parsing and the runtime. HF config is translated once per model in an adapter (hf_config → EngineModelSpec). Engine code should depend on the spec, not the HF schema. After a transformers update, we usually only fix parsing or the adapter—not model/loader code.

Pilots: minimax_m2, qwen3_5 / qwen3_5_moe
Flag: TOKENSPEED_USE_ENGINE_SPEC=1 (default off; legacy path unchanged)

Shared spec design

Both pilots use the same EngineModelSpec entry type—not separate per-model config systems.

Shared shell — every model produces the same top-level shape:

EngineModelSpec { schema_version, model_type, architecture, dtype, quantization, body }

Shared components — architecture-specific bodies are built from reusable blocks:

  • GQAAttentionSpec — attention heads / KV / RoPE dim
  • MoEMLPSpec — expert count, top-k, routing
  • RMSNormSpec, RopePositionSpec — norm and position encoding

MiniMax-M2 and Qwen3.5 both use these; only the adapter mapping from HF fields differs.

Typed body union — model-specific details live in body, not the shell:

ModelBody = MinimaxM2ModelSpec | Qwen35ModelSpec
  • MiniMax-M2: shared components + MTP fields (num_mtp_modules, …)
  • Qwen3.5: shared components + hybrid extras (GatedDeltaNetSpec, full_attention_interval, dense/MoE sizes, …)

Single dispatchbuild_engine_spec() routes by model_type to adapters/minimax_m2 or adapters/qwen3_5; both return EngineModelSpec. ModelConfig then branches on spec.body.type only for the RuntimeView bridge.

When need New models: add a body variant + adapter, reuse existing components where possible—no new flat config type per model.

@rjzhb rjzhb force-pushed the feat/runtime-config-decoupling branch 3 times, most recently from 1d7661a to 490b777 Compare June 8, 2026 21:04
@rjzhb rjzhb force-pushed the feat/runtime-config-decoupling branch from 490b777 to 66150e6 Compare June 8, 2026 21:07
Signed-off-by: rjzhb <rjzhb222@163.com>
@rjzhb rjzhb force-pushed the feat/runtime-config-decoupling branch from d02e72c to 4f0eea5 Compare June 8, 2026 21:34
@rjzhb rjzhb changed the title [WIP] feat(config): runtime config decoupling [WIP] feat(config): runtime config decoupling(design for reference) Jun 9, 2026
@github-actions

Copy link
Copy Markdown

This PR has been inactive for 14 days and is marked as stale. It will be closed in 3 days if there is no further activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant