[MoE Calibration] Simplify MoE calibration interface #1851

sairampillai · 2025-09-22T20:48:58Z

Introduce standardized MoE calibration interface and deprecate legacy replace_modules_for_calibration

Summary

Implements a simplified, decorator-based registration system for MoE model calibration using a single MoECalibrationModule base class, making MoE model integration easier and deprecates the legacy replace_modules_for_calibration function.

Problem

MoE model calibration currently requires module replacement logic scattered across replace_modules_for_calibration and manual context management. This makes contributing new MoE model support difficult and error-prone. Additionally, each model required custom replacement functions with duplicated boilerplate code.

Relevant Issues

Fixes #1829

Solution

MoECalibrationModule abstract base class implementation

Only two required methods: from_original() classmethod and optional restore()
is_permanent flag to specify if module replacement is to be restored using restore()
Clear contract: permanent modules stay in calibration form, non-permanent modules get restored after context exit

Decorator-Based Registration: @register_moe_calibration("ModuleName") decorator

Automatic registration in MOE_CALIBRATION_MODULES registry
Models self-register when their module is imported

New Model Integration: Adding MoE support requires only:

@register_moe_calibration("YourMoEModule")
class CalibrationYourMoE(MoECalibrationModule):
    is_permanent = True  # or False
    
    @classmethod
    def from_original(cls, original, config, calibrate_all_experts=True):
        return cls(config, original, calibrate_all_experts)

Dataset Arguments: New: moe_calibrate_all_experts: bool = True - Controls whether all experts see all tokens during calibration

True (default): All experts receive all tokens for proper quantization statistics
False: Normal routing behavior (only routed experts are used)
Used by both oneshot() and DatasetArguments
Automatically passed to moe_calibration_context by pipelines

Automatic Context Management: moe_calibration_context integrated into pipelines

Wraps calibration automatically in oneshot.py
Handles module replacement and restoration transparently
No manual context management required by users

Backward Compatibility: Deprecation of replace_modules_for_calibration with warnings

Legacy function preserved for compatibility
Clear migration path documented in deprecation message

Test Plan

✅ Unit tests for contextual MoE calibration with automatic module restoration
✅ Unit tests for permanent MoE calibration persistence
✅ Integration tests with Qwen3, Llama4, and DeepSeek V3 models
✅ Verification that all experts receive data during calibration
✅ Deprecation warning verification for legacy functions

Testing

✅ All unit tests pass
✅ Calibration types working correctly
✅ Model structure correctly modified and restored inside/outside contexts
✅ Linting and type checking pass
✅ Backward compatibility verified with deprecation warnings

Migration Guide

Before:

# Required defining MoEModelConfig entries, handling context manually
from llmcompressor.modeling.prepare import replace_modules_for_calibration
model = replace_modules_for_calibration(model, calibrate_all_experts=True)

After:

# Automatic - just use moe_calibration_context
from llmcompressor.modeling import moe_calibration_context

with moe_calibration_context(model, calibrate_all_experts=True):
    # Run calibration - modules replaced automatically
    for batch in dataloader:
        model(**batch)
# Modules restored automatically (if not permanent)

github-actions · 2025-09-22T20:49:05Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

sairampillai · 2025-09-24T16:39:51Z

@kylesayrs @dsikka Few clarifications:

I pushed a couple of commits without signing, how do you suggest I fix that?
I have deprecated the calibrate_moe_context parameter, do we want to plan how to phase it out?
I have tested using unit tests but without GPU (gpu poor), can you point me to best ways to test this change end-to-end?

brian-dellabetta · 2025-09-24T17:26:27Z

@sairampillai , regarding DCO, you can ignore that. We can sign it via github once reviewed/approved

examples/multimodal_vision/llama4_example.py

src/llmcompressor/pipelines/basic/pipeline.py

src/llmcompressor/pipelines/sequential/pipeline.py

…illai/llm-compressor into moe_calibration_refactor

dsikka · 2025-10-01T00:46:04Z

@kylesayrs

kylesayrs

This looks good, but I worry that this implementation uses more abstraction than is necessary. I like the idea of "contextual" vs "permanent" changes, and we should definitely log which one is being used to the user.

Please consider simplifying to a single mapping dictionary, and a single ABC class to handle the from_original and restore functions. Don't be afraid to remove/ refactor existing code!

src/llmcompressor/pipelines/basic/pipeline.py

src/llmcompressor/modeling/prepare.py

src/llmcompressor/args/dataset_arguments.py

src/llmcompressor/modeling/prepare.py

src/llmcompressor/modeling/moe_context.py

kylesayrs · 2025-10-10T15:40:53Z

Hey @sairampillai! Are you still interested in contributing to this PR? If not, please let me know and I can assign someone to pick up where you left off!

sairampillai · 2025-10-10T17:23:12Z

@kylesayrs I am working on the updates, I will push an update soon for review!

Signed-off-by: Sairam Pillai <[email protected]>

kylesayrs

Looks great so far, thanks for following up!

src/llmcompressor/modeling/moe_context.py

Signed-off-by: Sairam Pillai <[email protected]>

kylesayrs

Looks awesome! Is this ready to be tested?

kylesayrs · 2025-10-21T00:55:44Z

src/llmcompressor/modeling/moe_context.py

+MOE_CALIBRATION_MODULES: Dict[str, Type[MoECalibrationModule]] = {}
+
+
+def register_moe_calibration(module_class_name: str):


Something like this is also implemented via the RegistryMixin, but we can standardize that in a follow up as well.

Your registry is slightly different, let's leave this for a follow up

Right, I saw and didn't want to delay this any further. We can work on the follow up.

brian-dellabetta

Thanks for the contribution! I think you can remove the from_original class methods and just use the constructors directly

brian-dellabetta · 2025-10-23T18:07:51Z

examples/quantization_w8a8_fp8/llama4_fp8_block_example.py

+# MoE calibration is now handled automatically by the pipeline.
+# The `SequentialLlama4TextMoe` modules will be applied during calibration
+# to enable proper expert calibration and vLLM compatibility.


do we want to keep this note for all examples? Might be cleaner without them, what do people think?

I felt it helpful to have the note in varied examples since this would be a breaking change once we deprecate older methods. Open to recommendations.

src/llmcompressor/modeling/deepseek_v3.py

src/llmcompressor/modeling/llama4.py

Signed-off-by: Sairam Pillai <[email protected]>

…illai/llm-compressor into moe_calibration_refactor

kylesayrs

Hi @sairampillai!

I think the new interface looks great! It was easy for me to contribute qwen3_vl_moe. I think there's a small import here which I've fixed in this patch (along with adding qwen3_vl_moe, rearranging some imports). I think once the import issue is fixed, we should land this ASAP.

import_qwen3vl.patch

I agree with @brian-dellabetta that there's some overlap between __init__ and from_original, but we can revisit that later.

brian-dellabetta · 2025-10-27T16:47:34Z

src/llmcompressor/modeling/deepseek_v3.py

+    @classmethod
+    def from_original(
+        cls,
+        original: OriginalDeepseekV3MoE,
+        config: DeepseekV3Config,
+        calibrate_all_experts: bool = True,
+    ) -> "CalibrationDeepseekV3MoE":
+        """Create calibration module from original DeepseekV3MoE."""
+        return cls(
+            config=config,
+            original=original,
+            calibrate_all_experts=calibrate_all_experts,
+        )
+


We should be able to remove all of these, since it mimics the constructor we can just the constructor instead

Suggested change

@classmethod

def from_original(

cls,

original: OriginalDeepseekV3MoE,

config: DeepseekV3Config,

calibrate_all_experts: bool = True,

) -> "CalibrationDeepseekV3MoE":

"""Create calibration module from original DeepseekV3MoE."""

return cls(

config=config,

original=original,

calibrate_all_experts=calibrate_all_experts,

)

dsikka

Overall looks good to me. Two small comments.
Thank you for the contribution

Do you mind creating issues to apply this to the Qwen3 Vl MoE and Qwen3 Next as follow-ups? I believe support for those models was added while you were working on this PR.

dsikka · 2025-10-27T17:58:10Z

examples/quantizing_moe/deepseek_r1_example.py

 tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = replace_modules_for_calibration(model)
+# MoE calibration is now handled automatically by the pipeline.
+# The `CalibrationDeepseekV3MoE` modules will be applied during calibration


If we're going to make a note of this in every example, it is probably helpful indicating where these custom modules are being defined as they are not in the original model definition

Makes sense, will add to the note

dsikka · 2025-10-27T18:10:13Z

tests/llmcompressor/modeling/test_calib_llama4.py

        )

-    replace_modules_for_calibration(model, calibrate_all_experts=True)
+    with contextlib.ExitStack() as stack:


Is your linter up-to-date with llmcompressor? Seems to just be adding spaces here?

Let me recheck, I did use make style to check for linting errors

sairampillai · 2025-10-28T11:19:41Z

@dsikka, I will create the issues Qwen3-VL-MoE and Qwen3-Next but it looks like @kylesayrs already has a patch ready for Qwen3-VL-MoE.

sairampillai added 2 commits September 23, 2025 02:15

Deprecate replace_modules_for_calibration

0994db6

Refactor MoECalibrationContext

c7a9943

Use moe_context in pipeline and by default and add tests

e374313

sairampillai added 5 commits September 24, 2025 22:35

Update documentation

7fefaac

Deprecate replace_modules_for_calibration

858a0f6

Refactor MoECalibrationContext

ef8e0b7

Use moe_context in pipeline and by default and add tests

b04f957

Update documentation

ba42881

sairampillai force-pushed the moe_calibration_refactor branch from 7fefaac to ba42881 Compare September 24, 2025 17:07

sairampillai marked this pull request as ready for review September 24, 2025 17:07

brian-dellabetta requested review from dsikka and kylesayrs September 24, 2025 17:25

dsikka reviewed Sep 24, 2025

View reviewed changes

examples/multimodal_vision/llama4_example.py Show resolved Hide resolved

src/llmcompressor/pipelines/basic/pipeline.py Outdated Show resolved Hide resolved

src/llmcompressor/pipelines/sequential/pipeline.py Outdated Show resolved Hide resolved

sairampillai added 10 commits September 26, 2025 18:11

Update docstrings to fix review comments

6a38a0f

Merge branch 'moe_calibration_refactor' of https://github.com/sairamp…

61c5141

…illai/llm-compressor into moe_calibration_refactor

Fix style and quality checks

9a131cb

Deprecate replace_modules_for_calibration

4520421

Refactor MoECalibrationContext

1c15741

Use moe_context in pipeline and by default and add tests

d8fecb9

Update documentation

d099bf3

Merge branch 'moe_calibration_refactor' of https://github.com/sairamp…

1bb9f62

…illai/llm-compressor into moe_calibration_refactor

Fix style and quality checks

d4a6a11

Merge branch 'moe_calibration_refactor' of https://github.com/sairamp…

b19e3b6

…illai/llm-compressor into moe_calibration_refactor

sairampillai requested a review from dsikka September 26, 2025 13:26

sairampillai changed the title ~~Moe calibration refactor~~ [MoE Calibration] Simplify MoE calibration logic application and contribution Sep 26, 2025

sairampillai changed the title ~~[MoE Calibration] Simplify MoE calibration logic application and contribution~~ [MoE Calibration] Simplify MoE calibration interface Sep 26, 2025

kylesayrs requested changes Oct 1, 2025

View reviewed changes

sairampillai added 2 commits October 14, 2025 22:32

Simplify MoE calibration registration and implementation

779d79a

Signed-off-by: Sairam Pillai <[email protected]>

Use simplified implementation and update context entrypoint

87e4484

Signed-off-by: Sairam Pillai <[email protected]>

kylesayrs reviewed Oct 15, 2025

View reviewed changes

src/llmcompressor/modeling/moe_context.py Show resolved Hide resolved

sairampillai added 3 commits October 18, 2025 04:43

Make module replacement verbose and explicit

ebafc53

Signed-off-by: Sairam Pillai <[email protected]>

Update modeling and test files with latest moe_context signature

c451635

Signed-off-by: Sairam Pillai <[email protected]>

Update examples of models where moe_context was added

f271a51

Signed-off-by: Sairam Pillai <[email protected]>

sairampillai requested a review from kylesayrs October 17, 2025 23:28

Merge branch 'main' into moe_calibration_refactor

774bb81

kylesayrs reviewed Oct 21, 2025

View reviewed changes

kylesayrs self-requested a review October 21, 2025 14:33

brian-dellabetta requested changes Oct 23, 2025

View reviewed changes

sairampillai added 2 commits October 27, 2025 19:27

Simplify calibrate fallback calls

32546bb

Signed-off-by: Sairam Pillai <[email protected]>

Merge branch 'moe_calibration_refactor' of https://github.com/sairamp…

64675e0

…illai/llm-compressor into moe_calibration_refactor

kylesayrs reviewed Oct 27, 2025

View reviewed changes

kylesayrs requested review from brian-dellabetta and kylesayrs October 27, 2025 16:07

brian-dellabetta reviewed Oct 27, 2025

View reviewed changes

dsikka reviewed Oct 27, 2025

View reviewed changes

		MOE_CALIBRATION_MODULES: Dict[str, Type[MoECalibrationModule]] = {}


		def register_moe_calibration(module_class_name: str):

Uh oh!

[MoE Calibration] Simplify MoE calibration interface #1851

Are you sure you want to change the base?

[MoE Calibration] Simplify MoE calibration interface #1851

Uh oh!

Conversation

sairampillai commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Introduce standardized MoE calibration interface and deprecate legacy replace_modules_for_calibration

Summary

Problem

Relevant Issues

Solution

Test Plan

Testing

Migration Guide

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

sairampillai commented Sep 24, 2025

Uh oh!

brian-dellabetta commented Sep 24, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsikka commented Oct 1, 2025

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kylesayrs commented Oct 10, 2025

Uh oh!

sairampillai commented Oct 10, 2025

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kylesayrs left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dsikka left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sairampillai commented Sep 22, 2025 •

edited

Loading

kylesayrs left a comment •

edited

Loading

dsikka left a comment •

edited

Loading