Skip to content

Conversation

@sairampillai
Copy link

@sairampillai sairampillai commented Sep 22, 2025

Introduce standardized MoE calibration interface and deprecate legacy replace_modules_for_calibration

Summary

Implements a simplified, decorator-based registration system for MoE model calibration using a single MoECalibrationModule base class, making MoE model integration easier and deprecates the legacy replace_modules_for_calibration function.

Problem

MoE model calibration currently requires module replacement logic scattered across replace_modules_for_calibration and manual context management. This makes contributing new MoE model support difficult and error-prone. Additionally, each model required custom replacement functions with duplicated boilerplate code.

Relevant Issues

Fixes #1829

Solution

MoECalibrationModule abstract base class implementation

  • Only two required methods: from_original() classmethod and optional restore()
  • is_permanent flag to specify if module replacement is to be restored using restore()
  • Clear contract: permanent modules stay in calibration form, non-permanent modules get restored after context exit

Decorator-Based Registration: @register_moe_calibration("ModuleName") decorator

  • Automatic registration in MOE_CALIBRATION_MODULES registry
  • Models self-register when their module is imported

New Model Integration: Adding MoE support requires only:

@register_moe_calibration("YourMoEModule")
class CalibrationYourMoE(MoECalibrationModule):
    is_permanent = True  # or False
    
    @classmethod
    def from_original(cls, original, config, calibrate_all_experts=True):
        return cls(config, original, calibrate_all_experts)

Dataset Arguments: New: moe_calibrate_all_experts: bool = True - Controls whether all experts see all tokens during calibration

  • True (default): All experts receive all tokens for proper quantization statistics
  • False: Normal routing behavior (only routed experts are used)
  • Used by both oneshot() and DatasetArguments
  • Automatically passed to moe_calibration_context by pipelines

Automatic Context Management: moe_calibration_context integrated into pipelines

  • Wraps calibration automatically in oneshot.py
  • Handles module replacement and restoration transparently
  • No manual context management required by users

Backward Compatibility: Deprecation of replace_modules_for_calibration with warnings

  • Legacy function preserved for compatibility
  • Clear migration path documented in deprecation message

Test Plan

  • ✅ Unit tests for contextual MoE calibration with automatic module restoration
  • ✅ Unit tests for permanent MoE calibration persistence
  • ✅ Integration tests with Qwen3, Llama4, and DeepSeek V3 models
  • ✅ Verification that all experts receive data during calibration
  • ✅ Deprecation warning verification for legacy functions

Testing

  • ✅ All unit tests pass
  • ✅ Calibration types working correctly
  • ✅ Model structure correctly modified and restored inside/outside contexts
  • ✅ Linting and type checking pass
  • ✅ Backward compatibility verified with deprecation warnings

Migration Guide

Before:

# Required defining MoEModelConfig entries, handling context manually
from llmcompressor.modeling.prepare import replace_modules_for_calibration
model = replace_modules_for_calibration(model, calibrate_all_experts=True)

After:

# Automatic - just use moe_calibration_context
from llmcompressor.modeling import moe_calibration_context

with moe_calibration_context(model, calibrate_all_experts=True):
    # Run calibration - modules replaced automatically
    for batch in dataloader:
        model(**batch)
# Modules restored automatically (if not permanent)

@github-actions
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@sairampillai
Copy link
Author

@kylesayrs @dsikka Few clarifications:

  • I pushed a couple of commits without signing, how do you suggest I fix that?
  • I have deprecated the calibrate_moe_context parameter, do we want to plan how to phase it out?
  • I have tested using unit tests but without GPU (gpu poor), can you point me to best ways to test this change end-to-end?

@sairampillai sairampillai force-pushed the moe_calibration_refactor branch from 7fefaac to ba42881 Compare September 24, 2025 17:07
@sairampillai sairampillai marked this pull request as ready for review September 24, 2025 17:07
@brian-dellabetta
Copy link
Collaborator

@sairampillai , regarding DCO, you can ignore that. We can sign it via github once reviewed/approved

@sairampillai sairampillai requested a review from dsikka September 26, 2025 13:26
@sairampillai sairampillai changed the title Moe calibration refactor [MoE Calibration] Simplify MoE calibration logic application and contribution Sep 26, 2025
@sairampillai sairampillai changed the title [MoE Calibration] Simplify MoE calibration logic application and contribution [MoE Calibration] Simplify MoE calibration interface Sep 26, 2025
@dsikka
Copy link
Collaborator

dsikka commented Oct 1, 2025

@kylesayrs

Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but I worry that this implementation uses more abstraction than is necessary. I like the idea of "contextual" vs "permanent" changes, and we should definitely log which one is being used to the user.

Please consider simplifying to a single mapping dictionary, and a single ABC class to handle the from_original and restore functions. Don't be afraid to remove/ refactor existing code!

@kylesayrs
Copy link
Collaborator

Hey @sairampillai! Are you still interested in contributing to this PR? If not, please let me know and I can assign someone to pick up where you left off!

@sairampillai
Copy link
Author

@kylesayrs I am working on the updates, I will push an update soon for review!

Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far, thanks for following up!

Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome! Is this ready to be tested?

MOE_CALIBRATION_MODULES: Dict[str, Type[MoECalibrationModule]] = {}


def register_moe_calibration(module_class_name: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this is also implemented via the RegistryMixin, but we can standardize that in a follow up as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your registry is slightly different, let's leave this for a follow up

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I saw and didn't want to delay this any further. We can work on the follow up.

@kylesayrs kylesayrs self-requested a review October 21, 2025 14:33
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I think you can remove the from_original class methods and just use the constructors directly

Comment on lines +12 to +14
# MoE calibration is now handled automatically by the pipeline.
# The `SequentialLlama4TextMoe` modules will be applied during calibration
# to enable proper expert calibration and vLLM compatibility.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to keep this note for all examples? Might be cleaner without them, what do people think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt it helpful to have the note in varied examples since this would be a breaking change once we deprecate older methods. Open to recommendations.

Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sairampillai!

I think the new interface looks great! It was easy for me to contribute qwen3_vl_moe. I think there's a small import here which I've fixed in this patch (along with adding qwen3_vl_moe, rearranging some imports). I think once the import issue is fixed, we should land this ASAP.

import_qwen3vl.patch

I agree with @brian-dellabetta that there's some overlap between __init__ and from_original, but we can revisit that later.

Comment on lines +34 to +47
@classmethod
def from_original(
cls,
original: OriginalDeepseekV3MoE,
config: DeepseekV3Config,
calibrate_all_experts: bool = True,
) -> "CalibrationDeepseekV3MoE":
"""Create calibration module from original DeepseekV3MoE."""
return cls(
config=config,
original=original,
calibrate_all_experts=calibrate_all_experts,
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to remove all of these, since it mimics the constructor we can just the constructor instead

Suggested change
@classmethod
def from_original(
cls,
original: OriginalDeepseekV3MoE,
config: DeepseekV3Config,
calibrate_all_experts: bool = True,
) -> "CalibrationDeepseekV3MoE":
"""Create calibration module from original DeepseekV3MoE."""
return cls(
config=config,
original=original,
calibrate_all_experts=calibrate_all_experts,
)

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. Two small comments.
Thank you for the contribution

Do you mind creating issues to apply this to the Qwen3 Vl MoE and Qwen3 Next as follow-ups? I believe support for those models was added while you were working on this PR.

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = replace_modules_for_calibration(model)
# MoE calibration is now handled automatically by the pipeline.
# The `CalibrationDeepseekV3MoE` modules will be applied during calibration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to make a note of this in every example, it is probably helpful indicating where these custom modules are being defined as they are not in the original model definition

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will add to the note

)

replace_modules_for_calibration(model, calibrate_all_experts=True)
with contextlib.ExitStack() as stack:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your linter up-to-date with llmcompressor? Seems to just be adding spaces here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me recheck, I did use make style to check for linting errors

@sairampillai
Copy link
Author

@dsikka, I will create the issues Qwen3-VL-MoE and Qwen3-Next but it looks like @kylesayrs already has a patch ready for Qwen3-VL-MoE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Help Wanted] Refactor/ Clean up MoE calibration logic

4 participants