-
Couldn't load subscription status.
- Fork 269
[MoE Calibration] Simplify MoE calibration interface #1851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[MoE Calibration] Simplify MoE calibration interface #1851
Conversation
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
|
@kylesayrs @dsikka Few clarifications:
|
7fefaac to
ba42881
Compare
|
@sairampillai , regarding DCO, you can ignore that. We can sign it via github once reviewed/approved |
…illai/llm-compressor into moe_calibration_refactor
…illai/llm-compressor into moe_calibration_refactor
…illai/llm-compressor into moe_calibration_refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but I worry that this implementation uses more abstraction than is necessary. I like the idea of "contextual" vs "permanent" changes, and we should definitely log which one is being used to the user.
Please consider simplifying to a single mapping dictionary, and a single ABC class to handle the from_original and restore functions. Don't be afraid to remove/ refactor existing code!
|
Hey @sairampillai! Are you still interested in contributing to this PR? If not, please let me know and I can assign someone to pick up where you left off! |
|
@kylesayrs I am working on the updates, I will push an update soon for review! |
Signed-off-by: Sairam Pillai <[email protected]>
Signed-off-by: Sairam Pillai <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great so far, thanks for following up!
Signed-off-by: Sairam Pillai <[email protected]>
Signed-off-by: Sairam Pillai <[email protected]>
Signed-off-by: Sairam Pillai <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome! Is this ready to be tested?
| MOE_CALIBRATION_MODULES: Dict[str, Type[MoECalibrationModule]] = {} | ||
|
|
||
|
|
||
| def register_moe_calibration(module_class_name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this is also implemented via the RegistryMixin, but we can standardize that in a follow up as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your registry is slightly different, let's leave this for a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I saw and didn't want to delay this any further. We can work on the follow up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! I think you can remove the from_original class methods and just use the constructors directly
| # MoE calibration is now handled automatically by the pipeline. | ||
| # The `SequentialLlama4TextMoe` modules will be applied during calibration | ||
| # to enable proper expert calibration and vLLM compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to keep this note for all examples? Might be cleaner without them, what do people think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt it helpful to have the note in varied examples since this would be a breaking change once we deprecate older methods. Open to recommendations.
Signed-off-by: Sairam Pillai <[email protected]>
…illai/llm-compressor into moe_calibration_refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sairampillai!
I think the new interface looks great! It was easy for me to contribute qwen3_vl_moe. I think there's a small import here which I've fixed in this patch (along with adding qwen3_vl_moe, rearranging some imports). I think once the import issue is fixed, we should land this ASAP.
I agree with @brian-dellabetta that there's some overlap between __init__ and from_original, but we can revisit that later.
| @classmethod | ||
| def from_original( | ||
| cls, | ||
| original: OriginalDeepseekV3MoE, | ||
| config: DeepseekV3Config, | ||
| calibrate_all_experts: bool = True, | ||
| ) -> "CalibrationDeepseekV3MoE": | ||
| """Create calibration module from original DeepseekV3MoE.""" | ||
| return cls( | ||
| config=config, | ||
| original=original, | ||
| calibrate_all_experts=calibrate_all_experts, | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to remove all of these, since it mimics the constructor we can just the constructor instead
| @classmethod | |
| def from_original( | |
| cls, | |
| original: OriginalDeepseekV3MoE, | |
| config: DeepseekV3Config, | |
| calibrate_all_experts: bool = True, | |
| ) -> "CalibrationDeepseekV3MoE": | |
| """Create calibration module from original DeepseekV3MoE.""" | |
| return cls( | |
| config=config, | |
| original=original, | |
| calibrate_all_experts=calibrate_all_experts, | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me. Two small comments.
Thank you for the contribution
Do you mind creating issues to apply this to the Qwen3 Vl MoE and Qwen3 Next as follow-ups? I believe support for those models was added while you were working on this PR.
| tokenizer = AutoTokenizer.from_pretrained(model_id) | ||
| model = replace_modules_for_calibration(model) | ||
| # MoE calibration is now handled automatically by the pipeline. | ||
| # The `CalibrationDeepseekV3MoE` modules will be applied during calibration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to make a note of this in every example, it is probably helpful indicating where these custom modules are being defined as they are not in the original model definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, will add to the note
| ) | ||
|
|
||
| replace_modules_for_calibration(model, calibrate_all_experts=True) | ||
| with contextlib.ExitStack() as stack: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is your linter up-to-date with llmcompressor? Seems to just be adding spaces here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me recheck, I did use make style to check for linting errors
|
@dsikka, I will create the issues |
Introduce standardized MoE calibration interface and deprecate legacy replace_modules_for_calibration
Summary
Implements a simplified, decorator-based registration system for MoE model calibration using a single
MoECalibrationModulebase class, making MoE model integration easier and deprecates the legacyreplace_modules_for_calibrationfunction.Problem
MoE model calibration currently requires module replacement logic scattered across
replace_modules_for_calibrationand manual context management. This makes contributing new MoE model support difficult and error-prone. Additionally, each model required custom replacement functions with duplicated boilerplate code.Relevant Issues
Fixes #1829
Solution
MoECalibrationModuleabstract base class implementationfrom_original()classmethod and optionalrestore()is_permanentflag to specify if module replacement is to be restored usingrestore()Decorator-Based Registration:
@register_moe_calibration("ModuleName")decoratorMOE_CALIBRATION_MODULESregistryNew Model Integration: Adding MoE support requires only:
Dataset Arguments: New:
moe_calibrate_all_experts: bool = True- Controls whether all experts see all tokens during calibrationTrue(default): All experts receive all tokens for proper quantization statisticsFalse: Normal routing behavior (only routed experts are used)oneshot()andDatasetArgumentsmoe_calibration_contextby pipelinesAutomatic Context Management:
moe_calibration_contextintegrated into pipelinesoneshot.pyBackward Compatibility: Deprecation of
replace_modules_for_calibrationwith warningsTest Plan
Testing
Migration Guide
Before:
After: