Open
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is kicking off a free Cloud Agent to fix this issue. This run is complimentary, but you can enable Autofix for all future PRs in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
b4e8a0f to
0d90c81
Compare
|
Bugbot Autofix prepared fixes for 1 of the 1 bugs found in the latest run.
Or push these changes by commenting: Preview (3e7f39ba6a)diff --git a/pyproject.toml b/pyproject.toml
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -128,7 +128,7 @@
"whisper-s2t==1.3.1",
"hqq==0.2.7.post1",
"torchao>=0.12.0,<0.16.0", # 0.16.0 breaks diffusers 0.36.0, torch+torch: https://github.com/pytorch/ao/issues/2919#issue-3375688762
- "llmcompressor",
+ "llmcompressor>0.8",
"gliner; python_version >= '3.10'",
"piq",
"opencv-python",
@@ -138,6 +138,7 @@
"imageio-ffmpeg",
"jaxtyping",
"peft>=0.17.1",
+ "compressed-tensors >= 0.13.0"
]
[project.optional-dependencies]
diff --git a/src/pruna/algorithms/llm_compressor.py b/src/pruna/algorithms/llm_compressor.py
--- a/src/pruna/algorithms/llm_compressor.py
+++ b/src/pruna/algorithms/llm_compressor.py
@@ -70,6 +70,12 @@
default_value="W4A16",
meta=dict(desc="Quantization scheme to use. Use symmetric quantization to avoid decompression issues."),
),
+ CategoricalHyperparameter(
+ "calibration_pipeline",
+ choices=["independent", "basic", "datafree", "sequential", "layer_sequential"],
+ default_value="independent",
+ meta=dict(desc="Pipeline to use for calibration."),
+ ),
TargetModules(
"target_modules",
default_value=None,
@@ -145,6 +151,8 @@
defaults = self.get_model_dependent_hyperparameter_defaults(model, smash_config)
target_modules = cast(TARGET_MODULES_TYPE, defaults["target_modules"])
+ calibration_pipeline = smash_config["calibration_pipeline"]
+
def quantize_language_model(
attr_name: str | None, language_model: torch.nn.Module, subpaths: list[str]
) -> torch.nn.Module:
@@ -173,7 +181,9 @@
targets=["Linear"],
)
]
- return imported["oneshot"](model=language_model, recipe=recipe, dataset=dataset, processor=processor)
+ return imported["oneshot"](
+ model=language_model, recipe=recipe, dataset=dataset, processor=processor, pipeline=calibration_pipeline
+ )
model = map_targeted_nn_roots(quantize_language_model, model, target_modules)
return model
diff --git a/tests/algorithms/testers/awq.py b/tests/algorithms/testers/awq.py
--- a/tests/algorithms/testers/awq.py
+++ b/tests/algorithms/testers/awq.py
@@ -14,3 +14,6 @@
allow_pickle_files = False
algorithm_class = LLMCompressor
metrics = ["perplexity"]
+ hyperparameters = {
+ "awq_calibration_pipeline": "basic",
+ } |
0d90c81 to
2ca45df
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Description
Nightly test for llm compressor was failing with:
torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flowNewer versions of transformers introduced masking utilities that rely on dynamic control flow, which breaks torch.fx tracing. Our LLaMA test model hits a code path that does not guard with is_tracing(), causing quantization to fail.
To disable FX tracing in Oneshot, we introduced a new hyperparameter for the calibration pipeline in the algorithm. This avoids the tracing issue, but loading the saved model then failed due to Hugging Face’s lazy initialization creating meta tensors. The compression logic was attempting to move these empty tensors to a device, which is invalid.
There is also an upstream fix related to this in compressed-tensors:
vllm-project/compressed-tensors#376
After testing multiple combinations, the following versions are stable so we pin them:
compressed-tensors==0.13.0
llm-compressor==0.9.0.2
Related Issue
Fixes #(issue number)
Type of Change
How Has This Been Tested?
LLMCompressor test passes now.
Checklist
Additional Notes