Skip to content

Conversation

@mengniwang95
Copy link
Contributor

@mengniwang95 mengniwang95 commented Oct 11, 2025

User description

Type of Change

update example

Description

detail description

Expected Behavior & Potential Risk

the expected behavior that triggered by this PR

How has this PR been tested?

how to reproduce the test (including hardware information)

Dependency Change?

any library dependency introduced or removed


PR Type

Enhancement


Description

  • Added new main.py script for Llama4 quantization

  • Updated run_quant.sh to use main.py

  • Added neural-compressor dependency


Diagram Walkthrough

flowchart LR
  A["Add main.py"] -- "Quantization script" --> B["Update run_quant.sh"]
  B -- "Use main.py" --> C["Add neural-compressor dependency"]
Loading

File Walkthrough

Relevant files
Enhancement
main.py
Add Llama4 Quantization Script                                                     

examples/pytorch/multimodal-modeling/quantization/auto_round/llama4/main.py

  • Added script for Llama4 quantization using AutoRoundConfig
  • Included argument parsing for model, scheme, device, etc.
  • Implemented model preparation and conversion
+95/-0   
run_quant.sh
Update run_quant.sh to Use main.py                                             

examples/pytorch/multimodal-modeling/quantization/auto_round/llama4/run_quant.sh

  • Modified to call main.py instead of auto_round
+5/-6     
Dependencies
requirements.txt
Add neural-compressor Dependency                                                 

examples/pytorch/multimodal-modeling/quantization/auto_round/llama4/requirements.txt

  • Added neural-compressor dependency
+1/-0     

Signed-off-by: Mengni Wang <[email protected]>
@PRAgent4INC
Copy link
Collaborator

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

The layer_config dictionary is being populated with incorrect keys. The keys should be module names, but they are currently being set to the module objects themselves. This will likely cause issues when the qconfig is used.

for n, m in model.named_modules():
    if not isinstance(m, (torch.nn.Linear)):
        continue
    for name in fp_layers:
        if name in n:
            layer_config[n] = {"bits": 16, "act_bits": 16}
            break
Hardcoded Value

The export_format argument in setup_parser is hardcoded to "llm_compressor", which might not be flexible enough. Consider making it configurable via command-line arguments.

self.add_argument("--export_format", default="llm_compressor", type=str,
                  help="the format to save the model"
                  )

@PRAgent4INC
Copy link
Collaborator

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Align export format

Ensure export_format consistency between arguments.

examples/pytorch/multimodal-modeling/quantization/auto_round/llama4/main.py [82-89]

 qconfig = AutoRoundConfig(
     tokenizer=tokenizer,
     scheme=args.scheme,
     layer_config=layer_config,
-    export_format="llm_compressor",
+    export_format=args.export_format,
     is_mllm=True,
     output_dir=args.output_dir,
 )
Suggestion importance[1-10]: 8

__

Why: Ensuring export_format consistency between arguments prevents potential mismatches and enhances reliability.

Medium
Optimize layer configuration

Use a set for fp_layers to improve lookup efficiency.

examples/pytorch/multimodal-modeling/quantization/auto_round/llama4/main.py [72-81]

-if len(fp_layers) > 0:
-    for n, m in model.named_modules():
-        if not isinstance(m, (torch.nn.Linear)):
-            continue
-        for name in fp_layers:
-            if name in n:
-                layer_config[n] = {"bits": 16, "act_bits": 16}
-                break
+fp_layers_set = set(fp_layers)
+for n, m in model.named_modules():
+    if not isinstance(m, (torch.nn.Linear)):
+        continue
+    if any(name in n for name in fp_layers_set):
+        layer_config[n] = {"bits": 16, "act_bits": 16}
Suggestion importance[1-10]: 6

__

Why: Converting fp_layers to a set can improve lookup efficiency, which is beneficial for larger models.

Low
Simplify string trimming

Use rstrip to remove trailing slashes more efficiently.

examples/pytorch/multimodal-modeling/quantization/auto_round/llama4/main.py [65-66]

-if model_name[-1] == "/":
-    model_name = model_name[:-1]
+model_name = model_name.rstrip("/")
Suggestion importance[1-10]: 5

__

Why: Using rstrip improves readability and efficiency slightly, but the impact is minimal.

Low

Signed-off-by: Mengni Wang <[email protected]>
@mengniwang95
Copy link
Contributor Author

@chensuyue please check the updated example results

@chensuyue chensuyue added this to the 3.6 milestone Oct 13, 2025
@chensuyue chensuyue merged commit ebddfee into master Oct 16, 2025
11 checks passed
@chensuyue chensuyue deleted the mengni/scout_update branch October 16, 2025 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants