File tree Expand file tree Collapse file tree 1 file changed +6
-4
lines changed 
examples/pytorch/nlp/huggingface_models/language-modeling/quantization/mix-precision Expand file tree Collapse file tree 1 file changed +6
-4
lines changed Original file line number Diff line number Diff line change @@ -27,7 +27,7 @@ pip install -r requirements.txt
2727### Demo (` MXFP4 ` , ` MXFP8 ` , ` NVFP4 ` , ` uNVFP4 ` )  
2828
2929``` bash 
30- python quantize.py  --model_name_or_path facebook/opt-125m --quantize --dtype MXFP4 --batch_size 8 --accuracy
30+ python quantize.py  --model_name_or_path facebook/opt-125m --quantize --dtype MXFP4 --batch_size 8 --accuracy --enable_torch_compile 
3131``` 
3232
3333### Mix-precision Quantization (` MXFP4 + MXFP8 ` )  
@@ -41,7 +41,8 @@ python quantize.py  \
4141    --use_recipe \
4242    --recipe_file recipes/Meta-Llama-3.1-8B-Instruct_7bits.json \
4343    --accuracy \
44-     --batch_size 32
44+     --batch_size 32 \
45+     --enable_torch_compile
4546
4647#  Llama 3.3 70B
4748deepspeed --include=" localhost:0,1,2,3" 
@@ -112,13 +113,14 @@ Model with mixed precision is not supported in vLLM, but supported in transforme
112113python quantize.py  \
113114    --model_name_or_path meta-llama/Llama-3.1-8B-Instruct \
114115    --quantize \
115-     --iters 0 \
116116    --dtype MXFP4 \
117117    --use_recipe \
118118    --recipe_file recipes/Meta-Llama-3.1-8B-Instruct_7bits.json \
119119    --save \
120120    --save_format auto_round \
121-     --save_path Llama-3.1-8B-Instruct-MXFP4-MXFP8-AR
121+     --save_path Llama-3.1-8B-Instruct-MXFP4-MXFP8-AR \
122+     --enable_torch_compile
123+ 
122124#  Command to inference with transformer:
123125python run_hf_inf.py Llama-3.1-8B-Instruct-MXFP4-MXFP8-AR
124126``` 
 
 
   
 
     
   
   
          
    
    
     
    
      
     
     
    You can’t perform that action at this time.
  
 
    
  
    
      
        
     
       
      
     
   
 
    
    
  
 
  
 
     
    
0 commit comments