Skip to content

Commit 616c813

Browse files
committed
add torch_compile in command
Signed-off-by: He, Xin3 <[email protected]>
1 parent ccfea26 commit 616c813

File tree

1 file changed

+6
-4
lines changed
  • examples/pytorch/nlp/huggingface_models/language-modeling/quantization/mix-precision

1 file changed

+6
-4
lines changed

examples/pytorch/nlp/huggingface_models/language-modeling/quantization/mix-precision/README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ pip install -r requirements.txt
2727
### Demo (`MXFP4`, `MXFP8`, `NVFP4`, `uNVFP4`)
2828

2929
```bash
30-
python quantize.py --model_name_or_path facebook/opt-125m --quantize --dtype MXFP4 --batch_size 8 --accuracy
30+
python quantize.py --model_name_or_path facebook/opt-125m --quantize --dtype MXFP4 --batch_size 8 --accuracy --enable_torch_compile
3131
```
3232

3333
### Mix-precision Quantization (`MXFP4 + MXFP8`)
@@ -41,7 +41,8 @@ python quantize.py \
4141
--use_recipe \
4242
--recipe_file recipes/Meta-Llama-3.1-8B-Instruct_7bits.json \
4343
--accuracy \
44-
--batch_size 32
44+
--batch_size 32 \
45+
--enable_torch_compile
4546

4647
# Llama 3.3 70B
4748
deepspeed --include="localhost:0,1,2,3" --master_port=29500 quantize.py \
@@ -112,13 +113,14 @@ Model with mixed precision is not supported in vLLM, but supported in transforme
112113
python quantize.py \
113114
--model_name_or_path meta-llama/Llama-3.1-8B-Instruct \
114115
--quantize \
115-
--iters 0 \
116116
--dtype MXFP4 \
117117
--use_recipe \
118118
--recipe_file recipes/Meta-Llama-3.1-8B-Instruct_7bits.json \
119119
--save \
120120
--save_format auto_round \
121-
--save_path Llama-3.1-8B-Instruct-MXFP4-MXFP8-AR
121+
--save_path Llama-3.1-8B-Instruct-MXFP4-MXFP8-AR \
122+
--enable_torch_compile
123+
122124
# Command to inference with transformer:
123125
python run_hf_inf.py Llama-3.1-8B-Instruct-MXFP4-MXFP8-AR
124126
```

0 commit comments

Comments
 (0)