-
Notifications
You must be signed in to change notification settings - Fork 352
nvfp4: support inference_mode and rank 3 #3240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/vkuzo/153/head
Are you sure you want to change the base?
Conversation
|
Stack from ghstack (oldest at bottom): |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3240
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 1e3d3e3 with merge base 53b5efd ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Adds support for inference mode and rank 3 inputs for nvfp4 inference Test Plan: ``` pytest test/prototype/mx_formats/test_inference_workflow.py -s -x -k test_inference_workflow_nvfp4 ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 459143d ghstack-comment-id: 3441037282 Pull-Request: #3240
| ids=lambda s: f"{s[0]}x{s[1]}x{s[2]}", | ||
| ) | ||
| @pytest.mark.parametrize("use_inference_mode", [False, True]) | ||
| @pytest.mark.parametrize("x_rank", [2, 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe instead of x_rank, we can follow this:
ao/test/quantization/quantize_/workflows/float8/test_float8_tensor.py
Lines 76 to 77 in e9c7bea
| ((128,), 256, 128), | |
| ((32, 128), 64, 256), |
Summary:
Adds support for inference mode and rank 3 inputs for nvfp4 inference
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: