-
Notifications
You must be signed in to change notification settings - Fork 51
[Benchmark] Add all gather matmul benchmark #400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: joydddd/stack/21
Are you sure you want to change the base?
Conversation
stack-info: PR: #400, branch: joydddd/stack/22
0513a58 to
d87a64a
Compare
|
stack-info: PR: #400, branch: joydddd/stack/22
d87a64a to
a9da45b
Compare
stack-info: PR: #400, branch: joydddd/stack/22
a9da45b to
e482622
Compare
Optimization implemented in Kraken but not supported in Helion:(a, out) = ag_matmul(a_shared, b), where a = all_gather(a_shared), and out = a@b. Helion does not support conditional calculate tile offset and conditionally use different tensor_descriptor for tensor_descriptor.load. i.e. Same access pattern can be implementation in Helion as: But this generates 2 tensor_descriptor loads in each branch, and breaks Triton data prefetching. |
stack-info: PR: #400, branch: joydddd/stack/22
e482622 to
2e5a80e
Compare
2e5a80e to
1fa69aa
Compare
stack-info: PR: #400, branch: joydddd/stack/22
4d1ff3b to
80dd2ea
Compare
stack-info: PR: #400, branch: joydddd/stack/22
96aa4a7 to
cc373e2
Compare
stack-info: PR: #400, branch: joydddd/stack/22
cc373e2 to
55cd2d8
Compare
stack-info: PR: #400, branch: joydddd/stack/22
55cd2d8 to
5171d4b
Compare
|
If |
5171d4b to
22858ee
Compare
stack-info: PR: #400, branch: joydddd/stack/22
ec22ee1 to
644b641
Compare
stack-info: PR: #400, branch: joydddd/stack/22
22858ee to
e0ab2e4
Compare
stack-info: PR: #400, branch: joydddd/stack/22
e0ab2e4 to
dfcd4ad
Compare
stack-info: PR: #400, branch: joydddd/stack/22
dfcd4ad to
ae9927f
Compare
Yep. If |
|
@yf225 Another distributed benchmark PR. |
Stacked PRs:
[Benchmark] Add all gather matmul benchmark