-
Couldn't load subscription status.
- Fork 52
[Benchmark] Add all reduce benchmark #393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
stack-info: PR: #393, branch: joydddd/stack/21
8a962c5 to
8c301ef
Compare
|
Use custom cpp Benchmarking results for allreduce on 8x devices. (time_us)
Now our performance gap between Helion & Kraken only exist for shape >= 512k where optimal config uses persistent kernel with partial SMs + pre-log. |
8c301ef to
331d20a
Compare
331d20a to
bcdadde
Compare
bcdadde to
8c18c05
Compare
8c18c05 to
1defd16
Compare
5330da4 to
95ae805
Compare
95ae805 to
f8d3763
Compare
f8d3763 to
4d1ff3b
Compare
08b4196 to
19105c5
Compare
stack-info: PR: #393, branch: joydddd/stack/21
4d1ff3b to
80dd2ea
Compare
80dd2ea to
616a327
Compare
616a327 to
2a5733b
Compare
2a5733b to
ec22ee1
Compare
3df55a1 to
6651ba5
Compare
stack-info: PR: #393, branch: joydddd/stack/21
ec22ee1 to
644b641
Compare
644b641 to
b0040f7
Compare
stack-info: PR: #393, branch: joydddd/stack/21
b0040f7 to
fc8be32
Compare
|
@yf225 Will Leaving this distributed benchmark PR to you~ |

Stacked PRs:
[Benchmark] Add all reduce benchmark