-
- CUDA Warp-Level Primitives
- Parallel reduction
-
- Memory Coalescing
- Shared Memory
- Bank Conflict
- Swizzling
- CuTe
-
- Tile Size Tuning
- Shared Memory
- Bank Conflict
- Double Buffer
- Warp Divergence
- Vectorized memory access
make build
make install <kernel_name>make run <kernel_name>make ncu <kernel_name>make clean运行SGEMM测试时支持以下选项:
--bench: 启用基准测试模式--times N: 指定基准测试迭代次数(默认:3)--help: 显示帮助信息
例如:
make run <kernel_name> -- --bench --times 10- 本项目使用了 Chtholly-Boss/swizzle 的一些工具函数