Skip to content

Conversation

@PanZezhong1725
Copy link
Collaborator

  1. DCU单Block最大线程数512
  2. kernel默认最大线程数256,需要用__launch_bounds__扩大到512
  3. 目前的思路是xmake把DCU视为一种特殊的cuda,与nv cuda无法共存
  4. DCU的cublas matmul暂不支持内部fp16计算,内部会将fp16转成fp32

@PanZezhong1725 PanZezhong1725 merged commit 93db3ad into dev Feb 10, 2025
1 check passed
@PanZezhong1725 PanZezhong1725 deleted the sugon_dcu branch February 10, 2025 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

类别:开发 新功能开发 紧急! 需要尽快合并

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants