-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Open
Labels
Description
Issue
ncclSymkImplemented() only returns true for ncclDevSum:
case ncclFuncAllReduce:
case ncclFuncReduceScatter:
return red == ncclDevSum && isFloat && ty != ncclFloat64;However, ncclSymkMask() sets hasLDMC = true for ncclDevMinMax on multiple types:
case ncclInt32:
case ncclUint32:
case ncclInt64:
case ncclUint64:
case ncclFloat16:
case ncclBfloat16:
hasLDMC = red == ncclDevSum || red == ncclDevMinMax;
break;
case ncclFloat8e4m3:
case ncclFloat8e5m2:
hasLDMC = red == ncclDevSum || red == ncclDevMinMax;
hasLDMC &= comm->compCap >= 100;
break;But since ncclSymkImplemented() returns false for:
- All min/max operations (any type)
- All integer types (any operation)
The checks in ncclSymkMask() for integers and min/max are dead code.
Questions
-
Is min/max support intended for symmetric memory kernels?
ncclSymkMask()has code supporting min/max for Float8/Float16/Bfloat16/Int32/Int64- But
ncclSymkImplemented()returnsfalsefor min/max
-
Is integer type support intended?
ncclSymkMask()checks integer types (int32/64, uint32/64)- But
ncclSymkImplemented()returnsfalsefor all integer types
Required Changes
If the above features are intended:
-
Update
ncclSymkImplemented()to allow:- Min/max operations for supported types
- Integer types for AllReduce/ReduceScatter
-
Update
generate.pyto add integer types:all_tys = ["f32", "f16", "bf16", "f8e4m3", "f8e5m2", "i32", "i64", "u32", "u64"]
Impact
- Min/max operations: Cannot use symmetric memory kernels on ANY data type
- Integer types: Cannot use symmetric memory kernels for AllReduce/ReduceScatter
Reactions are currently unavailable