Skip to content

fix: inline kernel arg promotion to avoid wrapper function pattern#1214

Merged
pvelesko merged 2 commits intomainfrom
fix-kernel-arg-shuffle
Apr 14, 2026
Merged

fix: inline kernel arg promotion to avoid wrapper function pattern#1214
pvelesko merged 2 commits intomainfrom
fix-kernel-arg-shuffle

Conversation

@pvelesko
Copy link
Copy Markdown
Collaborator

@pvelesko pvelesko commented Mar 20, 2026

Summary

HipPromoteInts::promoteKernelArgs created a wrapper entry point with i32 params that called an inner .unpromoted function with the original narrow types (i1/i8/i16). Intel's GPU compiler (IGC) does not propagate OpExecutionMode SubgroupSize to called functions, so subgroup shuffle operations inside the inner function executed with the wrong subgroup size, producing incorrect results.

Changed promoteKernelArgs to modify the kernel function in-place: the function body is spliced into a new function with i32 params, and trunc instructions convert back to the original narrow type at the entry block. This produces a flat kernel without function call indirection.

Impact

Fixes ~50 library test failures across rocPRIM, hipCUB, and rocThrust. All multi-block parallel scan, sort, and reduce operations were broken because rocPRIM's lookback_scan_kernel has bool parameters (override_first_value, save_last_value).

Verified: thrust::inclusive_scan now passes at all sizes including N=4097, N=8192, N=65536 which were failing since the LLVM 22 native SPIR-V backend was first tested.

Test plan

  • Includes regression test: TestBoolParamShuffle — warp scan with int param (passes) vs bool param (was failing, now passes)
  • thrust::inclusive_scan N=128..65536 all PASS
  • chipStar unit tests unaffected
  • Re-run full rocPRIM/hipCUB/rocThrust test suites

@pvelesko pvelesko changed the title fix: inline kernel arg promotion to avoid IGC shuffle bug fix: inline kernel arg promotion to avoid wrapper function pattern Mar 20, 2026
@pvelesko pvelesko force-pushed the fix-kernel-arg-shuffle branch from 98e6508 to 83d9470 Compare April 8, 2026 14:36
The previous promoteKernelArgs implementation created a two-level
function pattern: a wrapper kernel (with i32 params) calling an
internal "unpromoted" function (with the original narrow types).
This caused subgroup shuffle operations (OpSubgroupShuffleINTEL) in
the called function to return wrong results on Intel GPUs (IGC bug).

Replace with in-place promotion: the kernel function is directly
modified to use i32 params, with trunc instructions inserted at
the entry block. This produces a flat kernel without function call
indirection, which works correctly with Intel's subgroup shuffles.

Fixes rocPRIM/hipCUB/rocThrust scan/sort/reduce failures caused by
lookback_scan_kernel's bool parameters triggering the IGC bug.
@pvelesko pvelesko force-pushed the fix-kernel-arg-shuffle branch from 83d9470 to f867de1 Compare April 8, 2026 19:11
@pvelesko
Copy link
Copy Markdown
Collaborator Author

/run-aurora-ci

@pvelesko pvelesko merged commit e4101ef into main Apr 14, 2026
36 of 46 checks passed
@pvelesko pvelesko deleted the fix-kernel-arg-shuffle branch April 14, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant