Skip to content

fix: remove static + __attribute__((used)) from device library to fix link error#1216

Merged
pvelesko merged 2 commits intomainfrom
2026-03-20-llvm-used-addrspace-mismatch
Apr 14, 2026
Merged

fix: remove static + __attribute__((used)) from device library to fix link error#1216
pvelesko merged 2 commits intomainfrom
2026-03-20-llvm-used-addrspace-mismatch

Conversation

@pvelesko
Copy link
Copy Markdown
Collaborator

@pvelesko pvelesko commented Mar 20, 2026

Summary

The device library (hipspv-spirv64.bc) is compiled as OpenCL C, producing
@llvm.used with ptr (addrspace 0). HIP kernel modules produce @llvm.used
with ptr addrspace(4). LLVM's linker refuses to merge these during
-mlink-builtin-bitcode and reports "Appending variables with different element types" for any HIP TU that also uses __attribute__((used))
notably rocThrust's runtime_static_assert.h.

Fix

Two coordinated changes:

  1. bitcode/_cl_print_str.cl, bitcode/texture.cl — drop
    static + __attribute__((used)) from the 10 affected functions. Without
    used, hipspv.bc no longer emits an @llvm.used global, so the
    addrspace collision with HIP TUs disappears.

  2. llvm_passes/HipPrintf.cppgetOrCreatePrintStringF() now defines
    the body of _cl_print_str inline in the kernel module via IRBuilder
    instead of inserting only an external declaration.

    Why (2) is required: the historical mechanism that pulled _cl_print_str
    into kernel modules was the @llvm.used global itself, not function-level
    liveness. Once @llvm.used is gone (from change 1), -mlink-builtin-bitcode
    (which uses LinkOnlyNeeded) no longer pulls _cl_print_str in, because
    HipPrintf only inserts the call to it after that step runs. Defining
    the body inside HipPrintf makes the kernel module self-contained for
    printf %s and avoids any runtime device-library link path — important
    on backends whose OpenCL driver rejects clLinkProgram on the equivalent
    SPIR-V (e.g. Mali-G52, observed CL_INVALID_OPERATION).

Impact

  • Fixes rocThrust generate_const_iterators build failure.
  • Fixes any HIP code that uses __attribute__((used)) __device__ variables.
  • No regression for printf %s: kernel modules now carry their own
    _cl_print_str body.

Test plan

  • Includes reproducer: TestFixLlvmUsedAddrspace.hip — FAIL before, PASS after.
  • Unit_printf_specifier and PrintfDynamic continue to pass on every
    backend in CI (Intel CPU/GPU, PoCL on macOS, Mali via Salami).
  • CI green across all unit-tests-llvm-{20,21,22}-{native,translator}-{debug,release}
    and unit-tests-llvm-18-release-salami.

@pvelesko pvelesko force-pushed the 2026-03-20-llvm-used-addrspace-mismatch branch 3 times, most recently from f6d8716 to 4d9a54b Compare April 8, 2026 19:09
…HipPrintf

The device library (hipspv-spirv64.bc) is compiled as OpenCL C, producing
@llvm.used entries with `ptr` (addrspace 0). HIP kernel modules produce
@llvm.used entries with `ptr addrspace(4)`. LLVM's linker refuses to merge
these during -mlink-builtin-bitcode and reports "Appending variables with
different element types" for any HIP TU that also uses __attribute__((used))
(notably rocThrust's runtime_static_assert.h).

Fix part 1 (bitcode): drop `static + __attribute__((used))` from
_cl_print_str.cl and texture.cl. Without `used`, hipspv.bc no longer
emits an @llvm.used global, so the addrspace collision goes away.

Fix part 2 (llvm_passes/HipPrintf.cpp): the historical pull-in for
_cl_print_str was the @llvm.used global itself — once it is gone,
-mlink-builtin-bitcode (LinkOnlyNeeded) no longer pulls _cl_print_str
into the kernel module, because HipPrintf inserts the call only AFTER
that step runs. Define the body of _cl_print_str inline in HipPrintf's
getOrCreatePrintStringF() via IRBuilder so the kernel module is
self-contained for printf %s. Equivalent C:

    void _cl_print_str(__generic const char *S) {
      if (S == 0) return;
      unsigned Pos = 0;
      char C;
      while ((C = S[Pos]) != 0) { printf("%c", C); ++Pos; }
    }

This avoids any runtime device-library link path for printf %s, which
is important on backends whose OpenCL driver rejects clLinkProgram on
the equivalent SPIR-V (e.g. Mali-G52).

Fixes rocThrust generate_const_iterators build failure and any HIP code
that uses `__attribute__((used)) __device__` variables. Adds reproducer
TestFixLlvmUsedAddrspace.hip.
@pvelesko pvelesko force-pushed the 2026-03-20-llvm-used-addrspace-mismatch branch from 4d9a54b to 6812405 Compare April 9, 2026 10:00
@pvelesko
Copy link
Copy Markdown
Collaborator Author

/run-aurora-ci

@pvelesko pvelesko merged commit 3c8d52d into main Apr 14, 2026
28 of 38 checks passed
@pvelesko pvelesko deleted the 2026-03-20-llvm-used-addrspace-mismatch branch April 14, 2026 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant