Skip to content

feat: add ROCm/HIP stub libraries for lazy loading (mirrors CUDA stubs)#1867

Open
LeiWang1999 wants to merge 2 commits intotile-ai:mainfrom
LeiWang1999:feat/rocm-stubs
Open

feat: add ROCm/HIP stub libraries for lazy loading (mirrors CUDA stubs)#1867
LeiWang1999 wants to merge 2 commits intotile-ai:mainfrom
LeiWang1999:feat/rocm-stubs

Conversation

@LeiWang1999
Copy link
Member

@LeiWang1999 LeiWang1999 commented Feb 22, 2026

Summary

Add HIP and HIPRTC stub libraries mirroring the existing CUDA stubs in src/target/stubs/. This enables:

  1. CPU-only imports: TileLang can be imported on machines without ROCm installed
  2. Version conflict avoidance: libamdhip64.so / libhiprtc.so are loaded dynamically at runtime via dlopen/dlsym, avoiding hard DT_NEEDED dependencies
  3. Portable wheels: A single ROCm-enabled wheel can work across different ROCm versions

New Files

File Description
src/target/stubs/hip.h Stub header with HIPDriverAPI singleton and X-macro function list
src/target/stubs/hip.cc Lazy-loading impl for libamdhip64.so + minimal HSA wrappers (hsa_init/hsa_shut_down)
src/target/stubs/hiprtc.cc Lazy-loading impl for libhiprtc.so
src/target/stubs/vendor/hip_runtime.h Minimal vendored HIP types for building without ROCm SDK

CMake Changes

  • Add TILELANG_USE_HIP_STUBS option (default ON, POSIX-only)
  • Build hip_stub and hiprtc_stub shared libraries when USE_ROCM=ON
  • Override ROCM_HIPHCC_LIBRARY to hip_stub so TVM links against our stub
  • Override ROCM_HSA_LIBRARY to avoid hard dependency on libhsa-runtime64
  • Link tilelang/tvm targets against stubs instead of system HIP libs

Design

Follows the same patterns as the existing CUDA stubs:

  • Lazy loading: Libraries loaded via dlopen() only on first API call
  • Global symbol reuse: Checks RTLD_DEFAULT / RTLD_NEXT first to reuse symbols already loaded by PyTorch or other frameworks
  • Vendored fallback header: Minimal vendor/hip_runtime.h allows compilation without ROCm SDK headers
  • X-macro pattern: Clean enumeration of all required HIP API functions

Tested On

  • AMD MI300X (gfx942) with ROCm 7.1 + PyTorch 2.9.1+rocm6.3

Summary by CodeRabbit

  • New Features

    • ROCm HIP and HIPRTC stub support with lazy runtime loading to avoid hard linkage to system ROCm libraries.
    • New build option TILELANG_USE_HIP_STUBS (POSIX-only); enabling it on Windows now emits a clear error.
    • Vendored minimal HIP header so stubs can be built without a full ROCm SDK.
  • Documentation

    • Docs restructured into separate CUDA and ROCm stub sections; updated behavior, versioning, and build-option details.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds ROCm/HIP lazy-loading stubs and build wiring: new CMake option TILELANG_USE_HIP_STUBS, hip_stub and hiprtc_stub shared targets, vendor minimal HIP headers, POSIX-only stub implementations for HIP and HIPRTC (with minimal HSA wrappers), and CMake linking/install/RPATH adjustments to use stubs when enabled.

Changes

Cohort / File(s) Summary
CMake Build Configuration
CMakeLists.txt
Adds TILELANG_USE_HIP_STUBS option with Windows guard; creates hip_stub and hiprtc_stub SHARED targets when enabled; forces ROCM_HIPHCC_LIBRARYhip_stub and ROCM_HSA_LIBRARY→NOTFOUND; links hip_stub into TILELANG_OUTPUT_TARGETS and appends stubs for install/RPATH handling.
Documentation
docs/runtime_internals/stubs.md
Restructures stub docs into CUDA and ROCm sections; renames CUDA stubs to *_stub; documents lazy-loading semantics, RTLD_DEFAULT/RTLD_NEXT reuse, HSA wrapper behavior, and new TILELANG_USE_HIP_STUBS option.
HIP Stub Header
src/target/stubs/hip.h
New header providing vendored hip_runtime.h fallback, visibility macros, tvm::tl::hip::HIPDriverAPI struct (lazy-loaded function pointers), X-macro of required HIP APIs, and exported extern "C" HIP wrapper declarations.
HIP Stub Implementation
src/target/stubs/hip.cc
POSIX-only implementation that lazily resolves libamdhip64.so symbols (prefers RTLD_DEFAULT/RTLD_NEXT), implements ~30 HIP runtime/module wrappers, and provides minimal lazy HSA init/shutdown wrappers with safe failure behavior.
HIPRTC Stub Implementation
src/target/stubs/hiprtc.cc
POSIX-only hiprtc_stub that lazily loads libhiprtc.so (tries common SONAMEs), exposes core HIPRTC APIs, and returns consistent fallback errors when unavailable.
Vendor Headers
src/target/stubs/vendor/hip_runtime.h
Adds minimal vendored HIP runtime header with opaque handle types, enums, partial hipDeviceProp_t, and ~30 function declarations so stubs build without a full ROCm SDK; guarded to prevent direct inclusion.

Sequence Diagram(s)

sequenceDiagram
    participant App
    participant hip_stub as "hip_stub (stub lib)"
    participant Loader as "dlopen/dlsym"
    participant libamd as "libamdhip64.so"
    participant HSA as "HSA runtime (optional)"

    App->>hip_stub: call hip API (e.g., hipMalloc)
    alt symbol already loaded in process
        hip_stub->>hip_stub: resolve via RTLD_DEFAULT/RTLD_NEXT
        hip_stub->>App: call resolved function and return
    else symbol not present
        hip_stub->>Loader: dlopen("libamdhip64.so")
        Loader-->>libamd: load library
        hip_stub->>Loader: dlsym("hipMalloc")
        Loader-->>hip_stub: function ptr
        alt HSA init required
            hip_stub->>HSA: lazy hsa_init wrapper
            HSA-->>hip_stub: status
        end
        hip_stub->>libamd: call hipMalloc(...)
        libamd-->>hip_stub: result
        hip_stub->>App: return result
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

enhancement, dependencies

Suggested reviewers

  • XuehaiPan
  • oraluben

Poem

🐰 I hopped through CMake and found a way,
To dlsym the hip and only wake at play,
Stubs curl up lazy until calls arrive,
No heavy links — the runtime can thrive,
A joyful hop — the build stays light! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.20% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: adding ROCm/HIP stub libraries for lazy loading, which mirrors the existing CUDA stub functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
src/target/stubs/hip.cc (1)

173-180: MissingHsaError(): both #if branches are equivalent — same dead-conditional pattern as hiprtc.cc.

static_cast<hsa_status_t>(1) and plain 1 are identical since hsa_status_t is typedef int. Collapse to a single branch.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/target/stubs/hip.cc` around lines 173 - 180, The two `#if` branches for
MissingHsaError() are redundant because both return the same value; remove the
conditional compilation and provide a single static hsa_status_t
MissingHsaError() implementation (e.g., return 1) replacing the `#if/`#else/#endif
block so there is only one definition; update any surrounding comments
accordingly and ensure the function signature remains static hsa_status_t
MissingHsaError().
src/target/stubs/hiprtc.cc (2)

95-104: GetSymbol template duplicated with hip.cc.

This helper is identical to the one in hip.cc (lines 67–76). Consider extracting it into a shared internal header (e.g., stubs/dlopen_util.h) to keep the two stub implementations in sync.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/target/stubs/hiprtc.cc` around lines 95 - 104, The GetSymbol template in
hiprtc.cc is duplicated in hip.cc; extract this helper into a shared internal
header (e.g., create stubs/dlopen_util.h) containing the template T
GetSymbol(void* handle, const char* name) and include that header from both
hiprtc.cc and hip.cc, then remove the duplicate definitions from each .cc so
both files use the single shared implementation; ensure the header is internal
(no public API) and has include guards or pragma once.

159-165: MissingLibraryError(): both #if branches are identical — dead conditional.

Both paths return HIPRTC_ERROR_INTERNAL_ERROR, making the preprocessor guard pointless. If the intent was to use a more specific error code when real headers are available (e.g., HIPRTC_ERROR_BUILTIN_OPERATION_FAILURE), this should be updated; otherwise, collapse to a single return.

Proposed fix
 hiprtcResult MissingLibraryError() {
-#if TILELANG_HAS_HIPRTC_HEADERS
-  return HIPRTC_ERROR_INTERNAL_ERROR;
-#else
   return HIPRTC_ERROR_INTERNAL_ERROR;
-#endif
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/target/stubs/hiprtc.cc` around lines 159 - 165, The preprocessor guard in
MissingLibraryError() is dead because both branches return
HIPRTC_ERROR_INTERNAL_ERROR; either collapse the `#if/`#else/#endif and return
HIPRTC_ERROR_INTERNAL_ERROR directly from MissingLibraryError(), or if the
intent was to return a more specific error when HIPRTC headers are present,
change the TILELANG_HAS_HIPRTC_HEADERS branch to return the intended code (e.g.,
HIPRTC_ERROR_BUILTIN_OPERATION_FAILURE) while keeping the else branch returning
HIPRTC_ERROR_INTERNAL_ERROR; update the function MissingLibraryError()
accordingly and remove the redundant conditional.
CMakeLists.txt (1)

268-277: Unused TILELANG_HIPRTC_STUB_EXPORTS compile definition for hiprtc_stub.

Line 270 defines TILELANG_HIPRTC_STUB_EXPORTS for the hiprtc_stub target, but hiprtc.cc (Line 62) hardcodes TILELANG_HIPRTC_STUB_API as __attribute__((visibility("default"))) without checking this macro. Either wire up the define in hiprtc.cc (as hip.h does for TILELANG_HIP_STUB_EXPORTS) or remove the unused definition.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CMakeLists.txt` around lines 268 - 277, The TILELANG_HIPRTC_STUB_EXPORTS
compile definition added to the hiprtc_stub target is unused because hiprtc.cc
currently hardcodes TILELANG_HIPRTC_STUB_API as
__attribute__((visibility("default"))) instead of honoring the export macro;
either remove the add of TILELANG_HIPRTC_STUB_EXPORTS from the CMake target or
update hiprtc.cc to mirror hip.h’s pattern (where TILELANG_HIP_STUB_EXPORTS
controls TILELANG_HIP_STUB_API) by making TILELANG_HIPRTC_STUB_API conditional
on TILELANG_HIPRTC_STUB_EXPORTS (and on Windows visibility if needed) so the
compile definition is actually respected by the symbol export logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/target/stubs/hip.cc`:
- Around line 78-99: TryLoadLibAmdHip64 (and similarly TryLoadLibHsaRuntime)
currently returns RTLD_DEFAULT/RTLD_NEXT which can be null on some platforms and
gets misinterpreted as "not found" by CreateHIPDriverAPI; change the logic so
that when a global/next symbol is detected (the dlsym checks that find
"hipGetErrorString"), the loader returns a non-null sentinel handle (for example
a distinct pointer value like reinterpret_cast<void*>(1)) instead of
RTLD_DEFAULT/RTLD_NEXT, and update CreateHIPDriverAPI to treat that sentinel as
a valid "use global namespace" result and perform symbol lookups with
RTLD_DEFAULT/RTLD_NEXT as appropriate; ensure the same sentinel and handling
change is applied to TryLoadLibHsaRuntime and places that check for handle ==
nullptr.
- Around line 108-117: GetSymbol currently consumes dlerror() so the subsequent
dlerror() in the LOOKUP_REQUIRED macro always yields "unknown"; modify GetSymbol
to also capture and return the dlerror() string (e.g., via a returned
std::optional<std::string> or an out parameter) when it fails to find a symbol,
then update LOOKUP_REQUIRED (used by TILELANG_LIBHIP_API_REQUIRED) to use that
returned error value when api.name##_ is nullptr so the thrown
std::runtime_error includes the real dlerror() message instead of "unknown".

In `@src/target/stubs/hiprtc.cc`:
- Around line 72-93: The CreateHIPRTCAPI path incorrectly treats
RTLD_DEFAULT/RTLD_NEXT (which can be NULL on glibc) as a missing handle; update
CreateHIPRTCAPI so it does not bail out when handle == nullptr and instead
always calls GetSymbol/GetProcAddress with the returned handle from
TryLoadLibHiprtc (so dlsym(NULL, name) can resolve global symbols), i.e., remove
the early "if (handle == nullptr) return empty API" check and allow GetSymbol to
perform dlsym against RTLD_DEFAULT/RTLD_NEXT/real dlopen handles; apply the same
change to CreateHIPDriverAPI for TryLoadLibAmdHip64 in src/target/stubs/hip.cc.

---

Nitpick comments:
In `@CMakeLists.txt`:
- Around line 268-277: The TILELANG_HIPRTC_STUB_EXPORTS compile definition added
to the hiprtc_stub target is unused because hiprtc.cc currently hardcodes
TILELANG_HIPRTC_STUB_API as __attribute__((visibility("default"))) instead of
honoring the export macro; either remove the add of TILELANG_HIPRTC_STUB_EXPORTS
from the CMake target or update hiprtc.cc to mirror hip.h’s pattern (where
TILELANG_HIP_STUB_EXPORTS controls TILELANG_HIP_STUB_API) by making
TILELANG_HIPRTC_STUB_API conditional on TILELANG_HIPRTC_STUB_EXPORTS (and on
Windows visibility if needed) so the compile definition is actually respected by
the symbol export logic.

In `@src/target/stubs/hip.cc`:
- Around line 173-180: The two `#if` branches for MissingHsaError() are redundant
because both return the same value; remove the conditional compilation and
provide a single static hsa_status_t MissingHsaError() implementation (e.g.,
return 1) replacing the `#if/`#else/#endif block so there is only one definition;
update any surrounding comments accordingly and ensure the function signature
remains static hsa_status_t MissingHsaError().

In `@src/target/stubs/hiprtc.cc`:
- Around line 95-104: The GetSymbol template in hiprtc.cc is duplicated in
hip.cc; extract this helper into a shared internal header (e.g., create
stubs/dlopen_util.h) containing the template T GetSymbol(void* handle, const
char* name) and include that header from both hiprtc.cc and hip.cc, then remove
the duplicate definitions from each .cc so both files use the single shared
implementation; ensure the header is internal (no public API) and has include
guards or pragma once.
- Around line 159-165: The preprocessor guard in MissingLibraryError() is dead
because both branches return HIPRTC_ERROR_INTERNAL_ERROR; either collapse the
`#if/`#else/#endif and return HIPRTC_ERROR_INTERNAL_ERROR directly from
MissingLibraryError(), or if the intent was to return a more specific error when
HIPRTC headers are present, change the TILELANG_HAS_HIPRTC_HEADERS branch to
return the intended code (e.g., HIPRTC_ERROR_BUILTIN_OPERATION_FAILURE) while
keeping the else branch returning HIPRTC_ERROR_INTERNAL_ERROR; update the
function MissingLibraryError() accordingly and remove the redundant conditional.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
src/target/stubs/hiprtc.cc (1)

158-164: Dead conditional — both branches of #if TILELANG_HAS_HIPRTC_HEADERS are identical.

Both the #if and #else branches return HIPRTC_ERROR_INTERNAL_ERROR. The preprocessor conditional serves no purpose here.

Simplify
 hiprtcResult MissingLibraryError() {
-#if TILELANG_HAS_HIPRTC_HEADERS
-  return HIPRTC_ERROR_INTERNAL_ERROR;
-#else
   return HIPRTC_ERROR_INTERNAL_ERROR;
-#endif
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/target/stubs/hiprtc.cc` around lines 158 - 164, The function
MissingLibraryError contains a redundant preprocessor conditional where both the
`#if` and `#else` branches return HIPRTC_ERROR_INTERNAL_ERROR; simplify by removing
the `#if/`#else/#endif and have MissingLibraryError() directly return
HIPRTC_ERROR_INTERNAL_ERROR (locate the function by name MissingLibraryError in
hiprtc.cc).
src/target/stubs/hip.cc (1)

172-179: Minor: MissingHsaError branches are functionally identical.

The #if TILELANG_HAS_HSA_HEADERS branch does static_cast<hsa_status_t>(1) and the #else branch returns 1. Since hsa_status_t is int in the fallback (line 43), these are identical.

Simplify
-#if TILELANG_HAS_HSA_HEADERS
-static hsa_status_t MissingHsaError() {
-  // Any non-success value makes TVM treat ROCm as not existing.
-  return static_cast<hsa_status_t>(1);
-}
-#else
-static hsa_status_t MissingHsaError() { return 1; }
-#endif
+static hsa_status_t MissingHsaError() {
+  // Any non-success value makes TVM treat ROCm as not existing.
+  return static_cast<hsa_status_t>(1);
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/target/stubs/hip.cc` around lines 172 - 179, The two conditional branches
for MissingHsaError are identical; simplify by providing a single definition
that returns 1 (or static_cast<hsa_status_t>(1) if you prefer) unconditionally,
removing the `#if` TILELANG_HAS_HSA_HEADERS / `#else` / `#endif` wrapping so the
function MissingHsaError() is defined once with a consistent return value; keep
the function signature and name exactly as MissingHsaError to preserve callers.
src/target/stubs/vendor/hip_runtime.h (1)

12-18: Nit: #pragma once is redundant after the include-guard #error.

The #error directive on lines 14-15 already prevents this header from being included directly. The #pragma once on line 18 won't add meaningful protection since the file can only be reached through hip.h which defines/undefines the guard each time. This is harmless, just unnecessary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/target/stubs/vendor/hip_runtime.h` around lines 12 - 18, The header
vendor/hip_runtime.h redundantly contains a `#pragma` once after the guarding
check using the macro _TILELANG_HIP_STUB_INCLUDE_GUARD; remove the unnecessary
`#pragma` once from vendor/hip_runtime.h so the inclusion policy relies solely on
the existing _TILELANG_HIP_STUB_INCLUDE_GUARD check (which is set/cleared by the
stub wrapper hip.h).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CMakeLists.txt`:
- Around line 236-291: target_include_directories is being called with
${ROCM_INCLUDE_DIRS} unconditionally for the hip_stub and hiprtc_stub targets
and appended unconditionally to TILE_LANG_INCLUDES, which can be empty/unset;
wrap the include-directory usage in presence checks (e.g. if(ROCM_INCLUDE_DIRS)
... endif()) so only call target_include_directories(hip_stub PRIVATE
${ROCM_INCLUDE_DIRS}) and target_include_directories(hiprtc_stub PRIVATE
${ROCM_INCLUDE_DIRS}) when ROCM_INCLUDE_DIRS is non-empty, and similarly only
append ROCM_INCLUDE_DIRS to TILE_LANG_INCLUDES when it is set.

In `@src/target/stubs/hip.cc`:
- Line 188: is_available() incorrectly treats a NULL handle from get_handle() as
"not available" even when get_handle() returned RTLD_DEFAULT (which is NULL on
glibc); change is_available() so it returns true when the HIP runtime was
successfully located via RTLD_DEFAULT. Concretely, either have get_handle() set
and expose an additional boolean (e.g. handle_is_rtld_default or
handle_resolved) when it resolves symbols via RTLD_DEFAULT, or make
is_available() check for a resolved-symbol presence (e.g. a known function
pointer populated by get_handle() like hipInit) instead of only get_handle() !=
nullptr; update is_available() to use that flag or symbol-check so get() no
longer throws when RTLD_DEFAULT was used.

---

Duplicate comments:
In `@src/target/stubs/hip.cc`:
- Around line 77-98: TryLoadLibAmdHip64 (and similarly TryLoadLibHsaRuntime)
returns RTLD_DEFAULT/RTLD_NEXT which can be NULL on glibc and is incorrectly
treated as failure by CreateHIPDriverAPI's nullptr check; change these helpers
to return a non-null sentinel value when they find the symbol in the global
namespace (e.g. a unique static void* sentinel) instead of returning
RTLD_DEFAULT/RTLD_NEXT, and update CreateHIPDriverAPI to treat that sentinel as
"library available in global namespace" (resolve symbols via dlsym(RTLD_DEFAULT,
...) when sentinel is seen); apply the same sentinel-based change to
TryLoadLibHsaRuntime and ensure comparisons against the address of
hipGetErrorString remain intact to detect already-loaded symbols, falling back
to dlopen(kLibHipPaths...) only when neither global symbol nor sentinel is set.
- Around line 107-116: GetSymbol currently clears and then consumes dlerror(),
so when LOOKUP_REQUIRED (macro) checks dlerror() it always sees nullptr; update
GetSymbol (or add a companion GetSymbolError helper) to capture and store the
dlerror() text before returning nullptr (e.g., store in a thread_local string or
return an error string alongside the symbol), then change LOOKUP_REQUIRED to use
that captured error (from the new GetSymbol error accessor) instead of calling
dlerror() directly; reference GetSymbol, LOOKUP_REQUIRED and
TILELANG_LIBHIP_API_REQUIRED in your changes.

In `@src/target/stubs/hiprtc.cc`:
- Around line 72-93: TryLoadLibHiprtc currently returns RTLD_DEFAULT (which is
NULL on glibc) and gets treated as "not found" by CreateHIPRTCAPI; change
TryLoadLibHiprtc to return a non-NULL sentinel (e.g., a unique static void*
value) when the global symbol lookup succeeded via dlsym(RTLD_DEFAULT) or
dlsym(RTLD_NEXT), and update CreateHIPRTCAPI (and any other checks around the
returned handle) to detect that sentinel and treat it as the global-scope result
(i.e., call GetSymbol/dlsym with nullptr or otherwise resolve symbols from
global scope) instead of treating it as a missing handle; refer to
TryLoadLibHiprtc, CreateHIPRTCAPI, GetSymbol and kLibHiprtcPaths when making the
change.

---

Nitpick comments:
In `@src/target/stubs/hip.cc`:
- Around line 172-179: The two conditional branches for MissingHsaError are
identical; simplify by providing a single definition that returns 1 (or
static_cast<hsa_status_t>(1) if you prefer) unconditionally, removing the `#if`
TILELANG_HAS_HSA_HEADERS / `#else` / `#endif` wrapping so the function
MissingHsaError() is defined once with a consistent return value; keep the
function signature and name exactly as MissingHsaError to preserve callers.

In `@src/target/stubs/hiprtc.cc`:
- Around line 158-164: The function MissingLibraryError contains a redundant
preprocessor conditional where both the `#if` and `#else` branches return
HIPRTC_ERROR_INTERNAL_ERROR; simplify by removing the `#if/`#else/#endif and have
MissingLibraryError() directly return HIPRTC_ERROR_INTERNAL_ERROR (locate the
function by name MissingLibraryError in hiprtc.cc).

In `@src/target/stubs/vendor/hip_runtime.h`:
- Around line 12-18: The header vendor/hip_runtime.h redundantly contains a
`#pragma` once after the guarding check using the macro
_TILELANG_HIP_STUB_INCLUDE_GUARD; remove the unnecessary `#pragma` once from
vendor/hip_runtime.h so the inclusion policy relies solely on the existing
_TILELANG_HIP_STUB_INCLUDE_GUARD check (which is set/cleared by the stub wrapper
hip.h).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/target/stubs/hip.cc`:
- Around line 100-145: CreateHIPDriverAPI currently returns an all-null
HIPDriverAPI singleton if any LOOKUP fails, which makes
HIPDriverAPI::is_available()/get() incorrectly report success and later cause
null-function-pointer crashes; change the logic so that CreateHIPDriverAPI (or
the singleton initialization) detects partial loads and fails early: after
resolving symbols with the LOOKUP macro (or by checking the populated api),
verify that at least one representative function pointer (e.g. hipSetDevice_,
hipGetDeviceCount_, or hipGetLastError_) is non-null and if not return an
empty/invalid HIPDriverAPI, and also update HIPDriverAPI::is_available() or
get() to confirm that the handle is non-null AND that the representative
function pointer(s) are non-null before claiming availability.

---

Duplicate comments:
In `@CMakeLists.txt`:
- Around line 252-253: Wrap the target_include_directories(hip_stub PRIVATE
${ROCM_INCLUDE_DIRS}) call in an explicit CMake check so it only runs when
ROCM_INCLUDE_DIRS is set/non-empty (e.g., if(ROCM_INCLUDE_DIRS) or if(DEFINED
ROCM_INCLUDE_DIRS AND ROCM_INCLUDE_DIRS) ... endif()), leaving
target_compile_definitions(TILELANG_HIP_STUB_EXPORTS) unchanged; apply the same
guard to the other target_include_directories invocation elsewhere in the file
that uses ${ROCM_INCLUDE_DIRS} so you never pass an empty list into
target_include_directories.

In `@src/target/stubs/hip.cc`:
- Around line 214-224: The is_available() check using get_handle() fails when
the HIP runtime was resolved via RTLD_DEFAULT; modify HIPDriverAPI so
CreateHIPDriverAPI marks success by setting a new bool member (e.g.,
HIPDriverAPI::populated) when it successfully populates the API, have
is_available() return populated (not get_handle()!=nullptr), and update
HIPDriverAPI::get() to rely on that populated flag when deciding whether to
throw; locate CreateHIPDriverAPI, HIPDriverAPI::is_available, and
HIPDriverAPI::get to implement this change.
- Around line 77-105: TryLoadLibAmdHip64 (and likewise
TryLoadLibHsaRuntime/GetLibHsaHandle) must not return nullptr when a global
pseudo-handle (RTLD_DEFAULT/RTLD_NEXT) was found because glibc defines those as
((void*)0); instead return a non-null sentinel value and keep track of which
pseudo-handle was chosen. Update TryLoadLibAmdHip64/TryLoadLibHsaRuntime to
return a unique non-null sentinel for RTLD_DEFAULT and RTLD_NEXT cases, and
modify CreateHIPDriverAPI (and any code using HIPDriverAPI::get_handle) to
detect those sentinels and translate them back to the corresponding
pseudo-handle when calling dlsym/LOOKUP so the null-check (if (handle ==
nullptr)) no longer incorrectly treats a global-loaded library as absent. Ensure
the sentinel is documented and reused consistently between loader functions and
the dlsym callers.

Add HIP and HIPRTC stub libraries mirroring the existing CUDA stubs.
This enables:

1. Importing TileLang on CPU-only machines without ROCm installed
2. Avoiding ROCm version conflicts by loading libamdhip64.so/libhiprtc.so
   dynamically at runtime via dlopen/dlsym
3. Building portable wheels that work across ROCm versions

New files:
- src/target/stubs/hip.h: Stub header with HIPDriverAPI singleton
- src/target/stubs/hip.cc: Lazy-loading implementation for libamdhip64.so,
  includes minimal HSA wrappers (hsa_init/hsa_shut_down)
- src/target/stubs/hiprtc.cc: Lazy-loading implementation for libhiprtc.so
- src/target/stubs/vendor/hip_runtime.h: Minimal vendored HIP types for
  building without ROCm SDK

CMake changes:
- Add TILELANG_USE_HIP_STUBS option (default ON, POSIX-only)
- Build hip_stub and hiprtc_stub shared libraries
- Override ROCM_HIPHCC_LIBRARY to use hip_stub
- Link tilelang/tvm against stubs instead of system HIP libs

Documentation:
- Update docs/runtime_internals/stubs.md with ROCm stub details
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/target/stubs/hiprtc.cc (1)

158-164: Redundant #if/#else — both branches are identical.

Both the TILELANG_HAS_HIPRTC_HEADERS and non-headers paths return HIPRTC_ERROR_INTERNAL_ERROR. The conditional serves no purpose here.

Proposed fix
 hiprtcResult MissingLibraryError() {
-#if TILELANG_HAS_HIPRTC_HEADERS
-  return HIPRTC_ERROR_INTERNAL_ERROR;
-#else
   return HIPRTC_ERROR_INTERNAL_ERROR;
-#endif
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/target/stubs/hiprtc.cc` around lines 158 - 164, The preprocessor
conditional in MissingLibraryError() is redundant because both branches return
HIPRTC_ERROR_INTERNAL_ERROR; remove the `#if/`#else/#endif surrounding
TILELANG_HAS_HIPRTC_HEADERS and simplify MissingLibraryError() to
unconditionally return HIPRTC_ERROR_INTERNAL_ERROR so the function just returns
that constant directly.
src/target/stubs/hip.cc (1)

198-205: Redundant #if/#else — both branches return the same value.

Both the TILELANG_HAS_HSA_HEADERS and non-headers paths return 1. Simplify to a single unconditional return.

Proposed fix
-#if TILELANG_HAS_HSA_HEADERS
-static hsa_status_t MissingHsaError() {
-  // Any non-success value makes TVM treat ROCm as not existing.
-  return static_cast<hsa_status_t>(1);
-}
-#else
-static hsa_status_t MissingHsaError() { return 1; }
-#endif
+static hsa_status_t MissingHsaError() {
+  // Any non-success value makes TVM treat ROCm as not existing.
+  return static_cast<hsa_status_t>(1);
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/target/stubs/hip.cc` around lines 198 - 205, The two conditional branches
for MissingHsaError are redundant; replace the `#if/`#else/#endif with a single
unconditional function definition for static hsa_status_t MissingHsaError() that
returns the non-success value (use static_cast<hsa_status_t>(1) for type safety)
and remove the TILELANG_HAS_HSA_HEADERS conditional entirely so only one
implementation remains.
src/target/stubs/hip.h (1)

54-83: X-macro TILELANG_LIBHIP_API_REQUIRED is defined but not used for symbol resolution.

The X-macro enumerates all required HIP APIs, but hip.cc manually lists each LOOKUP(member, "symbol") call instead of expanding this macro. This means adding a new API requires updates in three places: the X-macro, the struct members, and the LOOKUP calls. Consider using the X-macro to drive symbol resolution (like the CUDA stubs do) to keep them in sync.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/target/stubs/hip.h` around lines 54 - 83, The
TILELANG_LIBHIP_API_REQUIRED X-macro lists all HIP symbols but isn't used to
perform dynamic symbol lookup; instead hip.cc has manual LOOKUP(member,
"symbol") calls that diverge from the X-macro. Modify hip.cc to drive symbol
resolution from the X-macro by invoking TILELANG_LIBHIP_API_REQUIRED(LOOKUP) (or
creating a small adapter macro if names differ) so the same list expands to both
the struct members and the LOOKUP(...) calls; update any existing manual
LOOKUP(...) lines to be removed and replaced by the X-macro expansion, ensuring
the macro is visible where LOOKUP runs and that the struct definition and
symbol-resolving code use the same X-macro identifier
(TILELANG_LIBHIP_API_REQUIRED) so adding an API only requires one change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@CMakeLists.txt`:
- Around line 236-291: The calls to target_include_directories for the targets
hip_stub and hiprtc_stub assume ROCM_INCLUDE_DIRS is set; guard each
target_include_directories invocation so it only runs when ROCM_INCLUDE_DIRS is
non-empty (e.g., wrap the hip_stub and hiprtc_stub include-directory calls in an
if(ROCM_INCLUDE_DIRS) / endif block or similar check), leaving the rest of the
target configuration unchanged; this ensures safe behavior when find_rocm()
leaves ROCM_INCLUDE_DIRS unset.

In `@src/target/stubs/hip.cc`:
- Around line 77-105: TryLoadLibAmdHip64 (and TryLoadLibHsaRuntime) can return
RTLD_DEFAULT which equals NULL on glibc, so CreateHIPDriverAPI, CreateHSAAPI and
is_available() misinterpret a successful global resolution as "not found"; fix
by returning a small struct (e.g. {bool found; void* handle}) from
TryLoadLibAmdHip64/TryLoadLibHsaRuntime that sets found=true whenever a symbol
or RTLD_DEFAULT/RTLD_NEXT resolution succeeded (even if handle==nullptr), then
update HIPDriverAPI::get_handle()/HSA API equivalent (and
CreateHIPDriverAPI/CreateHSAAPI, is_available(), get()) to consume the struct
and use result.found for availability checks while still using result.handle
when actually calling dlopen/dlsym.

In `@src/target/stubs/hiprtc.cc`:
- Around line 122-127: CreateHIPRTCAPI() currently returns early when
GetLibHiprtcHandle() yields nullptr, which incorrectly treats RTLD_DEFAULT
((void*)0) as "not found" on glibc; remove the early return and allow GetSymbol
to be called with the handle (including NULL) so dlsym(NULL, name) can resolve
global symbols. Specifically, in CreateHIPRTCAPI() (and any logic using
GetLibHiprtcHandle()), drop the if (handle == nullptr) return api; check and
proceed to call GetSymbol(handle, "<symbol>") for each loader entry so
LOOKUP_REQUIRED/GetSymbol can correctly handle NULL handles and populate
HIPRTCAPI api{} as intended.

---

Nitpick comments:
In `@src/target/stubs/hip.cc`:
- Around line 198-205: The two conditional branches for MissingHsaError are
redundant; replace the `#if/`#else/#endif with a single unconditional function
definition for static hsa_status_t MissingHsaError() that returns the
non-success value (use static_cast<hsa_status_t>(1) for type safety) and remove
the TILELANG_HAS_HSA_HEADERS conditional entirely so only one implementation
remains.

In `@src/target/stubs/hip.h`:
- Around line 54-83: The TILELANG_LIBHIP_API_REQUIRED X-macro lists all HIP
symbols but isn't used to perform dynamic symbol lookup; instead hip.cc has
manual LOOKUP(member, "symbol") calls that diverge from the X-macro. Modify
hip.cc to drive symbol resolution from the X-macro by invoking
TILELANG_LIBHIP_API_REQUIRED(LOOKUP) (or creating a small adapter macro if names
differ) so the same list expands to both the struct members and the LOOKUP(...)
calls; update any existing manual LOOKUP(...) lines to be removed and replaced
by the X-macro expansion, ensuring the macro is visible where LOOKUP runs and
that the struct definition and symbol-resolving code use the same X-macro
identifier (TILELANG_LIBHIP_API_REQUIRED) so adding an API only requires one
change.

In `@src/target/stubs/hiprtc.cc`:
- Around line 158-164: The preprocessor conditional in MissingLibraryError() is
redundant because both branches return HIPRTC_ERROR_INTERNAL_ERROR; remove the
`#if/`#else/#endif surrounding TILELANG_HAS_HIPRTC_HEADERS and simplify
MissingLibraryError() to unconditionally return HIPRTC_ERROR_INTERNAL_ERROR so
the function just returns that constant directly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
src/target/stubs/hip.cc (3)

77-98: ⚠️ Potential issue | 🔴 Critical

RTLD_DEFAULT is NULL on glibc — returning it causes downstream nullptr checks to treat HIP as absent.

When real HIP is loaded globally (e.g., PyTorch ROCm imported first), dlsym(RTLD_DEFAULT, …) succeeds and TryLoadLibAmdHip64() returns RTLD_DEFAULT (Line 82), which is ((void*)0) on glibc. This propagates through get_handle()is_available()get(), all of which treat nullptr as "not found," so every wrapper throws even though HIP is actually available.

The idiomatic fix is to use a non-null sentinel (e.g. reinterpret_cast<void*>(0x1)) for the "use global namespace" case and map it back to RTLD_DEFAULT/RTLD_NEXT in GetSymbol/CreateHIPDriverAPI. The same issue applies to TryLoadLibHsaRuntime (Line 155-173).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/target/stubs/hip.cc` around lines 77 - 98, TryLoadLibAmdHip64 currently
returns RTLD_DEFAULT/RTLD_NEXT which are null on glibc and get treated as "not
found"; change TryLoadLibAmdHip64 to return non-null sentinels (e.g.
reinterpret_cast<void*>(0x1) for RTLD_DEFAULT and reinterpret_cast<void*>(0x2)
for RTLD_NEXT) when those dlsym checks succeed, and return actual dlopen handles
as before; then update GetSymbol and CreateHIPDriverAPI to detect these sentinel
values and map them back to RTLD_DEFAULT/RTLD_NEXT when calling dlsym or
constructing the driver API; apply the same sentinel-return and mapping changes
to TryLoadLibHsaRuntime so get_handle/is_available/get treat global namespace
finds as available rather than nullptr.

100-145: ⚠️ Potential issue | 🔴 Critical

Partial symbol resolution yields an all-null singleton while is_available() still returns true.

If dlopen succeeds but any LOOKUP call fails (e.g., version mismatch), the macro returns a fresh HIPDriverAPI{} at Line 110 with every pointer null. The static singleton (Line 217) captures this. However, is_available() (Line 214) only checks get_handle() != nullptr, so it returns true and get() hands back a pointer to the all-null struct. Every subsequent wrapper call then dereferences a null function pointer.

Consider either throwing from CreateHIPDriverAPI on partial failure, or having is_available() additionally verify a representative function pointer (e.g., hipGetDeviceCount_).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/target/stubs/hip.cc` around lines 100 - 145, CreateHIPDriverAPI can
return a partially-populated HIPDriverAPI with all-null pointers when a LOOKUP
fails, but is_available() currently only checks get_handle() != nullptr so the
singleton appears available; update availability check instead of throwing:
modify is_available() to also verify a representative function pointer from the
singleton (for example ensure get()->hipGetDeviceCount_ != nullptr) so callers
won't receive an all-null API, and keep CreateHIPDriverAPI behavior unchanged;
reference functions/objects: CreateHIPDriverAPI, is_available(), get_handle(),
get(), and hipGetDeviceCount_.

216-225: ⚠️ Potential issue | 🟠 Major

get() can return a pointer to a partially/fully-null singleton without ever throwing.

If get_handle() returned a valid non-null handle from dlopen but CreateHIPDriverAPI() failed to resolve symbols (returning an all-null HIPDriverAPI{}), Line 218's is_available() check passes because get_handle() is non-null. The caller then gets a pointer to the all-null singleton and crashes on the first function-pointer call.

A lightweight guard would be to check a representative pointer:

Suggested improvement
 HIPDriverAPI *HIPDriverAPI::get() {
   static HIPDriverAPI singleton = CreateHIPDriverAPI();
-  if (!is_available()) {
+  if (!is_available() || singleton.hipGetDeviceCount_ == nullptr) {
     throw std::runtime_error(
         "HIP runtime library (libamdhip64.so) not found. "
         "Install ROCm (or import a ROCm-enabled framework like PyTorch) before "
         "using TileLang's ROCm backend.");
   }
   return &singleton;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/target/stubs/hip.cc` around lines 216 - 225, The get() function can
return a singleton whose function pointers are all null even though
is_available() (which checks get_handle()) is true; update HIPDriverAPI::get to
validate the created singleton returned by CreateHIPDriverAPI() before returning
it by checking a representative function pointer (e.g., a core API pointer on
the HIPDriverAPI struct such as hipGetDevice or the first essential method) and
if that pointer is null throw a runtime_error with the existing ROCm-not-found
message; this ensures CreateHIPDriverAPI() failed symbol resolution is detected
and prevents returning a partially-null singleton even when get_handle() is
non-null.
🧹 Nitpick comments (2)
src/target/stubs/hip.cc (2)

198-205: Consider using a well-known HSA error code instead of a bare 1.

MissingHsaError() returns 1, which is not a defined hsa_status_t enumerator. While any non-zero value makes TVM skip ROCm, using HSA_STATUS_ERROR (0x1000) in the TILELANG_HAS_HSA_HEADERS branch would be more self-documenting and avoids relying on an out-of-range enum cast.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/target/stubs/hip.cc` around lines 198 - 205, The stub MissingHsaError()
currently returns a bare 1 which is not a defined hsa_status_t enumerator;
change it to return a well-known HSA error value (HSA_STATUS_ERROR / 0x1000) so
the intent is clear. In the TILELANG_HAS_HSA_HEADERS branch return
HSA_STATUS_ERROR; in the no-headers branch return
static_cast<hsa_status_t>(0x1000) or define a local constexpr kHsaStatusError =
0x1000 and return that; update the function MissingHsaError() accordingly.

19-31: _GNU_SOURCE should be defined before all includes to guarantee RTLD_NEXT availability.

_GNU_SOURCE is defined at Line 28, after #include "hip.h" at Line 19. While hip.h is unlikely to pull in <dlfcn.h> transitively today, any future change to that header could silently break RTLD_NEXT resolution. Moving the define (or a #ifndef guard) before all #include directives is the robust fix.

Suggested reorder
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+
 `#include` "hip.h"
 
 `#if` defined(_WIN32) && !defined(__CYGWIN__)
 `#error` "hip_stub is currently POSIX-only ..."
 `#endif`
 
-#ifndef _GNU_SOURCE
-#define _GNU_SOURCE
-#endif
-
 `#include` <dlfcn.h>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/target/stubs/hip.cc` around lines 19 - 31, Move the feature-test macro
definition so _GNU_SOURCE is defined before any includes: add the
`#ifndef/_define` _GNU_SOURCE block at the top of this translation unit (before
the `#include` "hip.h") so RTLD_NEXT and other GNU extensions are guaranteed; keep
the existing guard logic (i.e., only define if not already defined) and retain
the subsequent inclusion of <dlfcn.h> and other headers unchanged so symbols
used by the hip stub (RTLD_NEXT, dlopen, etc.) resolve correctly.
ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6440e71 and ee485ea.

📒 Files selected for processing (1)
  • src/target/stubs/hip.cc
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/target/stubs/hip.cc`:
- Around line 240-398: The HIP stub definitions omit the TILELANG_HIP_STUB_API
visibility attribute (e.g., hipGetErrorName, hipGetErrorString, hipGetLastError,
hipSetDevice, hipMalloc, hipFree, hipMemcpy, hipModuleLaunchKernel, etc.),
causing inconsistent linkage across toolchains; fix this by adding the
TILELANG_HIP_STUB_API prefix to every HIP function definition in this file so
each definition matches its extern "C" declaration (apply to all functions
implemented via HIPDriverAPI::get()->...).

---

Duplicate comments:
In `@src/target/stubs/hip.cc`:
- Around line 77-98: TryLoadLibAmdHip64 currently returns RTLD_DEFAULT/RTLD_NEXT
which are null on glibc and get treated as "not found"; change
TryLoadLibAmdHip64 to return non-null sentinels (e.g.
reinterpret_cast<void*>(0x1) for RTLD_DEFAULT and reinterpret_cast<void*>(0x2)
for RTLD_NEXT) when those dlsym checks succeed, and return actual dlopen handles
as before; then update GetSymbol and CreateHIPDriverAPI to detect these sentinel
values and map them back to RTLD_DEFAULT/RTLD_NEXT when calling dlsym or
constructing the driver API; apply the same sentinel-return and mapping changes
to TryLoadLibHsaRuntime so get_handle/is_available/get treat global namespace
finds as available rather than nullptr.
- Around line 100-145: CreateHIPDriverAPI can return a partially-populated
HIPDriverAPI with all-null pointers when a LOOKUP fails, but is_available()
currently only checks get_handle() != nullptr so the singleton appears
available; update availability check instead of throwing: modify is_available()
to also verify a representative function pointer from the singleton (for example
ensure get()->hipGetDeviceCount_ != nullptr) so callers won't receive an
all-null API, and keep CreateHIPDriverAPI behavior unchanged; reference
functions/objects: CreateHIPDriverAPI, is_available(), get_handle(), get(), and
hipGetDeviceCount_.
- Around line 216-225: The get() function can return a singleton whose function
pointers are all null even though is_available() (which checks get_handle()) is
true; update HIPDriverAPI::get to validate the created singleton returned by
CreateHIPDriverAPI() before returning it by checking a representative function
pointer (e.g., a core API pointer on the HIPDriverAPI struct such as
hipGetDevice or the first essential method) and if that pointer is null throw a
runtime_error with the existing ROCm-not-found message; this ensures
CreateHIPDriverAPI() failed symbol resolution is detected and prevents returning
a partially-null singleton even when get_handle() is non-null.

---

Nitpick comments:
In `@src/target/stubs/hip.cc`:
- Around line 198-205: The stub MissingHsaError() currently returns a bare 1
which is not a defined hsa_status_t enumerator; change it to return a well-known
HSA error value (HSA_STATUS_ERROR / 0x1000) so the intent is clear. In the
TILELANG_HAS_HSA_HEADERS branch return HSA_STATUS_ERROR; in the no-headers
branch return static_cast<hsa_status_t>(0x1000) or define a local constexpr
kHsaStatusError = 0x1000 and return that; update the function MissingHsaError()
accordingly.
- Around line 19-31: Move the feature-test macro definition so _GNU_SOURCE is
defined before any includes: add the `#ifndef/_define` _GNU_SOURCE block at the
top of this translation unit (before the `#include` "hip.h") so RTLD_NEXT and
other GNU extensions are guaranteed; keep the existing guard logic (i.e., only
define if not already defined) and retain the subsequent inclusion of <dlfcn.h>
and other headers unchanged so symbols used by the hip stub (RTLD_NEXT, dlopen,
etc.) resolve correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant