feat: add ROCm/HIP stub libraries for lazy loading (mirrors CUDA stubs)#1867
feat: add ROCm/HIP stub libraries for lazy loading (mirrors CUDA stubs)#1867LeiWang1999 wants to merge 2 commits intotile-ai:mainfrom
Conversation
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds ROCm/HIP lazy-loading stubs and build wiring: new CMake option Changes
Sequence Diagram(s)sequenceDiagram
participant App
participant hip_stub as "hip_stub (stub lib)"
participant Loader as "dlopen/dlsym"
participant libamd as "libamdhip64.so"
participant HSA as "HSA runtime (optional)"
App->>hip_stub: call hip API (e.g., hipMalloc)
alt symbol already loaded in process
hip_stub->>hip_stub: resolve via RTLD_DEFAULT/RTLD_NEXT
hip_stub->>App: call resolved function and return
else symbol not present
hip_stub->>Loader: dlopen("libamdhip64.so")
Loader-->>libamd: load library
hip_stub->>Loader: dlsym("hipMalloc")
Loader-->>hip_stub: function ptr
alt HSA init required
hip_stub->>HSA: lazy hsa_init wrapper
HSA-->>hip_stub: status
end
hip_stub->>libamd: call hipMalloc(...)
libamd-->>hip_stub: result
hip_stub->>App: return result
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (4)
src/target/stubs/hip.cc (1)
173-180:MissingHsaError(): both#ifbranches are equivalent — same dead-conditional pattern ashiprtc.cc.
static_cast<hsa_status_t>(1)and plain1are identical sincehsa_status_tistypedef int. Collapse to a single branch.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/target/stubs/hip.cc` around lines 173 - 180, The two `#if` branches for MissingHsaError() are redundant because both return the same value; remove the conditional compilation and provide a single static hsa_status_t MissingHsaError() implementation (e.g., return 1) replacing the `#if/`#else/#endif block so there is only one definition; update any surrounding comments accordingly and ensure the function signature remains static hsa_status_t MissingHsaError().src/target/stubs/hiprtc.cc (2)
95-104:GetSymboltemplate duplicated withhip.cc.This helper is identical to the one in
hip.cc(lines 67–76). Consider extracting it into a shared internal header (e.g.,stubs/dlopen_util.h) to keep the two stub implementations in sync.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/target/stubs/hiprtc.cc` around lines 95 - 104, The GetSymbol template in hiprtc.cc is duplicated in hip.cc; extract this helper into a shared internal header (e.g., create stubs/dlopen_util.h) containing the template T GetSymbol(void* handle, const char* name) and include that header from both hiprtc.cc and hip.cc, then remove the duplicate definitions from each .cc so both files use the single shared implementation; ensure the header is internal (no public API) and has include guards or pragma once.
159-165:MissingLibraryError(): both#ifbranches are identical — dead conditional.Both paths return
HIPRTC_ERROR_INTERNAL_ERROR, making the preprocessor guard pointless. If the intent was to use a more specific error code when real headers are available (e.g.,HIPRTC_ERROR_BUILTIN_OPERATION_FAILURE), this should be updated; otherwise, collapse to a single return.Proposed fix
hiprtcResult MissingLibraryError() { -#if TILELANG_HAS_HIPRTC_HEADERS - return HIPRTC_ERROR_INTERNAL_ERROR; -#else return HIPRTC_ERROR_INTERNAL_ERROR; -#endif }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/target/stubs/hiprtc.cc` around lines 159 - 165, The preprocessor guard in MissingLibraryError() is dead because both branches return HIPRTC_ERROR_INTERNAL_ERROR; either collapse the `#if/`#else/#endif and return HIPRTC_ERROR_INTERNAL_ERROR directly from MissingLibraryError(), or if the intent was to return a more specific error when HIPRTC headers are present, change the TILELANG_HAS_HIPRTC_HEADERS branch to return the intended code (e.g., HIPRTC_ERROR_BUILTIN_OPERATION_FAILURE) while keeping the else branch returning HIPRTC_ERROR_INTERNAL_ERROR; update the function MissingLibraryError() accordingly and remove the redundant conditional.CMakeLists.txt (1)
268-277: UnusedTILELANG_HIPRTC_STUB_EXPORTScompile definition forhiprtc_stub.Line 270 defines
TILELANG_HIPRTC_STUB_EXPORTSfor thehiprtc_stubtarget, buthiprtc.cc(Line 62) hardcodesTILELANG_HIPRTC_STUB_APIas__attribute__((visibility("default")))without checking this macro. Either wire up the define inhiprtc.cc(aship.hdoes forTILELANG_HIP_STUB_EXPORTS) or remove the unused definition.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CMakeLists.txt` around lines 268 - 277, The TILELANG_HIPRTC_STUB_EXPORTS compile definition added to the hiprtc_stub target is unused because hiprtc.cc currently hardcodes TILELANG_HIPRTC_STUB_API as __attribute__((visibility("default"))) instead of honoring the export macro; either remove the add of TILELANG_HIPRTC_STUB_EXPORTS from the CMake target or update hiprtc.cc to mirror hip.h’s pattern (where TILELANG_HIP_STUB_EXPORTS controls TILELANG_HIP_STUB_API) by making TILELANG_HIPRTC_STUB_API conditional on TILELANG_HIPRTC_STUB_EXPORTS (and on Windows visibility if needed) so the compile definition is actually respected by the symbol export logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/target/stubs/hip.cc`:
- Around line 78-99: TryLoadLibAmdHip64 (and similarly TryLoadLibHsaRuntime)
currently returns RTLD_DEFAULT/RTLD_NEXT which can be null on some platforms and
gets misinterpreted as "not found" by CreateHIPDriverAPI; change the logic so
that when a global/next symbol is detected (the dlsym checks that find
"hipGetErrorString"), the loader returns a non-null sentinel handle (for example
a distinct pointer value like reinterpret_cast<void*>(1)) instead of
RTLD_DEFAULT/RTLD_NEXT, and update CreateHIPDriverAPI to treat that sentinel as
a valid "use global namespace" result and perform symbol lookups with
RTLD_DEFAULT/RTLD_NEXT as appropriate; ensure the same sentinel and handling
change is applied to TryLoadLibHsaRuntime and places that check for handle ==
nullptr.
- Around line 108-117: GetSymbol currently consumes dlerror() so the subsequent
dlerror() in the LOOKUP_REQUIRED macro always yields "unknown"; modify GetSymbol
to also capture and return the dlerror() string (e.g., via a returned
std::optional<std::string> or an out parameter) when it fails to find a symbol,
then update LOOKUP_REQUIRED (used by TILELANG_LIBHIP_API_REQUIRED) to use that
returned error value when api.name##_ is nullptr so the thrown
std::runtime_error includes the real dlerror() message instead of "unknown".
In `@src/target/stubs/hiprtc.cc`:
- Around line 72-93: The CreateHIPRTCAPI path incorrectly treats
RTLD_DEFAULT/RTLD_NEXT (which can be NULL on glibc) as a missing handle; update
CreateHIPRTCAPI so it does not bail out when handle == nullptr and instead
always calls GetSymbol/GetProcAddress with the returned handle from
TryLoadLibHiprtc (so dlsym(NULL, name) can resolve global symbols), i.e., remove
the early "if (handle == nullptr) return empty API" check and allow GetSymbol to
perform dlsym against RTLD_DEFAULT/RTLD_NEXT/real dlopen handles; apply the same
change to CreateHIPDriverAPI for TryLoadLibAmdHip64 in src/target/stubs/hip.cc.
---
Nitpick comments:
In `@CMakeLists.txt`:
- Around line 268-277: The TILELANG_HIPRTC_STUB_EXPORTS compile definition added
to the hiprtc_stub target is unused because hiprtc.cc currently hardcodes
TILELANG_HIPRTC_STUB_API as __attribute__((visibility("default"))) instead of
honoring the export macro; either remove the add of TILELANG_HIPRTC_STUB_EXPORTS
from the CMake target or update hiprtc.cc to mirror hip.h’s pattern (where
TILELANG_HIP_STUB_EXPORTS controls TILELANG_HIP_STUB_API) by making
TILELANG_HIPRTC_STUB_API conditional on TILELANG_HIPRTC_STUB_EXPORTS (and on
Windows visibility if needed) so the compile definition is actually respected by
the symbol export logic.
In `@src/target/stubs/hip.cc`:
- Around line 173-180: The two `#if` branches for MissingHsaError() are redundant
because both return the same value; remove the conditional compilation and
provide a single static hsa_status_t MissingHsaError() implementation (e.g.,
return 1) replacing the `#if/`#else/#endif block so there is only one definition;
update any surrounding comments accordingly and ensure the function signature
remains static hsa_status_t MissingHsaError().
In `@src/target/stubs/hiprtc.cc`:
- Around line 95-104: The GetSymbol template in hiprtc.cc is duplicated in
hip.cc; extract this helper into a shared internal header (e.g., create
stubs/dlopen_util.h) containing the template T GetSymbol(void* handle, const
char* name) and include that header from both hiprtc.cc and hip.cc, then remove
the duplicate definitions from each .cc so both files use the single shared
implementation; ensure the header is internal (no public API) and has include
guards or pragma once.
- Around line 159-165: The preprocessor guard in MissingLibraryError() is dead
because both branches return HIPRTC_ERROR_INTERNAL_ERROR; either collapse the
`#if/`#else/#endif and return HIPRTC_ERROR_INTERNAL_ERROR directly from
MissingLibraryError(), or if the intent was to return a more specific error when
HIPRTC headers are present, change the TILELANG_HAS_HIPRTC_HEADERS branch to
return the intended code (e.g., HIPRTC_ERROR_BUILTIN_OPERATION_FAILURE) while
keeping the else branch returning HIPRTC_ERROR_INTERNAL_ERROR; update the
function MissingLibraryError() accordingly and remove the redundant conditional.
ec5f0bc to
83ad99d
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (3)
src/target/stubs/hiprtc.cc (1)
158-164: Dead conditional — both branches of#if TILELANG_HAS_HIPRTC_HEADERSare identical.Both the
#ifand#elsebranches returnHIPRTC_ERROR_INTERNAL_ERROR. The preprocessor conditional serves no purpose here.Simplify
hiprtcResult MissingLibraryError() { -#if TILELANG_HAS_HIPRTC_HEADERS - return HIPRTC_ERROR_INTERNAL_ERROR; -#else return HIPRTC_ERROR_INTERNAL_ERROR; -#endif }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/target/stubs/hiprtc.cc` around lines 158 - 164, The function MissingLibraryError contains a redundant preprocessor conditional where both the `#if` and `#else` branches return HIPRTC_ERROR_INTERNAL_ERROR; simplify by removing the `#if/`#else/#endif and have MissingLibraryError() directly return HIPRTC_ERROR_INTERNAL_ERROR (locate the function by name MissingLibraryError in hiprtc.cc).src/target/stubs/hip.cc (1)
172-179: Minor:MissingHsaErrorbranches are functionally identical.The
#if TILELANG_HAS_HSA_HEADERSbranch doesstatic_cast<hsa_status_t>(1)and the#elsebranch returns1. Sincehsa_status_tisintin the fallback (line 43), these are identical.Simplify
-#if TILELANG_HAS_HSA_HEADERS -static hsa_status_t MissingHsaError() { - // Any non-success value makes TVM treat ROCm as not existing. - return static_cast<hsa_status_t>(1); -} -#else -static hsa_status_t MissingHsaError() { return 1; } -#endif +static hsa_status_t MissingHsaError() { + // Any non-success value makes TVM treat ROCm as not existing. + return static_cast<hsa_status_t>(1); +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/target/stubs/hip.cc` around lines 172 - 179, The two conditional branches for MissingHsaError are identical; simplify by providing a single definition that returns 1 (or static_cast<hsa_status_t>(1) if you prefer) unconditionally, removing the `#if` TILELANG_HAS_HSA_HEADERS / `#else` / `#endif` wrapping so the function MissingHsaError() is defined once with a consistent return value; keep the function signature and name exactly as MissingHsaError to preserve callers.src/target/stubs/vendor/hip_runtime.h (1)
12-18: Nit:#pragma onceis redundant after the include-guard#error.The
#errordirective on lines 14-15 already prevents this header from being included directly. The#pragma onceon line 18 won't add meaningful protection since the file can only be reached throughhip.hwhich defines/undefines the guard each time. This is harmless, just unnecessary.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/target/stubs/vendor/hip_runtime.h` around lines 12 - 18, The header vendor/hip_runtime.h redundantly contains a `#pragma` once after the guarding check using the macro _TILELANG_HIP_STUB_INCLUDE_GUARD; remove the unnecessary `#pragma` once from vendor/hip_runtime.h so the inclusion policy relies solely on the existing _TILELANG_HIP_STUB_INCLUDE_GUARD check (which is set/cleared by the stub wrapper hip.h).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CMakeLists.txt`:
- Around line 236-291: target_include_directories is being called with
${ROCM_INCLUDE_DIRS} unconditionally for the hip_stub and hiprtc_stub targets
and appended unconditionally to TILE_LANG_INCLUDES, which can be empty/unset;
wrap the include-directory usage in presence checks (e.g. if(ROCM_INCLUDE_DIRS)
... endif()) so only call target_include_directories(hip_stub PRIVATE
${ROCM_INCLUDE_DIRS}) and target_include_directories(hiprtc_stub PRIVATE
${ROCM_INCLUDE_DIRS}) when ROCM_INCLUDE_DIRS is non-empty, and similarly only
append ROCM_INCLUDE_DIRS to TILE_LANG_INCLUDES when it is set.
In `@src/target/stubs/hip.cc`:
- Line 188: is_available() incorrectly treats a NULL handle from get_handle() as
"not available" even when get_handle() returned RTLD_DEFAULT (which is NULL on
glibc); change is_available() so it returns true when the HIP runtime was
successfully located via RTLD_DEFAULT. Concretely, either have get_handle() set
and expose an additional boolean (e.g. handle_is_rtld_default or
handle_resolved) when it resolves symbols via RTLD_DEFAULT, or make
is_available() check for a resolved-symbol presence (e.g. a known function
pointer populated by get_handle() like hipInit) instead of only get_handle() !=
nullptr; update is_available() to use that flag or symbol-check so get() no
longer throws when RTLD_DEFAULT was used.
---
Duplicate comments:
In `@src/target/stubs/hip.cc`:
- Around line 77-98: TryLoadLibAmdHip64 (and similarly TryLoadLibHsaRuntime)
returns RTLD_DEFAULT/RTLD_NEXT which can be NULL on glibc and is incorrectly
treated as failure by CreateHIPDriverAPI's nullptr check; change these helpers
to return a non-null sentinel value when they find the symbol in the global
namespace (e.g. a unique static void* sentinel) instead of returning
RTLD_DEFAULT/RTLD_NEXT, and update CreateHIPDriverAPI to treat that sentinel as
"library available in global namespace" (resolve symbols via dlsym(RTLD_DEFAULT,
...) when sentinel is seen); apply the same sentinel-based change to
TryLoadLibHsaRuntime and ensure comparisons against the address of
hipGetErrorString remain intact to detect already-loaded symbols, falling back
to dlopen(kLibHipPaths...) only when neither global symbol nor sentinel is set.
- Around line 107-116: GetSymbol currently clears and then consumes dlerror(),
so when LOOKUP_REQUIRED (macro) checks dlerror() it always sees nullptr; update
GetSymbol (or add a companion GetSymbolError helper) to capture and store the
dlerror() text before returning nullptr (e.g., store in a thread_local string or
return an error string alongside the symbol), then change LOOKUP_REQUIRED to use
that captured error (from the new GetSymbol error accessor) instead of calling
dlerror() directly; reference GetSymbol, LOOKUP_REQUIRED and
TILELANG_LIBHIP_API_REQUIRED in your changes.
In `@src/target/stubs/hiprtc.cc`:
- Around line 72-93: TryLoadLibHiprtc currently returns RTLD_DEFAULT (which is
NULL on glibc) and gets treated as "not found" by CreateHIPRTCAPI; change
TryLoadLibHiprtc to return a non-NULL sentinel (e.g., a unique static void*
value) when the global symbol lookup succeeded via dlsym(RTLD_DEFAULT) or
dlsym(RTLD_NEXT), and update CreateHIPRTCAPI (and any other checks around the
returned handle) to detect that sentinel and treat it as the global-scope result
(i.e., call GetSymbol/dlsym with nullptr or otherwise resolve symbols from
global scope) instead of treating it as a missing handle; refer to
TryLoadLibHiprtc, CreateHIPRTCAPI, GetSymbol and kLibHiprtcPaths when making the
change.
---
Nitpick comments:
In `@src/target/stubs/hip.cc`:
- Around line 172-179: The two conditional branches for MissingHsaError are
identical; simplify by providing a single definition that returns 1 (or
static_cast<hsa_status_t>(1) if you prefer) unconditionally, removing the `#if`
TILELANG_HAS_HSA_HEADERS / `#else` / `#endif` wrapping so the function
MissingHsaError() is defined once with a consistent return value; keep the
function signature and name exactly as MissingHsaError to preserve callers.
In `@src/target/stubs/hiprtc.cc`:
- Around line 158-164: The function MissingLibraryError contains a redundant
preprocessor conditional where both the `#if` and `#else` branches return
HIPRTC_ERROR_INTERNAL_ERROR; simplify by removing the `#if/`#else/#endif and have
MissingLibraryError() directly return HIPRTC_ERROR_INTERNAL_ERROR (locate the
function by name MissingLibraryError in hiprtc.cc).
In `@src/target/stubs/vendor/hip_runtime.h`:
- Around line 12-18: The header vendor/hip_runtime.h redundantly contains a
`#pragma` once after the guarding check using the macro
_TILELANG_HIP_STUB_INCLUDE_GUARD; remove the unnecessary `#pragma` once from
vendor/hip_runtime.h so the inclusion policy relies solely on the existing
_TILELANG_HIP_STUB_INCLUDE_GUARD check (which is set/cleared by the stub wrapper
hip.h).
83ad99d to
7717ee6
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/target/stubs/hip.cc`:
- Around line 100-145: CreateHIPDriverAPI currently returns an all-null
HIPDriverAPI singleton if any LOOKUP fails, which makes
HIPDriverAPI::is_available()/get() incorrectly report success and later cause
null-function-pointer crashes; change the logic so that CreateHIPDriverAPI (or
the singleton initialization) detects partial loads and fails early: after
resolving symbols with the LOOKUP macro (or by checking the populated api),
verify that at least one representative function pointer (e.g. hipSetDevice_,
hipGetDeviceCount_, or hipGetLastError_) is non-null and if not return an
empty/invalid HIPDriverAPI, and also update HIPDriverAPI::is_available() or
get() to confirm that the handle is non-null AND that the representative
function pointer(s) are non-null before claiming availability.
---
Duplicate comments:
In `@CMakeLists.txt`:
- Around line 252-253: Wrap the target_include_directories(hip_stub PRIVATE
${ROCM_INCLUDE_DIRS}) call in an explicit CMake check so it only runs when
ROCM_INCLUDE_DIRS is set/non-empty (e.g., if(ROCM_INCLUDE_DIRS) or if(DEFINED
ROCM_INCLUDE_DIRS AND ROCM_INCLUDE_DIRS) ... endif()), leaving
target_compile_definitions(TILELANG_HIP_STUB_EXPORTS) unchanged; apply the same
guard to the other target_include_directories invocation elsewhere in the file
that uses ${ROCM_INCLUDE_DIRS} so you never pass an empty list into
target_include_directories.
In `@src/target/stubs/hip.cc`:
- Around line 214-224: The is_available() check using get_handle() fails when
the HIP runtime was resolved via RTLD_DEFAULT; modify HIPDriverAPI so
CreateHIPDriverAPI marks success by setting a new bool member (e.g.,
HIPDriverAPI::populated) when it successfully populates the API, have
is_available() return populated (not get_handle()!=nullptr), and update
HIPDriverAPI::get() to rely on that populated flag when deciding whether to
throw; locate CreateHIPDriverAPI, HIPDriverAPI::is_available, and
HIPDriverAPI::get to implement this change.
- Around line 77-105: TryLoadLibAmdHip64 (and likewise
TryLoadLibHsaRuntime/GetLibHsaHandle) must not return nullptr when a global
pseudo-handle (RTLD_DEFAULT/RTLD_NEXT) was found because glibc defines those as
((void*)0); instead return a non-null sentinel value and keep track of which
pseudo-handle was chosen. Update TryLoadLibAmdHip64/TryLoadLibHsaRuntime to
return a unique non-null sentinel for RTLD_DEFAULT and RTLD_NEXT cases, and
modify CreateHIPDriverAPI (and any code using HIPDriverAPI::get_handle) to
detect those sentinels and translate them back to the corresponding
pseudo-handle when calling dlsym/LOOKUP so the null-check (if (handle ==
nullptr)) no longer incorrectly treats a global-loaded library as absent. Ensure
the sentinel is documented and reused consistently between loader functions and
the dlsym callers.
Add HIP and HIPRTC stub libraries mirroring the existing CUDA stubs. This enables: 1. Importing TileLang on CPU-only machines without ROCm installed 2. Avoiding ROCm version conflicts by loading libamdhip64.so/libhiprtc.so dynamically at runtime via dlopen/dlsym 3. Building portable wheels that work across ROCm versions New files: - src/target/stubs/hip.h: Stub header with HIPDriverAPI singleton - src/target/stubs/hip.cc: Lazy-loading implementation for libamdhip64.so, includes minimal HSA wrappers (hsa_init/hsa_shut_down) - src/target/stubs/hiprtc.cc: Lazy-loading implementation for libhiprtc.so - src/target/stubs/vendor/hip_runtime.h: Minimal vendored HIP types for building without ROCm SDK CMake changes: - Add TILELANG_USE_HIP_STUBS option (default ON, POSIX-only) - Build hip_stub and hiprtc_stub shared libraries - Override ROCM_HIPHCC_LIBRARY to use hip_stub - Link tilelang/tvm against stubs instead of system HIP libs Documentation: - Update docs/runtime_internals/stubs.md with ROCm stub details
7717ee6 to
6440e71
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/target/stubs/hiprtc.cc (1)
158-164: Redundant#if/#else— both branches are identical.Both the
TILELANG_HAS_HIPRTC_HEADERSand non-headers paths returnHIPRTC_ERROR_INTERNAL_ERROR. The conditional serves no purpose here.Proposed fix
hiprtcResult MissingLibraryError() { -#if TILELANG_HAS_HIPRTC_HEADERS - return HIPRTC_ERROR_INTERNAL_ERROR; -#else return HIPRTC_ERROR_INTERNAL_ERROR; -#endif }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/target/stubs/hiprtc.cc` around lines 158 - 164, The preprocessor conditional in MissingLibraryError() is redundant because both branches return HIPRTC_ERROR_INTERNAL_ERROR; remove the `#if/`#else/#endif surrounding TILELANG_HAS_HIPRTC_HEADERS and simplify MissingLibraryError() to unconditionally return HIPRTC_ERROR_INTERNAL_ERROR so the function just returns that constant directly.src/target/stubs/hip.cc (1)
198-205: Redundant#if/#else— both branches return the same value.Both the
TILELANG_HAS_HSA_HEADERSand non-headers paths return1. Simplify to a single unconditional return.Proposed fix
-#if TILELANG_HAS_HSA_HEADERS -static hsa_status_t MissingHsaError() { - // Any non-success value makes TVM treat ROCm as not existing. - return static_cast<hsa_status_t>(1); -} -#else -static hsa_status_t MissingHsaError() { return 1; } -#endif +static hsa_status_t MissingHsaError() { + // Any non-success value makes TVM treat ROCm as not existing. + return static_cast<hsa_status_t>(1); +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/target/stubs/hip.cc` around lines 198 - 205, The two conditional branches for MissingHsaError are redundant; replace the `#if/`#else/#endif with a single unconditional function definition for static hsa_status_t MissingHsaError() that returns the non-success value (use static_cast<hsa_status_t>(1) for type safety) and remove the TILELANG_HAS_HSA_HEADERS conditional entirely so only one implementation remains.src/target/stubs/hip.h (1)
54-83: X-macroTILELANG_LIBHIP_API_REQUIREDis defined but not used for symbol resolution.The X-macro enumerates all required HIP APIs, but
hip.ccmanually lists eachLOOKUP(member, "symbol")call instead of expanding this macro. This means adding a new API requires updates in three places: the X-macro, the struct members, and the LOOKUP calls. Consider using the X-macro to drive symbol resolution (like the CUDA stubs do) to keep them in sync.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/target/stubs/hip.h` around lines 54 - 83, The TILELANG_LIBHIP_API_REQUIRED X-macro lists all HIP symbols but isn't used to perform dynamic symbol lookup; instead hip.cc has manual LOOKUP(member, "symbol") calls that diverge from the X-macro. Modify hip.cc to drive symbol resolution from the X-macro by invoking TILELANG_LIBHIP_API_REQUIRED(LOOKUP) (or creating a small adapter macro if names differ) so the same list expands to both the struct members and the LOOKUP(...) calls; update any existing manual LOOKUP(...) lines to be removed and replaced by the X-macro expansion, ensuring the macro is visible where LOOKUP runs and that the struct definition and symbol-resolving code use the same X-macro identifier (TILELANG_LIBHIP_API_REQUIRED) so adding an API only requires one change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@CMakeLists.txt`:
- Around line 236-291: The calls to target_include_directories for the targets
hip_stub and hiprtc_stub assume ROCM_INCLUDE_DIRS is set; guard each
target_include_directories invocation so it only runs when ROCM_INCLUDE_DIRS is
non-empty (e.g., wrap the hip_stub and hiprtc_stub include-directory calls in an
if(ROCM_INCLUDE_DIRS) / endif block or similar check), leaving the rest of the
target configuration unchanged; this ensures safe behavior when find_rocm()
leaves ROCM_INCLUDE_DIRS unset.
In `@src/target/stubs/hip.cc`:
- Around line 77-105: TryLoadLibAmdHip64 (and TryLoadLibHsaRuntime) can return
RTLD_DEFAULT which equals NULL on glibc, so CreateHIPDriverAPI, CreateHSAAPI and
is_available() misinterpret a successful global resolution as "not found"; fix
by returning a small struct (e.g. {bool found; void* handle}) from
TryLoadLibAmdHip64/TryLoadLibHsaRuntime that sets found=true whenever a symbol
or RTLD_DEFAULT/RTLD_NEXT resolution succeeded (even if handle==nullptr), then
update HIPDriverAPI::get_handle()/HSA API equivalent (and
CreateHIPDriverAPI/CreateHSAAPI, is_available(), get()) to consume the struct
and use result.found for availability checks while still using result.handle
when actually calling dlopen/dlsym.
In `@src/target/stubs/hiprtc.cc`:
- Around line 122-127: CreateHIPRTCAPI() currently returns early when
GetLibHiprtcHandle() yields nullptr, which incorrectly treats RTLD_DEFAULT
((void*)0) as "not found" on glibc; remove the early return and allow GetSymbol
to be called with the handle (including NULL) so dlsym(NULL, name) can resolve
global symbols. Specifically, in CreateHIPRTCAPI() (and any logic using
GetLibHiprtcHandle()), drop the if (handle == nullptr) return api; check and
proceed to call GetSymbol(handle, "<symbol>") for each loader entry so
LOOKUP_REQUIRED/GetSymbol can correctly handle NULL handles and populate
HIPRTCAPI api{} as intended.
---
Nitpick comments:
In `@src/target/stubs/hip.cc`:
- Around line 198-205: The two conditional branches for MissingHsaError are
redundant; replace the `#if/`#else/#endif with a single unconditional function
definition for static hsa_status_t MissingHsaError() that returns the
non-success value (use static_cast<hsa_status_t>(1) for type safety) and remove
the TILELANG_HAS_HSA_HEADERS conditional entirely so only one implementation
remains.
In `@src/target/stubs/hip.h`:
- Around line 54-83: The TILELANG_LIBHIP_API_REQUIRED X-macro lists all HIP
symbols but isn't used to perform dynamic symbol lookup; instead hip.cc has
manual LOOKUP(member, "symbol") calls that diverge from the X-macro. Modify
hip.cc to drive symbol resolution from the X-macro by invoking
TILELANG_LIBHIP_API_REQUIRED(LOOKUP) (or creating a small adapter macro if names
differ) so the same list expands to both the struct members and the LOOKUP(...)
calls; update any existing manual LOOKUP(...) lines to be removed and replaced
by the X-macro expansion, ensuring the macro is visible where LOOKUP runs and
that the struct definition and symbol-resolving code use the same X-macro
identifier (TILELANG_LIBHIP_API_REQUIRED) so adding an API only requires one
change.
In `@src/target/stubs/hiprtc.cc`:
- Around line 158-164: The preprocessor conditional in MissingLibraryError() is
redundant because both branches return HIPRTC_ERROR_INTERNAL_ERROR; remove the
`#if/`#else/#endif surrounding TILELANG_HAS_HIPRTC_HEADERS and simplify
MissingLibraryError() to unconditionally return HIPRTC_ERROR_INTERNAL_ERROR so
the function just returns that constant directly.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
src/target/stubs/hip.cc (3)
77-98:⚠️ Potential issue | 🔴 Critical
RTLD_DEFAULTisNULLon glibc — returning it causes downstreamnullptrchecks to treat HIP as absent.When real HIP is loaded globally (e.g., PyTorch ROCm imported first),
dlsym(RTLD_DEFAULT, …)succeeds andTryLoadLibAmdHip64()returnsRTLD_DEFAULT(Line 82), which is((void*)0)on glibc. This propagates throughget_handle()→is_available()→get(), all of which treatnullptras "not found," so every wrapper throws even though HIP is actually available.The idiomatic fix is to use a non-null sentinel (e.g.
reinterpret_cast<void*>(0x1)) for the "use global namespace" case and map it back toRTLD_DEFAULT/RTLD_NEXTinGetSymbol/CreateHIPDriverAPI. The same issue applies toTryLoadLibHsaRuntime(Line 155-173).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/target/stubs/hip.cc` around lines 77 - 98, TryLoadLibAmdHip64 currently returns RTLD_DEFAULT/RTLD_NEXT which are null on glibc and get treated as "not found"; change TryLoadLibAmdHip64 to return non-null sentinels (e.g. reinterpret_cast<void*>(0x1) for RTLD_DEFAULT and reinterpret_cast<void*>(0x2) for RTLD_NEXT) when those dlsym checks succeed, and return actual dlopen handles as before; then update GetSymbol and CreateHIPDriverAPI to detect these sentinel values and map them back to RTLD_DEFAULT/RTLD_NEXT when calling dlsym or constructing the driver API; apply the same sentinel-return and mapping changes to TryLoadLibHsaRuntime so get_handle/is_available/get treat global namespace finds as available rather than nullptr.
100-145:⚠️ Potential issue | 🔴 CriticalPartial symbol resolution yields an all-null singleton while
is_available()still returnstrue.If
dlopensucceeds but anyLOOKUPcall fails (e.g., version mismatch), the macro returns a freshHIPDriverAPI{}at Line 110 with every pointer null. The static singleton (Line 217) captures this. However,is_available()(Line 214) only checksget_handle() != nullptr, so it returnstrueandget()hands back a pointer to the all-null struct. Every subsequent wrapper call then dereferences a null function pointer.Consider either throwing from
CreateHIPDriverAPIon partial failure, or havingis_available()additionally verify a representative function pointer (e.g.,hipGetDeviceCount_).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/target/stubs/hip.cc` around lines 100 - 145, CreateHIPDriverAPI can return a partially-populated HIPDriverAPI with all-null pointers when a LOOKUP fails, but is_available() currently only checks get_handle() != nullptr so the singleton appears available; update availability check instead of throwing: modify is_available() to also verify a representative function pointer from the singleton (for example ensure get()->hipGetDeviceCount_ != nullptr) so callers won't receive an all-null API, and keep CreateHIPDriverAPI behavior unchanged; reference functions/objects: CreateHIPDriverAPI, is_available(), get_handle(), get(), and hipGetDeviceCount_.
216-225:⚠️ Potential issue | 🟠 Major
get()can return a pointer to a partially/fully-null singleton without ever throwing.If
get_handle()returned a valid non-null handle fromdlopenbutCreateHIPDriverAPI()failed to resolve symbols (returning an all-nullHIPDriverAPI{}), Line 218'sis_available()check passes becauseget_handle()is non-null. The caller then gets a pointer to the all-null singleton and crashes on the first function-pointer call.A lightweight guard would be to check a representative pointer:
Suggested improvement
HIPDriverAPI *HIPDriverAPI::get() { static HIPDriverAPI singleton = CreateHIPDriverAPI(); - if (!is_available()) { + if (!is_available() || singleton.hipGetDeviceCount_ == nullptr) { throw std::runtime_error( "HIP runtime library (libamdhip64.so) not found. " "Install ROCm (or import a ROCm-enabled framework like PyTorch) before " "using TileLang's ROCm backend."); } return &singleton; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/target/stubs/hip.cc` around lines 216 - 225, The get() function can return a singleton whose function pointers are all null even though is_available() (which checks get_handle()) is true; update HIPDriverAPI::get to validate the created singleton returned by CreateHIPDriverAPI() before returning it by checking a representative function pointer (e.g., a core API pointer on the HIPDriverAPI struct such as hipGetDevice or the first essential method) and if that pointer is null throw a runtime_error with the existing ROCm-not-found message; this ensures CreateHIPDriverAPI() failed symbol resolution is detected and prevents returning a partially-null singleton even when get_handle() is non-null.
🧹 Nitpick comments (2)
src/target/stubs/hip.cc (2)
198-205: Consider using a well-known HSA error code instead of a bare1.
MissingHsaError()returns1, which is not a definedhsa_status_tenumerator. While any non-zero value makes TVM skip ROCm, usingHSA_STATUS_ERROR(0x1000) in theTILELANG_HAS_HSA_HEADERSbranch would be more self-documenting and avoids relying on an out-of-range enum cast.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/target/stubs/hip.cc` around lines 198 - 205, The stub MissingHsaError() currently returns a bare 1 which is not a defined hsa_status_t enumerator; change it to return a well-known HSA error value (HSA_STATUS_ERROR / 0x1000) so the intent is clear. In the TILELANG_HAS_HSA_HEADERS branch return HSA_STATUS_ERROR; in the no-headers branch return static_cast<hsa_status_t>(0x1000) or define a local constexpr kHsaStatusError = 0x1000 and return that; update the function MissingHsaError() accordingly.
19-31:_GNU_SOURCEshould be defined before all includes to guaranteeRTLD_NEXTavailability.
_GNU_SOURCEis defined at Line 28, after#include "hip.h"at Line 19. Whilehip.his unlikely to pull in<dlfcn.h>transitively today, any future change to that header could silently breakRTLD_NEXTresolution. Moving the define (or a#ifndefguard) before all#includedirectives is the robust fix.Suggested reorder
+#ifndef _GNU_SOURCE +#define _GNU_SOURCE +#endif + `#include` "hip.h" `#if` defined(_WIN32) && !defined(__CYGWIN__) `#error` "hip_stub is currently POSIX-only ..." `#endif` -#ifndef _GNU_SOURCE -#define _GNU_SOURCE -#endif - `#include` <dlfcn.h>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/target/stubs/hip.cc` around lines 19 - 31, Move the feature-test macro definition so _GNU_SOURCE is defined before any includes: add the `#ifndef/_define` _GNU_SOURCE block at the top of this translation unit (before the `#include` "hip.h") so RTLD_NEXT and other GNU extensions are guaranteed; keep the existing guard logic (i.e., only define if not already defined) and retain the subsequent inclusion of <dlfcn.h> and other headers unchanged so symbols used by the hip stub (RTLD_NEXT, dlopen, etc.) resolve correctly.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/target/stubs/hip.cc
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/target/stubs/hip.cc`:
- Around line 240-398: The HIP stub definitions omit the TILELANG_HIP_STUB_API
visibility attribute (e.g., hipGetErrorName, hipGetErrorString, hipGetLastError,
hipSetDevice, hipMalloc, hipFree, hipMemcpy, hipModuleLaunchKernel, etc.),
causing inconsistent linkage across toolchains; fix this by adding the
TILELANG_HIP_STUB_API prefix to every HIP function definition in this file so
each definition matches its extern "C" declaration (apply to all functions
implemented via HIPDriverAPI::get()->...).
---
Duplicate comments:
In `@src/target/stubs/hip.cc`:
- Around line 77-98: TryLoadLibAmdHip64 currently returns RTLD_DEFAULT/RTLD_NEXT
which are null on glibc and get treated as "not found"; change
TryLoadLibAmdHip64 to return non-null sentinels (e.g.
reinterpret_cast<void*>(0x1) for RTLD_DEFAULT and reinterpret_cast<void*>(0x2)
for RTLD_NEXT) when those dlsym checks succeed, and return actual dlopen handles
as before; then update GetSymbol and CreateHIPDriverAPI to detect these sentinel
values and map them back to RTLD_DEFAULT/RTLD_NEXT when calling dlsym or
constructing the driver API; apply the same sentinel-return and mapping changes
to TryLoadLibHsaRuntime so get_handle/is_available/get treat global namespace
finds as available rather than nullptr.
- Around line 100-145: CreateHIPDriverAPI can return a partially-populated
HIPDriverAPI with all-null pointers when a LOOKUP fails, but is_available()
currently only checks get_handle() != nullptr so the singleton appears
available; update availability check instead of throwing: modify is_available()
to also verify a representative function pointer from the singleton (for example
ensure get()->hipGetDeviceCount_ != nullptr) so callers won't receive an
all-null API, and keep CreateHIPDriverAPI behavior unchanged; reference
functions/objects: CreateHIPDriverAPI, is_available(), get_handle(), get(), and
hipGetDeviceCount_.
- Around line 216-225: The get() function can return a singleton whose function
pointers are all null even though is_available() (which checks get_handle()) is
true; update HIPDriverAPI::get to validate the created singleton returned by
CreateHIPDriverAPI() before returning it by checking a representative function
pointer (e.g., a core API pointer on the HIPDriverAPI struct such as
hipGetDevice or the first essential method) and if that pointer is null throw a
runtime_error with the existing ROCm-not-found message; this ensures
CreateHIPDriverAPI() failed symbol resolution is detected and prevents returning
a partially-null singleton even when get_handle() is non-null.
---
Nitpick comments:
In `@src/target/stubs/hip.cc`:
- Around line 198-205: The stub MissingHsaError() currently returns a bare 1
which is not a defined hsa_status_t enumerator; change it to return a well-known
HSA error value (HSA_STATUS_ERROR / 0x1000) so the intent is clear. In the
TILELANG_HAS_HSA_HEADERS branch return HSA_STATUS_ERROR; in the no-headers
branch return static_cast<hsa_status_t>(0x1000) or define a local constexpr
kHsaStatusError = 0x1000 and return that; update the function MissingHsaError()
accordingly.
- Around line 19-31: Move the feature-test macro definition so _GNU_SOURCE is
defined before any includes: add the `#ifndef/_define` _GNU_SOURCE block at the
top of this translation unit (before the `#include` "hip.h") so RTLD_NEXT and
other GNU extensions are guaranteed; keep the existing guard logic (i.e., only
define if not already defined) and retain the subsequent inclusion of <dlfcn.h>
and other headers unchanged so symbols used by the hip stub (RTLD_NEXT, dlopen,
etc.) resolve correctly.
Summary
Add HIP and HIPRTC stub libraries mirroring the existing CUDA stubs in src/target/stubs/. This enables:
New Files
CMake Changes
Design
Follows the same patterns as the existing CUDA stubs:
Tested On
Summary by CodeRabbit
New Features
Documentation