-
Notifications
You must be signed in to change notification settings - Fork 234
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct
Type of Bug
Runtime Error
Component
cuda.bindings
Describe the bug
I am trying to create a fp16 kernel program with nvrtcCompileProgram but it fails to compile with the cuda-python wrapper.
The error log is the following:
RuntimeError: Cuda compile error: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5096): error: identifier "NV_IS_DEVICE" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5097): error: expected a ")"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5096): error: identifier "NV_IF_ELSE_TARGET" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5098): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5101): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5102): error: expected a ";"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5122): error: identifier "NV_IS_DEVICE" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5123): error: expected a ")"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5122): error: identifier "NV_IF_ELSE_TARGET" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5124): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5127): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5128): error: expected a ";"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5140): error: identifier "NV_IS_DEVICE" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5141): error: expected a ")"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5140): error: identifier "NV_IF_ELSE_TARGET" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5142): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5144): error: identifier "tr" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5148): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5149): error: expected a ";"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(264): error: identifier "NV_IS_DEVICE" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(265): error: expected a ")"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(264): error: identifier "NV_IF_ELSE_TARGET" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(266): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(269): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(271): warning #940-D: missing return statement at end of non-void function "__half2::operator=(const __half2 &&)"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(274): error: identifier "NV_IS_DEVICE" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(275): error: expected a ")"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(274): error: identifier "NV_IF_ELSE_TARGET" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(276): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(279): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(281): warning #940-D: missing return statement at end of non-void function "__half2::operator=(const __half2 &)"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(283): error: identifier "NV_IS_DEVICE" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(284): error: expected a ")"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(283): error: identifier "NV_IF_ELSE_TARGET" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(285): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(287): error: identifier "tr" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(291): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(293): warning #940-D: missing return statement at end of non-void function "__half2::operator=(const __half2_raw &)"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(296): error: identifier "NV_IS_DEVICE" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(297): error: expected a ")"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(296): error: identifier "NV_IF_ELSE_TARGET" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(300): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(303): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(305): warning #940-D: missing return statement at end of non-void function "__half2::operator __half2_raw"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: identifier "NV_IS_DEVICE" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: type name is not allowed
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: expected a ")"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: identifier "NV_IF_ELSE_TARGET" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: identifier "val" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: expression must be a modifiable lvalue
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: an asm operand must have scalar type
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: identifier "result" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: identifier "result" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: identifier "result" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(545): warning #12-D: parsing restarts here after previous syntax error
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(545): error: expected a ";"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(549): error: identifier "NV_IS_DEVICE" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(550): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(550): error: expected a ")"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(549): error: identifier "NV_IF_ELSE_TARGET" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(551): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(555): error: identifier "r" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(555): error: identifier "__internal_float2half" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(560): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(562): warning #940-D: missing return statement at end of non-void function "__float2half"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(566): error: identifier "NV_IS_DEVICE" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(567): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(567): error: expected a ")"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(566): error: identifier "NV_IF_ELSE_TARGET" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(568): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(572): error: identifier "r" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(572): error: identifier "__internal_float2half" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(577): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(579): warning #940-D: missing return statement at end of non-void function "__float2half_rn"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(583): error: identifier "NV_IS_DEVICE" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(584): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(584): error: expected a ")"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(583): error: identifier "NV_IF_ELSE_TARGET" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(585): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(589): error: identifier "r" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(589): error: identifier "__internal_float2half" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(591): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(593): warning #940-D: missing return statement at end of non-void function "__float2half_rz"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(597): error: identifier "NV_IS_DEVICE" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(598): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(598): error: expected a ")"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(597): error: identifier "NV_IF_ELSE_TARGET" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(599): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(603): error: identifier "r" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(603): error: identifier "__internal_float2half" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(608): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(610): warning #940-D: missing return statement at end of non-void function "__float2half_rd"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(614): error: identifier "NV_IS_DEVICE" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(615): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(615): error: expected a ")"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(614): error: identifier "NV_IF_ELSE_TARGET" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(616): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(620): error: identifier "r" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(620): error: identifier "__internal_float2half" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(625): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(627): warning #940-D: missing return statement at end of non-void function "__float2half_ru"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(631): error: identifier "NV_IS_DEVICE" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(632): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(631): error: identifier "NV_IF_ELSE_TARGET" is undefined
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(635): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(637): error: expected an expression
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(639): warning #940-D: missing return statement at end of non-void function "__float2half2_rn"
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(630): warning #177-D: variable "val" was declared but never referenced
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(644): error: identifier "NV_PROVIDES_SM_80" is undefined
Error limit reached.
100 errors detected in the compilation of "cuda_kernels\conv2d.cu".
Compilation terminated.
I also tried the cpp solution to compile the same program, it works. So I believe the bug lies in cuda-python wrapper.
#include <cuda.h>
#include <nvrtc.h>
#include <iostream>
#include <vector>
#define NVRTC_CHECK(x) \
do { \
nvrtcResult result = x; \
if (result != NVRTC_SUCCESS) { \
std::cerr << "NVRTC error: " << nvrtcGetErrorString(result) \
<< std::endl; \
} \
} while (0)
#define CUDA_CHECK(x) \
do { \
CUresult result = x; \
if (result != CUDA_SUCCESS) { \
const char* msg; \
cuGetErrorString(result, &msg); \
std::cerr << "CUDA error: " << msg << std::endl; \
} \
} while (0)
int main() {
const char* cuda_source = R"(
#include <cuda_fp16.h>
extern "C" __global__ void half_add_kernel(half* a, half* b, half* c, int n) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < n) {
// 半精度加法运算
c[idx] = a[idx] + b[idx];
}
}
)";
nvrtcProgram program;
NVRTC_CHECK(nvrtcCreateProgram(&program, cuda_source, "half_kernel.cu", 0,
nullptr, nullptr));
const char* opts[] = {
"--gpu-architecture=compute_89", "--fmad=false",
"-IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include"};
std::cout << "Compiling CUDA code with half precision support..."
<< std::endl;
NVRTC_CHECK(nvrtcCompileProgram(program, 3, opts));
size_t log_size;
NVRTC_CHECK(nvrtcGetProgramLogSize(program, &log_size));
char log[1 << 10];
NVRTC_CHECK(nvrtcGetProgramLog(program, log));
std::cout << std::string(log, log + log_size) << std::endl;
size_t ptx_size;
NVRTC_CHECK(nvrtcGetPTXSize(program, &ptx_size));
std::vector<char> ptx(ptx_size);
NVRTC_CHECK(nvrtcGetPTX(program, ptx.data()));
NVRTC_CHECK(nvrtcDestroyProgram(&program));
CUdevice device;
CUcontext context;
CUmodule module;
CUfunction kernel;
CUDA_CHECK(cuInit(0));
CUDA_CHECK(cuDeviceGet(&device, 0));
CUDA_CHECK(cuCtxCreate(&context, 0, 0, device));
CUDA_CHECK(cuModuleLoadDataEx(&module, ptx.data(), 0, nullptr, nullptr));
CUDA_CHECK(cuModuleGetFunction(&kernel, module, "half_add_kernel"));
std::cout << "Compilation successful. PTX code generated." << std::endl;
CUDA_CHECK(cuModuleUnload(module));
CUDA_CHECK(cuCtxDestroy(context));
return 0;
}
How to Reproduce
My code repo: https://github.com/tigert1998/mytorch/blob/main/cuda_utils.py#L120
The kernel is located at: https://github.com/tigert1998/mytorch/blob/main/cuda_kernels/conv2d.cu
Expected behavior
"cuda_fp16.h" should be correctly included with no errors.
Operating System
Windows 11
nvidia-smi output
Sun Aug 17 13:32:58 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.88 Driver Version: 580.88 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4060 ... WDDM | 00000000:01:00.0 On | N/A |
| N/A 55C P5 18W / 95W | 1393MiB / 8188MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
Metadata
Metadata
Assignees
Labels
Type
Projects
Status