Skip to content

[BUG]: cuda_fp16.h is not able to include from nvrtcCompileProgram #844

@tigert1998

Description

@tigert1998

Is this a duplicate?

Type of Bug

Runtime Error

Component

cuda.bindings

Describe the bug

I am trying to create a fp16 kernel program with nvrtcCompileProgram but it fails to compile with the cuda-python wrapper.
The error log is the following:

RuntimeError: Cuda compile error: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5096): error: identifier "NV_IS_DEVICE" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5097): error: expected a ")"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5096): error: identifier "NV_IF_ELSE_TARGET" is undefined    

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5098): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5101): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5102): error: expected a ";"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5122): error: identifier "NV_IS_DEVICE" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5123): error: expected a ")"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5122): error: identifier "NV_IF_ELSE_TARGET" is undefined    

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5124): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5127): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5128): error: expected a ";"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5140): error: identifier "NV_IS_DEVICE" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5141): error: expected a ")"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5140): error: identifier "NV_IF_ELSE_TARGET" is undefined    

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5142): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5144): error: identifier "tr" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5148): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.h(5149): error: expected a ";"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(264): error: identifier "NV_IS_DEVICE" is undefined        

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(265): error: expected a ")"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(264): error: identifier "NV_IF_ELSE_TARGET" is undefined   

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(266): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(269): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(271): warning #940-D: missing return statement at end of non-void function "__half2::operator=(const __half2 &&)"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(274): error: identifier "NV_IS_DEVICE" is undefined        

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(275): error: expected a ")"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(274): error: identifier "NV_IF_ELSE_TARGET" is undefined   

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(276): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(279): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(281): warning #940-D: missing return statement at end of non-void function "__half2::operator=(const __half2 &)"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(283): error: identifier "NV_IS_DEVICE" is undefined        

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(284): error: expected a ")"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(283): error: identifier "NV_IF_ELSE_TARGET" is undefined   

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(285): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(287): error: identifier "tr" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(291): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(293): warning #940-D: missing return statement at end of non-void function "__half2::operator=(const __half2_raw &)"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(296): error: identifier "NV_IS_DEVICE" is undefined        

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(297): error: expected a ")"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(296): error: identifier "NV_IF_ELSE_TARGET" is undefined   

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(300): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(303): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(305): warning #940-D: missing return statement at end of non-void function "__half2::operator __half2_raw"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: identifier "NV_IS_DEVICE" is undefined        

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: type name is not allowed

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: expected a ")"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: identifier "NV_IF_ELSE_TARGET" is undefined   

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: identifier "val" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: expression must be a modifiable lvalue        

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: an asm operand must have scalar type

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: identifier "result" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: identifier "result" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: identifier "result" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(408): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(545): warning #12-D: parsing restarts here after previous syntax error

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(545): error: expected a ";"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(549): error: identifier "NV_IS_DEVICE" is undefined        

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(550): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(550): error: expected a ")"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(549): error: identifier "NV_IF_ELSE_TARGET" is undefined   

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(551): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(555): error: identifier "r" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(555): error: identifier "__internal_float2half" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(560): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(562): warning #940-D: missing return statement at end of non-void function "__float2half"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(566): error: identifier "NV_IS_DEVICE" is undefined        

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(567): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(567): error: expected a ")"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(566): error: identifier "NV_IF_ELSE_TARGET" is undefined   

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(568): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(572): error: identifier "r" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(572): error: identifier "__internal_float2half" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(577): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(579): warning #940-D: missing return statement at end of non-void function "__float2half_rn"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(583): error: identifier "NV_IS_DEVICE" is undefined        

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(584): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(584): error: expected a ")"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(583): error: identifier "NV_IF_ELSE_TARGET" is undefined   

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(585): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(589): error: identifier "r" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(589): error: identifier "__internal_float2half" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(591): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(593): warning #940-D: missing return statement at end of non-void function "__float2half_rz"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(597): error: identifier "NV_IS_DEVICE" is undefined        

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(598): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(598): error: expected a ")"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(597): error: identifier "NV_IF_ELSE_TARGET" is undefined   

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(599): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(603): error: identifier "r" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(603): error: identifier "__internal_float2half" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(608): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(610): warning #940-D: missing return statement at end of non-void function "__float2half_rd"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(614): error: identifier "NV_IS_DEVICE" is undefined        

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(615): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(615): error: expected a ")"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(614): error: identifier "NV_IF_ELSE_TARGET" is undefined   

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(616): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(620): error: identifier "r" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(620): error: identifier "__internal_float2half" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(625): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(627): warning #940-D: missing return statement at end of non-void function "__float2half_ru"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(631): error: identifier "NV_IS_DEVICE" is undefined        

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(632): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(631): error: identifier "NV_IF_ELSE_TARGET" is undefined

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(635): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(637): error: expected an expression

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(639): warning #940-D: missing return statement at end of non-void function "__float2half2_rn"

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(630): warning #177-D: variable "val" was declared but never referenced

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v13.0\include\cuda_fp16.hpp(644): error: identifier "NV_PROVIDES_SM_80" is undefined

Error limit reached.
100 errors detected in the compilation of "cuda_kernels\conv2d.cu".
Compilation terminated.

I also tried the cpp solution to compile the same program, it works. So I believe the bug lies in cuda-python wrapper.

#include <cuda.h>
#include <nvrtc.h>

#include <iostream>
#include <vector>

#define NVRTC_CHECK(x)                                            \
  do {                                                            \
    nvrtcResult result = x;                                       \
    if (result != NVRTC_SUCCESS) {                                \
      std::cerr << "NVRTC error: " << nvrtcGetErrorString(result) \
                << std::endl;                                     \
    }                                                             \
  } while (0)

#define CUDA_CHECK(x)                                  \
  do {                                                 \
    CUresult result = x;                               \
    if (result != CUDA_SUCCESS) {                      \
      const char* msg;                                 \
      cuGetErrorString(result, &msg);                  \
      std::cerr << "CUDA error: " << msg << std::endl; \
    }                                                  \
  } while (0)

int main() {
  const char* cuda_source = R"(
        #include <cuda_fp16.h>
        
        extern "C" __global__ void half_add_kernel(half* a, half* b, half* c, int n) {
            int idx = blockIdx.x * blockDim.x + threadIdx.x;
            if (idx < n) {
                // 半精度加法运算
                c[idx] = a[idx] + b[idx];
            }
        }
    )";

  nvrtcProgram program;
  NVRTC_CHECK(nvrtcCreateProgram(&program, cuda_source, "half_kernel.cu", 0,
                                 nullptr, nullptr));

  const char* opts[] = {
      "--gpu-architecture=compute_89", "--fmad=false",
      "-IC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v13.0/include"};

  std::cout << "Compiling CUDA code with half precision support..."
            << std::endl;
  NVRTC_CHECK(nvrtcCompileProgram(program, 3, opts));
  size_t log_size;
  NVRTC_CHECK(nvrtcGetProgramLogSize(program, &log_size));
  char log[1 << 10];
  NVRTC_CHECK(nvrtcGetProgramLog(program, log));
  std::cout << std::string(log, log + log_size) << std::endl;

  size_t ptx_size;
  NVRTC_CHECK(nvrtcGetPTXSize(program, &ptx_size));
  std::vector<char> ptx(ptx_size);
  NVRTC_CHECK(nvrtcGetPTX(program, ptx.data()));

  NVRTC_CHECK(nvrtcDestroyProgram(&program));

  CUdevice device;
  CUcontext context;
  CUmodule module;
  CUfunction kernel;

  CUDA_CHECK(cuInit(0));
  CUDA_CHECK(cuDeviceGet(&device, 0));
  CUDA_CHECK(cuCtxCreate(&context, 0, 0, device));

  CUDA_CHECK(cuModuleLoadDataEx(&module, ptx.data(), 0, nullptr, nullptr));
  CUDA_CHECK(cuModuleGetFunction(&kernel, module, "half_add_kernel"));

  std::cout << "Compilation successful. PTX code generated." << std::endl;

  CUDA_CHECK(cuModuleUnload(module));
  CUDA_CHECK(cuCtxDestroy(context));

  return 0;
}

How to Reproduce

My code repo: https://github.com/tigert1998/mytorch/blob/main/cuda_utils.py#L120
The kernel is located at: https://github.com/tigert1998/mytorch/blob/main/cuda_kernels/conv2d.cu

Expected behavior

"cuda_fp16.h" should be correctly included with no errors.

Operating System

Windows 11

nvidia-smi output

Sun Aug 17 13:32:58 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.88 Driver Version: 580.88 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4060 ... WDDM | 00000000:01:00.0 On | N/A |
| N/A 55C P5 18W / 95W | 1393MiB / 8188MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions