Skip to content

RuntimeError: CUDA error: an illegal memory access was encountered #1816

@albertz

Description

@albertz
RETURNN starting up, version 1.20260227.073346+git.9e93588a, date/time 2026-03-03-11-05-14 (UTC+0100), pid 79310, cwd /rwthfs/rz/cluster/hpcwork/p0023
565/mwk22690/setups/2025-11-10-start/work/i6_core/returnn/training/ReturnnTrainingJob.OuK06vxsrJJq/work, Python /home/az668407/work/py-envs/py3.12-tor
ch2.7/bin/python3
RETURNN command line options: ['/rwthfs/rz/cluster/home/mwk22690/setups/2025-11-10-start/work/i6_core/returnn/training/ReturnnTrainingJob.OuK06vxsrJJq
/output/returnn.config']
Hostname: n23g0010.hpc.itc.rwth-aachen.de
...
PyTorch: 2.7.1+cu126 (e2d141dbde55c2a4370fac5165b0561b6af4798b) (<site-package> in /home/az668407/work/py-envs/py3.12-torch2.7/lib/python3.12/site-pac
kages/torch)
CUDA_VISIBLE_DEVICES=0
MKL_EXAMPLES=/cvmfs/software.hpc.rwth.de/Linux/RH9/x86_64/intel/sapphirerapids/software/imkl/2024.2.0/mkl/2024.2/share/doc/mkl/examples
CUDA_PATH=/cvmfs/software.hpc.rwth.de/Linux/RH9/x86_64/intel/sapphirerapids/software/CUDA/12.6.3
CUDA_ROOT=/cvmfs/software.hpc.rwth.de/Linux/RH9/x86_64/intel/sapphirerapids/software/CUDA/12.6.3
OMP_NUM_THREADS=24
CUDA_HOME=/cvmfs/software.hpc.rwth.de/Linux/RH9/x86_64/intel/sapphirerapids/software/CUDA/12.6.3
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
MKL_NUM_THREADS=24
CUDA_VISIBLE_DEVICES is set to '0'.
Num NVML devices: 1
Available CUDA devices:
  1/1: cuda:0
       name: NVIDIA H100
       total_memory: 93.1GB
       free_memory: 92.6GB (99%)
       capability: 9.0
       device_index: 0
       uuid: c67fb24c-9d87-f003-79a7-5aed55c54f76
       nvml_device_index: 0
RETURNN global startup callback.
/w0/tmp disk usage: total 695.8GB, used 54.0GB, free 641.9GB
Total freed space: 0B

...
ep 18 train, step 52, no_collapse_ctc 1.580, no_collapse_ctc_err 0.363, ctc_4 1.580, ctc_err_4 0.363, ctc_10 1.339, ctc_err_10 0.287, ctc_16 1.176, ctc_err_16 0.252, ce 0.866, fer 0.160, num_seqs 28, max_size:time 266552, max_size:out-spatial 69, mem_usage:cuda 52.4GB, 0.305 sec/step, elapsed 0:00:28, exp. remaining 2:57:08, complete 0.27%
ep 18 train, step 53, txt_ctc_10 1.032, txt_ctc_err_10 0.157, txt_ctc_16 0.944, txt_ctc_err_16 0.149, txt_ce 0.916, txt_fer 0.135, grad_norm:p2 11.455, num_seqs 200, max_size:time 0, max_size:out-spatial 32, mem_usage:cuda 52.4GB, 0.280 sec/step, elapsed 0:00:29, exp. remaining 2:58:40, complete 0.27%
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

...
  File "/rwthfs/rz/cluster/home/mwk22690/setups/2025-11-10-start/tools/returnn/returnn/torch/frontend/_backend.py", line 755, in TorchBackend.ctc_loss
    line: loss_raw = torch.nn.functional.ctc_loss(
              log_probs=log_probs,
              targets=targets_raw,
              input_lengths=input_lengths,
              target_lengths=targets_lengths,
              blank=blank_index,
              zero_infinity=True,
              reduction="none",
          )
    locals:
      log_probs = <local> <torch.Tensor: repr-error RuntimeError: CUDA error: an illegal memory access was encountered
                          CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
                          For debugging consider passing CUDA_LAUNCH_BLOCKING=1
                          Compile with `TORCH_USE_CUDA_D...
      targets = <local> Tensor{'text', [B,T|'out-spatial'[B]], dtype='int32', sparse_dim=Dim{F'vocab'(10240)}}
      targets_raw = <local> <torch.Tensor: repr-error RuntimeError: CUDA error: an illegal memory access was encountered
                            CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
                            For debugging consider passing CUDA_LAUNCH_BLOCKING=1
                            Compile with `TORCH_USE_CUDA_D...
      input_lengths = <local> tensor[28] i32 x∈[228, 280] μ=254.179 σ=23.777
      targets_lengths = <local> tensor[28] i32 x∈[35, 66] μ=50.429 σ=9.183
      blank_index = <local> 10240
  File "/home/az668407/work/py-envs/py3.12-torch2.7/lib/python3.12/site-packages/torch/nn/functional.py", line 3079, in ctc_loss
    line: return torch.ctc_loss(
              log_probs,
              targets,
              input_lengths,
              target_lengths,
              blank,
              _Reduction.get_enum(reduction),
              zero_infinity,
          )
    locals:
      torch.ctc_loss = <global> <built-in method ctc_loss of type object at 0x14b283689fa0>
      log_probs = <local> <torch.Tensor: repr-error RuntimeError: CUDA error: an illegal memory access was encountered
                          CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
                          For debugging consider passing CUDA_LAUNCH_BLOCKING=1
                          Compile with `TORCH_USE_CUDA_D...
      targets = <local> <torch.Tensor: repr-error RuntimeError: CUDA error: an illegal memory access was encountered
                        CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
                        For debugging consider passing CUDA_LAUNCH_BLOCKING=1
                        Compile with `TORCH_USE_CUDA_D...
      input_lengths = <local> tensor[28] i32 x∈[228, 280] μ=254.179 σ=23.777
      target_lengths = <local> tensor[28] i32 x∈[35, 66] μ=50.429 σ=9.183
      blank = <local> 10240
      reduction = <local> 'none'
      zero_infinity = <local> True
RuntimeError: CUDA error: an illegal memory access was encountered

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions