RETURNN starting up, version 1.20260227.073346+git.9e93588a, date/time 2026-03-03-11-05-14 (UTC+0100), pid 79310, cwd /rwthfs/rz/cluster/hpcwork/p0023
565/mwk22690/setups/2025-11-10-start/work/i6_core/returnn/training/ReturnnTrainingJob.OuK06vxsrJJq/work, Python /home/az668407/work/py-envs/py3.12-tor
ch2.7/bin/python3
RETURNN command line options: ['/rwthfs/rz/cluster/home/mwk22690/setups/2025-11-10-start/work/i6_core/returnn/training/ReturnnTrainingJob.OuK06vxsrJJq
/output/returnn.config']
Hostname: n23g0010.hpc.itc.rwth-aachen.de
...
PyTorch: 2.7.1+cu126 (e2d141dbde55c2a4370fac5165b0561b6af4798b) (<site-package> in /home/az668407/work/py-envs/py3.12-torch2.7/lib/python3.12/site-pac
kages/torch)
CUDA_VISIBLE_DEVICES=0
MKL_EXAMPLES=/cvmfs/software.hpc.rwth.de/Linux/RH9/x86_64/intel/sapphirerapids/software/imkl/2024.2.0/mkl/2024.2/share/doc/mkl/examples
CUDA_PATH=/cvmfs/software.hpc.rwth.de/Linux/RH9/x86_64/intel/sapphirerapids/software/CUDA/12.6.3
CUDA_ROOT=/cvmfs/software.hpc.rwth.de/Linux/RH9/x86_64/intel/sapphirerapids/software/CUDA/12.6.3
OMP_NUM_THREADS=24
CUDA_HOME=/cvmfs/software.hpc.rwth.de/Linux/RH9/x86_64/intel/sapphirerapids/software/CUDA/12.6.3
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
MKL_NUM_THREADS=24
CUDA_VISIBLE_DEVICES is set to '0'.
Num NVML devices: 1
Available CUDA devices:
1/1: cuda:0
name: NVIDIA H100
total_memory: 93.1GB
free_memory: 92.6GB (99%)
capability: 9.0
device_index: 0
uuid: c67fb24c-9d87-f003-79a7-5aed55c54f76
nvml_device_index: 0
RETURNN global startup callback.
/w0/tmp disk usage: total 695.8GB, used 54.0GB, free 641.9GB
Total freed space: 0B
...
ep 18 train, step 52, no_collapse_ctc 1.580, no_collapse_ctc_err 0.363, ctc_4 1.580, ctc_err_4 0.363, ctc_10 1.339, ctc_err_10 0.287, ctc_16 1.176, ctc_err_16 0.252, ce 0.866, fer 0.160, num_seqs 28, max_size:time 266552, max_size:out-spatial 69, mem_usage:cuda 52.4GB, 0.305 sec/step, elapsed 0:00:28, exp. remaining 2:57:08, complete 0.27%
ep 18 train, step 53, txt_ctc_10 1.032, txt_ctc_err_10 0.157, txt_ctc_16 0.944, txt_ctc_err_16 0.149, txt_ce 0.916, txt_fer 0.135, grad_norm:p2 11.455, num_seqs 200, max_size:time 0, max_size:out-spatial 32, mem_usage:cuda 52.4GB, 0.280 sec/step, elapsed 0:00:29, exp. remaining 2:58:40, complete 0.27%
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
...
File "/rwthfs/rz/cluster/home/mwk22690/setups/2025-11-10-start/tools/returnn/returnn/torch/frontend/_backend.py", line 755, in TorchBackend.ctc_loss
line: loss_raw = torch.nn.functional.ctc_loss(
log_probs=log_probs,
targets=targets_raw,
input_lengths=input_lengths,
target_lengths=targets_lengths,
blank=blank_index,
zero_infinity=True,
reduction="none",
)
locals:
log_probs = <local> <torch.Tensor: repr-error RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_D...
targets = <local> Tensor{'text', [B,T|'out-spatial'[B]], dtype='int32', sparse_dim=Dim{F'vocab'(10240)}}
targets_raw = <local> <torch.Tensor: repr-error RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_D...
input_lengths = <local> tensor[28] i32 x∈[228, 280] μ=254.179 σ=23.777
targets_lengths = <local> tensor[28] i32 x∈[35, 66] μ=50.429 σ=9.183
blank_index = <local> 10240
File "/home/az668407/work/py-envs/py3.12-torch2.7/lib/python3.12/site-packages/torch/nn/functional.py", line 3079, in ctc_loss
line: return torch.ctc_loss(
log_probs,
targets,
input_lengths,
target_lengths,
blank,
_Reduction.get_enum(reduction),
zero_infinity,
)
locals:
torch.ctc_loss = <global> <built-in method ctc_loss of type object at 0x14b283689fa0>
log_probs = <local> <torch.Tensor: repr-error RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_D...
targets = <local> <torch.Tensor: repr-error RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_D...
input_lengths = <local> tensor[28] i32 x∈[228, 280] μ=254.179 σ=23.777
target_lengths = <local> tensor[28] i32 x∈[35, 66] μ=50.429 σ=9.183
blank = <local> 10240
reduction = <local> 'none'
zero_infinity = <local> True
RuntimeError: CUDA error: an illegal memory access was encountered