Skip to content
This repository was archived by the owner on Mar 1, 2025. It is now read-only.
This repository was archived by the owner on Mar 1, 2025. It is now read-only.

RuntimeError: CUDA error: an illegal memory access was encountered #231

@eddiewrc

Description

@eddiewrc

Hi, first of all thanks for sharing this library with all of us!
Unfortunately I am encountering few problems while trying to run it. In particular, I tried to build the following network, which is supposed to take as input a sparse tensor of shape (8192, 16384). Part of it is now commented because I tried to locate the origin of the problem, and apparently it happens just with just the first Convolution module (so I commented the rest for now)

The error that I get is pasted below. The GPU is a quadro gv100, system cuda version 11.4, pytorch 1.11.0 py3.9_cuda11.3_cudnn8.2.0_0

class HCSparseConvNet1(t.nn.Module):
        def __init__(self, featSize, numOut, size, name = "NN"):
                super(HCSparseConvNet1, self).__init__()
                print(size)
                self.inputLayer = scn.InputLayer(2, size, 2)
                
                self.sparseModel = scn.Sequential(scn.Convolution(2,1,4,8,8, True))#, scn.Convolution(2,4,8,8,4, True), scn.LeakyReLU(), scn.Convolution(2,8,16,3,2,True), scn.LeakyReLU(), scn.Convolution(2,16,16, 3,2, True), scn.SparseToDense(2, 16))#, scn.MaxPooling(2,16,8), scn.Convolution(2, 10,10,64,32, False))
                self.out1 = t.nn.Sequential(t.nn.GroupNorm(1,16), t.nn.Tanh(), t.nn.Conv2d(16,8,3,2), t.nn.GroupNorm(1,8), t.nn.Tanh(), t.nn.Conv2d(8,4,3,1, padding=1), t.nn.GroupNorm(1,4), t.nn.Tanh())
#self.spatial_size= self.sparseModel.input_spatial_size(size)
                self.final = t.nn.Sequential(t.nn.Linear(7812, 100), t.nn.LayerNorm(100), t.nn.Tanh(), t.nn.Linear(100, numOut))

        def forward(self, x, batchSize):
                #print(x[0].size(), x[1].size())
                x = self.inputLayer(x)
                x = self.sparseModel(x)
                print(x)
                #x = self.out1(x)
                #print(x.size())
                #x = self.final(x.view(batchSize, -1))
                return x

The error:

Traceback (most recent call last):
  File "/home/eddiewrc/galiana2/galianaHCsparseConvNet.py", line 144, in <module>
    sys.exit(main(sys.argv))
  File "/home/eddiewrc/galiana2/galianaHCsparseConvNet.py", line 94, in main
    wrapper.fit(X, Y, device, epochs=50, batch_size = 11, LOG=False)
  File "/home/eddiewrc/galiana2/sources/HCModels.py", line 200, in fit
    yp = self.model.forward([coord, features], batchSize)
  File "/home/eddiewrc/galiana2/sources/HCModels.py", line 58, in forward
    print(x)
  File "/home/eddiewrc/SparseConvNet/sparseconvnet/sparseConvNetTensor.py", line 58, in __repr__
    'features=' + repr(self.features) + \
  File "/home/eddiewrc/miniconda3/lib/python3.9/site-packages/torch/_tensor.py", line 305, in __repr__
    return torch._tensor_str._str(self)
  File "/home/eddiewrc/miniconda3/lib/python3.9/site-packages/torch/_tensor_str.py", line 434, in _str
    return _str_intern(self)
  File "/home/eddiewrc/miniconda3/lib/python3.9/site-packages/torch/_tensor_str.py", line 409, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/home/eddiewrc/miniconda3/lib/python3.9/site-packages/torch/_tensor_str.py", line 264, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/home/eddiewrc/miniconda3/lib/python3.9/site-packages/torch/_tensor_str.py", line 296, in get_summarized_data
    return torch.stack([get_summarized_data(x) for x in (start + end)])
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
```.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions