This repository was archived by the owner on Mar 1, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 338
This repository was archived by the owner on Mar 1, 2025. It is now read-only.
RuntimeError: CUDA error: an illegal memory access was encountered #231
Copy link
Copy link
Open
Description
Hi, first of all thanks for sharing this library with all of us!
Unfortunately I am encountering few problems while trying to run it. In particular, I tried to build the following network, which is supposed to take as input a sparse tensor of shape (8192, 16384). Part of it is now commented because I tried to locate the origin of the problem, and apparently it happens just with just the first Convolution module (so I commented the rest for now)
The error that I get is pasted below. The GPU is a quadro gv100, system cuda version 11.4, pytorch 1.11.0 py3.9_cuda11.3_cudnn8.2.0_0
class HCSparseConvNet1(t.nn.Module):
def __init__(self, featSize, numOut, size, name = "NN"):
super(HCSparseConvNet1, self).__init__()
print(size)
self.inputLayer = scn.InputLayer(2, size, 2)
self.sparseModel = scn.Sequential(scn.Convolution(2,1,4,8,8, True))#, scn.Convolution(2,4,8,8,4, True), scn.LeakyReLU(), scn.Convolution(2,8,16,3,2,True), scn.LeakyReLU(), scn.Convolution(2,16,16, 3,2, True), scn.SparseToDense(2, 16))#, scn.MaxPooling(2,16,8), scn.Convolution(2, 10,10,64,32, False))
self.out1 = t.nn.Sequential(t.nn.GroupNorm(1,16), t.nn.Tanh(), t.nn.Conv2d(16,8,3,2), t.nn.GroupNorm(1,8), t.nn.Tanh(), t.nn.Conv2d(8,4,3,1, padding=1), t.nn.GroupNorm(1,4), t.nn.Tanh())
#self.spatial_size= self.sparseModel.input_spatial_size(size)
self.final = t.nn.Sequential(t.nn.Linear(7812, 100), t.nn.LayerNorm(100), t.nn.Tanh(), t.nn.Linear(100, numOut))
def forward(self, x, batchSize):
#print(x[0].size(), x[1].size())
x = self.inputLayer(x)
x = self.sparseModel(x)
print(x)
#x = self.out1(x)
#print(x.size())
#x = self.final(x.view(batchSize, -1))
return x
The error:
Traceback (most recent call last):
File "/home/eddiewrc/galiana2/galianaHCsparseConvNet.py", line 144, in <module>
sys.exit(main(sys.argv))
File "/home/eddiewrc/galiana2/galianaHCsparseConvNet.py", line 94, in main
wrapper.fit(X, Y, device, epochs=50, batch_size = 11, LOG=False)
File "/home/eddiewrc/galiana2/sources/HCModels.py", line 200, in fit
yp = self.model.forward([coord, features], batchSize)
File "/home/eddiewrc/galiana2/sources/HCModels.py", line 58, in forward
print(x)
File "/home/eddiewrc/SparseConvNet/sparseconvnet/sparseConvNetTensor.py", line 58, in __repr__
'features=' + repr(self.features) + \
File "/home/eddiewrc/miniconda3/lib/python3.9/site-packages/torch/_tensor.py", line 305, in __repr__
return torch._tensor_str._str(self)
File "/home/eddiewrc/miniconda3/lib/python3.9/site-packages/torch/_tensor_str.py", line 434, in _str
return _str_intern(self)
File "/home/eddiewrc/miniconda3/lib/python3.9/site-packages/torch/_tensor_str.py", line 409, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/home/eddiewrc/miniconda3/lib/python3.9/site-packages/torch/_tensor_str.py", line 264, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/home/eddiewrc/miniconda3/lib/python3.9/site-packages/torch/_tensor_str.py", line 296, in get_summarized_data
return torch.stack([get_summarized_data(x) for x in (start + end)])
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
```.
Metadata
Metadata
Assignees
Labels
No labels