Skip to content

某些张量在CPU上,而其他张量在GPU上 #109

@DqqHns

Description

@DqqHns

你好,想问一下某些张量在CPU上,而其他张量在GPU上,这怎么解决呀?完全根据指示来的,但步骤也并没有特别详细,有些文件步骤还是需要慢慢摸索,但到这一步就完全卡住,不知道如何修改了。希望大神们可以帮我解决一下,刚接触这方面,也不是很懂

| Name | Type | Params

0 | model | DiffusionWrapper | 872 M
1 | first_stage_model | AutoencoderKL | 83.7 M
2 | cond_stage_model | BERTEmbedder | 581 M
3 | embedding_manager | EmbeddingManager | 356 M
4 | resize256 | Resize | 0

356 M Trainable params
1.5 B Non-trainable params
1.9 B Total params
7,577.387 Total estimated model params size (MB)
/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:617: UserWarning: Checkpoint directory logs/anomaly-checkpoints/checkpoints exists and is not empty.
rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")
Validation sanity check: 0it [00:00, ?it/s]/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:440: UserWarning: Your val_dataloader has shuffle=True,it is strongly recommended that you turn this off for val/test/predict dataloaders.
rank_zero_warn(
/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/trainer/data_loading.py:110: UserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the num_workers argument(try 128 which is the number of cpus on this machine) in theDataLoaderinit to improve performance. rank_zero_warn( Validation sanity check: 0%| | 0/1 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid usingtokenizersbefore the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid usingtokenizersbefore the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) /root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:56: UserWarning: Trying to infer thebatch_sizefrom an ambiguous collection. The batch size we found is 4. To avoid any miscalculations, useself.log(..., batch_size=batch_size)`.
warning_cache.warn(
Validation batch keys: dict_keys(['image', 'mask', 'caption', 'name'])
Batch 'image' device: cpu
Batch 'mask' device: cpu
Batch 'caption' is not a tensor
Batch 'name' is not a tensor
Model device: cpu
Summoning checkpoint.

Traceback (most recent call last):
File "main.py", line 923, in
trainer.fit(model, data)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 735, in fit
self._call_and_handle_interrupt(
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 682, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1193, in _run
self._dispatch()
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1272, in _dispatch
self.training_type_plugin.start_training(self)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1282, in run_stage
return self._run_train()
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1304, in _run_train
self._run_sanity_check(self.lightning_module)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1368, in _run_sanity_check
self._evaluation_loop.run()
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 109, in advance
dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 123, in advance
output = self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 215, in _evaluation_step
output = self.trainer.accelerator.validation_step(step_kwargs)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 236, in validation_step
return self.training_type_plugin.validation_step(*step_kwargs.values())
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 219, in validation_step
return self.model.validation_step(*args, **kwargs)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/root/autodl-tmp/anomalydiffusion-master/ldm/models/diffusion/ddpm.py", line 388, in validation_step
_, loss_dict_no_ema = self.shared_step(batch)
File "/root/autodl-tmp/anomalydiffusion-master/ldm/models/diffusion/ddpm.py", line 1047, in shared_step
loss = self(x, c,**total_dict)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/anomalydiffusion-master/ldm/models/diffusion/ddpm.py", line 1055, in forward
c,
= self.get_learned_conditioning(c,x=mask_cond,name=name)
File "/root/autodl-tmp/anomalydiffusion-master/ldm/models/diffusion/ddpm.py", line 640, in get_learned_conditioning
c,position = self.cond_stage_model.encode(c, cond_img=x,embedding_manager=self.embedding_manager,name=name)
File "/root/autodl-tmp/anomalydiffusion-master/ldm/modules/encoders/modules.py", line 130, in encode
return self(text,**kwargs)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/anomalydiffusion-master/ldm/modules/encoders/modules.py", line 122, in forward
z,position = self.transformer(tokens, return_embeddings=True, **kwargs)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/autodl-tmp/anomalydiffusion-master/ldm/modules/x_transformer.py", line 612, in forward
embedded_x = self.token_emb(x)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 164, in forward
return F.embedding(
File "/root/miniconda3/envs/Anomalydiffusion/lib/python3.8/site-packages/torch/nn/functional.py", line 2267, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions