Have been trying to run a hyperparameter sweep to use CPA but have been running into the same error as #36 . I also saw that I was getting outputs of "NaN or Inf in input tensor" but have done extensive testing and know I don't have either in my initial dataset. But when I looked at my training losses I saw that both the reconstruction and adversarial losses were nans as follows
[[34mINFO
[[0m Generating sequential column names
[[34mINFO
[[0m Generating sequential column names
Training: 0%| | 0/200 [00:00<?, ?it/s]
Epoch 1/200: 0%| | 0/200 [00:00<?, ?it/s]
Epoch 1/200: 0%| | 1/200 [23:33<78:08:05, 1413.50s/it]
Epoch 1/200: 0%| | 1/200 [23:33<78:08:05, 1413.50s/it, v_num=0, recon=nan, r2_mean=-70.7, adv_loss=nan, acc_pert=0.00859, acc_cellIDs=0.0819]
Epoch 2/200: 0%| | 1/200 [23:33<78:08:05, 1413.50s/it, v_num=0, recon=nan, r2_mean=-70.7, adv_loss=nan, acc_pert=0.00859, acc_cellIDs=0.0819]
Epoch 2/200: 1%| | 2/200 [47:19<78:07:55, 1420.58s/it, v_num=0, recon=nan, r2_mean=-70.7, adv_loss=nan, acc_pert=0.00859, acc_cellIDs=0.0819]
Epoch 2/200: 1%| | 2/200 [47:19<78:07:55, 1420.58s/it, v_num=0, recon=nan, r2_mean=-70.7, adv_loss=nan, acc_pert=0.00857, acc_cellIDs=0.0819]
Epoch 3/200: 1%| | 2/200 [47:19<78:07:55, 1420.58s/it, v_num=0, recon=nan, r2_mean=-70.7, adv_loss=nan, acc_pert=0.00857, acc_cellIDs=0.0819]
Epoch 3/200: 2%|▏ | 3/200 [1:10:30<77:00:15, 1407.19s/it, v_num=0, recon=nan, r2_mean=-70.7, adv_loss=nan, acc_pert=0.00857, acc_cellIDs=0.0819]
Epoch 3/200: 2%|▏ | 3/200 [1:10:30<77:00:15, 1407.19s/it, v_num=0, recon=nan, r2_mean=-70.7, adv_loss=nan, acc_pert=0.00855, acc_cellIDs=0.0819]
Epoch 4/200: 2%|▏ | 3/200 [1:10:30<77:00:15, 1407.19s/it, v_num=0, recon=nan, r2_mean=-70.7, adv_loss=nan, acc_pert=0.00855, acc_cellIDs=0.0819]
Epoch 4/200: 2%|▏ | 4/200 [1:33:38<76:11:55, 1399.57s/it, v_num=0, recon=nan, r2_mean=-70.7, adv_loss=nan, acc_pert=0.00855, acc_cellIDs=0.0819]
Epoch 4/200: 2%|▏ | 4/200 [1:33:38<76:11:55, 1399.57s/it, v_num=0, recon=nan, r2_mean=-70.7, adv_loss=nan, acc_pert=0.00849, acc_cellIDs=0.0819]
Epoch 5/200: 2%|▏ | 4/200 [1:33:38<76:11:55, 1399.57s/it, v_num=0, recon=nan, r2_mean=-70.7, adv_loss=nan, acc_pert=0.00849, acc_cellIDs=0.0819]
Have been trying to run a hyperparameter sweep to use CPA but have been running into the same error as #36 . I also saw that I was getting outputs of "NaN or Inf in input tensor" but have done extensive testing and know I don't have either in my initial dataset. But when I looked at my training losses I saw that both the reconstruction and adversarial losses were nans as follows
And the specific sweep would fail at epoch 5 because that's what my
check_val_every_n_epochwas set to. I'm guessing that the output about the NaNs/Infs are pointing towards these losses but maybe there's something else going on? Myadv_losswas set tocceand I was using yourtune_script.py- any idea why I'd fail to get reconstruction and adversarial losses? Happy to provide more information as it would be useful.