Hi,
I used the codes in this repo to finetune open llama model, to reduce the finetuning time, when I generate dataset, I only use one prompt for training, valadation and test set on Beauty. I use random indexing and use the original setting in your repo. And then when I evaluate, I found that the generated output sequences are full of unexpected chracters, like '(@*$^)(*Y(8'. And also, when I want to use the codes to finetune other llama series model, the generated sentences become to be full of '!' .
Can anyone give a hint about this? Is this the problem od tokenizer?
Thanks
Hi,
I used the codes in this repo to finetune open llama model, to reduce the finetuning time, when I generate dataset, I only use one prompt for training, valadation and test set on Beauty. I use random indexing and use the original setting in your repo. And then when I evaluate, I found that the generated output sequences are full of unexpected chracters, like '(@*$^)(*Y(8'. And also, when I want to use the codes to finetune other llama series model, the generated sentences become to be full of '!' .
Can anyone give a hint about this? Is this the problem od tokenizer?
Thanks