Skip to content

Unexpected generated sentences when using llama #36

@xiaxin1998

Description

@xiaxin1998

Hi,
I used the codes in this repo to finetune open llama model, to reduce the finetuning time, when I generate dataset, I only use one prompt for training, valadation and test set on Beauty. I use random indexing and use the original setting in your repo. And then when I evaluate, I found that the generated output sequences are full of unexpected chracters, like '(@*$^)(*Y(8'. And also, when I want to use the codes to finetune other llama series model, the generated sentences become to be full of '!' .
Can anyone give a hint about this? Is this the problem od tokenizer?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions