Evaluation Problem

Hi,  based on your guidance, I train my model based on Qwen 1.5-1.8B.
While conducting the evaluation, I noticed that there appear to be some issues with the SQA and MMBench evaluations. The results are quite low, and the evaluation time is extremely long (12 hours or more). This problem seems to occur only when "Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation."

Here is my evaluation script:

`export IMP_SILIENT_OTHERS=true

gpu_list="${CUDA_VISIBLE_DEVICES:-0}"
IFS=',' read -ra GPULIST <<< "$gpu_list"

CHUNKS=${#GPULIST[@]}

SPLIT="llava_scienceqa"

MODEL_CKPT="imp-v1-2b-stage2-lora"
EVAL_CKPT="${MODEL_CKPT//\//_}_1"
MODEL_BASE=checkpoints/base/Qwen1.5-1.8B

for IDX in $(seq 0 $((CHUNKS-1))); do
    CUDA_VISIBLE_DEVICES=${GPULIST[$IDX]} python -m imp_llava.eval.model_vqa_science \
        --model-path ./checkpoints/$MODEL_CKPT \
        --model-base $MODEL_BASE  \
        --question-file ./eval_dataset/scienceqa/llava_test_CQM-A.json \
        --image-folder ./eval_dataset/scienceqa/images/test \
        --answers-file ./eval_dataset/scienceqa/answers/$SPLIT/$EVAL_CKPT/${CHUNKS}_${IDX}.jsonl \
        --num-chunks $CHUNKS \
        --chunk-idx $IDX \
        --temperature 0 \
        --conv-mode qwen2 &
done

wait


output_file=./eval_dataset/scienceqa/answers/$SPLIT/$EVAL_CKPT/merge.jsonl

# Clear out the output file if it exists.
> "$output_file"

# Loop through the indices and concatenate each file.
for IDX in $(seq 0 $((CHUNKS-1))); do
    cat ./eval_dataset/scienceqa/answers/$SPLIT/$EVAL_CKPT/${CHUNKS}_${IDX}.jsonl >> "$output_file"
done


python imp_llava/eval/eval_science_qa.py \
    --base-dir ./eval_dataset/scienceqa \
    --result-file $output_file \
    --output-file ./eval_dataset/scienceqa/answers/output.jsonl \
    --output-result ./eval_dataset/scienceqa/answers/result.json
`

Other than that, the rest of the benchmark evaluations seem to be normal. I would like to know if you have encountered similar issues and what are the solutions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation Problem #28

Clear out the output file if it exists.

Loop through the indices and concatenate each file.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluation Problem #28

Description

Clear out the output file if it exists.

Loop through the indices and concatenate each file.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions