fix: repair pad_id fallback order in finetuning generators by peter941221 · Pull Request #86 · Cerebras/modelzoo

peter941221 · 2026-06-10T02:00:23Z

Bug

Both finetuning token generators compute eos_token before assigning self.pad_id.

In finetuning_token_generator.py, the eos_id is None fallback reads self.pad_id before self.pad_id = pad_id runs. The sibling implementation in finetuning_token_generator_mllama.py has the same ordering.

That leaves both constructors with the same invalid path when eos_id is None.

Fix

Assign self.pad_id before computing self.eos_token in both generators.

This is an order-only change. It keeps the existing fallback behavior and makes the eos_id is None path valid.

Verification

I reproduced the failure in both generators with an isolated constructor stub that passes eos_id=None.

Before the change:
AttributeError: 'FinetuningTokenGenerator' object has no attribute 'pad_id'

After the change:
pad_id=99, eos_id=None, eos_token=tok-99

I also added a focused regression test at src/cerebras/modelzoo/data_preparation/data_preprocessing/test_finetuning_token_generator_init.py.

Verification commands:
PYTHONPATH=src python3 -m unittest src/cerebras/modelzoo/data_preparation/data_preprocessing/test_finetuning_token_generator_init.py
python3 -m py_compile src/cerebras/modelzoo/data_preparation/data_preprocessing/finetuning_token_generator.py src/cerebras/modelzoo/data_preparation/data_preprocessing/finetuning_token_generator_mllama.py

Fixes #64.

This applies the same constructor-order repair to the multimodal sibling generator.

Fix pad_id fallback order in finetuning generators

144cda6

peter941221 mentioned this pull request Jun 10, 2026

Bug: Attribute used before being set in FinetuningTokenGenerator #64

Open

Add regression test for finetuning token init

c0d7bef

peter941221 marked this pull request as ready for review June 10, 2026 02:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: repair pad_id fallback order in finetuning generators#86

fix: repair pad_id fallback order in finetuning generators#86
peter941221 wants to merge 2 commits into
Cerebras:mainfrom
peter941221:fix/finetuning-pad-id-order

peter941221 commented Jun 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

peter941221 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

peter941221 commented Jun 10, 2026 •

edited

Loading