Implements an `OLMo-core` compatible data loader to load HF datasets from `get_cached_dataset_tulu`. #1208

finbarrtimbers · 2025-11-18T19:40:51Z

There's not a ton of point to this, right now. But by making this change, we can subsequently:

move dpo_tune_cache.py over to use this, which will (eventually) let that use the Olmo-core trainer
Give each LLMRayActor their own data_loader, so they can pull independently and not have to coordinate with main. This will let us remove add_prompt_to_generator and have them operate fully asynchronously. This will make switching the entire grpo_fast.py script over to the data_loader/trainer pattern easier.

Runs:

Single GPU GRPO: Beaker
Multi-node GRPO script: Beaker

Note

Adds an olmo_core-compatible HF dataset loader and rewires GRPO, queues, and vLLM utils to use prompt_id-based requests with checkpointable reshuffling.

Data Loading:
- HFDataLoader: New olmo_core DataLoaderBase wrapper for HuggingFace Dataset with sharding, shuffling/automatic reshuffle, exclusion, checkpointable state, and prompt_id emission.
GRPO Pipeline:
- Replace ShufflingIterator and PendingQueriesMap with HFDataLoader-driven sampling and replenishment.
- Update batching (accumulate_inference_batches) to fetch examples from prompt_dataset by dataset_index and handle no-resample/exclusion via loader.
- Change add_prompt_to_generator to enqueue single-example PromptRequest using dataset_index and prompt_id.
- Save/restore data loader state in checkpoints; create eval data loader; optional automatic reshuffle.
Queue Types & IDs:
- PromptRequest/GenerationResult: require dataset_index, add prompt_id, remove epoch_number/training_step fields.
- Request IDs now train|eval_{prompt_id}.
vLLM Utils:
- Update request ID creation, metadata, and result processing to use prompt_id.
Tests:
- Add test_data_loader.py; refactor GRPO and vLLM utils tests to new loader/queue semantics and prompt-based flow.
Dependencies:
- Add ai2-olmo-core==2.3.0 to integrate with DataLoaderBase.

^{Written by Cursor Bugbot for commit 3107302. This will update automatically on new commits. Configure here.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-11-26T17:20:17Z

open_instruct/grpo_fast.py

        percent_solved = np.mean(scores).item() / args.max_possible_score
        # Don't resample prompt that was solved at more than no_resample_positive_rate
        if no_resampling_pass_rate is not None and percent_solved >= no_resampling_pass_rate:
-            iter_dataloader.exclude_index(result.dataset_index)
            total_no_resampled += 1
            logging.debug(


Enforce no_resampling_pass_rate when prompt is solved

The no_resampling_pass_rate flag is now effectively ignored. In accumulate_inference_batches, when a prompt’s average score exceeds the threshold (lines 1678‑1682) the code only increments total_no_resampled and logs, but no longer excludes that prompt from future sampling (the previous iter_dataloader.exclude_index(...) call was removed). With the flag set, solved prompts will keep being fed back through add_prompt_to_generator(next(data_loader), …) and resampled indefinitely instead of being retired, so training continues to spend compute on already-solved items.

Useful? React with 👍 / 👎.

open_instruct/grpo_fast.py

open_instruct/data_loader.py

hamishivi

Nice stuff! I like that this PR is actually reducing the code complexity a little. But a few questions before I'm happy with merging...

hamishivi · 2025-11-26T19:06:50Z

open_instruct/data_loader.py

+
+    @property
+    def total_batches(self) -> int:
+        """Return the total number of batches in an epoch."""


Is this doc string right? Isn't effective size number of samples in the dataset, from line 38?

yes, you're right. Fixed!

hamishivi · 2025-11-26T19:09:00Z

open_instruct/grpo_fast.py

        percent_solved = np.mean(scores).item() / args.max_possible_score
        # Don't resample prompt that was solved at more than no_resample_positive_rate
        if no_resampling_pass_rate is not None and percent_solved >= no_resampling_pass_rate:
-            iter_dataloader.exclude_index(result.dataset_index)


was this change intentional?

Mistake! Fixed.

open_instruct/grpo_fast.py

cursor · 2025-11-27T19:24:55Z

open_instruct/data_loader.py

+        """Return an iterable over all batches in the epoch."""
+        for i in range(self.batches_processed, self.effective_size):
+            example = self.dataset[i]
+            yield example | {"prompt_id": f"{self._epoch}_{example['dataset_index']}"}


Bug: batches_processed never incremented during iteration

The _iter_batches method yields items but never increments self.batches_processed. This breaks checkpointing because state_dict will always save batches_processed at its initial value (0 or whatever was restored). When resuming from a checkpoint, the data loader will restart from the beginning of the epoch instead of continuing from where it left off. The loop variable i should be used to update batches_processed after each yield, or batches_processed should be incremented in the __next__ method.

cursor · 2025-11-27T19:24:55Z

open_instruct/data_loader.py

+                self.reshuffle()
+                self._current_iter = self._iter_batches()
+                return next(self._current_iter)
+            raise


Bug: Evaluation data loader exhausted after first use

When automatic_reshuffle=False (the default), after the first complete iteration through the dataset, _current_iter remains as an exhausted generator and is never reset. Subsequent calls to __next__ will immediately raise StopIteration without yielding any items. This breaks evaluation in grpo_fast.py where eval_data_loader is created without automatic_reshuffle=True and is iterated over multiple times during training. After the first evaluation completes, all subsequent evaluations will process zero examples because the iterator is exhausted.

hamishivi

dont think the fix got pushed for two comments 😅 I think the new cursor bugbot comments are a bit valid, and left a comment on the automatic reshuffle!

hamishivi · 2025-11-28T01:20:23Z

open_instruct/grpo_fast.py

    """Whether to filter out prompts with zero reward std (all samples have the same score)."""
    no_resampling_pass_rate: float | None = None
    """If the response to a prompt is solved at a rate higher than this, do not resample this prompt again"""
+    automatic_reshuffle: bool = True


If this flag is false, then training just errors right? is there a reason to have it configurable?

finbarrtimbers added 30 commits November 18, 2025 12:21

test pass now

54365f8

Updated code

473b2d8

updated code

e3da5e3

Added data loader

72777b0

cleaned up code

6423648

Added get_mock_batch impl.

fbe386d

Updated to use proper iterator api

bbed906

Fixed caching.

f27aab3

fixed dataset issue

c8097fd

added debugging

0b01af2

fixed iter abtches

63253ba

Added tests for data loader

5b71400

Added tests

bdb2b44

Simplified filter logic.

6f43dde

Merge branch 'main' into oc-dataloader-dataset

1e3a6ac

Removed dataset

2fe15a9

updated cluster

07421df

Merge branch 'main' into oc-dataloader-dataset

d283c5a

Added docstrings, function signature type annotations.

0f6c467

Added docstrings + function signature type annotations

c2ac156

updated code

1390a5d

Cleans up docstrings.

8da1b86

Cleaned up code.

ba7b09d

updated code

6577e33

Cleaned up code

ceeee47

udpated test

d023ea6

Merge branch 'main' into oc-dataloader-dataset

508ef2e

Fixed tests.

c277976

updated docstring

27301c1

Cleaned up code.

2017ab5

finbarrtimbers marked this pull request as ready for review November 26, 2025 17:17

finbarrtimbers enabled auto-merge November 26, 2025 17:17

chatgpt-codex-connector bot reviewed Nov 26, 2025

View reviewed changes

cursor bot reviewed Nov 26, 2025

View reviewed changes

open_instruct/grpo_fast.py Show resolved Hide resolved

open_instruct/data_loader.py Outdated Show resolved Hide resolved

Fixed bugs.

1d2c925

cursor bot reviewed Nov 26, 2025

View reviewed changes

open_instruct/data_loader.py Outdated Show resolved Hide resolved

finbarrtimbers requested a review from hamishivi November 26, 2025 18:15

Fixed bug cursor pointed out.

86aa632

hamishivi reviewed Nov 26, 2025

View reviewed changes

now, everything should be fixed! addressed all of cursor's comments.

3107302

cursor bot reviewed Nov 27, 2025

View reviewed changes

hamishivi reviewed Nov 28, 2025

View reviewed changes

Implements an OLMo-core compatible data loader to load HF datasets from get_cached_dataset_tulu. #1208

Are you sure you want to change the base?

Implements an OLMo-core compatible data loader to load HF datasets from get_cached_dataset_tulu. #1208

Uh oh!

Conversation

finbarrtimbers commented Nov 18, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hamishivi left a comment

Choose a reason for hiding this comment

Uh oh!

hamishivi Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

finbarrtimbers Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

hamishivi Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

finbarrtimbers Nov 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Nov 27, 2025

Choose a reason for hiding this comment

Bug: batches_processed never incremented during iteration

Uh oh!

cursor bot Nov 27, 2025

Choose a reason for hiding this comment

Bug: Evaluation data loader exhausted after first use

Uh oh!

hamishivi left a comment

Choose a reason for hiding this comment

Uh oh!

hamishivi Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implements an `OLMo-core` compatible data loader to load HF datasets from `get_cached_dataset_tulu`. #1208

Implements an `OLMo-core` compatible data loader to load HF datasets from `get_cached_dataset_tulu`. #1208

finbarrtimbers commented Nov 18, 2025 •

edited by cursor bot

Loading