Skip to content

Custom dataloader#162

Draft
voegtlel wants to merge 39 commits intodevelopfrom
feature/custom_dataloader
Draft

Custom dataloader#162
voegtlel wants to merge 39 commits intodevelopfrom
feature/custom_dataloader

Conversation

@voegtlel
Copy link
Copy Markdown
Collaborator

Implementing a custom dataloader instead of torch.data.utils.DataLoader:

  • Better customizability required for python nogil
  • Fixing broken fork shutdown logic
  • Improving savability (instead of regular checkpoints with rewind, can use the actual checkpoint), calls to the worker without a crutch command thread

Comment thread src/megatron/energon/wrappers/group_batch_dataset.py Outdated
Copy link
Copy Markdown
Collaborator

@philipp-fischer philipp-fischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Check if we can add a check to detect users utilizing global RNG in taskencoder.

Comment thread src/megatron/energon/task_encoder/base.py Outdated
def _pin_memory(self, sample: TSample) -> TSample:
return PinMemoryMixin.sample_pin_memory(sample, self._device)

def __call__(self, sample: Future[TSample]) -> Future[TSample]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this method is never called, because instead _pin_memory is directly being accessed. What's the idea behind the __call__ pattern?

Copy link
Copy Markdown
Collaborator Author

@voegtlel voegtlel Sep 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 229 of dataloader.py:

self._pin_memory(
    CallableFuture(functools.partial(self.restore_sample, sample_key))
)

and line 317:

self._prefetching_samples[worker_idx].append(
    self._pin_memory(worker.prefetch_next())
)

Comment thread src/megatron/energon/loader.py Outdated
Comment thread src/megatron/energon/loader.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants