Skip to content

Make DEVICE_BATCH_SIZE and TOTAL_BATCH_SIZE configurable via env vars#402

Open
abhayapattanaik wants to merge 1 commit intokarpathy:masterfrom
abhayapattanaik:env-configurable-batch-sizes
Open

Make DEVICE_BATCH_SIZE and TOTAL_BATCH_SIZE configurable via env vars#402
abhayapattanaik wants to merge 1 commit intokarpathy:masterfrom
abhayapattanaik:env-configurable-batch-sizes

Conversation

@abhayapattanaik
Copy link
Copy Markdown

Summary

  • DEVICE_BATCH_SIZE and TOTAL_BATCH_SIZE in train.py can now be overridden via environment variables
  • Defaults are unchanged — existing behavior is identical
  • Two-line change, no new dependencies

Motivation

These values are hardcoded for H100 80GB GPUs. Cloud GPUs with less VRAM (A10G 24GB, T4 16GB) OOM with the defaults. Tools that run autoresearch on various cloud GPUs currently have to sed-patch these values before training. Environment variables are cleaner:

DEVICE_BATCH_SIZE=32 TOTAL_BATCH_SIZE=131072 uv run train.py

Test plan

  • Verified on Mac (MPS) with DEVICE_BATCH_SIZE=8 — gradient accumulation steps changed from 2 to 4 as expected, training completed successfully
  • Verified default behavior unchanged (no env vars set — same output as before)

These values are currently hardcoded for H100 80GB GPUs. Cloud GPUs with
less VRAM (A10G 24GB, T4 16GB) OOM with the defaults.

This change allows overriding via environment variables while keeping the
same defaults, so existing behavior is unchanged:

    DEVICE_BATCH_SIZE=32 TOTAL_BATCH_SIZE=131072 uv run train.py

Motivation: tools like autoresearch-anywhere that run autoresearch on
various cloud GPUs currently have to sed-patch these values before
training. Environment variables are cleaner and more composable.
abhayapattanaik added a commit to abhayapattanaik/autoresearch-anycloud that referenced this pull request Mar 24, 2026
Link to karpathy/autoresearch#402 in README GPU compatibility note.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Agolid pushed a commit to Agolid/autoresearch that referenced this pull request Mar 27, 2026
…v vars

- Allow override via environment variables
- Defaults unchanged — existing behavior identical
- Useful for cloud GPUs with less VRAM (A10G, T4, etc.)
- Usage: DEVICE_BATCH_SIZE=32 TOTAL_BATCH_SIZE=131072 uv run train.py

Fixes karpathy#402
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant