Skip to content

tests: fix bootstrap mock + force-OOM in crash-recovery test#6

Draft
ykhrustalev wants to merge 1 commit into
mainfrom
claude/fix-bootstrap-and-oom-tests
Draft

tests: fix bootstrap mock + force-OOM in crash-recovery test#6
ykhrustalev wants to merge 1 commit into
mainfrom
claude/fix-bootstrap-and-oom-tests

Conversation

@ykhrustalev

Copy link
Copy Markdown
Member

The two bootstrap tests asserted on uv venv / python3 -m venv commands that bootstrap_remote never issued: the catch-all mock returned rc=0 for test -x .lqh-env/bin/python, so bootstrap hit the "venv already exists, reusing" idempotency branch and skipped venv creation entirely. Have the mock signal "venv missing" for that probe so the creation path runs and the assertion is exercised.

The OOM crash test seeded the dataset with trivially short content ("Convert file_X.mp4 to mp3"), so the SFT collator padded to the longest sample in the batch (~30 tokens), not max_seq_length=8192. The advertised bs=512 × seq=8192 config collapsed to bs=20 × seq=30 and ran to completion on any modern GPU. Seed the 20 samples with long content (~60K chars) so tokenization overshoots max_seq_length and truncation fills the seq dim — verified to raise torch.OutOfMemoryError on an A10 (24GB), exercising the crash-detection plumbing the test was meant to cover.

@ykhrustalev ykhrustalev marked this pull request as draft May 15, 2026 23:57
The two bootstrap tests asserted on ``uv venv`` / ``python3 -m venv``
commands that bootstrap_remote never issued: the catch-all mock returned
rc=0 for ``test -x .lqh-env/bin/python``, so bootstrap hit the
"venv already exists, reusing" idempotency branch and skipped venv
creation entirely. Have the mock signal "venv missing" for that probe
so the creation path runs and the assertion is exercised.

The crash-recovery test seeded the dataset with trivially short content
("Convert file_X.mp4 to mp3"), so the SFT collator padded to the longest
sample in the batch (~30 tokens), not max_seq_length=8192. The advertised
``bs=512 × seq=8192`` config collapsed to ``bs=20 × seq=30`` and ran to
completion on any modern GPU. Replace the OOM mechanism with a
deterministic crash trigger: point ``base_model`` at a nonexistent
HuggingFace identifier so the subprocess fails fast at model-load time.
GPU-size agnostic, faster (~9s vs ~15s), and tests the same plumbing —
subprocess exits non-zero, ``SubprocessManager.get_status`` transitions
to ``failed``, stderr captures the traceback. Renamed to
``test_subprocess_crash_detected`` to reflect the actual coverage; OOM-
specific verification (if needed) belongs in a separate small test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ykhrustalev ykhrustalev force-pushed the claude/fix-bootstrap-and-oom-tests branch from e53f917 to d37d396 Compare May 16, 2026 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant