tests: fix bootstrap mock + force-OOM in crash-recovery test#6
Draft
ykhrustalev wants to merge 1 commit into
Draft
tests: fix bootstrap mock + force-OOM in crash-recovery test#6ykhrustalev wants to merge 1 commit into
ykhrustalev wants to merge 1 commit into
Conversation
The two bootstrap tests asserted on ``uv venv`` / ``python3 -m venv``
commands that bootstrap_remote never issued: the catch-all mock returned
rc=0 for ``test -x .lqh-env/bin/python``, so bootstrap hit the
"venv already exists, reusing" idempotency branch and skipped venv
creation entirely. Have the mock signal "venv missing" for that probe
so the creation path runs and the assertion is exercised.
The crash-recovery test seeded the dataset with trivially short content
("Convert file_X.mp4 to mp3"), so the SFT collator padded to the longest
sample in the batch (~30 tokens), not max_seq_length=8192. The advertised
``bs=512 × seq=8192`` config collapsed to ``bs=20 × seq=30`` and ran to
completion on any modern GPU. Replace the OOM mechanism with a
deterministic crash trigger: point ``base_model`` at a nonexistent
HuggingFace identifier so the subprocess fails fast at model-load time.
GPU-size agnostic, faster (~9s vs ~15s), and tests the same plumbing —
subprocess exits non-zero, ``SubprocessManager.get_status`` transitions
to ``failed``, stderr captures the traceback. Renamed to
``test_subprocess_crash_detected`` to reflect the actual coverage; OOM-
specific verification (if needed) belongs in a separate small test.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e53f917 to
d37d396
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The two bootstrap tests asserted on
uv venv/python3 -m venvcommands that bootstrap_remote never issued: the catch-all mock returned rc=0 fortest -x .lqh-env/bin/python, so bootstrap hit the "venv already exists, reusing" idempotency branch and skipped venv creation entirely. Have the mock signal "venv missing" for that probe so the creation path runs and the assertion is exercised.The OOM crash test seeded the dataset with trivially short content ("Convert file_X.mp4 to mp3"), so the SFT collator padded to the longest sample in the batch (~30 tokens), not max_seq_length=8192. The advertised
bs=512 × seq=8192config collapsed tobs=20 × seq=30and ran to completion on any modern GPU. Seed the 20 samples with long content (~60K chars) so tokenization overshoots max_seq_length and truncation fills the seq dim — verified to raise torch.OutOfMemoryError on an A10 (24GB), exercising the crash-detection plumbing the test was meant to cover.