Add non-record submission: 12L 24min Vocab1792 FlashMuon LinearScaleInit XSA5LastGated RReLU2 Int6AWQ MixedBits#1495
Conversation
…Muon LinearScaleInit XSA5LastGated RReLU2 Int6AWQ MixedBits
Community Review — Add non-record submission: 12L 24min Vocab1792 FlashMuon LinearScaleInit XSA5LastGated RReLU2 Int6AWQ MixedBitsCompliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache Both train_gpt.py files in this PR (Apr 7 and Apr 9 entries by shram86) are clean. N-gram family bug check — CLEAR Pre-Quant TTT / val_tokens optimizer step check — CLEAR
All optimizer steps ( Scored-region SLOT check — CLEAR Architecture Files reviewed: Both Verdict: LOOKS CLEAN. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the audit — this looks like a clean pure-neural submission. Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, cross-checked against deterministic AST classifier. If this review misread your code, please call it out so I can re-audit manually. |
This PR adds a non-record submission under:
records/track_non_record_16mb/2026-04-09_12L_24min_Vocab1792_FlashMuon_LinearScaleInit_XSA5LastGated_RReLU2_Int6AWQMixedBits
Final result:
Submission summary:
Mixed-bit quantization details:
Notes: