Non-record JEPA-style regression transformer submission: VRS (Void Rescue System)#1513
Non-record JEPA-style regression transformer submission: VRS (Void Rescue System)#1513ikermoel wants to merge 1 commit intoopenai:mainfrom
Conversation
Community Review — Non-record JEPA-style regression transformer: VRS (Void Rescue System)Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache PR #1513 ("VRS_VoidRescueSystem_JEPARegression") implements a JEPA-style architecture (JEPAVRS) with a causal transformer "Navigator" (Model A) and a small MLP "Rescuer" (Model B). The submission is clean on all four compliance checks. ## N-gram / hash family bug No hash tables, prime arrays, context hashing, or XOR-based key lookups of any kind. Verdict: LOOKS CLEAN. Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the audit — this looks like a clean pure-neural submission. Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, cross-checked against deterministic AST classifier. If this review misread your code, please call it out so I can re-audit manually. |
Summary
This PR adds a non-record 16MB submission for VRS (Void Rescue System), a JEPA-style regression transformer with a small auxiliary rescue decoder.
The main contribution is not SOTA BPB. The contribution is a new research direction for regression-based transformers:
Core Research Idea
VRS is built around a specific hypothesis about regression decoding.
The Navigator is trained to predict the next token as a continuous embedding vector using MSE, instead of predicting vocabulary logits directly. In this setting, the raw latent can already carry token information but still decode poorly under the shared embedding geometry. In the paper, these ambiguous regions are called voids.
VRS adds a small Rescuer module that maps:
v_void -> v_rescuedThe system is intentionally split into two roles:
This submission is meant as a concrete JEPA / regression contribution under Parameter Golf constraints, not as a leaderboard-optimized architecture.
Why this may be interesting for Parameter Golf
Included Files
records/track_non_record_16mb/2026-04-09_VRS_VoidRescueSystem_JEPARegression/Contents:
README.mdsubmission.jsontrain_gpt.pytrain.logtrain_seed42.logtrain_seed1337.logresults.tsvvrs-spec.txtMetrics
Best included run:
val_bpb = 1.865815,980,8403-seed mean:
val_bpb = 1.8667val_bpb_A = 1.9436nn_acc ≈ 0.5051Regression-only baselines (separate runs, no VRS):
val_bpb = 2.0941 - 2.1301So the gain is not just an internal probe effect; the rescue module also improves over standalone regression training.
Links
Note on track choice
This is submitted as a non-record research contribution because the value is the architectural idea and the empirical evidence around regression decoding, not leaderboard SOTA.