Non-record JEPA-style regression transformer submission: VRS (Void Rescue System) by ikermoel · Pull Request #1513 · openai/parameter-golf

ikermoel · 2026-04-09T21:39:02Z

Summary

This PR adds a non-record 16MB submission for VRS (Void Rescue System), a JEPA-style regression transformer with a small auxiliary rescue decoder.

The main contribution is not SOTA BPB. The contribution is a new research direction for regression-based transformers:

regression latents can contain useful token information before they are directly decodable, and a tiny jointly learned decoder can correct part of that geometric misalignment.

Core Research Idea

VRS is built around a specific hypothesis about regression decoding.

The Navigator is trained to predict the next token as a continuous embedding vector using MSE, instead of predicting vocabulary logits directly. In this setting, the raw latent can already carry token information but still decode poorly under the shared embedding geometry. In the paper, these ambiguous regions are called voids.

VRS adds a small Rescuer module that maps:

v_void -> v_rescued

The system is intentionally split into two roles:

Navigator: learns contextual geometry
Rescuer: learns lexical / embedding-space correction

This submission is meant as a concrete JEPA / regression contribution under Parameter Golf constraints, not as a leaderboard-optimized architecture.

Why this may be interesting for Parameter Golf

the challenge README explicitly asks to see JEPA submissions
the method stays under the 16MB artifact cap
it is stable across 3 separate 10-minute 8xH100 runs
it improves over the raw internal regression decode path
it also improves over standalone regression-only baselines trained separately under the same budget

Included Files

records/track_non_record_16mb/2026-04-09_VRS_VoidRescueSystem_JEPARegression/

Contents:

README.md
submission.json
train_gpt.py
train.log
train_seed42.log
train_seed1337.log
results.tsv
vrs-spec.txt

Metrics

Best included run:

val_bpb = 1.8658
total artifact bytes = 15,980,840

3-seed mean:

val_bpb = 1.8667
raw-path val_bpb_A = 1.9436
peak nn_acc ≈ 0.5051

Regression-only baselines (separate runs, no VRS):

val_bpb = 2.0941 - 2.1301

So the gain is not just an internal probe effect; the rescue module also improves over standalone regression training.

Links

Paper / main repository: ikermoel/VRS-Void-Rescue-System
Zenodo record: 10.5281/zenodo.19477224

Note on track choice

This is submitted as a non-record research contribution because the value is the architectural idea and the empirical evidence around regression decoding, not leaderboard SOTA.

MatoTeziTanka · 2026-04-12T05:01:53Z

Community Review — Non-record JEPA-style regression transformer: VRS (Void Rescue System)

Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache

PR #1513 ("VRS_VoidRescueSystem_JEPARegression") implements a JEPA-style architecture (JEPAVRS) with a causal transformer "Navigator" (Model A) and a small MLP "Rescuer" (Model B). The submission is clean on all four compliance checks. ## N-gram / hash family bug No hash tables, prime arrays, context hashing, or XOR-based key lookups of any kind. input_ids[..., 1:] ^ input_ids[..., :-1] is also absent. The submission does not touch n-gram machinery at all. ## Pre-Quant TTT No test-time training. val_tokens is consumed exclusively inside eval_val() (lines 193–246), which is wrapped entirely in torch.inference_mode() with model.eval(). No optimizer step, no backward pass, and no gradient computation touches val_tokens at any point. ## Score-first-per-chunk TTT (PR #1413 pattern) Not present — but this is expected for PURE_NEURAL_CLEAN; absence is correct. ## Scored-region SLOT Not present. The training loop (lines 832–904) uses train_loader (train split only) for gradient computation. Validation runs as a read-only diagnostic branch (if should_validate) that returns before any optimizer step. There is no masking or optimizing of scored regions. ## Training objective Lines 611–617: training loss is pure MSE in embedding space — loss_A = F.mse_loss(v_void, target_emb) and loss_B = F.mse_loss(v_rescued, target_emb) — with target_emb = self.tok_emb(target_ids).detach(). Cross-entropy and logits are only computed in the else (eval) branch (lines 619–631) under model.eval() / torch.inference_mode(). No CE is involved in gradient flow. ## Summary The model trains entirely on the train split using pure MSE regression toward detached token embeddings. Val tokens are only read under inference_mode for reporting BPB. No auxiliary lookup structures, no...

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the audit — this looks like a clean pure-neural submission.

Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, cross-checked against deterministic AST classifier. If this review misread your code, please call it out so I can re-audit manually.

Add VRS non-record JEPA submission

7a5108e

ikermoel changed the title ~~Non-record JEPA submission: VRS (Void Rescue System)~~ Non-record regression submission: VRS (Void Rescue System) Apr 9, 2026

ikermoel changed the title ~~Non-record regression submission: VRS (Void Rescue System)~~ Non-record JEPA-style regression transformer submission: VRS (Void Rescue System) Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record JEPA-style regression transformer submission: VRS (Void Rescue System)#1513

Non-record JEPA-style regression transformer submission: VRS (Void Rescue System)#1513
ikermoel wants to merge 1 commit intoopenai:mainfrom
ikermoel:codex/vrs-nonrecord-submission

ikermoel commented Apr 9, 2026 •

edited

Loading

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ikermoel commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core Research Idea

Why this may be interesting for Parameter Golf

Included Files

Metrics

Links

Note on track choice

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Community Review — Non-record JEPA-style regression transformer: VRS (Void Rescue System)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ikermoel commented Apr 9, 2026 •

edited

Loading