| Approach | R@20 | R@50 | mR@20 | Combined | Key Insight |
|---|---|---|---|---|---|
| Baseline | 0.124 | 0.175 | 0.017 | 0.123 | Most common predicate for all pairs |
| V1 Attention thresholding | 0.161 | 0.242 | 0.028 | 0.167 | Self-attention weights rank related pairs higher (2.36x) |
| V2 Hidden concat + LogReg | 0.315 | 0.385 | 0.104 | 0.301 | Q/K as subject/object; concatenation preserves info |
| V3 Multi-layer MLP | 0.431 | 0.575 | 0.154 | 0.433 | All decoder layers + attention features in 3680-dim |
| V4 Smoothing + Connectivity | 0.449 | 0.560 | 0.175 | 0.439 | Detection-quality label smoothing + connectivity aux |
- 5000 images total (4000 train / 1000 test)
- 34 predicate classes (merged from 50), 24808 train relationships
- Raw images + JSON annotations (contestants run DETR themselves)
- Phase 1: Design — insight chain, metric, difficulty
- Phase 2: Dataset — 1734 images from Visual Genome
- Phase 2.5: Signal validation — 1.63x attention ratio confirmed
- Phase 3: Evaluation script — R@20/R@50/mR@20
- Phase 4: Baseline — 12.3% combined
- Phase 5: Reference solutions v1-v4 (16.7% → 30.1% → 43.3% → 43.9%)
- Phase 6: End-to-end verification — ALL PASSED
- Phase 7: Analysis + Kaggle packaging + notebooks
| File | Description |
|---|---|
| data_generation.py | Full data pipeline (VG download → DETR extraction) |
| signal_validation.py | Diagnostic confirming attention encodes relations |
| evaluation.py | R@20/R@50/mR@20 scoring |
| baseline.py | Frequency baseline (12.3%) |
| solution_v1.py | Attention thresholding (16.7%) |
| solution_v2.py | Hidden concat + LogReg (30.1%) |
| solution_v3.py | Multi-layer MLP (43.3%) |
| solution_v4.py | Smoothing + connectivity (43.9%) |
| analysis.md | Full post-mortem |
| baseline_notebook.ipynb | Kaggle baseline notebook |
| reference_solution.ipynb | Reference solution notebook |
| kaggle/ | Full Kaggle package (149.4 MB) |