Releases: dwgx/vrchat-il2cpp-re
v2.3 — Quality-Audited Final (90.7% precision-first)
v2.3 — Full quality audit applied, RVA output, unified docs
This release finalizes the June 5 build with a precision-first pass: a complete
122-batch quality audit removed low-confidence predictions, and all outputs now use
RVA offsets (Il2CppDumper/IDA/Ghidra-compatible).
Coverage (June 5 build, post audit)
| Metric | Value |
|---|---|
| Methods semantic | 478,923 / 528,135 (90.7%) |
| Hash fallback remaining | 49,212 (9.3%) |
| Raw-obfuscated methods | 0 |
| Obfuscated classes named | 7,813 / 11,503 (67.9%) |
| Fields typed | 2,712 / 2,870 |
| cross_version mappings | 39,623 |
Coverage dropped 94.1% → 90.7% by design — the audit removed ~13,777
low-confidence predictions (mostly<>cclosures/lambdas and generic methods
where any name was a guess) and fixed 137. Quality over quantity.
What's new
- Full 122-batch quality audit applied via
tools/apply_audit_results.py - RVA output in
deobfuscated_dump.csandoutput/src/(+ field types/offsets) - Single source of truth:
tools/compute_final_stats.py→output/coverage_stats.json,
auto-refreshed every pipeline run (Stage 3c) - Docs unified to canonical numbers (README, WORKFLOW, dashboard, coverage report)
- WORKFLOW handoff section with "start here" steps, ROI-ranked next work, dead-ends
- Repo de-bloat: removed tracked data backups + IDA logs from version control
For users
- IDA: run
output/ida_apply_names.py(auto-detects imagebase, 226,911 renames) - Ghidra/other: use
output/name_mapping.json deobfuscated_dump.csmethods show RVA;imagebase + RVA = address- Metadata decryption:
tools/decrypt_metadata.py(see README)
Next highest-value work
Runtime Frida field extraction — only 2,870 of ~70K+ fields captured by the minidump.
See WORKFLOW.md §6 "接手者:从这里开始".
v2.1 — 94.1% Coverage + New Build Support
VRChat IL2CPP Deobfuscation Pipeline v2.1
June 5 build analysis — 94.1% semantic method naming (up from 90.8% in v2.0).
Coverage
| Metric | Count |
|---|---|
| Classes | 88,400 extracted |
| Methods named | 496,886 / 528,135 (94.1%) |
| Hash remaining | 31,249 (5.9%) |
| cross_version entries | 53,292 |
| Semantic class names | 7,813 / 11,503 (67.9%) |
| Fields | 2,870 |
What's New Since v2.0
+3.3% coverage improvement (90.8% → 94.1%) via three new strategies:
- Sibling-context inference: 333 Codex batches predicting method names from class context (named siblings, fields, parent class) — +13,565 names after quality filtering
- RVA propagation v2 + cascade: relaxed per-class dedup + multi-pass cascade through shared function pointer groups — +15,527 total RVA-based names
- Neighbor-class batches: 6,242 single-class batches for remaining hash methods (in progress, 66% complete)
- Quality audit pipeline: automated review of all LLM predictions, flagging generic/wrong names for removal
New build support (June 6+):
reverse_struct_layout.pynow auto-detects heap VA range — no longer crashes on new builds with different memory layouts. Falls back through: default range → auto-detect from string hits → full VA scanextract_precise_dump.pysupports--offsetsflag to load struct offsets fromreverse_struct_layout.pyoutput
Usage for New Builds
# Step 1: Discover struct offsets for your build
python tools/reverse_struct_layout.py --dump YOUR_DUMP.dmp --auto-heap
# Step 2: Extract classes using discovered offsets
python tools/extract_precise_dump.py YOUR_DUMP.dmp --auto-heap \
--offsets output/struct_layout_report.json
# Step 3: Run full pipeline
python tools/run_full_pipeline.pyTools Added
tools/rva_propagate_v2.py— relaxed RVA filter (1 per class for common names)tools/rva_cascade.py— cascade names through shared-RVA hash groupstools/sibling_context_batches.py— build LLM batches from class contexttools/merge_sibling_preds.py— quality-filtered merge with blackliststools/build_audit_batches.py— build quality audit batches for reviewtools/codex_worker.py— supports mega/sibling/neighbor/audit modes
v2.0 — June 5 Build: 90.8% Semantic Coverage
VRChat IL2CPP Deobfuscation Pipeline v2.0
June 5 build analysis with 90.8% semantic method naming coverage.
Key Numbers
- 88,400 classes extracted from IL2CPP memory dump
- 528,135 methods analyzed, 479,421 semantically named (90.8%)
- 48,714 remaining hash methods (m_XXX fallback names)
- 38,386 cross-version method name entries
- 7,813 semantic class names (67.9% of obfuscated classes)
What's New Since v1.0
- Beebyte struct re-discovery: June 5 build has shuffled Il2CppClass offsets — fully reverse-engineered
- 6-stage pipeline runs in ~25 seconds: vocab merge → deobfuscate (11 phases) → xref → source tree → IDA scripts → field types
- RVA-based name propagation: zero-hallucination naming — if hash method shares function pointer with named method, assign the name (+15,500 methods)
- Codex batch processing: 262 mega-batches with IDA Hex-Rays pseudocode (+1,600 validated predictions)
- Sibling-context inference: LLM naming from class context (named methods, fields, parent class)
- IDA integration: auto-generated
ida_apply_names.pyscript for renaming in IDA Pro
Pipeline Tools
run_full_pipeline.py— full 6-stage deobfuscationrva_propagate_names.py/rva_propagate_v2.py— RVA-based name propagationrva_cascade.py— cascade names through shared-RVA groupscodex_worker.py— parallel Codex CLI worker for batch namingextract_precise_dump.py— IL2CPP class extraction from memory dumpsreverse_struct_layout.py— Beebyte struct offset discovery
v1.0 — IL2CPP Memory-Dump Toolkit
IL2CPP Memory-Dump Toolkit — v1.0
A general-purpose offline analysis pipeline for IL2CPP-compiled Unity application memory dumps.
Highlights
- BSOD-based kernel-dump path —
kerneldump_to_minidump.pywalks Volatility 3 to extract a single process out of a Windows complete memory dump (MEMORY.DMP) into a self-contained MDMP minidump. Useful where user-mode dump APIs are blocked. extract_precise_dump.py --auto-heap— scans every user-mode memory range in the MDMP for theIl2CppClassself-reference signature, recovering classes regardless of where IL2CPP allocated them.extract_field_types_from_dump.py— walks each class's FieldInfo table, resolving the recursiveIl2CppTypechain (primitives, CLASS, VALUETYPE, SZARRAY, GENERICINST, PTR, BYREF, VAR, MVAR, ARRAY) into legible(name, type, offset)rows.cross_version_class_map.py— bridges two dumps taken across a build update by hashing the first 128 bytes of every method body. Falls back to a(namespace, classname, field_count, method_name_signature)fingerprint for compiler-generated closures whose bodies are too short to hash reliably. Lets a previously-built vocabulary survive a Beebyte obf-string re-seed.merge_field_types.py— VA-first matching (works on same-dump pairs) with full-name fallback (works across re-seeds).
Reference coverage
Achieved on one representative dump after migrating an earlier-baseline vocabulary forward via cross-version class mapping:
| Item | Count | Coverage |
|---|---|---|
| Classes | 104,797 | obf 0% · semantic 89.96% |
| Methods | 544,667 | semantic 99.99% |
| Fields | 235,947 | renamed 87.57% |
| Field types | 214,888 | 91.07% typed |
Asset
il2cpp-toolkit-v1.0-source.zip— full source tree (3,040 files, 27 MB) including all Python tools, the deobfuscated dump, the merged vocabulary, the cross-version class map, and the C# source-tree-style output.
Scope and DMCA
This repository ships no third-party game executables, libraries, or assets. Generated artifacts contain class/method/field names and layout offsets observed in memory by the repository owner running their own legitimately-acquired copy on their own hardware.
For takedown requests see NOTICE.md. 7-day response window.
Requirements
```
pip install volatility3 frida psutil
```
Tested on Windows 11 (build 26200), Python 3.12, Volatility 3 2.27, Frida 17.7.