Skip to content

Releases: dwgx/vrchat-il2cpp-re

v2.3 — Quality-Audited Final (90.7% precision-first)

07 Jun 09:08

Choose a tag to compare

v2.3 — Full quality audit applied, RVA output, unified docs

This release finalizes the June 5 build with a precision-first pass: a complete
122-batch quality audit removed low-confidence predictions, and all outputs now use
RVA offsets (Il2CppDumper/IDA/Ghidra-compatible).

Coverage (June 5 build, post audit)

Metric Value
Methods semantic 478,923 / 528,135 (90.7%)
Hash fallback remaining 49,212 (9.3%)
Raw-obfuscated methods 0
Obfuscated classes named 7,813 / 11,503 (67.9%)
Fields typed 2,712 / 2,870
cross_version mappings 39,623

Coverage dropped 94.1% → 90.7% by design — the audit removed ~13,777
low-confidence predictions (mostly <>c closures/lambdas and generic methods
where any name was a guess) and fixed 137. Quality over quantity.

What's new

  • Full 122-batch quality audit applied via tools/apply_audit_results.py
  • RVA output in deobfuscated_dump.cs and output/src/ (+ field types/offsets)
  • Single source of truth: tools/compute_final_stats.pyoutput/coverage_stats.json,
    auto-refreshed every pipeline run (Stage 3c)
  • Docs unified to canonical numbers (README, WORKFLOW, dashboard, coverage report)
  • WORKFLOW handoff section with "start here" steps, ROI-ranked next work, dead-ends
  • Repo de-bloat: removed tracked data backups + IDA logs from version control

For users

  • IDA: run output/ida_apply_names.py (auto-detects imagebase, 226,911 renames)
  • Ghidra/other: use output/name_mapping.json
  • deobfuscated_dump.cs methods show RVA; imagebase + RVA = address
  • Metadata decryption: tools/decrypt_metadata.py (see README)

Next highest-value work

Runtime Frida field extraction — only 2,870 of ~70K+ fields captured by the minidump.
See WORKFLOW.md §6 "接手者:从这里开始".

v2.1 — 94.1% Coverage + New Build Support

07 Jun 03:37

Choose a tag to compare

VRChat IL2CPP Deobfuscation Pipeline v2.1

June 5 build analysis — 94.1% semantic method naming (up from 90.8% in v2.0).

Coverage

Metric Count
Classes 88,400 extracted
Methods named 496,886 / 528,135 (94.1%)
Hash remaining 31,249 (5.9%)
cross_version entries 53,292
Semantic class names 7,813 / 11,503 (67.9%)
Fields 2,870

What's New Since v2.0

+3.3% coverage improvement (90.8% → 94.1%) via three new strategies:

  • Sibling-context inference: 333 Codex batches predicting method names from class context (named siblings, fields, parent class) — +13,565 names after quality filtering
  • RVA propagation v2 + cascade: relaxed per-class dedup + multi-pass cascade through shared function pointer groups — +15,527 total RVA-based names
  • Neighbor-class batches: 6,242 single-class batches for remaining hash methods (in progress, 66% complete)
  • Quality audit pipeline: automated review of all LLM predictions, flagging generic/wrong names for removal

New build support (June 6+):

  • reverse_struct_layout.py now auto-detects heap VA range — no longer crashes on new builds with different memory layouts. Falls back through: default range → auto-detect from string hits → full VA scan
  • extract_precise_dump.py supports --offsets flag to load struct offsets from reverse_struct_layout.py output

Usage for New Builds

# Step 1: Discover struct offsets for your build
python tools/reverse_struct_layout.py --dump YOUR_DUMP.dmp --auto-heap

# Step 2: Extract classes using discovered offsets
python tools/extract_precise_dump.py YOUR_DUMP.dmp --auto-heap \
    --offsets output/struct_layout_report.json

# Step 3: Run full pipeline
python tools/run_full_pipeline.py

Tools Added

  • tools/rva_propagate_v2.py — relaxed RVA filter (1 per class for common names)
  • tools/rva_cascade.py — cascade names through shared-RVA hash groups
  • tools/sibling_context_batches.py — build LLM batches from class context
  • tools/merge_sibling_preds.py — quality-filtered merge with blacklists
  • tools/build_audit_batches.py — build quality audit batches for review
  • tools/codex_worker.py — supports mega/sibling/neighbor/audit modes

v2.0 — June 5 Build: 90.8% Semantic Coverage

06 Jun 12:35

Choose a tag to compare

VRChat IL2CPP Deobfuscation Pipeline v2.0

June 5 build analysis with 90.8% semantic method naming coverage.

Key Numbers

  • 88,400 classes extracted from IL2CPP memory dump
  • 528,135 methods analyzed, 479,421 semantically named (90.8%)
  • 48,714 remaining hash methods (m_XXX fallback names)
  • 38,386 cross-version method name entries
  • 7,813 semantic class names (67.9% of obfuscated classes)

What's New Since v1.0

  • Beebyte struct re-discovery: June 5 build has shuffled Il2CppClass offsets — fully reverse-engineered
  • 6-stage pipeline runs in ~25 seconds: vocab merge → deobfuscate (11 phases) → xref → source tree → IDA scripts → field types
  • RVA-based name propagation: zero-hallucination naming — if hash method shares function pointer with named method, assign the name (+15,500 methods)
  • Codex batch processing: 262 mega-batches with IDA Hex-Rays pseudocode (+1,600 validated predictions)
  • Sibling-context inference: LLM naming from class context (named methods, fields, parent class)
  • IDA integration: auto-generated ida_apply_names.py script for renaming in IDA Pro

Pipeline Tools

  • run_full_pipeline.py — full 6-stage deobfuscation
  • rva_propagate_names.py / rva_propagate_v2.py — RVA-based name propagation
  • rva_cascade.py — cascade names through shared-RVA groups
  • codex_worker.py — parallel Codex CLI worker for batch naming
  • extract_precise_dump.py — IL2CPP class extraction from memory dumps
  • reverse_struct_layout.py — Beebyte struct offset discovery

v1.0 — IL2CPP Memory-Dump Toolkit

25 Apr 17:26

Choose a tag to compare

IL2CPP Memory-Dump Toolkit — v1.0

A general-purpose offline analysis pipeline for IL2CPP-compiled Unity application memory dumps.

Highlights

  • BSOD-based kernel-dump pathkerneldump_to_minidump.py walks Volatility 3 to extract a single process out of a Windows complete memory dump (MEMORY.DMP) into a self-contained MDMP minidump. Useful where user-mode dump APIs are blocked.
  • extract_precise_dump.py --auto-heap — scans every user-mode memory range in the MDMP for the Il2CppClass self-reference signature, recovering classes regardless of where IL2CPP allocated them.
  • extract_field_types_from_dump.py — walks each class's FieldInfo table, resolving the recursive Il2CppType chain (primitives, CLASS, VALUETYPE, SZARRAY, GENERICINST, PTR, BYREF, VAR, MVAR, ARRAY) into legible (name, type, offset) rows.
  • cross_version_class_map.py — bridges two dumps taken across a build update by hashing the first 128 bytes of every method body. Falls back to a (namespace, classname, field_count, method_name_signature) fingerprint for compiler-generated closures whose bodies are too short to hash reliably. Lets a previously-built vocabulary survive a Beebyte obf-string re-seed.
  • merge_field_types.py — VA-first matching (works on same-dump pairs) with full-name fallback (works across re-seeds).

Reference coverage

Achieved on one representative dump after migrating an earlier-baseline vocabulary forward via cross-version class mapping:

Item Count Coverage
Classes 104,797 obf 0% · semantic 89.96%
Methods 544,667 semantic 99.99%
Fields 235,947 renamed 87.57%
Field types 214,888 91.07% typed

Asset

  • il2cpp-toolkit-v1.0-source.zip — full source tree (3,040 files, 27 MB) including all Python tools, the deobfuscated dump, the merged vocabulary, the cross-version class map, and the C# source-tree-style output.

Scope and DMCA

This repository ships no third-party game executables, libraries, or assets. Generated artifacts contain class/method/field names and layout offsets observed in memory by the repository owner running their own legitimately-acquired copy on their own hardware.

For takedown requests see NOTICE.md. 7-day response window.

Requirements

```
pip install volatility3 frida psutil
```

Tested on Windows 11 (build 26200), Python 3.12, Volatility 3 2.27, Frida 17.7.