Skip to content

Investigate sourcing in-game text from pret decompilations (Gen 1–5) as authoritative supplement #905

Description

@codemonkey85

Motivation

The current tools/generate-descriptions.cs pipeline pulls from PokeAPI (primary) with Pokémon Showdown as a fallback for newer-gen mechanics and pokemondb.net scrapes as a last-resort cache. This works well — ~99.9% accurate — but PokeAPI's CSVs occasionally carry typos, mis-attributed flavor text, and outright malformed short_effect rows. PR #904 surfaced three such bugs in ability-info.json (Cotton Down, Effect Spore, Queenly Majesty) and was resolved by extending description-overrides.json to support ability-level corrections and per-version-group flavor removal.

Override-driven patching is fine for a steady drip of small bugs, but a more authoritative path exists for older games.

Proposal

The pret organization maintains community decompilations of every Gen 1–5 game:

  • Gen 1: pret/pokered, pret/pokeyellow
  • Gen 2: pret/pokecrystal, pret/pokegold-spaceworld
  • Gen 3: pret/pokeruby, pret/pokeemerald, pret/pokefirered
  • Gen 4: pret/pokediamond, pret/pokeplatinum, pret/pokeheartgold
  • Gen 5: pret/pokeblack, pret/pokeblack2

These are reverse-engineered source code, not ROM rips — they hold all in-game text as plain ASM/C string tables, already canonical, version-tagged, and (relatively) legally cleaner to read from than direct ROM extraction. Each repo has predictable text-table file layouts (e.g. data/text/ability_descriptions.asm in pokeemerald, data/abilities/descriptions.asm-style in pokecrystal, etc.).

Investigate adding a pret-source supplement to the description pipeline, scoped to Gen 1–5 only:

  1. Add --pret <root> flag to tools/generate-descriptions.cs accepting a parent directory that contains the cloned pret repos (or per-repo flags).
  2. For each Gen 1–5 version-group, read the canonical text tables and produce a {abilityId|moveId|itemId: {versionGroupId: text}} map.
  3. Merge into the existing output as an authoritative override for those specific version groups, taking precedence over PokeAPI flavor text but not over manual description-overrides.json entries.
  4. Document the cloned-repo conventions (probably ~/Code/pret/<repo> on macOS, C:\Code\pret\<repo> on Windows; align with the conventions in AGENTS.md).

Gen 6+ is out of scope — no full decompilations exist yet; SwSh/SV would still rely on PokeAPI/Showdown.

Open questions

  • Legal posture. Even text extracted from pret is still Nintendo's IP. PokeAPI/Showdown carry that risk under community norms and explicit licensing. Pulling pret text into our public repo is a tier closer to obvious infringement. Worth a careful read of pret's license stance and any prior art (other Pokémon tooling that pulls from pret) before committing.
  • Coverage. Verify pret repos actually hold the strings we'd want for each of: ability short effects, ability flavor (per-game), move flavor (per-game), item flavor (per-game). Some categories may not be there or may be split across many files.
  • Encoding. Gen 1–2 use custom character tables; Gen 3 uses a different custom encoding; Gen 4/5 are NDS NARC-packed text but pret has it already-decoded. Each gen needs its own reader (though pret usually provides converted plaintext for review).
  • Diff vs PokeAPI. Before committing to integration, do a small audit: clone one pret repo (say pokeemerald) and diff its ability descriptions against PokeAPI's Gen-3 entries. If divergence is < 5%, the maintenance cost likely isn't worth it; if it's higher and consistently in pret's favor, the case is stronger.

Out of scope

  • ROM ripping / extraction from .gba / .nds / .nx files directly.
  • Gen 6+ (no decomps).
  • Replacing PokeAPI/Showdown — this is a supplement, not a substitute.

Background

  • PR chore: sync-repos manifest + regenerate descriptions JSON #904 — added override mechanism to patch upstream data bugs (Cotton Down, Effect Spore, Queenly Majesty) — see the discussion there for the specific kind of bugs pret could prevent.
  • tools/generate-descriptions.cs — current pipeline.
  • tools/data/description-overrides.json — current corrections layer.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions