fix: reject names that cannot be represented as alphanumeric ASCII by Sivva2 · Pull Request #22 · tictactrip/gp-uid

Sivva2 · 2026-05-04T13:48:46Z

Closes #2.

Context

Issue #2 lists generated ids containing characters that should not appear in a gpuid: Cyrillic (c|XXκαβάλα__@sx32g), Greek, Arabic, German eszett (g|XXweißenfe@u30e29), curly apostrophes (c|FRlessabd’@gbq8r), em dashes (g|ITadb6–pxs@srbj4j), emojis (g|XX🚌______@u2dhf7), and a handful of malformed inputs where backslash escape sequences leaked through.

The sanitize() pipeline already maps a wide range of Latin-extended characters to ASCII via replaceChar(), but anything outside that table flows through unchanged and ends up in the final id.

Change

A single regex check at the end of sanitize():

if (!/^[a-z0-9 ]+$/.test(sanitized)) {
  throw new Error(
    `Cannot generate gpuid: name "${str}" contains characters that cannot be represented as alphanumeric ASCII (got "${sanitized}" after sanitization).`,
  );
}

The check runs after replaceChar() and stop-word removal, so anything recoverable as Latin (é, ñ, ø, …) still goes through. Only characters that the existing pipeline cannot normalize trigger the error, and the message reports both the original input and the post-sanitization form to make upstream debugging easy.

Test changes

The existing should return array of gpuid test contained a B\u00fcsum, … entry whose expected id was c|DEb\u00fcs@u1w7c — i.e. the test was pinning a malformed input/output pair (the source uses the literal characters \, u, 0, 0, f, c, not a ü). I removed that entry from the array test and added a dedicated test asserting that this input now throws.

A new Non-alphanumeric inputs describe block adds 11 tests covering each category from the issue:

curly apostrophe (’)
em dash (–)
German eszett (ß)
Cyrillic (Брод)
Greek (Καβάλα)
Arabic (العربية)
emoji (🚌)
name reducing to nothing after sanitization (!!!)
malformed backslash escapes (B\u00fcsum, …)
regression: ASCII apostrophe (Pont-de-l'Arche) still works
regression: digits (73 Rue Victor Hugo, …) still work

yarn test reports 15/15 passing with 100% coverage, yarn lint is clean, yarn build is clean.

Note on backwards compatibility

This is a behavioural breaking change: inputs that previously produced malformed ids now throw. The issue is filed under the v2.0.0 milestone, but the package is currently at v2.1.1, so I'll let maintainers decide whether to ship this in a v3 or as part of v2.x — happy to adjust whichever way.

fix: reject names that cannot be represented as alphanumeric ASCII

a340d13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: reject names that cannot be represented as alphanumeric ASCII#22

fix: reject names that cannot be represented as alphanumeric ASCII#22
Sivva2 wants to merge 1 commit into
tictactrip:masterfrom
Sivva2:fix/reject-non-alphanum-ids

Sivva2 commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Sivva2 commented May 4, 2026

Context

Change

Test changes

Note on backwards compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant