fix: reject names that cannot be represented as alphanumeric ASCII#22
Open
Sivva2 wants to merge 1 commit into
Open
fix: reject names that cannot be represented as alphanumeric ASCII#22Sivva2 wants to merge 1 commit into
Sivva2 wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2.
Context
Issue #2 lists generated ids containing characters that should not appear in a gpuid: Cyrillic (
c|XXκαβάλα__@sx32g), Greek, Arabic, German eszett (g|XXweißenfe@u30e29), curly apostrophes (c|FRlessabd’@gbq8r), em dashes (g|ITadb6–pxs@srbj4j), emojis (g|XX🚌______@u2dhf7), and a handful of malformed inputs where backslash escape sequences leaked through.The
sanitize()pipeline already maps a wide range of Latin-extended characters to ASCII viareplaceChar(), but anything outside that table flows through unchanged and ends up in the final id.Change
A single regex check at the end of
sanitize():The check runs after
replaceChar()and stop-word removal, so anything recoverable as Latin (é, ñ, ø, …) still goes through. Only characters that the existing pipeline cannot normalize trigger the error, and the message reports both the original input and the post-sanitization form to make upstream debugging easy.Test changes
The existing
should return array of gpuidtest contained aB\u00fcsum, …entry whose expected id wasc|DEb\u00fcs@u1w7c— i.e. the test was pinning a malformed input/output pair (the source uses the literal characters\,u,0,0,f,c, not aü). I removed that entry from the array test and added a dedicated test asserting that this input now throws.A new
Non-alphanumeric inputsdescribe block adds 11 tests covering each category from the issue:’)–)ß)Брод)Καβάλα)العربية)🚌)!!!)B\u00fcsum, …)Pont-de-l'Arche) still works73 Rue Victor Hugo, …) still workyarn testreports 15/15 passing with 100% coverage,yarn lintis clean,yarn buildis clean.Note on backwards compatibility
This is a behavioural breaking change: inputs that previously produced malformed ids now throw. The issue is filed under the v2.0.0 milestone, but the package is currently at v2.1.1, so I'll let maintainers decide whether to ship this in a v3 or as part of v2.x — happy to adjust whichever way.