-
Notifications
You must be signed in to change notification settings - Fork 0
Add unicode support #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c4aff6f to
091cee3
Compare
091cee3 to
d02924f
Compare
Adds UAX #44 identifier checking, and NFC quick check support, along with a few helpers like `isAscii` and `unescapeUnicode`.
d02924f to
2e0ffdb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces comprehensive Unicode support to the qtil library, providing CodeQL predicates for Unicode property checking, UAX #44 identifier validation, and NFC normalization checking. The implementation includes raw Unicode data generation, string utilities for Unicode escape handling, and efficient APIs for common Unicode operations.
- Adds extensible predicates for Unicode properties (enumeration, boolean, and numeric)
- Implements UAX #44 identifier validation and NFC normalization quick checking
- Provides utilities for Unicode escape sequences and ASCII validation
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/qtil/strings/Unicode.qll | Core Unicode module with extensible predicates and helper functions |
| src/qtil/Qtil.qll | Imports the new Unicode module |
| src/qlpack.yml | Adds data extension for generated Unicode data |
| scripts/generate_unicode.py | Python script to generate Unicode property data from Unicode standard files |
| test/qtil/strings/UnicodeTest.ql | Comprehensive test suite for Unicode functionality |
| test/qtil/strings/UnicodeTest.expected | Test expectations file |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| if '..' not in code_point_hex_pair: | ||
| code_point_start = code_point_end = int(code_point_hex_pair, 16) | ||
| else: | ||
| # handle ranges like '00A0..00A7' | ||
| code_point_hex_start, code_point_hex_end = code_point_hex_pair.split('..') | ||
| code_point_start, code_point_end = int(code_point_hex_start, 16), int(code_point_hex_end, 16) |
Copilot
AI
Aug 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment and code handling ranges is duplicated on lines 128-130 and 157-159. Consider extracting this logic into a helper function to reduce duplication.
| if '..' not in code_point_hex_pair: | |
| code_point_start = code_point_end = int(code_point_hex_pair, 16) | |
| else: | |
| # handle ranges like '00A0..00A7' | |
| code_point_hex_start, code_point_hex_end = code_point_hex_pair.split('..') | |
| code_point_start, code_point_end = int(code_point_hex_start, 16), int(code_point_hex_end, 16) | |
| code_point_start, code_point_end = parse_code_point_range(code_point_hex_pair) |
| if '..' not in code_point_hex_pair: | ||
| code_point_start = code_point_end = int(code_point_hex_pair, 16) | ||
| else: | ||
| # handle ranges like '00A0..00A7' | ||
| code_point_hex_start, code_point_hex_end = code_point_hex_pair.split('..') | ||
| code_point_start, code_point_end = int(code_point_hex_start, 16), int(code_point_hex_end, 16) |
Copilot
AI
Aug 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment and code handling ranges is duplicated on lines 128-130 and 157-159. Consider extracting this logic into a helper function to reduce duplication.
| if '..' not in code_point_hex_pair: | |
| code_point_start = code_point_end = int(code_point_hex_pair, 16) | |
| else: | |
| # handle ranges like '00A0..00A7' | |
| code_point_hex_start, code_point_hex_end = code_point_hex_pair.split('..') | |
| code_point_start, code_point_end = int(code_point_hex_start, 16), int(code_point_hex_end, 16) | |
| code_point_start, code_point_end = parse_code_point_range(code_point_hex_pair) |
CodeQL coding standards is implementing MISRA rules that refer to unicode standard concepts such as UAX #44 compliant identifiers, and NFC normalization checks.
These concepts are neither specific to MISRA, nor C, and thus, deserve a home in qtil.
This pull request introduces
isAsciiandunescapeUnicodeQtil.qll.These features are pretty advanced, I'm not sure they're worth adding to the README.md.