Add unicode support #13

MichaelRFairhurst · 2025-08-23T15:58:58Z

CodeQL coding standards is implementing MISRA rules that refer to unicode standard concepts such as UAX #44 compliant identifiers, and NFC normalization checks.

These concepts are neither specific to MISRA, nor C, and thus, deserve a home in qtil.

This pull request introduces

extensible predicates that encode raw data from the unicode standard
a script to generate the extensible predicate yaml from local downloads of the unicode property databases
a few general ascii/unicode helpers such as isAscii and unescapeUnicode
a few specific APIs to make UAX #44 validity and NFC quick checking more user friendly (and efficient)
importing these unicode features from Qtil.qll.

These features are pretty advanced, I'm not sure they're worth adding to the README.md.

Adds UAX #44 identifier checking, and NFC quick check support, along with a few helpers like `isAscii` and `unescapeUnicode`.

Copilot

Pull Request Overview

This PR introduces comprehensive Unicode support to the qtil library, providing CodeQL predicates for Unicode property checking, UAX #44 identifier validation, and NFC normalization checking. The implementation includes raw Unicode data generation, string utilities for Unicode escape handling, and efficient APIs for common Unicode operations.

Adds extensible predicates for Unicode properties (enumeration, boolean, and numeric)
Implements UAX #44 identifier validation and NFC normalization quick checking
Provides utilities for Unicode escape sequences and ASCII validation

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/qtil/strings/Unicode.qll	Core Unicode module with extensible predicates and helper functions
src/qtil/Qtil.qll	Imports the new Unicode module
src/qlpack.yml	Adds data extension for generated Unicode data
scripts/generate_unicode.py	Python script to generate Unicode property data from Unicode standard files
test/qtil/strings/UnicodeTest.ql	Comprehensive test suite for Unicode functionality
test/qtil/strings/UnicodeTest.expected	Test expectations file

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-08-24T06:16:58Z

scripts/generate_unicode.py

+        if '..' not in code_point_hex_pair:
+            code_point_start = code_point_end = int(code_point_hex_pair, 16)
+        else:
+        # handle ranges like '00A0..00A7'
+            code_point_hex_start, code_point_hex_end = code_point_hex_pair.split('..')
+            code_point_start, code_point_end = int(code_point_hex_start, 16), int(code_point_hex_end, 16)


The comment and code handling ranges is duplicated on lines 128-130 and 157-159. Consider extracting this logic into a helper function to reduce duplication.

Suggested change

if '..' not in code_point_hex_pair:

code_point_start = code_point_end = int(code_point_hex_pair, 16)

else:

# handle ranges like '00A0..00A7'

code_point_hex_start, code_point_hex_end = code_point_hex_pair.split('..')

code_point_start, code_point_end = int(code_point_hex_start, 16), int(code_point_hex_end, 16)

code_point_start, code_point_end = parse_code_point_range(code_point_hex_pair)

Copilot · 2025-08-24T06:16:58Z

scripts/generate_unicode.py

+        if '..' not in code_point_hex_pair:
+            code_point_start = code_point_end = int(code_point_hex_pair, 16)
+        else:
+        # handle ranges like '00A0..00A7'
+            code_point_hex_start, code_point_hex_end = code_point_hex_pair.split('..')
+            code_point_start, code_point_end = int(code_point_hex_start, 16), int(code_point_hex_end, 16)


The comment and code handling ranges is duplicated on lines 128-130 and 157-159. Consider extracting this logic into a helper function to reduce duplication.

Suggested change

if '..' not in code_point_hex_pair:

code_point_start = code_point_end = int(code_point_hex_pair, 16)

else:

# handle ranges like '00A0..00A7'

code_point_hex_start, code_point_hex_end = code_point_hex_pair.split('..')

code_point_start, code_point_end = int(code_point_hex_start, 16), int(code_point_hex_end, 16)

code_point_start, code_point_end = parse_code_point_range(code_point_hex_pair)

MichaelRFairhurst force-pushed the michaelrfairhurst/add-unicode branch from c4aff6f to 091cee3 Compare August 23, 2025 16:02

MichaelRFairhurst mentioned this pull request Aug 23, 2025

Implement naming package, new IdentifierIntroduction.qll, unicode funcs. github/codeql-coding-standards#950

Open

30 tasks

MichaelRFairhurst force-pushed the michaelrfairhurst/add-unicode branch from 091cee3 to d02924f Compare August 24, 2025 05:54

Implement general unicode property support, and some specific features.

2e0ffdb

Adds UAX #44 identifier checking, and NFC quick check support, along with a few helpers like `isAscii` and `unescapeUnicode`.

MichaelRFairhurst force-pushed the michaelrfairhurst/add-unicode branch from d02924f to 2e0ffdb Compare August 24, 2025 06:05

MichaelRFairhurst marked this pull request as ready for review August 24, 2025 06:15

MichaelRFairhurst requested review from Copilot and jeongsoolee09 August 24, 2025 06:15

Copilot AI reviewed Aug 24, 2025

View reviewed changes

Merge branch 'main' into michaelrfairhurst/add-unicode

a6e2ebc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add unicode support #13

Add unicode support #13

Uh oh!

MichaelRFairhurst commented Aug 23, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 24, 2025

Uh oh!

Copilot AI Aug 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add unicode support #13

Are you sure you want to change the base?

Add unicode support #13

Uh oh!

Conversation

MichaelRFairhurst commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MichaelRFairhurst commented Aug 23, 2025 •

edited

Loading