Fix default wstring length #862

TonyWelte · 2025-04-27T11:11:48Z

This PR fixes strings containing unicode characters not being escaped properly and their size being inaccurate. For instance on WString it results in the following change

38c38
< static char test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default2[] = "Hell\\xc3\\xb6 w\\xc3\\xb6rld!";
---
> static char test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default2[] = "Hell\xc3\xb6 w\xc3\xb6rld!";
40c40
< static char test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default3[] = "\\xe3\\x83\\x8f\\xe3\\x83\\xad\\xe3\\x83\\xbc\\xe3\\x83\\xaf\\xe3\\x83\\xbc\\xe3\\x83\\xab\\xe3\\x83\\x89";
---
> static char test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default3[] = "\xe3\x83\x8f\xe3\x83\xad\xe3\x83\xbc\xe3\x83\xaf\xe3\x83\xbc\xe3\x83\xab\xe3\x83\x89";
74c74
<     {test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default2, 12, 12},
---
>     {test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default2, 14, 14},
84c84
<     {test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default3, 7, 7},
---
>     {test_msgs__msg__WStrings__DEFAULT_VALUE__wstring_value_default3, 21, 21},
140,141c140,141
<   "wstring wstring_value_default2 \"Hell\\xc3\\xb6 w\\xc3\\xb6rld!\"\n"
<   "wstring wstring_value_default3 \"\\xe3\\x83\\x8f\\xe3\\x83\\xad\\xe3\\x83\\xbc\\xe3\\x83\\xaf\\xe3\\x83\\xbc\\xe3\\x83\\xab\\xe3\\x83\\x89\"\n"
---
>   "wstring wstring_value_default2 \"Hell\xc3\xb6 w\xc3\xb6rld!\"\n"
>   "wstring wstring_value_default3 \"\xe3\x83\x8f\xe3\x83\xad\xe3\x83\xbc\xe3\x83\xaf\xe3\x83\xbc\xe3\x83\xab\xe3\x83\x89\"\n"
161c161
<     {toplevel_type_raw_source, 399, 399},
---
>     {toplevel_type_raw_source, 415, 415},

It also fixes " and ' being over escaped (and some other escaped characters). For instance on https://github.com/Tonywelte/test_interface_files/blob/rolling/msg/Strings.msg (see ros2/test_interface_files#25)

46c46
< static char test_msgs__msg__Strings__DEFAULT_VALUE__string_value_default6[] = "Hello\\\\world!";
---
> static char test_msgs__msg__Strings__DEFAULT_VALUE__string_value_default6[] = "Hello\\world!";
48c48
< static char test_msgs__msg__Strings__DEFAULT_VALUE__string_value_default7[] = "Hello\\tworld!";
---
> static char test_msgs__msg__Strings__DEFAULT_VALUE__string_value_default7[] = "Hello  world!";
61c61
< static char test_msgs__msg__Strings__DEFAULT_VALUE__bounded_string_value_default6[] = "Hello\\\\world!";
---
> static char test_msgs__msg__Strings__DEFAULT_VALUE__bounded_string_value_default6[] = "Hello\\world!";
63c63
< static char test_msgs__msg__Strings__DEFAULT_VALUE__bounded_string_value_default7[] = "Hello\\tworld!";
---
> static char test_msgs__msg__Strings__DEFAULT_VALUE__bounded_string_value_default7[] = "Hello  world!";
250,255c250,255
<   "string string_value_default2 \"Hello\\'world!\"\n"
<   "string string_value_default3 \\'Hello\"world!\\'\n"
<   "string string_value_default4 'Hello\\\\'world!'\n"
<   "string string_value_default5 \"Hello\\\\\"world!\"\n"
<   "string string_value_default6 \"Hello\\\\\\\\world!\"\n"
<   "string string_value_default7 \"Hello\\\\tworld!\"\n"
---
>   "string string_value_default2 \"Hello'world!\"\n"
>   "string string_value_default3 'Hello\"world!'\n"
>   "string string_value_default4 'Hello\\'world!'\n"
>   "string string_value_default5 \"Hello\\\"world!\"\n"
>   "string string_value_default6 \"Hello\\\\world!\"\n"
>   "string string_value_default7 \"Hello\\tworld!\"\n"
260,265c260,265
<   "string<=22 bounded_string_value_default2 \"Hello\\'world!\"\n"
<   "string<=22 bounded_string_value_default3 \\'Hello\"world!\\'\n"
<   "string<=22 bounded_string_value_default4 'Hello\\\\'world!'\n"
<   "string<=22 bounded_string_value_default5 \"Hello\\\\\"world!\"\n"
<   "string<=22 bounded_string_value_default6 \"Hello\\\\\\\\world!\"\n"
<   "string<=22 bounded_string_value_default7 \"Hello\\\\tworld!\"\n"
---
>   "string<=22 bounded_string_value_default2 \"Hello'world!\"\n"
>   "string<=22 bounded_string_value_default3 'Hello\"world!'\n"
>   "string<=22 bounded_string_value_default4 'Hello\\'world!'\n"
>   "string<=22 bounded_string_value_default5 \"Hello\\\"world!\"\n"
>   "string<=22 bounded_string_value_default6 \"Hello\\\\world!\"\n"
>   "string<=22 bounded_string_value_default7 \"Hello\\tworld!\"\n"

Additional notes

While working on this fix I noticed that rosidl_runtime_c__String states that the capacity should include the null byte but the size shouldn't. None of the string defined by rosidl generator do this. I haven't fixed it in this PR since I'm not sure what needs to be changed. It should be as simple as adding +1 in static_seq but for toplevel_type_raw_source it should be -1. I'm not sure what's the difference.
A think it should be clarified what escaped characters are supported in default strings. \' and \" are mentioned in https://design.ros2.org/articles/legacy_interface_definition.html but other (\\ and \t) worked in rosidl_generator_cpp but rosidl_generator_c didn't escape them properly. \n doesn't work at all.

Fixes #855

Signed-off-by: Anthony Welte <[email protected]>

rosidl_generator_c/rosidl_generator_c/__init__.py

fujitatomoya

this looks good to me, just a minor nitpick.

i would like to have an another review for this PR before merge.

fujitatomoya · 2025-05-06T15:19:05Z

Pulls: #862
Gist: https://gist.githubusercontent.com/fujitatomoya/88476ec104d8b6970d44034c3f4a6fac/raw/354a5a9e3a0ddbe768c322cf3f1ae478f06cbdf5/ros2.repos
BUILD args: --packages-above-and-dependencies rosidl_generator_c
TEST args: --packages-above rosidl_generator_c
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/15920

Linux
Linux-aarch64
Linux-rhel
Windows

fujitatomoya · 2025-05-20T04:18:09Z

Pulls: #862
Gist: https://gist.githubusercontent.com/fujitatomoya/9157bd03771275331596df97b01c4e91/raw/354a5a9e3a0ddbe768c322cf3f1ae478f06cbdf5/ros2.repos
BUILD args: --packages-above-and-dependencies rosidl_generator_c
TEST args: --packages-above rosidl_generator_c
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/16050

Linux
Linux-aarch64
Linux-rhel
Windows

Signed-off-by: Anthony Welte <[email protected]>

TonyWelte · 2025-05-21T19:40:30Z

Looks like the RHEL build uses an older Python version (before 3.12) which doesn't like strings within fstrings. I've pushed the following modification:

   """Statically define a runtime Sequence or String type."""
   if values:
     if isinstance(values, str):
-      return f'{{{varname}, {len(values.encode('utf-8'))}, {len(values.encode('utf-8'))}}}'
+      utf8_values = values.encode('utf-8')
+      return f'{{{varname}, {len(utf8_values)}, {len(utf8_values)}}}'
     else:
       return f'{{{varname}, {len(values)}, {len(values)}}}'
   return '{NULL, 0, 0}'

fujitatomoya · 2025-05-21T21:03:17Z

Pulls: #862
Gist: https://gist.githubusercontent.com/fujitatomoya/56af7eafdd1a192e3fc92c458054b5f5/raw/354a5a9e3a0ddbe768c322cf3f1ae478f06cbdf5/ros2.repos
BUILD args: --packages-above-and-dependencies rosidl_generator_c
TEST args: --packages-above rosidl_generator_c
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/16062

Linux
Linux-aarch64
Linux-rhel
Windows

TonyWelte · 2025-09-10T09:28:27Z

After using this fix with our interfaces we noticed another bug that could cause issues once this is merged.

If a Unicode character is directly followed by 0-9 or a-f, we get a warning warning: hex escape sequence out of range. It compiles but our tests fail because of an address sanitizer error global-buffer-overflow.

The solution would be to split the string after a Unicode sequence by adding "" (maybe only if the next character is 0-9 or a-f ?). Turning

char test[] = "Temperature \xc2\xb0C";

into

char test[] = "Temperature \xc2\xb0""C";

see https://godbolt.org/z/5rPKh4qa9

I can attempt a fix in this MR or make a new one. What do you think @fujitatomoya ?

fujitatomoya · 2025-09-11T02:49:15Z

@TonyWelte to be honest, i am not sure about that.

@sloretz @cottsay @ahcorde do you have opinion about this?

TonyWelte force-pushed the bugfix/unicode-string-length branch from 2360fe6 to 6eba9f8 Compare April 27, 2025 11:12

TonyWelte force-pushed the bugfix/unicode-string-length branch from 6eba9f8 to 8c50d41 Compare May 4, 2025 14:10

TonyWelte marked this pull request as ready for review May 4, 2025 14:12

TonyWelte added a commit to TonyWelte/rosidlcpp that referenced this pull request May 4, 2025

Fix escaped strings to match changes from ros2/rosidl#862

4fb71f6

Signed-off-by: Anthony Welte <[email protected]>

fujitatomoya reviewed May 6, 2025

View reviewed changes

rosidl_generator_c/rosidl_generator_c/__init__.py Outdated Show resolved Hide resolved

fujitatomoya approved these changes May 6, 2025

View reviewed changes

TonyWelte force-pushed the bugfix/unicode-string-length branch from 8c50d41 to 0c29c98 Compare May 6, 2025 18:33

alsora assigned fujitatomoya May 16, 2025

fujitatomoya approved these changes May 20, 2025

View reviewed changes

Fix default wstring length

52876f4

Signed-off-by: Anthony Welte <[email protected]>

TonyWelte force-pushed the bugfix/unicode-string-length branch from 0c29c98 to 52876f4 Compare May 21, 2025 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix default wstring length #862

Fix default wstring length #862

Uh oh!

TonyWelte commented Apr 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

fujitatomoya left a comment

Uh oh!

fujitatomoya commented May 6, 2025

Uh oh!

fujitatomoya commented May 20, 2025

Uh oh!

TonyWelte commented May 21, 2025 •

edited

Loading

Uh oh!

fujitatomoya commented May 21, 2025

Uh oh!

TonyWelte commented Sep 10, 2025

Uh oh!

fujitatomoya commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix default wstring length #862

Are you sure you want to change the base?

Fix default wstring length #862

Uh oh!

Conversation

TonyWelte commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Additional notes

Uh oh!

Uh oh!

fujitatomoya left a comment

Choose a reason for hiding this comment

Uh oh!

fujitatomoya commented May 6, 2025

Uh oh!

fujitatomoya commented May 20, 2025

Uh oh!

TonyWelte commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fujitatomoya commented May 21, 2025

Uh oh!

TonyWelte commented Sep 10, 2025

Uh oh!

fujitatomoya commented Sep 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TonyWelte commented Apr 27, 2025 •

edited

Loading

TonyWelte commented May 21, 2025 •

edited

Loading