Skip to content

Conversation

@MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Nov 28, 2025

Review without whitespace changes

When we rebuild the _string for inputs with non-ASCII chars, we do so in two places:

  • PrivateParseMinimal recreates the string up to the path (scheme, user info, host, port)
  • ParseRemaining recreates the path, query and fragment

ParseRemaining is currently written along the lines of

int idx;
int origIdx;
int length;

if (HasUnicode)
{
    _string += Normalize(_originalString, origIdx, ...);
    GetLengthWithoutTrailingSpaces(_string, ref length, idx);
}

CheckCanonical(_string, idx, ...);

if (HasUnicode)
{
    _string += Normalize(_originalString, origIdx, ...);
    GetLengthWithoutTrailingSpaces(_string, ref length, idx);
}

CheckCanonical(_string, idx, ...);

if (HasUnicode)
{
    _string += Normalize(_originalString, origIdx, ...);
    GetLengthWithoutTrailingSpaces(_string, ref length, idx);
}

CheckCanonical(_string, idx, ...);

This makes it much harder to reason about because we're juggling two offsets into two different strings, while also changing one of them.

This PR splits the rebuilding logic into a separate helper we call first that normalizes the path/query/fragment.

if (HasUnicode)
{
    Rebuild();
}

CheckCanonical(_string, idx, ...);
CheckCanonical(_string, idx, ...);
CheckCanonical(_string, idx, ...);

void Rebuild()
{
    var builder = ...;

    Normalize(ref builder, span, ...);
    Normalize(ref builder, span, ...);
    Normalize(ref builder, span, ...);

    _string = builder.ToString();
}

Since all the transformations now happen in a single place, we can also easily avoid the ~5 temporary string allocations.
We're still allocating the host separately, but that can be improved later.

Benchmark
[MemoryDiagnoser(false)]
public class UriInit
{
    private string _uriString;

    [Params("Simple", "LongPath", "ShortSections")]
    public string Input;

    [GlobalSetup]
    public void Setup() => _uriString = Input switch
    {
        "Simple" => "https://mihubot.xyz/api/RuntimeUtils/CoreRoot/All?arch=x64&os=linux&type=release#hello",
        "LongPath" => $"https://contoso.com/{new string('a', 400)}?b#c",
        "ShortSections" => $"https://a@host/b?c#d"
    } + "🍉";

    [Benchmark]
    public Uri Ctor() => new Uri(_uriString);
}
Method Toolchain Input Mean Error Ratio Allocated Alloc Ratio
Ctor main LongPath 666.8 ns 15.24 ns 1.00 3752 B 1.00
Ctor pr LongPath 552.8 ns 3.87 ns 0.83 1128 B 0.30
Ctor main ShortSections 158.4 ns 1.52 ns 1.00 608 B 1.00
Ctor pr ShortSections 122.4 ns 0.69 ns 0.77 360 B 0.59
Ctor main Simple 225.1 ns 1.82 ns 1.00 968 B 1.00
Ctor pr Simple 195.3 ns 3.08 ns 0.87 448 B 0.46

@MihaZupan MihaZupan added this to the 11.0.0 milestone Nov 28, 2025
@MihaZupan MihaZupan self-assigned this Nov 28, 2025
Copilot AI review requested due to automatic review settings November 28, 2025 13:37
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Copilot finished reviewing on behalf of MihaZupan November 28, 2025 13:40
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the URI normalization logic to improve code maintainability and reduce memory allocations for URIs containing non-ASCII characters. The key change splits the string rebuilding logic out of ParseRemaining into a dedicated helper method ParseRemaining_RecreateNormalizedString, eliminating approximately 5 temporary string allocations per URI construction.

Key changes:

  • Refactored ParseRemaining to separate Unicode normalization logic into ParseRemaining_RecreateNormalizedString
  • Changed IriHelper.EscapeUnescapeIri from returning a string to accepting a ref ValueStringBuilder parameter, enabling allocation reuse
  • Simplified UriSyntax helper methods by inlining IsFullMatch

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/libraries/System.Private.Uri/tests/UnitTests/IriEscapeUnescapeTest.cs Updated test helper to use new EscapeUnescapeIri API with ValueStringBuilder
src/libraries/System.Private.Uri/src/System/UriSyntax.cs Simplified flag checking methods by inlining logic
src/libraries/System.Private.Uri/src/System/UriExt.cs Removed obsolete EscapeUnescapeIri wrapper method
src/libraries/System.Private.Uri/src/System/Uri.cs Split normalization logic into separate helper; updated to use new IriHelper API; minor BOM addition
src/libraries/System.Private.Uri/src/System/IriHelper.cs Changed API from returning string to void with ref ValueStringBuilder parameter; removed unnecessary parentheses

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants