Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 25, 2026

Description

Adds span-based APIs to IdnMapping for zero-allocation IDN encoding/decoding:

namespace System.Globalization
{
    public sealed class IdnMapping
    {
        public bool TryGetAscii(ReadOnlySpan<char> unicode, Span<char> destination, out int charsWritten);
        public bool TryGetUnicode(ReadOnlySpan<char> ascii, Span<char> destination, out int charsWritten);
    }
}

Behavior: Throws on invalid input (consistent with existing APIs). Returns false only when destination buffer is too small.

Changes

  • IdnMapping.cs: Added public TryGetAscii/TryGetUnicode methods with validation and platform dispatch logic inlined directly. Refactored existing GetAscii/GetUnicode methods to use spans throughout, removing all unsafe code.
  • IdnMapping.Icu.cs: Added IcuTryGetAsciiCore/IcuTryGetUnicodeCore writing directly to destination buffer using span-based interop
  • IdnMapping.Nls.cs: Added NlsTryGetAsciiCore/NlsTryGetUnicodeCore for Windows using span-based interop
  • Interop.Idna.cs: Updated to use span-based method signatures, removed unused pointer-based overloads
  • CharUnicodeInfo.cs: Added GetBidiCategory(ReadOnlySpan<char>, int) overload for invariant punycode encoding; existing string overloads now call span-based implementations
  • System.Runtime.cs: Exposed new APIs in reference assembly
  • DomainNameHelper.cs: Updated TryGetUnicodeEquivalent to use new span-based APIs, avoiding intermediate string allocations when processing IDN labels
  • Tests: Added tests for both methods covering success, buffer-too-small, invalid input cases, and UseStd3AsciiRules/AllowUnassigned flag combinations

Usage

var idn = new IdnMapping();
Span<char> buffer = stackalloc char[256];

if (idn.TryGetAscii("例え.jp", buffer, out int written))
{
    ReadOnlySpan<char> ascii = buffer.Slice(0, written); // "xn--r8jz45g.jp"
}

Internal Consumers

The new APIs are used internally in System.Private.Uri to avoid allocations in DomainNameHelper.TryGetUnicodeEquivalent, which processes IDN labels using ValueStringBuilder.

Original prompt

This section details on the original issue you should resolve

<issue_title>Proposal: Add IdnMapping Span-based APIs</issue_title>
<issue_description>The current IdnMapping API accepts/returns strings and throws on invalid input. I propose a set of Span-based APIs to avoid allocations.

namespace System.Globalization
{
    public sealed class IdnMapping
    {
        // Existing API
        public string GetAscii(string unicode);
        public string GetAscii(string unicode, int index);
        public string GetAscii(string unicode, int index, int count);

        public string GetUnicode(string ascii);
        public string GetUnicode(string ascii, int index);
        public string GetUnicode(string ascii, int index, int count);

        // Proposed API
        public string GetAscii(ReadOnlySpan<char> unicode);
        public bool TryGetAscii(ReadOnlySpan<char> unicode, Span<char> destination, out int charsWritten);

        public string GetUnicode(ReadOnlySpan<char> ascii);
        public bool TryGetUnicode(ReadOnlySpan<char> ascii, Span<char> destination, out int charsWritten);
    }
}

Both Get and TryGet* methods would throw on invalid input.
TryGet* would return false on insufficient space in the destination span.

This new API would simplify call sites and remove allocations throughout code dealing with internationalized domain names, like Uri and Markdig.

cc: @tarekgh </issue_description>

<agent_instructions>Implement the approved TryGetXx methods. They must be as efficient as possible and avoid allocation. Reuse/augment existing code paths as much as possible rather than adding new ones. Testing should also leverage existing test code as much as possible.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

@tarekgh In general, the proposal looks reasonable.

Why we need the APIs:

    public string GetAscii(ReadOnlySpan<char> unicode);
    public string GetUnicode(ReadOnlySpan<char> ascii);

I don't think these are useful if we are going to allocate a string anyway. And the other proposed APIs can be used at that time. What do you think about that?

@MihaZupan Those aren't needed in my use-cases, as `Try*` would always be used.

I'll remove them as they could easily be exposed later if a use-case presents itself.</comment_new>
<comment_new>@stephentoub
I don't understand these APIs:

public bool TryGetAscii(ReadOnlySpan<char> unicode, out string ascii);
public bool TryGetUnicode(ReadOnlySpan<char> ascii, out string unicode);

What does the Try mean here? If it's to avoid throwing in the case where the data is somehow invalid, that's a different meaning than the other Try overloads would have, which would be based solely on whether the destination is large enough. The Boolean returned from a Try is supposed to convey only one thing, and in such span-based Try methods, it's always used to connote whether the destination was large enough to store the transformed data. I don't think we want two overloads of the same method having a different meaning for the Try.</comment_new>
<comment_new>@MihaZupan
The idea was for both the out string and Span destination to return false on invalid input. The span one would also return false on insufficient space.

I agree the Span overload would be confusing to use since you couldn't differentiate between invalid input/insufficient space, without ensuring you supply a worst-case sized buffer.

Do we have a pattern of Try* methods ever throwing? If so, we could have all overloads throw on invalid input, where the Try only returns false on insufficient space.

// Existing
public string GetAscii(string unicode);

// New
public string GetAscii(ReadOnlySpan<char> unicode);
bool TryGetAscii(ReadOnlySpan<char> unicode, Span<char> destination, out int charsWritten);

Alternatively, we would need an OperationStatus-style return?

@stephentoub > Do we have a pattern of Try* methods ever throwing?

Yes, Try methods can still throw.

Alternatively, we would need an OperationStatus-style return?

Why not just:

string GetAscii(ReadOnlySpan<char> unicode);
bool TryGetAscii(ReadOnlySpan<char> unicode, Span<char> destination, out int charsWritten);

?

If the exception for invalid input really is unexceptional, though, with the exception happening so frequently as to be a performance problem in real situations, then yeah, OperationStatus is what you'd want.

Can you share examples where the exception is a meaningful problem?</comment_new>
<comment_new>@MihaZupan

string GetAscii(ReadOnlySpan<char> unicode);

Yes, that is the shape we'...


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits January 25, 2026 14:18
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot AI changed the title [WIP] Add Span-based APIs to IdnMapping Add IdnMapping Span-based APIs (TryGetAscii/TryGetUnicode) Jan 25, 2026
Copilot AI requested a review from stephentoub January 25, 2026 14:26
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
Copilot AI requested a review from jkotas January 25, 2026 19:45
…callers

Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
@tarekgh
Copy link
Member

tarekgh commented Jan 25, 2026

@copilot Can this method private static int GetCodePointFromString(string s, int index) now just call the newly introduced method that uses Span?


Refers to: src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs:464 in 5e3facf. [](commit_id = 5e3facf, deletion_comment = False)

Copy link
Contributor Author

Copilot AI commented Jan 25, 2026

@copilot Can this method private static int GetCodePointFromString(string s, int index) now just call the newly introduced method that uses Span?
...

Unfortunately StringBuilder doesn't have a simple AsSpan() method to convert to ReadOnlySpan<char> for the whole content. The GetChunks() API exists but would add complexity for extracting just 1-2 characters. Both methods compute a code point from the character at index (considering surrogates), then call the shared GetBidiCategoryNoBoundsChecks. The code duplication is minimal (the surrogate handling logic) and avoids adding allocation or complexity to call the span version.

Copilot AI requested a review from tarekgh January 25, 2026 22:15
@stephentoub stephentoub marked this pull request as ready for review January 26, 2026 00:19
Copilot AI review requested due to automatic review settings January 26, 2026 00:19
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
Copilot AI requested a review from jkotas January 26, 2026 00:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds allocation-free-ish span-based IdnMapping conversion APIs (TryGetAscii / TryGetUnicode) and wires them through the ICU and Windows NLS implementations, along with tests and updated reference assemblies.

Changes:

  • Introduced IdnMapping.TryGetAscii(ReadOnlySpan<char>, Span<char>, out int) and TryGetUnicode(ReadOnlySpan<char>, Span<char>, out int) public APIs.
  • Updated ICU + NLS implementations and interop layers to support span-based conversions without unsafe code.
  • Added unit tests for the new APIs and updated System.Runtime ref to expose them.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.cs Adds new public span-based APIs and updates invariant helpers / shared logic.
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.Nls.cs Switches NLS path to span-based interop and adds Try* core implementations.
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.Icu.cs Switches ICU path to span-based interop and adds Try* core implementations.
src/libraries/Common/src/Interop/Windows/Normaliz/Interop.Idna.cs Adds span-based IdnToAscii/IdnToUnicode P/Invoke overloads.
src/libraries/Common/src/Interop/Interop.Idna.cs Adds span-based ToAscii/ToUnicode P/Invoke overloads for ICU globalization native.
src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs Routes string overload through a new span-based bidi category helper.
src/libraries/System.Runtime/ref/System.Runtime.cs Exposes the new public APIs in the reference assembly.
src/libraries/System.Runtime/tests/System.Globalization.Extensions.Tests/IdnMapping/IdnMappingGetAsciiTests.cs Adds tests for TryGetAscii.
src/libraries/System.Runtime/tests/System.Globalization.Extensions.Tests/IdnMapping/IdnMappingGetUnicodeTests.cs Adds tests for TryGetUnicode.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copy link
Member

@tarekgh tarekgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest changes LGTM!

@stephentoub
Copy link
Member

@copilot where in the repo can these new methods be used? Look for all uses of the existing IdnMapping methods and see whether any would be straightforward to convert in order to avoid allocations. Possibly somewhere in Uri or HttpClient?

Copy link
Contributor Author

Copilot AI commented Jan 26, 2026

@copilot where in the repo can these new methods be used? Look for all uses of the existing IdnMapping methods and see whether any would be straightforward to convert in order to avoid allocations. Po...

I found these usages of IdnMapping.GetAscii/GetUnicode:

Most promising for conversion (uses ValueStringBuilder):

  • System.Private.Uri/DomainNameHelper.cs:TryGetUnicodeEquivalent - This method already uses ValueStringBuilder and could benefit from TryGetAscii/TryGetUnicode to avoid intermediate string allocations when processing labels (lines 246-264).

Other usages (straightforward but limited benefit):

  • DomainNameHelper.cs:UnicodeEquivalent (line 205) - returns string, moderate benefit
  • TargetHostNameHelper.cs:NormalizeHostName (line 30) - returns string, limited benefit
  • CertificateValidation.Unix.cs (line 50) - needs string for interop
  • CertificateValidation.OSX.cs (line 42) - needs string for interop
  • MailAddress.cs and SmtpClient.cs - returns string, limited benefit
  • SubjectAlternativeNameBuilder.cs - needs string for ASN.1 struct
  • TlsFrameHelper.cs:GetServerName (line 779) - uses GetUnicode, returns string

The TryGetUnicodeEquivalent method in DomainNameHelper.cs is the best candidate since it uses ValueStringBuilder and processes labels in a loop - it could use TryGetAscii and TryGetUnicode directly into the buffer to avoid string allocations.

@copilot where in the repo can these new methods be used? Look for all uses of the existing IdnMapping methods and see whether any would be straightforward to convert in order to avoid allocations. Po...

I searched for all uses of IdnMapping.GetAscii/GetUnicode in the repo. Here are the main candidates for conversion:

Best candidates:

  1. System.Private.Uri/DomainNameHelper.cs - TryGetUnicodeEquivalent (lines 246, 248, 264) - This already uses ValueStringBuilder and processes labels in a loop. The span-based TryGetAscii/TryGetUnicode could avoid intermediate string allocations. Most promising.

  2. System.Net.Security/TargetHostNameHelper.cs (line 30) - GetAscii is called but the result is returned as a string anyway, so no benefit unless the caller chain is also converted.

Lower priority:
3. System.Net.Mail/SmtpClient.cs and MailAddress.cs - Both assign GetAscii result to string variables, minimal benefit.

  1. CertificateValidation.*.cs - Results stored in string matchName for comparison, would need broader refactoring.

  2. TlsFrameHelper.cs (line 779) - GetUnicode is called but returns string, needs caller chain changes.

DomainNameHelper.TryGetUnicodeEquivalent is the most straightforward opportunity - it builds results using ValueStringBuilder and could benefit from writing directly to spans. Should I implement that optimization?

@stephentoub
Copy link
Member

Should I implement that optimization?

@copilot yes. Also address your pending feedback

… allocations

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 26, 2026

Should I implement that optimization?

@copilot yes. Also address your pending feedback

Done in 5d34496. Optimized DomainNameHelper.TryGetUnicodeEquivalent to use the new span-based APIs:

  • For non-ASCII labels: uses TryGetAscii to stackalloc buffer, then TryGetUnicode via helper
  • For ACE labels (xn--): uses TryGetUnicode directly to the ValueStringBuilder
  • Added TryAppendIdnUnicode helper that writes directly to the ValueStringBuilder, falling back to string-based API only for edge cases exceeding the initial 64-char buffer

@stephentoub
Copy link
Member

@copilot, you didn't address the feedback from copilot. Please do so and leave comments on each indicating how you addressed it.

…3AsciiRules/AllowUnassigned flags

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 26, 2026

@copilot, you didn't address the feedback from copilot. Please do so and leave comments on each indicating how you addressed it.

Addressed all the copilot-pull-request-reviewer feedback in b9f72ff:

  1. Duplicate substring allocation (IdnMapping.cs:644-648) - Already fixed in 1488db9, the code stores asciiSlice and reuses it for both PunycodeDecode and the round-trip comparison.

  2. XML summary for TryGetAscii (IdnMapping.cs:97-103) - Already fixed in 1488db9, the doc now correctly says "Encodes a Unicode domain name to its ASCII (Punycode) equivalent."

  3. Wrong parameter name in GetUnicodeInvariant (IdnMapping.cs:647-651) - Already fixed in 1488db9, the parameter is named ascii and the exception uses nameof(ascii).

  4. Test coverage with UseStd3AsciiRules/AllowUnassigned flags - Added in b9f72ff:

    • TryGetAscii_WithFlags - tests with both flags enabled
    • TryGetAscii_Invalid - tests invalid inputs with both UseStd3AsciiRules = false and UseStd3AsciiRules = true
    • TryGetUnicode_WithFlags - tests with both flags enabled
    • TryGetUnicode_Invalid - tests invalid inputs with both flag combinations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: Add IdnMapping Span-based APIs

5 participants