-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Add IdnMapping Span-based APIs (TryGetAscii/TryGetUnicode) #123593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.Icu.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.cs
Outdated
Show resolved
Hide resolved
…callers Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
|
@copilot Can this method Refers to: src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs:464 in 5e3facf. [](commit_id = 5e3facf, deletion_comment = False) |
Unfortunately StringBuilder doesn't have a simple |
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.cs
Show resolved
Hide resolved
Co-authored-by: jkotas <6668460+jkotas@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds allocation-free-ish span-based IdnMapping conversion APIs (TryGetAscii / TryGetUnicode) and wires them through the ICU and Windows NLS implementations, along with tests and updated reference assemblies.
Changes:
- Introduced
IdnMapping.TryGetAscii(ReadOnlySpan<char>, Span<char>, out int)andTryGetUnicode(ReadOnlySpan<char>, Span<char>, out int)public APIs. - Updated ICU + NLS implementations and interop layers to support span-based conversions without unsafe code.
- Added unit tests for the new APIs and updated
System.Runtimeref to expose them.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.cs | Adds new public span-based APIs and updates invariant helpers / shared logic. |
| src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.Nls.cs | Switches NLS path to span-based interop and adds Try* core implementations. |
| src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.Icu.cs | Switches ICU path to span-based interop and adds Try* core implementations. |
| src/libraries/Common/src/Interop/Windows/Normaliz/Interop.Idna.cs | Adds span-based IdnToAscii/IdnToUnicode P/Invoke overloads. |
| src/libraries/Common/src/Interop/Interop.Idna.cs | Adds span-based ToAscii/ToUnicode P/Invoke overloads for ICU globalization native. |
| src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs | Routes string overload through a new span-based bidi category helper. |
| src/libraries/System.Runtime/ref/System.Runtime.cs | Exposes the new public APIs in the reference assembly. |
| src/libraries/System.Runtime/tests/System.Globalization.Extensions.Tests/IdnMapping/IdnMappingGetAsciiTests.cs | Adds tests for TryGetAscii. |
| src/libraries/System.Runtime/tests/System.Globalization.Extensions.Tests/IdnMapping/IdnMappingGetUnicodeTests.cs | Adds tests for TryGetUnicode. |
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.cs
Show resolved
Hide resolved
...em.Runtime/tests/System.Globalization.Extensions.Tests/IdnMapping/IdnMappingGetAsciiTests.cs
Show resolved
Hide resolved
....Runtime/tests/System.Globalization.Extensions.Tests/IdnMapping/IdnMappingGetUnicodeTests.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CharUnicodeInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.Icu.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.Icu.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.Icu.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.Nls.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/IdnMapping.Nls.cs
Show resolved
Hide resolved
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
tarekgh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest changes LGTM!
|
@copilot where in the repo can these new methods be used? Look for all uses of the existing IdnMapping methods and see whether any would be straightforward to convert in order to avoid allocations. Possibly somewhere in Uri or HttpClient? |
I found these usages of Most promising for conversion (uses
Other usages (straightforward but limited benefit):
The
I searched for all uses of Best candidates:
Lower priority:
|
@copilot yes. Also address your pending feedback |
… allocations Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Done in 5d34496. Optimized
|
|
@copilot, you didn't address the feedback from copilot. Please do so and leave comments on each indicating how you addressed it. |
…3AsciiRules/AllowUnassigned flags Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Addressed all the copilot-pull-request-reviewer feedback in b9f72ff:
|
Description
Adds span-based APIs to
IdnMappingfor zero-allocation IDN encoding/decoding:Behavior: Throws on invalid input (consistent with existing APIs). Returns
falseonly when destination buffer is too small.Changes
TryGetAscii/TryGetUnicodemethods with validation and platform dispatch logic inlined directly. Refactored existingGetAscii/GetUnicodemethods to use spans throughout, removing all unsafe code.IcuTryGetAsciiCore/IcuTryGetUnicodeCorewriting directly to destination buffer using span-based interopNlsTryGetAsciiCore/NlsTryGetUnicodeCorefor Windows using span-based interopGetBidiCategory(ReadOnlySpan<char>, int)overload for invariant punycode encoding; existing string overloads now call span-based implementationsTryGetUnicodeEquivalentto use new span-based APIs, avoiding intermediate string allocations when processing IDN labelsUseStd3AsciiRules/AllowUnassignedflag combinationsUsage
Internal Consumers
The new APIs are used internally in
System.Private.Urito avoid allocations inDomainNameHelper.TryGetUnicodeEquivalent, which processes IDN labels usingValueStringBuilder.Original prompt
This section details on the original issue you should resolve
<issue_title>Proposal: Add IdnMapping Span-based APIs</issue_title>
<issue_description>The current
IdnMappingAPI accepts/returns strings and throws on invalid input. I propose a set of Span-based APIs to avoid allocations.Both
GetandTryGet*methods would throw on invalid input.TryGet*would return false on insufficient space in the destination span.This new API would simplify call sites and remove allocations throughout code dealing with internationalized domain names, like Uri and Markdig.
cc: @tarekgh </issue_description>
<agent_instructions>Implement the approved TryGetXx methods. They must be as efficient as possible and avoid allocation. Reuse/augment existing code paths as much as possible rather than adding new ones. Testing should also leverage existing test code as much as possible.</agent_instructions>
Comments on the Issue (you are @copilot in this section)
@tarekgh In general, the proposal looks reasonable.Why we need the APIs:
I don't think these are useful if we are going to allocate a string anyway. And the other proposed APIs can be used at that time. What do you think about that?
@MihaZupan Those aren't needed in my use-cases, as `Try*` would always be used.I'll remove them as they could easily be exposed later if a use-case presents itself.</comment_new>
<comment_new>@stephentoub
I don't understand these APIs:
What does the
Trymean here? If it's to avoid throwing in the case where the data is somehow invalid, that's a different meaning than the otherTryoverloads would have, which would be based solely on whether the destination is large enough. The Boolean returned from aTryis supposed to convey only one thing, and in such span-based Try methods, it's always used to connote whether the destination was large enough to store the transformed data. I don't think we want two overloads of the same method having a different meaning for theTry.</comment_new><comment_new>@MihaZupan
The idea was for both the
out stringandSpan destinationto return false on invalid input. The span one would also return false on insufficient space.I agree the Span overload would be confusing to use since you couldn't differentiate between invalid input/insufficient space, without ensuring you supply a worst-case sized buffer.
Do we have a pattern of
Try*methods ever throwing? If so, we could have all overloads throw on invalid input, where theTryonly returns false on insufficient space.Alternatively, we would need an
@stephentoub > Do we have a pattern of Try* methods ever throwing?OperationStatus-style return?Yes, Try methods can still throw.
Why not just:
?
If the exception for invalid input really is unexceptional, though, with the exception happening so frequently as to be a performance problem in real situations, then yeah, OperationStatus is what you'd want.
Can you share examples where the exception is a meaningful problem?</comment_new>
<comment_new>@MihaZupan
Yes, that is the shape we'...
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.