Skip to content

Reduce backtracking for greedy loops followed by subsumed literals#125636

Merged
stephentoub merged 4 commits intodotnet:mainfrom
stephentoub:lessbacktracking
Mar 18, 2026
Merged

Reduce backtracking for greedy loops followed by subsumed literals#125636
stephentoub merged 4 commits intodotnet:mainfrom
stephentoub:lessbacktracking

Conversation

@stephentoub
Copy link
Member

@stephentoub stephentoub commented Mar 16, 2026

When a greedy character loop (like \w+, \d+, [a-z]+) is followed by a literal that's part of the loop's character class, backtracking normally requires repeated LastIndexOf calls to find viable positions. However, if whatever comes after that literal is disjoint from the loop's class, then only the very last position consumed by the loop can possibly succeed — every earlier position would have a loop-class character where the disjoint subsequent needs something else.

For example, in \b\w+n\b, the \w+ loop is followed by n (which is in \w), and n is followed by \b. Since the loop only matches word characters, any position in the middle of the loop's consumed range would have a word character after the n, and the \b boundary wouldn't be satisfied there. Only the very last consumed position can work, so backtracking can skip directly to it rather than searching backward one position at a time.

- Add Multi literal support to CanReduceLoopBacktrackingToSinglePosition,
  treating a multi-char literal as two singles: first char must be in the
  loop's set, second must not, enabling single-position backtracking.
- Extract FindNextNodeInSequence from CanBeMadeAtomic for reuse.
- Replace IsKnownWordClassSubset with IsSubsetOf(set, WordClass) and
  delete the now-redundant method.
- Add test coverage for both One and Multi literal cases.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 16, 2026 21:24
@stephentoub stephentoub changed the title Improve regex loop backtracking optimizations Reduce backtracking for greedy loops followed by subsumed literals Mar 16, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances System.Text.RegularExpressions loop backtracking optimizations, expanding the single-position backtrack short-circuit to cover additional literal forms and refactoring supporting analysis helpers.

Changes:

  • Extend loop backtracking reduction to support Multi literals (in addition to One/Set) and update compiler/generator emitters accordingly.
  • Extract tree-walk logic into FindNextNodeInSequence for reuse.
  • Replace IsKnownWordClassSubset with a more general RegexCharClass.IsSubsetOf helper and remove the redundant method.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs Adds functional test cases covering scenarios intended to exercise the updated optimization paths (including Multi literals).
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs Adds CanReduceLoopBacktrackingToSinglePosition, refactors tree-walk logic into FindNextNodeInSequence, and updates word-class subset checks.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs Updates emitted IL for greedy single-char loops to use the new single-position check (including Multi) instead of repeated LastIndexOf when applicable.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs Introduces IsSubsetOf(subset, superset) and removes IsKnownWordClassSubset.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs Mirrors the compiler changes in the source generator emitter, including Multi handling for the single-position check.

You can also share your feedback on Copilot code review. Take the survey.

@stephentoub
Copy link
Member Author

@MihuBot regexdiff

@stephentoub
Copy link
Member Author

@MihuBot benchmark Regex

@MihuBot
Copy link

MihuBot commented Mar 16, 2026

69 out of 18857 patterns have generated source code changes.

Examples of GeneratedRegex source diffs
"([A-Z]+)([A-Z][a-z])" (1481 uses)
[GeneratedRegex("([A-Z]+)([A-Z][a-z])")]
      base.CheckTimeout();
  }
  
-   if (charloop_starting_pos >= charloop_ending_pos ||
-       (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, charloop_ending_pos - charloop_starting_pos).LastIndexOfAnyInRange('A', 'Z')) < 0)
+   if (charloop_starting_pos >= charloop_ending_pos)
+   {
+       UncaptureUntil(0);
+       return false; // The input didn't match.
+   }
+   charloop_ending_pos--;
+   if (!char.IsAsciiLetterUpper(inputSpan[charloop_ending_pos]))
  {
      UncaptureUntil(0);
      return false; // The input didn't match.
  }
-   charloop_ending_pos += charloop_starting_pos;
  pos = charloop_ending_pos;
+   charloop_ending_pos = 0;
  slice = inputSpan.Slice(pos);
  
  CharLoopEnd:
"([0-9\\w\\+]+\\.)|([0-9\\w\\+]+\\+)([\\(\\)]*)" (582 uses)
[GeneratedRegex("([0-9\\w\\+]+\\.)|([0-9\\w\\+]+\\+)([\\(\\)]*)")]
      base.CheckTimeout();
  }
  
-   if (charloop_starting_pos >= charloop_ending_pos ||
-       (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, charloop_ending_pos - charloop_starting_pos).LastIndexOf('+')) < 0)
+   if (charloop_starting_pos >= charloop_ending_pos)
+   {
+       UncaptureUntil(0);
+       return false; // The input didn't match.
+   }
+   charloop_ending_pos--;
+   if (inputSpan[charloop_ending_pos] != '+')
  {
      UncaptureUntil(0);
      return false; // The input didn't match.
  }
-   charloop_ending_pos += charloop_starting_pos;
  pos = charloop_ending_pos;
+   charloop_ending_pos = 0;
  slice = inputSpan.Slice(pos);
  
  CharLoopEnd:
"^[a-z0-9][a-z0-9.-]+[a-z0-9]$" (452 uses)
[GeneratedRegex("^[a-z0-9][a-z0-9.-]+[a-z0-9]$")]
      base.CheckTimeout();
  }
  
-   if (charloop_starting_pos >= charloop_ending_pos ||
-       (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, charloop_ending_pos - charloop_starting_pos).LastIndexOfAny(Utilities.s_asciiLettersLowerAndDigits)) < 0)
+   if (charloop_starting_pos >= charloop_ending_pos)
+   {
+       return false; // The input didn't match.
+   }
+   charloop_ending_pos--;
+   if (!(((uint)((ch = inputSpan[charloop_ending_pos]) - '0') <= (uint)('9' - '0')) | ((uint)(ch - 'a') <= (uint)('z' - 'a'))))
  {
      return false; // The input didn't match.
  }
-   charloop_ending_pos += charloop_starting_pos;
  pos = charloop_ending_pos;
+   charloop_ending_pos = 0;
  slice = inputSpan.Slice(pos);
  
  CharLoopEnd:
"((([^\"]*\\\\\\\")*)|([^\"]*))[^\"]*(\\\"|$)" (193 uses)
[GeneratedRegex("((([^\"]*\\\\\\\")*)|([^\"]*))[^\"]*(\\\"|$)")]
      base.CheckTimeout();
  }
  
-   if (charloop_starting_pos >= charloop_ending_pos ||
-       (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, Math.Min(inputSpan.Length, charloop_ending_pos + 1) - charloop_starting_pos).LastIndexOf("\\\"")) < 0)
+   if (charloop_starting_pos >= charloop_ending_pos)
+   {
+       goto LoopIterationNoMatch;
+   }
+   charloop_ending_pos--;
+   if (inputSpan[charloop_ending_pos] != '\\')
  {
      goto LoopIterationNoMatch;
  }
-   charloop_ending_pos += charloop_starting_pos;
  pos = charloop_ending_pos;
+   charloop_ending_pos = 0;
  slice = inputSpan.Slice(pos);
  
  CharLoopEnd:
"([+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+) \\s+ [+- ..." (169 uses)
[GeneratedRegex("([+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+) \\s+ [+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+)) (?:\\s+[+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+))+")]
      base.CheckTimeout();
  }
  
-   if (charloop_starting_pos2 >= charloop_ending_pos2 ||
-       (charloop_ending_pos2 = inputSpan.Slice(charloop_starting_pos2, charloop_ending_pos2 - charloop_starting_pos2).LastIndexOf(' ')) < 0)
+   if (charloop_starting_pos2 >= charloop_ending_pos2)
+   {
+       goto AlternationBacktrack;
+   }
+   charloop_ending_pos2--;
+   if (inputSpan[charloop_ending_pos2] != ' ')
  {
      goto AlternationBacktrack;
  }
-   charloop_ending_pos2 += charloop_starting_pos2;
  pos = charloop_ending_pos2;
+   charloop_ending_pos2 = 0;
  slice = inputSpan.Slice(pos);
  
  CharLoopEnd2:
"\\s+ (?:z|m|zm|Z|M|ZM) \\s* \\(" (169 uses)
[GeneratedRegex("\\s+ (?:z|m|zm|Z|M|ZM) \\s* \\(")]
      base.CheckTimeout();
  }
  
-   if (charloop_starting_pos >= charloop_ending_pos ||
-       (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, charloop_ending_pos - charloop_starting_pos).LastIndexOf(' ')) < 0)
+   if (charloop_starting_pos >= charloop_ending_pos)
+   {
+       return false; // The input didn't match.
+   }
+   charloop_ending_pos--;
+   if (inputSpan[charloop_ending_pos] != ' ')
  {
      return false; // The input didn't match.
  }
-   charloop_ending_pos += charloop_starting_pos;
  pos = charloop_ending_pos;
+   charloop_ending_pos = 0;
  slice = inputSpan.Slice(pos);
  
  CharLoopEnd:
      base.CheckTimeout();
  }
  
-   if (charloop_starting_pos1 >= charloop_ending_pos1 ||
-       (charloop_ending_pos1 = inputSpan.Slice(charloop_starting_pos1, Math.Min(inputSpan.Length, charloop_ending_pos1 + 1) - charloop_starting_pos1).LastIndexOf(" (")) < 0)
+   if (charloop_starting_pos1 >= charloop_ending_pos1)
+   {
+       goto AlternationBacktrack;
+   }
+   charloop_ending_pos1--;
+   if (inputSpan[charloop_ending_pos1] != ' ')
  {
      goto AlternationBacktrack;
  }
-   charloop_ending_pos1 += charloop_starting_pos1;
  pos = charloop_ending_pos1;
+   charloop_ending_pos1 = 0;
  slice = inputSpan.Slice(pos);
  
  CharLoopEnd1:
"^ \\s* ([+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+)) ..." (169 uses)
[GeneratedRegex("^ \\s* ([+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+)) \\s* ([+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+)) \\s* ([+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+)) \\s* ([+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+)) \\s* $")]
      base.CheckTimeout();
  }
  
-   if (charloop_starting_pos >= charloop_ending_pos ||
-       (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, charloop_ending_pos - charloop_starting_pos).LastIndexOf(' ')) < 0)
+   if (charloop_starting_pos >= charloop_ending_pos)
+   {
+       UncaptureUntil(0);
+       return false; // The input didn't match.
+   }
+   charloop_ending_pos--;
+   if (inputSpan[charloop_ending_pos] != ' ')
  {
      UncaptureUntil(0);
      return false; // The input didn't match.
  }
-   charloop_ending_pos += charloop_starting_pos;
  pos = charloop_ending_pos;
+   charloop_ending_pos = 0;
  slice = inputSpan.Slice(pos);
  
  CharLoopEnd:
      base.CheckTimeout();
  }
  
-   if (charloop_starting_pos3 >= charloop_ending_pos3 ||
-       (charloop_ending_pos3 = inputSpan.Slice(charloop_starting_pos3, charloop_ending_pos3 - charloop_starting_pos3).LastIndexOf(' ')) < 0)
+   if (charloop_starting_pos3 >= charloop_ending_pos3)
+   {
+       goto CaptureBacktrack;
+   }
+   charloop_ending_pos3--;
+   if (inputSpan[charloop_ending_pos3] != ' ')
  {
      goto CaptureBacktrack;
  }
-   charloop_ending_pos3 += charloop_starting_pos3;
  pos = charloop_ending_pos3;
+   charloop_ending_pos3 = 0;
  slice = inputSpan.Slice(pos);
  
  CharLoopEnd3:
      base.CheckTimeout();
  }
  
-   if (charloop_starting_pos6 >= charloop_ending_pos6 ||
-       (charloop_ending_pos6 = inputSpan.Slice(charloop_starting_pos6, charloop_ending_pos6 - charloop_starting_pos6).LastIndexOf(' ')) < 0)
+   if (charloop_starting_pos6 >= charloop_ending_pos6)
+   {
+       goto CaptureBacktrack1;
+   }
+   charloop_ending_pos6--;
+   if (inputSpan[charloop_ending_pos6] != ' ')
  {
      goto CaptureBacktrack1;
  }
-   charloop_ending_pos6 += charloop_starting_pos6;
  pos = charloop_ending_pos6;
+   charloop_ending_pos6 = 0;
  slice = inputSpan.Slice(pos);
  
  CharLoopEnd6:
      base.CheckTimeout();
  }
  
-   if (charloop_starting_pos9 >= charloop_ending_pos9 ||
-       (charloop_ending_pos9 = inputSpan.Slice(charloop_starting_pos9, charloop_ending_pos9 - charloop_starting_pos9).LastIndexOf(' ')) < 0)
+   if (charloop_starting_pos9 >= charloop_ending_pos9)
+   {
+       goto CaptureBacktrack2;
+   }
+   charloop_ending_pos9--;
+   if (inputSpan[charloop_ending_pos9] != ' ')
  {
      goto CaptureBacktrack2;
  }
-   charloop_ending_pos9 += charloop_starting_pos9;
  pos = charloop_ending_pos9;
+   charloop_ending_pos9 = 0;
  slice = inputSpan.Slice(pos);
  
  CharLoopEnd9:
"(?<ServiceName>^[^ ]+): " (164 uses)
[GeneratedRegex("(?<ServiceName>^[^ ]+): ")]
      base.CheckTimeout();
  }
  
-   if (charloop_starting_pos >= charloop_ending_pos ||
-       (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, Math.Min(inputSpan.Length, charloop_ending_pos + 1) - charloop_starting_pos).LastIndexOf(": ")) < 0)
+   if (charloop_starting_pos >= charloop_ending_pos)
+   {
+       UncaptureUntil(0);
+       return false; // The input didn't match.
+   }
+   charloop_ending_pos--;
+   if (inputSpan[charloop_ending_pos] != ':')
  {
      UncaptureUntil(0);
      return false; // The input didn't match.
  }
-   charloop_ending_pos += charloop_starting_pos;
  pos = charloop_ending_pos;
+   charloop_ending_pos = 0;
  slice = inputSpan.Slice(pos);
  
  CharLoopEnd:
"Bind<(?<contractTypes>[\\w.,\\s<>]+)>\\(\\)( ..." (125 uses)
[GeneratedRegex("Bind<(?<contractTypes>[\\w.,\\s<>]+)>\\(\\)(?<config>.*)\\.To<\\s*(?<instanceType>[\\w.<>]+)\\s*>\\(\\)", RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.CultureInvariant)]
      base.CheckTimeout();
  }
  
-   if (charloop_starting_pos >= charloop_ending_pos ||
-       (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, Math.Min(inputSpan.Length, charloop_ending_pos + 2) - charloop_starting_pos).LastIndexOf(">()")) < 0)
+   if (charloop_starting_pos >= charloop_ending_pos)
+   {
+       UncaptureUntil(0);
+       return false; // The input didn't match.
+   }
+   charloop_ending_pos--;
+   if (inputSpan[charloop_ending_pos] != '>')
  {
      UncaptureUntil(0);
      return false; // The input didn't match.
  }
-   charloop_ending_pos += charloop_starting_pos;
  pos = charloop_ending_pos;
+   charloop_ending_pos = 0;
  slice = inputSpan.Slice(pos);
  
  CharLoopEnd:
"(?<link>[a-z]*?:\\/\\/(?<domain>(?:[a-z0-9]\ ..." (124 uses)
[GeneratedRegex("(?<link>[a-z]*?:\\/\\/(?<domain>(?:[a-z0-9]\\.|[a-z0-9][a-z0-9-]*[a-z0-9]\\.)*[a-z0-9-]*[a-z0-9](?::\\d+)?)(?<path>(?:(?:\\/+(?:[a-z0-9$_\\.\\+!\\*\\',;:\\(\\)@&~=-]|%[0-9a-f]{2})*)*(?:\\?(?:[a-z0-9$_\\+!\\*\\',;:\\(\\)@&=\\/~-]|%[0-9a-f]{2})*)?)?(?:#(?:[a-z0-9$_\\+!\\*\\',;:\\(\\)@&=\\/~-]|%[0-9a-f]{2})*)?)?)", RegexOptions.IgnoreCase)]
      base.CheckTimeout();
  }
  
-   if (charloop_starting_pos >= charloop_ending_pos ||
-       (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, charloop_ending_pos - charloop_starting_pos).LastIndexOfAny(Utilities.s_asciiLettersAndDigitsAndKelvinSign)) < 0)
+   if (charloop_starting_pos >= charloop_ending_pos)
+   {
+       goto LoopIterationNoMatch;
+   }
+   charloop_ending_pos--;
+   if (!((ch = inputSpan[charloop_ending_pos]) < 128 ? char.IsAsciiLetterOrDigit(ch) : RegexRunner.CharInClass((char)ch, "\0\b\00:A[a{KÅ")))
  {
      goto LoopIterationNoMatch;
  }
-   charloop_ending_pos += charloop_starting_pos;
  pos = charloop_ending_pos;
+   charloop_ending_pos = 0;
  slice = inputSpan.Slice(pos);
  
  CharLoopEnd:

For more diff examples, see https://gist.github.com/MihuBot/df11270327a688e54dcc88562eb55b44

JIT assembly changes
Total bytes of base: 55567181
Total bytes of diff: 55569340
Total bytes of delta: 2159 (0.00 % of base)
Total relative delta: 0.31
    diff is a regression.
    relative diff is a regression.

For a list of JIT diff regressions, see Regressions.md
For a list of JIT diff improvements, see Improvements.md

Sample source code for further analysis
const string JsonPath = "RegexResults-1819.json";
if (!File.Exists(JsonPath))
{
    await using var archiveStream = await new HttpClient().GetStreamAsync("https://mihubot.xyz/r/FJV1maWA");
    using var archive = new ZipArchive(archiveStream, ZipArchiveMode.Read);
    archive.Entries.First(e => e.Name == "Results.json").ExtractToFile(JsonPath);
}

using FileStream jsonFileStream = File.OpenRead(JsonPath);
RegexEntry[] entries = JsonSerializer.Deserialize<RegexEntry[]>(jsonFileStream, new JsonSerializerOptions { IncludeFields = true })!;
Console.WriteLine($"Working with {entries.Length} patterns");



record KnownPattern(string Pattern, RegexOptions Options, int Count);

sealed class RegexEntry
{
    public required KnownPattern Regex { get; set; }
    public required string MainSource { get; set; }
    public required string PrSource { get; set; }
    public string? FullDiff { get; set; }
    public string? ShortDiff { get; set; }
    public (string Name, string Values)[]? SearchValuesOfChar { get; set; }
    public (string[] Values, StringComparison ComparisonType)[]? SearchValuesOfString { get; set; }
}

- Update XML doc and inline comment in CanReduceLoopBacktrackingToSinglePosition
  to describe the Multi literal case (first char subsumed, second char disjoint)
  instead of claiming only One/Set are handled.
- Update IsSubsetOf comments to use parameter names (subset/superset) instead
  of set1/set2.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@MihuBot
Copy link

MihuBot commented Mar 16, 2026

@stephentoub stephentoub marked this pull request as ready for review March 17, 2026 00:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the regex compiler/tree optimization that reduces backtracking work for greedy single-character loops when they’re followed by a subsumed literal and then a disjoint constraint (e.g., boundary/anchor/disjoint set), enabling the engine to skip repeated backward searches and instead check only the final viable position.

Changes:

  • Extend the “single viable backtrack position” optimization to handle Multi (string) literals in addition to One/Set.
  • Refactor tree-walking logic into a shared FindNextNodeInSequence helper for reuse across optimizations.
  • Replace the specialized word-subset helper with a general RegexCharClass.IsSubsetOf and update call sites; add functional tests covering these scenarios.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs Adds functional cases that exercise patterns where the new single-position backtracking reduction should/shouldn’t apply (including Multi).
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs Introduces CanReduceLoopBacktrackingToSinglePosition, extracts FindNextNodeInSequence, and updates atomicity checks to use IsSubsetOf.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs Emits specialized backtracking code for qualifying loops to avoid repeated LastIndexOf calls (checks only the last viable position).
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs Adds conservative IsSubsetOf(subset, superset) helper and removes the redundant IsKnownWordClassSubset.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs Mirrors the compiler-side optimization in the source generator emitter, keeping generated and runtime-compiled regex code in sync.

You can also share your feedback on Copilot code review. Take the survey.

Add tests for:
- Set literal arm (e.g. \d+[0-9]\s, \d+[0-9][a-z])
- Nothing after literal / optimization should not fire (\w+n, \d+5)
- Notoneloop path ([^x]+a\b)
- Minimum 0 / star (\w*n\b)
- Longer Multi literal (\d+0abc)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danmoseley
Copy link
Member

Added a few more tests.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a regex compilation optimization to reduce backtracking for greedy single-character loops when they’re followed by a literal that’s contained in the loop’s character class, and the pattern immediately after that literal is disjoint from the loop’s class (e.g., \b\w+n\b). In those cases, only the last possible backtrack position can succeed, so the compiler/source-generator can avoid repeated LastIndexOf searches.

Changes:

  • Add RegexNode.CanReduceLoopBacktrackingToSinglePosition plus a shared FindNextNodeInSequence helper to analyze when the single-position backtrack is valid.
  • Update both the IL compiler and source generator emitters to use a direct “check last position” path instead of LastIndexOf when the optimization applies.
  • Add functional test cases covering boundary, anchors, multi-literals, and grouping wrappers; replace IsKnownWordClassSubset usage with RegexCharClass.IsSubsetOf.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs Adds functional cases intended to exercise the optimization scenarios and edge cases.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs Adds the new applicability analysis helper and refactors “next node” traversal into a reusable method; updates word-subset checks to IsSubsetOf.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs Emits the single-position backtrack fast-path in the IL compiler to avoid repeated LastIndexOf.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs Introduces conservative IsSubsetOf helper and removes IsKnownWordClassSubset.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs Mirrors the IL compiler fast-path in the source generator output.

…ents

Add concrete regex examples (e.g. \w+a\s, \d+[0-9]\s, \d+0x) to the
CanReduceLoopBacktrackingToSinglePosition XML doc and inline case
comments per reviewer request. Simplify the emitter/compiler comments
to reference the method rather than repeating the full rationale.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@stephentoub
Copy link
Member Author

/ba-g deadletter

@stephentoub stephentoub enabled auto-merge (squash) March 18, 2026 21:54
@stephentoub stephentoub merged commit c05c6a4 into dotnet:main Mar 18, 2026
83 of 90 checks passed
@stephentoub stephentoub deleted the lessbacktracking branch March 18, 2026 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants