Reduce backtracking for greedy loops followed by subsumed literals#125636
Reduce backtracking for greedy loops followed by subsumed literals#125636stephentoub merged 4 commits intodotnet:mainfrom
Conversation
- Add Multi literal support to CanReduceLoopBacktrackingToSinglePosition, treating a multi-char literal as two singles: first char must be in the loop's set, second must not, enabling single-position backtracking. - Extract FindNextNodeInSequence from CanBeMadeAtomic for reuse. - Replace IsKnownWordClassSubset with IsSubsetOf(set, WordClass) and delete the now-redundant method. - Add test coverage for both One and Multi literal cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR enhances System.Text.RegularExpressions loop backtracking optimizations, expanding the single-position backtrack short-circuit to cover additional literal forms and refactoring supporting analysis helpers.
Changes:
- Extend loop backtracking reduction to support
Multiliterals (in addition toOne/Set) and update compiler/generator emitters accordingly. - Extract tree-walk logic into
FindNextNodeInSequencefor reuse. - Replace
IsKnownWordClassSubsetwith a more generalRegexCharClass.IsSubsetOfhelper and remove the redundant method.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs | Adds functional test cases covering scenarios intended to exercise the updated optimization paths (including Multi literals). |
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs | Adds CanReduceLoopBacktrackingToSinglePosition, refactors tree-walk logic into FindNextNodeInSequence, and updates word-class subset checks. |
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs | Updates emitted IL for greedy single-char loops to use the new single-position check (including Multi) instead of repeated LastIndexOf when applicable. |
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs | Introduces IsSubsetOf(subset, superset) and removes IsKnownWordClassSubset. |
| src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs | Mirrors the compiler changes in the source generator emitter, including Multi handling for the single-position check. |
You can also share your feedback on Copilot code review. Take the survey.
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs
Show resolved
Hide resolved
...ibraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs
Outdated
Show resolved
Hide resolved
|
@MihuBot regexdiff |
|
@MihuBot benchmark Regex |
|
69 out of 18857 patterns have generated source code changes. Examples of GeneratedRegex source diffs"([A-Z]+)([A-Z][a-z])" (1481 uses)[GeneratedRegex("([A-Z]+)([A-Z][a-z])")] base.CheckTimeout();
}
- if (charloop_starting_pos >= charloop_ending_pos ||
- (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, charloop_ending_pos - charloop_starting_pos).LastIndexOfAnyInRange('A', 'Z')) < 0)
+ if (charloop_starting_pos >= charloop_ending_pos)
+ {
+ UncaptureUntil(0);
+ return false; // The input didn't match.
+ }
+ charloop_ending_pos--;
+ if (!char.IsAsciiLetterUpper(inputSpan[charloop_ending_pos]))
{
UncaptureUntil(0);
return false; // The input didn't match.
}
- charloop_ending_pos += charloop_starting_pos;
pos = charloop_ending_pos;
+ charloop_ending_pos = 0;
slice = inputSpan.Slice(pos);
CharLoopEnd:"([0-9\\w\\+]+\\.)|([0-9\\w\\+]+\\+)([\\(\\)]*)" (582 uses)[GeneratedRegex("([0-9\\w\\+]+\\.)|([0-9\\w\\+]+\\+)([\\(\\)]*)")] base.CheckTimeout();
}
- if (charloop_starting_pos >= charloop_ending_pos ||
- (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, charloop_ending_pos - charloop_starting_pos).LastIndexOf('+')) < 0)
+ if (charloop_starting_pos >= charloop_ending_pos)
+ {
+ UncaptureUntil(0);
+ return false; // The input didn't match.
+ }
+ charloop_ending_pos--;
+ if (inputSpan[charloop_ending_pos] != '+')
{
UncaptureUntil(0);
return false; // The input didn't match.
}
- charloop_ending_pos += charloop_starting_pos;
pos = charloop_ending_pos;
+ charloop_ending_pos = 0;
slice = inputSpan.Slice(pos);
CharLoopEnd:"^[a-z0-9][a-z0-9.-]+[a-z0-9]$" (452 uses)[GeneratedRegex("^[a-z0-9][a-z0-9.-]+[a-z0-9]$")] base.CheckTimeout();
}
- if (charloop_starting_pos >= charloop_ending_pos ||
- (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, charloop_ending_pos - charloop_starting_pos).LastIndexOfAny(Utilities.s_asciiLettersLowerAndDigits)) < 0)
+ if (charloop_starting_pos >= charloop_ending_pos)
+ {
+ return false; // The input didn't match.
+ }
+ charloop_ending_pos--;
+ if (!(((uint)((ch = inputSpan[charloop_ending_pos]) - '0') <= (uint)('9' - '0')) | ((uint)(ch - 'a') <= (uint)('z' - 'a'))))
{
return false; // The input didn't match.
}
- charloop_ending_pos += charloop_starting_pos;
pos = charloop_ending_pos;
+ charloop_ending_pos = 0;
slice = inputSpan.Slice(pos);
CharLoopEnd:"((([^\"]*\\\\\\\")*)|([^\"]*))[^\"]*(\\\"|$)" (193 uses)[GeneratedRegex("((([^\"]*\\\\\\\")*)|([^\"]*))[^\"]*(\\\"|$)")] base.CheckTimeout();
}
- if (charloop_starting_pos >= charloop_ending_pos ||
- (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, Math.Min(inputSpan.Length, charloop_ending_pos + 1) - charloop_starting_pos).LastIndexOf("\\\"")) < 0)
+ if (charloop_starting_pos >= charloop_ending_pos)
+ {
+ goto LoopIterationNoMatch;
+ }
+ charloop_ending_pos--;
+ if (inputSpan[charloop_ending_pos] != '\\')
{
goto LoopIterationNoMatch;
}
- charloop_ending_pos += charloop_starting_pos;
pos = charloop_ending_pos;
+ charloop_ending_pos = 0;
slice = inputSpan.Slice(pos);
CharLoopEnd:"([+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+) \\s+ [+- ..." (169 uses)[GeneratedRegex("([+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+) \\s+ [+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+)) (?:\\s+[+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+))+")] base.CheckTimeout();
}
- if (charloop_starting_pos2 >= charloop_ending_pos2 ||
- (charloop_ending_pos2 = inputSpan.Slice(charloop_starting_pos2, charloop_ending_pos2 - charloop_starting_pos2).LastIndexOf(' ')) < 0)
+ if (charloop_starting_pos2 >= charloop_ending_pos2)
+ {
+ goto AlternationBacktrack;
+ }
+ charloop_ending_pos2--;
+ if (inputSpan[charloop_ending_pos2] != ' ')
{
goto AlternationBacktrack;
}
- charloop_ending_pos2 += charloop_starting_pos2;
pos = charloop_ending_pos2;
+ charloop_ending_pos2 = 0;
slice = inputSpan.Slice(pos);
CharLoopEnd2:"\\s+ (?:z|m|zm|Z|M|ZM) \\s* \\(" (169 uses)[GeneratedRegex("\\s+ (?:z|m|zm|Z|M|ZM) \\s* \\(")] base.CheckTimeout();
}
- if (charloop_starting_pos >= charloop_ending_pos ||
- (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, charloop_ending_pos - charloop_starting_pos).LastIndexOf(' ')) < 0)
+ if (charloop_starting_pos >= charloop_ending_pos)
+ {
+ return false; // The input didn't match.
+ }
+ charloop_ending_pos--;
+ if (inputSpan[charloop_ending_pos] != ' ')
{
return false; // The input didn't match.
}
- charloop_ending_pos += charloop_starting_pos;
pos = charloop_ending_pos;
+ charloop_ending_pos = 0;
slice = inputSpan.Slice(pos);
CharLoopEnd:
base.CheckTimeout();
}
- if (charloop_starting_pos1 >= charloop_ending_pos1 ||
- (charloop_ending_pos1 = inputSpan.Slice(charloop_starting_pos1, Math.Min(inputSpan.Length, charloop_ending_pos1 + 1) - charloop_starting_pos1).LastIndexOf(" (")) < 0)
+ if (charloop_starting_pos1 >= charloop_ending_pos1)
+ {
+ goto AlternationBacktrack;
+ }
+ charloop_ending_pos1--;
+ if (inputSpan[charloop_ending_pos1] != ' ')
{
goto AlternationBacktrack;
}
- charloop_ending_pos1 += charloop_starting_pos1;
pos = charloop_ending_pos1;
+ charloop_ending_pos1 = 0;
slice = inputSpan.Slice(pos);
CharLoopEnd1:"^ \\s* ([+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+)) ..." (169 uses)[GeneratedRegex("^ \\s* ([+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+)) \\s* ([+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+)) \\s* ([+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+)) \\s* ([+-]?(?:\\d+\\.?\\d*|\\d*\\.?\\d+)) \\s* $")] base.CheckTimeout();
}
- if (charloop_starting_pos >= charloop_ending_pos ||
- (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, charloop_ending_pos - charloop_starting_pos).LastIndexOf(' ')) < 0)
+ if (charloop_starting_pos >= charloop_ending_pos)
+ {
+ UncaptureUntil(0);
+ return false; // The input didn't match.
+ }
+ charloop_ending_pos--;
+ if (inputSpan[charloop_ending_pos] != ' ')
{
UncaptureUntil(0);
return false; // The input didn't match.
}
- charloop_ending_pos += charloop_starting_pos;
pos = charloop_ending_pos;
+ charloop_ending_pos = 0;
slice = inputSpan.Slice(pos);
CharLoopEnd:
base.CheckTimeout();
}
- if (charloop_starting_pos3 >= charloop_ending_pos3 ||
- (charloop_ending_pos3 = inputSpan.Slice(charloop_starting_pos3, charloop_ending_pos3 - charloop_starting_pos3).LastIndexOf(' ')) < 0)
+ if (charloop_starting_pos3 >= charloop_ending_pos3)
+ {
+ goto CaptureBacktrack;
+ }
+ charloop_ending_pos3--;
+ if (inputSpan[charloop_ending_pos3] != ' ')
{
goto CaptureBacktrack;
}
- charloop_ending_pos3 += charloop_starting_pos3;
pos = charloop_ending_pos3;
+ charloop_ending_pos3 = 0;
slice = inputSpan.Slice(pos);
CharLoopEnd3:
base.CheckTimeout();
}
- if (charloop_starting_pos6 >= charloop_ending_pos6 ||
- (charloop_ending_pos6 = inputSpan.Slice(charloop_starting_pos6, charloop_ending_pos6 - charloop_starting_pos6).LastIndexOf(' ')) < 0)
+ if (charloop_starting_pos6 >= charloop_ending_pos6)
+ {
+ goto CaptureBacktrack1;
+ }
+ charloop_ending_pos6--;
+ if (inputSpan[charloop_ending_pos6] != ' ')
{
goto CaptureBacktrack1;
}
- charloop_ending_pos6 += charloop_starting_pos6;
pos = charloop_ending_pos6;
+ charloop_ending_pos6 = 0;
slice = inputSpan.Slice(pos);
CharLoopEnd6:
base.CheckTimeout();
}
- if (charloop_starting_pos9 >= charloop_ending_pos9 ||
- (charloop_ending_pos9 = inputSpan.Slice(charloop_starting_pos9, charloop_ending_pos9 - charloop_starting_pos9).LastIndexOf(' ')) < 0)
+ if (charloop_starting_pos9 >= charloop_ending_pos9)
+ {
+ goto CaptureBacktrack2;
+ }
+ charloop_ending_pos9--;
+ if (inputSpan[charloop_ending_pos9] != ' ')
{
goto CaptureBacktrack2;
}
- charloop_ending_pos9 += charloop_starting_pos9;
pos = charloop_ending_pos9;
+ charloop_ending_pos9 = 0;
slice = inputSpan.Slice(pos);
CharLoopEnd9:"(?<ServiceName>^[^ ]+): " (164 uses)[GeneratedRegex("(?<ServiceName>^[^ ]+): ")] base.CheckTimeout();
}
- if (charloop_starting_pos >= charloop_ending_pos ||
- (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, Math.Min(inputSpan.Length, charloop_ending_pos + 1) - charloop_starting_pos).LastIndexOf(": ")) < 0)
+ if (charloop_starting_pos >= charloop_ending_pos)
+ {
+ UncaptureUntil(0);
+ return false; // The input didn't match.
+ }
+ charloop_ending_pos--;
+ if (inputSpan[charloop_ending_pos] != ':')
{
UncaptureUntil(0);
return false; // The input didn't match.
}
- charloop_ending_pos += charloop_starting_pos;
pos = charloop_ending_pos;
+ charloop_ending_pos = 0;
slice = inputSpan.Slice(pos);
CharLoopEnd:"Bind<(?<contractTypes>[\\w.,\\s<>]+)>\\(\\)( ..." (125 uses)[GeneratedRegex("Bind<(?<contractTypes>[\\w.,\\s<>]+)>\\(\\)(?<config>.*)\\.To<\\s*(?<instanceType>[\\w.<>]+)\\s*>\\(\\)", RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.CultureInvariant)] base.CheckTimeout();
}
- if (charloop_starting_pos >= charloop_ending_pos ||
- (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, Math.Min(inputSpan.Length, charloop_ending_pos + 2) - charloop_starting_pos).LastIndexOf(">()")) < 0)
+ if (charloop_starting_pos >= charloop_ending_pos)
+ {
+ UncaptureUntil(0);
+ return false; // The input didn't match.
+ }
+ charloop_ending_pos--;
+ if (inputSpan[charloop_ending_pos] != '>')
{
UncaptureUntil(0);
return false; // The input didn't match.
}
- charloop_ending_pos += charloop_starting_pos;
pos = charloop_ending_pos;
+ charloop_ending_pos = 0;
slice = inputSpan.Slice(pos);
CharLoopEnd:"(?<link>[a-z]*?:\\/\\/(?<domain>(?:[a-z0-9]\ ..." (124 uses)[GeneratedRegex("(?<link>[a-z]*?:\\/\\/(?<domain>(?:[a-z0-9]\\.|[a-z0-9][a-z0-9-]*[a-z0-9]\\.)*[a-z0-9-]*[a-z0-9](?::\\d+)?)(?<path>(?:(?:\\/+(?:[a-z0-9$_\\.\\+!\\*\\',;:\\(\\)@&~=-]|%[0-9a-f]{2})*)*(?:\\?(?:[a-z0-9$_\\+!\\*\\',;:\\(\\)@&=\\/~-]|%[0-9a-f]{2})*)?)?(?:#(?:[a-z0-9$_\\+!\\*\\',;:\\(\\)@&=\\/~-]|%[0-9a-f]{2})*)?)?)", RegexOptions.IgnoreCase)] base.CheckTimeout();
}
- if (charloop_starting_pos >= charloop_ending_pos ||
- (charloop_ending_pos = inputSpan.Slice(charloop_starting_pos, charloop_ending_pos - charloop_starting_pos).LastIndexOfAny(Utilities.s_asciiLettersAndDigitsAndKelvinSign)) < 0)
+ if (charloop_starting_pos >= charloop_ending_pos)
+ {
+ goto LoopIterationNoMatch;
+ }
+ charloop_ending_pos--;
+ if (!((ch = inputSpan[charloop_ending_pos]) < 128 ? char.IsAsciiLetterOrDigit(ch) : RegexRunner.CharInClass((char)ch, "\0\b\00:A[a{KÅ")))
{
goto LoopIterationNoMatch;
}
- charloop_ending_pos += charloop_starting_pos;
pos = charloop_ending_pos;
+ charloop_ending_pos = 0;
slice = inputSpan.Slice(pos);
CharLoopEnd:For more diff examples, see https://gist.github.com/MihuBot/df11270327a688e54dcc88562eb55b44 JIT assembly changesFor a list of JIT diff regressions, see Regressions.md Sample source code for further analysisconst string JsonPath = "RegexResults-1819.json";
if (!File.Exists(JsonPath))
{
await using var archiveStream = await new HttpClient().GetStreamAsync("https://mihubot.xyz/r/FJV1maWA");
using var archive = new ZipArchive(archiveStream, ZipArchiveMode.Read);
archive.Entries.First(e => e.Name == "Results.json").ExtractToFile(JsonPath);
}
using FileStream jsonFileStream = File.OpenRead(JsonPath);
RegexEntry[] entries = JsonSerializer.Deserialize<RegexEntry[]>(jsonFileStream, new JsonSerializerOptions { IncludeFields = true })!;
Console.WriteLine($"Working with {entries.Length} patterns");
record KnownPattern(string Pattern, RegexOptions Options, int Count);
sealed class RegexEntry
{
public required KnownPattern Regex { get; set; }
public required string MainSource { get; set; }
public required string PrSource { get; set; }
public string? FullDiff { get; set; }
public string? ShortDiff { get; set; }
public (string Name, string Values)[]? SearchValuesOfChar { get; set; }
public (string[] Values, StringComparison ComparisonType)[]? SearchValuesOfString { get; set; }
} |
- Update XML doc and inline comment in CanReduceLoopBacktrackingToSinglePosition to describe the Multi literal case (first char subsumed, second char disjoint) instead of claiming only One/Set are handled. - Update IsSubsetOf comments to use parameter names (subset/superset) instead of set1/set2. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
See benchmark results at https://gist.github.com/MihuBot/e04f5fe4c614aed7f26c3dbf6ea17478 |
There was a problem hiding this comment.
Pull request overview
This PR extends the regex compiler/tree optimization that reduces backtracking work for greedy single-character loops when they’re followed by a subsumed literal and then a disjoint constraint (e.g., boundary/anchor/disjoint set), enabling the engine to skip repeated backward searches and instead check only the final viable position.
Changes:
- Extend the “single viable backtrack position” optimization to handle
Multi(string) literals in addition toOne/Set. - Refactor tree-walking logic into a shared
FindNextNodeInSequencehelper for reuse across optimizations. - Replace the specialized word-subset helper with a general
RegexCharClass.IsSubsetOfand update call sites; add functional tests covering these scenarios.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs | Adds functional cases that exercise patterns where the new single-position backtracking reduction should/shouldn’t apply (including Multi). |
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs | Introduces CanReduceLoopBacktrackingToSinglePosition, extracts FindNextNodeInSequence, and updates atomicity checks to use IsSubsetOf. |
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs | Emits specialized backtracking code for qualifying loops to avoid repeated LastIndexOf calls (checks only the last viable position). |
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs | Adds conservative IsSubsetOf(subset, superset) helper and removes the redundant IsKnownWordClassSubset. |
| src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs | Mirrors the compiler-side optimization in the source generator emitter, keeping generated and runtime-compiled regex code in sync. |
You can also share your feedback on Copilot code review. Take the survey.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
...ibraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs
Outdated
Show resolved
Hide resolved
Add tests for: - Set literal arm (e.g. \d+[0-9]\s, \d+[0-9][a-z]) - Nothing after literal / optimization should not fire (\w+n, \d+5) - Notoneloop path ([^x]+a\b) - Minimum 0 / star (\w*n\b) - Longer Multi literal (\d+0abc) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Added a few more tests. |
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
This PR adds a regex compilation optimization to reduce backtracking for greedy single-character loops when they’re followed by a literal that’s contained in the loop’s character class, and the pattern immediately after that literal is disjoint from the loop’s class (e.g., \b\w+n\b). In those cases, only the last possible backtrack position can succeed, so the compiler/source-generator can avoid repeated LastIndexOf searches.
Changes:
- Add
RegexNode.CanReduceLoopBacktrackingToSinglePositionplus a sharedFindNextNodeInSequencehelper to analyze when the single-position backtrack is valid. - Update both the IL compiler and source generator emitters to use a direct “check last position” path instead of
LastIndexOfwhen the optimization applies. - Add functional test cases covering boundary, anchors, multi-literals, and grouping wrappers; replace
IsKnownWordClassSubsetusage withRegexCharClass.IsSubsetOf.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs | Adds functional cases intended to exercise the optimization scenarios and edge cases. |
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs | Adds the new applicability analysis helper and refactors “next node” traversal into a reusable method; updates word-subset checks to IsSubsetOf. |
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs | Emits the single-position backtrack fast-path in the IL compiler to avoid repeated LastIndexOf. |
| src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs | Introduces conservative IsSubsetOf helper and removes IsKnownWordClassSubset. |
| src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs | Mirrors the IL compiler fast-path in the source generator output. |
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
…ents Add concrete regex examples (e.g. \w+a\s, \d+[0-9]\s, \d+0x) to the CanReduceLoopBacktrackingToSinglePosition XML doc and inline case comments per reviewer request. Simplify the emitter/compiler comments to reference the method rather than repeating the full rationale. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/ba-g deadletter |
When a greedy character loop (like
\w+,\d+,[a-z]+) is followed by a literal that's part of the loop's character class, backtracking normally requires repeatedLastIndexOfcalls to find viable positions. However, if whatever comes after that literal is disjoint from the loop's class, then only the very last position consumed by the loop can possibly succeed — every earlier position would have a loop-class character where the disjoint subsequent needs something else.For example, in
\b\w+n\b, the\w+loop is followed byn(which is in\w), andnis followed by\b. Since the loop only matches word characters, any position in the middle of the loop's consumed range would have a word character after then, and the\bboundary wouldn't be satisfied there. Only the very last consumed position can work, so backtracking can skip directly to it rather than searching backward one position at a time.