Commit c05c6a4
Reduce backtracking for greedy loops followed by subsumed literals (#125636)
When a greedy character loop (like `\w+`, `\d+`, `[a-z]+`) is followed
by a literal that's part of the loop's character class, backtracking
normally requires repeated `LastIndexOf` calls to find viable positions.
However, if whatever comes *after* that literal is disjoint from the
loop's class, then only the very last position consumed by the loop can
possibly succeed — every earlier position would have a loop-class
character where the disjoint subsequent needs something else.
For example, in `\b\w+n\b`, the `\w+` loop is followed by `n` (which is
in `\w`), and `n` is followed by `\b`. Since the loop only matches word
characters, any position in the middle of the loop's consumed range
would have a word character after the `n`, and the `\b` boundary
wouldn't be satisfied there. Only the very last consumed position can
work, so backtracking can skip directly to it rather than searching
backward one position at a time.
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Dan Moseley <danmose@microsoft.com>1 parent 4687f9a commit c05c6a4
5 files changed
Lines changed: 381 additions & 129 deletions
File tree
- src/libraries/System.Text.RegularExpressions
- gen
- src/System/Text/RegularExpressions
- tests/FunctionalTests
Lines changed: 49 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3482 | 3482 | | |
3483 | 3483 | | |
3484 | 3484 | | |
3485 | | - | |
| 3485 | + | |
| 3486 | + | |
| 3487 | + | |
| 3488 | + | |
| 3489 | + | |
| 3490 | + | |
| 3491 | + | |
| 3492 | + | |
| 3493 | + | |
3486 | 3494 | | |
3487 | | - | |
3488 | | - | |
3489 | | - | |
3490 | | - | |
| 3495 | + | |
| 3496 | + | |
| 3497 | + | |
| 3498 | + | |
| 3499 | + | |
| 3500 | + | |
| 3501 | + | |
| 3502 | + | |
| 3503 | + | |
| 3504 | + | |
| 3505 | + | |
| 3506 | + | |
| 3507 | + | |
| 3508 | + | |
| 3509 | + | |
| 3510 | + | |
3491 | 3511 | | |
3492 | | - | |
| 3512 | + | |
| 3513 | + | |
| 3514 | + | |
| 3515 | + | |
| 3516 | + | |
| 3517 | + | |
| 3518 | + | |
| 3519 | + | |
| 3520 | + | |
| 3521 | + | |
| 3522 | + | |
3493 | 3523 | | |
3494 | | - | |
| 3524 | + | |
| 3525 | + | |
| 3526 | + | |
| 3527 | + | |
| 3528 | + | |
| 3529 | + | |
| 3530 | + | |
| 3531 | + | |
| 3532 | + | |
| 3533 | + | |
| 3534 | + | |
| 3535 | + | |
| 3536 | + | |
3495 | 3537 | | |
3496 | | - | |
3497 | | - | |
3498 | 3538 | | |
3499 | 3539 | | |
3500 | 3540 | | |
| |||
Lines changed: 66 additions & 49 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
879 | 879 | | |
880 | 880 | | |
881 | 881 | | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
| 890 | + | |
| 891 | + | |
| 892 | + | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
| 937 | + | |
| 938 | + | |
| 939 | + | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
882 | 948 | | |
883 | 949 | | |
884 | 950 | | |
| |||
1127 | 1193 | | |
1128 | 1194 | | |
1129 | 1195 | | |
1130 | | - | |
1131 | | - | |
1132 | | - | |
1133 | | - | |
1134 | | - | |
1135 | | - | |
1136 | | - | |
1137 | | - | |
1138 | | - | |
1139 | | - | |
1140 | | - | |
1141 | | - | |
1142 | | - | |
1143 | | - | |
1144 | | - | |
1145 | | - | |
1146 | | - | |
1147 | | - | |
1148 | | - | |
1149 | | - | |
1150 | | - | |
1151 | | - | |
1152 | | - | |
1153 | | - | |
1154 | | - | |
1155 | | - | |
1156 | | - | |
1157 | | - | |
1158 | | - | |
1159 | | - | |
1160 | | - | |
1161 | | - | |
1162 | | - | |
1163 | | - | |
1164 | | - | |
1165 | | - | |
1166 | | - | |
1167 | | - | |
1168 | | - | |
1169 | | - | |
1170 | | - | |
1171 | | - | |
1172 | | - | |
1173 | | - | |
1174 | | - | |
1175 | | - | |
1176 | | - | |
1177 | | - | |
1178 | | - | |
1179 | 1196 | | |
1180 | 1197 | | |
1181 | 1198 | | |
| |||
Lines changed: 92 additions & 33 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3655 | 3655 | | |
3656 | 3656 | | |
3657 | 3657 | | |
3658 | | - | |
3659 | | - | |
3660 | | - | |
3661 | | - | |
3662 | | - | |
3663 | | - | |
3664 | | - | |
3665 | | - | |
| 3658 | + | |
| 3659 | + | |
| 3660 | + | |
3666 | 3661 | | |
3667 | | - | |
| 3662 | + | |
| 3663 | + | |
| 3664 | + | |
| 3665 | + | |
| 3666 | + | |
| 3667 | + | |
| 3668 | + | |
3668 | 3669 | | |
3669 | | - | |
3670 | 3670 | | |
3671 | | - | |
3672 | | - | |
3673 | | - | |
| 3671 | + | |
| 3672 | + | |
| 3673 | + | |
| 3674 | + | |
| 3675 | + | |
| 3676 | + | |
| 3677 | + | |
| 3678 | + | |
| 3679 | + | |
| 3680 | + | |
| 3681 | + | |
| 3682 | + | |
| 3683 | + | |
| 3684 | + | |
| 3685 | + | |
| 3686 | + | |
| 3687 | + | |
| 3688 | + | |
| 3689 | + | |
| 3690 | + | |
| 3691 | + | |
| 3692 | + | |
| 3693 | + | |
| 3694 | + | |
| 3695 | + | |
| 3696 | + | |
| 3697 | + | |
| 3698 | + | |
| 3699 | + | |
| 3700 | + | |
| 3701 | + | |
| 3702 | + | |
| 3703 | + | |
| 3704 | + | |
| 3705 | + | |
| 3706 | + | |
| 3707 | + | |
| 3708 | + | |
3674 | 3709 | | |
3675 | 3710 | | |
3676 | 3711 | | |
3677 | | - | |
3678 | | - | |
3679 | | - | |
3680 | | - | |
3681 | | - | |
3682 | | - | |
| 3712 | + | |
| 3713 | + | |
| 3714 | + | |
| 3715 | + | |
| 3716 | + | |
| 3717 | + | |
| 3718 | + | |
| 3719 | + | |
| 3720 | + | |
| 3721 | + | |
| 3722 | + | |
| 3723 | + | |
| 3724 | + | |
| 3725 | + | |
| 3726 | + | |
| 3727 | + | |
| 3728 | + | |
| 3729 | + | |
| 3730 | + | |
| 3731 | + | |
| 3732 | + | |
| 3733 | + | |
| 3734 | + | |
| 3735 | + | |
| 3736 | + | |
3683 | 3737 | | |
3684 | | - | |
3685 | | - | |
| 3738 | + | |
| 3739 | + | |
3686 | 3740 | | |
3687 | | - | |
3688 | | - | |
3689 | | - | |
| 3741 | + | |
| 3742 | + | |
| 3743 | + | |
3690 | 3744 | | |
3691 | | - | |
3692 | | - | |
3693 | | - | |
3694 | | - | |
3695 | | - | |
| 3745 | + | |
| 3746 | + | |
| 3747 | + | |
| 3748 | + | |
| 3749 | + | |
| 3750 | + | |
| 3751 | + | |
| 3752 | + | |
| 3753 | + | |
| 3754 | + | |
3696 | 3755 | | |
3697 | 3756 | | |
3698 | 3757 | | |
| |||
3701 | 3760 | | |
3702 | 3761 | | |
3703 | 3762 | | |
3704 | | - | |
3705 | 3763 | | |
3706 | | - | |
3707 | | - | |
3708 | | - | |
| 3764 | + | |
| 3765 | + | |
| 3766 | + | |
| 3767 | + | |
3709 | 3768 | | |
3710 | 3769 | | |
3711 | 3770 | | |
| |||
0 commit comments