Fix SORT_REGULAR with EG flag for transitive comparison#20315
Fix SORT_REGULAR with EG flag for transitive comparison#20315jmarble wants to merge 12 commits intophp:masterfrom
Conversation
6e73f5a to
7f9704c
Compare
Fixes phpGH-20262: SORT_REGULAR now uses transitive comparison for consistent sorting of mixed numeric/non-numeric strings and types. Added EG(transitive_compare_mode) flag with save/restore pattern to enforce ordering: numeric-types < numeric-strings < non-numeric. This fixes sort() and array_unique() inconsistencies with mixed types, nested arrays, and objects. Comparison operators unchanged.
7f9704c to
438b127
Compare
Empty strings must sort before numbers to match PHP 8+ semantics where '' < 5 is true. Updated compare_long_to_string(), compare_double_to_string(), and zendi_smart_strcmp() to handle empty string as a special case in transitive mode.
Without initialization, transitive_compare_mode had garbage values on Windows ZTS, causing the flag to leak into normal comparisons. Initialize to false in init_executor().
withinboredom
left a comment
There was a problem hiding this comment.
Nice! Some suggested additional tests that could help:
$a = ["", "0", "00", "A"];
$b = ["A", "00", "0", ""];
sort($a, SORT_REGULAR);
sort($b, SORT_REGULAR);
var_dump($a === $b); // expect true
// and expected order: ["", "0", "00", "A"]?
$a = [" 5", "+5", "-0", "0", "A"];
sort($a, SORT_REGULAR);
// expect numeric ones before "A", retaining numeric ordering: ["-0","0","+5"," 5","A"] and stable or no (+5 vs [space]5 insertion order)?
$a = ["5e2", "500", "NAN", "INF", "-INF"];
sort($a, SORT_REGULAR);
// So: ["500","5e2", "-INF","INF","NAN"]?
$a = ["0x10", "16", "0b10000"];
sort($a, SORT_REGULAR);
// expect: ["16", "0b10000", "0x10"]?
$a = [10, "3A", 5, "10", ""];
sort($a, SORT_REGULAR);
// expect: ["", 5, 10, "10", "3A"]?
$a = ["9223372036854775807", "9223372036854775808", 9223372036854775807];
sort($a, SORT_REGULAR);
// ensure consistency across LONG_MAX vs double promotionsSuggested-by: Rob Landers <[email protected]>
- Add gh20262.phpt: Bug reproduction test (replaces sort_regular_transitive*.phpt) - Add sort_variation_numeric_strings.phpt: Edge case tests for numeric strings Suggested-by: Rob Landers <[email protected]>
On x32, 9223372036854775807 becomes float instead of int.
|
Hi @jmarble. Sorry for the delayed response. One immediate concern is that this introduces an obvious inconsistency for key and value sorting. These two should remain consistent. $units = ['5', '3A'];
sort($units);
var_dump($units);
// array(2) {
// [0]=>
// string(1) "5"
// [1]=>
// string(2) "3A"
// }
$units = ['5' => 1, '3A' => 1];
ksort($units);
var_dump($units);
// array(2) {
// ["3A"]=>
// int(1)
// [5]=>
// int(1)
// }There's an obvious performance impact, given how common comparisons are (+0.4% instruction count as suggested by the Symfony Demo benchmark). Whether there's a true real-world impact remains to be seen. This is related to GH-9882. Enums are also plagued by them not being properly handled in I also wonder how this relates to @Girgias' efforts to make comparisons in PHP transitive. Maybe she can comment on that. |
|
Hi @iluuu1994 thank you for taking the time to review this so carefully. Admittedly, I got a bit too focused on resolving the issue I had run into. I'll push a commit here in a moment to resolve the discrepancy.
The CI benchmark was actually only +0.04%, so only 0.04ms on a hypothetical 100ms request scenario. I think this is acceptable for correctness. We could always try compensating with a performance improvement somewhere. I saw your email thread to internals regarding |
- Apply transitive comparison mode to ksort/krsort for consistency with sort(). - Reset transitive_compare_mode in _zend_bailout() to prevent mode leakage. - Add tests for ksort consistency and bailout scenarios.
@jmarble You're right, sorry. I mistyped. It's not completely insignificant (performance improvements are very hard to come by nowadays), but unlikely to be a blocker if we otherwise decide this is the right behavior. I do think it may be useful to consider consolidating comparison as a whole. Gina has done some work on that, as mentioned. Maybe we can solve this in a more consistent manner. If not, I don't object to your solution. The array_unique case specifically may be better served with a case that doesn't rely on sorting. Some people have suggested introducing a proper |
|
@iluuu1994 looks like we got a pretty favorable run on this last benchmark? A swing from +0.04% to -0.19% and faster across the board? Could be noise, or just more proof of negligible impact on performance.
I agree. Admittedly, I took the path of least resistance. I felt there was enough precedent for an EG flag.
Yes, a proper |
This test covers reentrancy cases with usort, uksort, sort, array_unique, and ksort.
This commit replaces manual comparisons with ZEND_THREEWAY_COMPARE to improve consistency. Additionally, it adds type-specific fast paths for integer, string, and double comparisons in array sorting.
Fixes CI error: "label followed by a declaration is a C23 extension"
|
For context, I had started thinking about what comparison semantics PHP should have and some of it is written down in one of my draft RFC: https://github.com/Girgias/php-rfcs/blob/master/comparison-equality-semantics.md However, not that comparison operators cannot be transitive, even if you just include numeric values due to In general I believe that getting rid of "ordering" for strings with the comparison operators is a better approach and have people use However, this doesn't really address |
|
@Girgias thank you for sharing your RFC draft on equality semantics -- you make a compelling case! To address your question: creating a standalone Ultimately, I felt this path of least resistance was acceptable given the amount of refactoring that would be necessary to implement a proper alternative. Perhaps this solution might at least inspire someone who might be willing to make such a refactor? In my recent commit, I found some opportunities to improve comparison performance. I noticed In real-world scenarios, I would expect no meaningful performance impact. However, the correctness bug this resolves is a demonstrable win. |
Refactors php_array_data_compare_unstable_i() to dereference before type checking and simplify control flow with if/else-if chain. This eliminates redundant mode toggling and improves branch prediction. Pass transitive_mode parameter to compare_long_to_string() and compare_double_to_string() to reduce repeated EG() calls.
|
@Girgias Thanks again for the earlier feedback -- it pushed me to try the approach you suggested. I’ve opened a new PR that leaves |
Problem
array_unique()withSORT_REGULARwas missing duplicates, andsort()was producing inconsistent results depending on input order when comparing mixed types (numeric strings, non-numeric strings, integers, doubles).The root cause: SORT_REGULAR was using PHP's comparison operators (which are non-transitive by design) directly in sorting algorithms that require transitive comparisons to function correctly.
When comparing these types, the comparison created cycles:
"5" < "10"(numeric: 5 < 10)"10" < "3A"(lexicographic: '1' < '3')"5" > "3A"(lexicographic: '5' > '3') Creates a cycleThis violated the fundamental requirement of sorting algorithms: the comparison function must be transitive. This affected not only scalar strings but also nested arrays and objects with string properties.
Solution
Added
EG(transitive_compare_mode)flag tozend_executor_globalsthat enforces transitive comparison during sorting operations, with consistent ordering that matches PHP 8+ semantics: empty-strings < numeric-types < numeric-strings < non-numeric.Modified three comparison functions to check this flag:
zendi_smart_strcmp()- handles string-to-string comparisonscompare_long_to_string()- handles integer-to-string comparisonscompare_double_to_string()- handles double-to-string comparisonsThe flag is properly initialized in
init_executor()and uses save/restore pattern inphp_array_data_compare_unstable_i()to handle reentrancy correctly.Important: This fix does not change the behavior of comparison operators like
<=>, maintaining backward compatibility. The fix only affects sorting and array operations with SORT_REGULAR.Tests
Added comprehensive tests:
gh20262.phpt- Bug reproduction test covering scalars, objects, and nested arraysgh20262_bailout.phpt- Tests bailout handling to prevent mode leakagegh20262_reentrancy.phpt- Tests reentrancy handling in comparisonssort/sort_variation_numeric_strings.phpt- Edge case tests for numeric string handlingsort/ksort_variation_numeric_strings.phpt- Tests transitive mode in key sortingAll existing array sorting tests pass without modification!
Fixes #20262