Fix #20262: SORT_REGULAR transitivity violation with mixed numeric/non-numeric strings#20305
Fix #20262: SORT_REGULAR transitivity violation with mixed numeric/non-numeric strings#20305jmarble wants to merge 1 commit intophp:masterfrom
Conversation
Fixed transitivity violation in SORT_REGULAR comparison that caused sort() and array_unique() to produce incorrect results with mixed numeric and non-numeric strings. The fix adds special handling in php_array_data_compare_unstable_i() to consistently order non-numeric strings before numeric strings when one is numeric and one is not. This ensures transitivity for sorting operations. Importantly, this fix does NOT change the behavior of comparison operators like <=>, maintaining backward compatibility. The fix only affects sorting and array_unique() operations with SORT_REGULAR, similar to how enum comparison is handled specially for sorting.
| @@ -0,0 +1,99 @@ | |||
| --TEST-- | |||
| GH-20262: array_unique() with SORT_REGULAR fails to identify duplicates with mixed numeric/alphanumeric strings | |||
There was a problem hiding this comment.
Might be good idea to verify this PR using naive impl. using 2 loops and == operator and random data.
There was a problem hiding this comment.
Here's a simple test with a small amount of random data, looped 1k times:
https://gist.github.com/jmarble/c7debbadbca5f2dc21a160f0a48a65af
|
I tired to take this strategy one step further, to apply a similar fix for arrays and objects -- I was able to fix the transitivity bug with arrays but I could not find a solution for objects. We could add a context parameter to |
|
Closing in favor of #20315 which provides a complete solution that fixes transitivity for all mixed types (scalars, nested arrays, and objects). |
Problem
array_unique()withSORT_REGULARwas missing duplicates, andsort()was incorrectly ordering equal values due to a transitivity violation in string comparison.When comparing numeric strings (like
'5','10') with non-numeric strings (like'3A'), the comparison function fell back to lexicographic comparison, which violated transitivity:'5' < '10'(numeric: 5 < 10)'10' < '3A'(lexicographic: '1' < '3')'5' > '3A'(lexicographic: '5' > '3') Violates transitivityThis caused equal values to not be grouped together in sorted arrays.
Solution
Fixed
php_array_data_compare_unstable_i()inext/standard/array.cto consistently order non-numeric strings before numeric strings when one is numeric and one is not, ensuring transitivity for all SORT_REGULAR operations.Important: This fix does not change the behavior of comparison operators like
<=>, maintaining backward compatibility. The fix only affects sorting and array operations with SORT_REGULAR, similar to how enum comparison is handled specially for sorting. This fix also does not correct the underlying issue as it affects object and nested arrays.Test Updates Required
The following tests need their expected output updated to reflect the new transitive sort order:
ext/standard/tests/array/sort/array_multisort_basic1.phptext/standard/tests/array/sort/array_multisort_variation5.phptext/standard/tests/array/sort/array_multisort_variation6.phptTests marked as "(OK to fail as result is unpredictable)" (arsort_variation11, asort_variation11, sort_variation11, rsort_variation11) should maybe still be updated for consistency.