perf: Use batched row conversion for `array_has_any`, `array_has_all` by neilconway · Pull Request #20588 · apache/datafusion

neilconway · 2026-02-27T02:35:23Z

Which issue does this PR close?

Closes Use batched row conversion for array_has_any, array_has_all #20587 .

Rationale for this change

array_has_any and array_has_all called RowConverter::convert_columns twice for every input row. convert_columns has a lot of per-call overhead: allocating a new Rows buffer, doing various schema checking, and so on.

It is considerably more efficient to use RowConverter twice up front and convert all of the haystack and needle inputs in bulk. We can then implement the has_any / has_all predicate comparison by indexing into the converted rows.

array_has_any / array_has_all had a special-case for strings, but it had an analogous problem: it iterated over rows, materialized each row's inner list, and then called string_array_to_vec twice per row. That does a lot of per-row work; it is significantly faster to call string_array_to_vec on all input rows at once, and then index into the results to implement the per-row comparisons.

What changes are included in this PR?

Implement optimization
Improve test coverage for sliced arrays; not strictly related to this PR but more coverage for this codepath made me feel more comfortable

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

neilconway · 2026-02-27T03:00:51Z

Benchmarks:

   RowConverter path (i64)

  ┌─────────────────────────────┬──────┬──────────┬──────────┬────────┐
  │          Benchmark          │ Size │   Main   │  Branch  │ Change │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all/all_found     │ 10   │ 3.03 ms  │ 0.65 ms  │ -78%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all/not_all_found │ 10   │ 2.74 ms  │ 0.47 ms  │ -83%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any/some_match    │ 10   │ 2.83 ms  │ 0.57 ms  │ -80%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any/no_match      │ 10   │ 3.35 ms  │ 1.04 ms  │ -69%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all/all_found     │ 100  │ 7.47 ms  │ 4.86 ms  │ -35%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all/not_all_found │ 100  │ 6.78 ms  │ 4.07 ms  │ -40%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any/some_match    │ 100  │ 7.11 ms  │ 4.60 ms  │ -35%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any/no_match      │ 100  │ 11.95 ms │ 9.47 ms  │ -21%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all/all_found     │ 500  │ 33.35 ms │ 32.10 ms │ -4%    │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all/not_all_found │ 500  │ 29.57 ms │ 28.59 ms │ -3%    │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any/some_match    │ 500  │ 30.80 ms │ 29.64 ms │ -4%    │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any/no_match      │ 500  │ 59.78 ms │ 59.89 ms │ ~0%    │
  └─────────────────────────────┴──────┴──────────┴──────────┴────────┘

  String path

  ┌─────────────────────────────────────┬──────┬──────────┬──────────┬────────┐
  │              Benchmark              │ Size │   Main   │  Branch  │ Change │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all_strings/all_found     │ 10   │ 1.92 ms  │ 1.17 ms  │ -39%   │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all_strings/not_all_found │ 10   │ 1.41 ms  │ 0.71 ms  │ -50%   │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any_strings/some_match    │ 10   │ 1.74 ms  │ 1.04 ms  │ -40%   │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any_strings/no_match      │ 10   │ 1.96 ms  │ 1.28 ms  │ -35%   │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all_strings/all_found     │ 100  │ 6.35 ms  │ 5.50 ms  │ -13%   │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all_strings/not_all_found │ 100  │ 5.55 ms  │ 4.73 ms  │ -15%   │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any_strings/some_match    │ 100  │ 5.75 ms  │ 4.97 ms  │ -14%   │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any_strings/no_match      │ 100  │ 9.65 ms  │ 8.79 ms  │ -9%    │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all_strings/all_found     │ 500  │ 28.05 ms │ 26.96 ms │ -4%    │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all_strings/not_all_found │ 500  │ 30.38 ms │ 29.47 ms │ -3%    │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any_strings/some_match    │ 500  │ 25.01 ms │ 24.04 ms │ -4%    │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any_strings/no_match      │ 500  │ 58.31 ms │ 57.32 ms │ -2%    │
  └─────────────────────────────────────┴──────┴──────────┴──────────┴────────┘

It's a significant win for short arrays, and a small win for large arrays. For large arrays, the N*M comparison cost probably dominates. We should probably be able to do something smarter by hashing, I'll look at that shortly but in a separate PR.

Omega359 · 2026-03-02T22:27:41Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux fedora 6.18.13-200.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Feb 19 19:54:01 UTC 2026 x86_64 GNU/Linux
Comparing neilc/optimize-array-has-any-all-rowconvert (396bec0) to a257c29 diff
BENCH_NAME=array_has
BENCH_COMMAND=cargo bench --bench array_has
BENCH_FILTER=
BENCH_BRANCH_NAME=neilc_optimize-array-has-any-all-rowconvert
Results will be posted here when complete

Omega359 · 2026-03-02T22:50:01Z

🤖: Benchmark completed

Details

group                                          main                                   neilc_optimize-array-has-any-all-rowconvert
-----                                          ----                                   -------------------------------------------
array_has_all/all_found_small_needle/10        4.32      2.5±0.01ms        ? ?/sec    1.00    587.3±1.28µs        ? ?/sec
array_has_all/all_found_small_needle/100       1.34      6.0±0.13ms        ? ?/sec    1.00      4.5±0.15ms        ? ?/sec
array_has_all/all_found_small_needle/500       1.00     21.0±0.11ms        ? ?/sec    1.70     35.7±0.83ms        ? ?/sec
array_has_all/not_all_found/10                 6.57      2.4±0.02ms        ? ?/sec    1.00    365.1±5.59µs        ? ?/sec
array_has_all/not_all_found/100                1.40      5.4±0.04ms        ? ?/sec    1.00      3.9±0.06ms        ? ?/sec
array_has_all/not_all_found/500                1.00     19.1±0.17ms        ? ?/sec    1.68     32.1±1.57ms        ? ?/sec
array_has_all_strings/all_found/10             2.30      2.2±0.05ms        ? ?/sec    1.00   940.4±16.17µs        ? ?/sec
array_has_all_strings/all_found/100            1.19      7.1±0.22ms        ? ?/sec    1.00      5.9±0.55ms        ? ?/sec
array_has_all_strings/all_found/500            1.00     30.1±0.76ms        ? ?/sec    1.33     40.1±0.38ms        ? ?/sec
array_has_all_strings/not_all_found/10         2.97  1809.3±19.42µs        ? ?/sec    1.00   608.9±22.77µs        ? ?/sec
array_has_all_strings/not_all_found/100        1.31      6.4±0.13ms        ? ?/sec    1.00      4.9±0.14ms        ? ?/sec
array_has_all_strings/not_all_found/500        1.00     33.5±0.44ms        ? ?/sec    1.27     42.5±0.42ms        ? ?/sec
array_has_any/no_match/10                      4.21      2.7±0.01ms        ? ?/sec    1.00    631.2±7.30µs        ? ?/sec
array_has_any/no_match/100                     1.21      7.9±0.06ms        ? ?/sec    1.00      6.5±0.51ms        ? ?/sec
array_has_any/no_match/500                     1.00     31.1±0.24ms        ? ?/sec    1.44     44.7±0.70ms        ? ?/sec
array_has_any/scalar_no_match/10               1.00    637.4±5.38µs        ? ?/sec    1.02    650.4±8.17µs        ? ?/sec
array_has_any/scalar_no_match/100              1.00      7.2±0.27ms        ? ?/sec    1.82     13.1±0.46ms        ? ?/sec
array_has_any/scalar_no_match/500              1.00     48.5±2.17ms        ? ?/sec    1.61     78.0±0.37ms        ? ?/sec
array_has_any/scalar_some_match/10             1.00    448.9±2.41µs        ? ?/sec    1.03    462.1±5.77µs        ? ?/sec
array_has_any/scalar_some_match/100            1.00      4.7±0.58ms        ? ?/sec    1.28      6.0±0.10ms        ? ?/sec
array_has_any/scalar_some_match/500            1.00     38.4±3.46ms        ? ?/sec    1.16     44.7±2.10ms        ? ?/sec
array_has_any/some_match/10                    4.70      2.4±0.01ms        ? ?/sec    1.00    518.2±6.22µs        ? ?/sec
array_has_any/some_match/100                   1.24      5.5±0.03ms        ? ?/sec    1.00      4.4±0.12ms        ? ?/sec
array_has_any/some_match/500                   1.00     19.1±0.43ms        ? ?/sec    1.78     34.1±0.83ms        ? ?/sec
array_has_any_scalar/i64_no_match/1            1.02    131.9±2.03µs        ? ?/sec    1.00    129.3±4.23µs        ? ?/sec
array_has_any_scalar/i64_no_match/10           1.06    160.4±4.41µs        ? ?/sec    1.00    151.0±4.30µs        ? ?/sec
array_has_any_scalar/i64_no_match/100          1.01    220.6±7.16µs        ? ?/sec    1.00    218.6±8.77µs        ? ?/sec
array_has_any_scalar/i64_no_match/1000         1.05    194.8±5.54µs        ? ?/sec    1.00    185.7±7.38µs        ? ?/sec
array_has_any_scalar/string_no_match/1         1.03    114.1±1.98µs        ? ?/sec    1.00    110.5±1.68µs        ? ?/sec
array_has_any_scalar/string_no_match/10        1.04    163.0±5.42µs        ? ?/sec    1.00    156.6±4.13µs        ? ?/sec
array_has_any_scalar/string_no_match/100       1.03   217.8±10.17µs        ? ?/sec    1.00    212.1±7.81µs        ? ?/sec
array_has_any_scalar/string_no_match/1000      1.05    172.9±5.74µs        ? ?/sec    1.00    164.5±5.54µs        ? ?/sec
array_has_any_strings/no_match/10              2.07      2.2±0.11ms        ? ?/sec    1.00  1041.5±36.23µs        ? ?/sec
array_has_any_strings/no_match/100             1.19      9.7±0.39ms        ? ?/sec    1.00      8.2±0.02ms        ? ?/sec
array_has_any_strings/no_match/500             1.00     56.9±0.81ms        ? ?/sec    1.19     68.0±0.72ms        ? ?/sec
array_has_any_strings/scalar_no_match/10       1.03    307.3±2.17µs        ? ?/sec    1.00    297.8±9.04µs        ? ?/sec
array_has_any_strings/scalar_no_match/100      1.01      2.8±0.02ms        ? ?/sec    1.00      2.8±0.10ms        ? ?/sec
array_has_any_strings/scalar_no_match/500      1.00     33.3±0.71ms        ? ?/sec    1.04     34.5±1.05ms        ? ?/sec
array_has_any_strings/scalar_some_match/10     1.06    339.4±7.53µs        ? ?/sec    1.00   320.8±17.19µs        ? ?/sec
array_has_any_strings/scalar_some_match/100    1.03  1542.3±45.67µs        ? ?/sec    1.00  1494.7±259.82µs        ? ?/sec
array_has_any_strings/scalar_some_match/500    1.01      6.4±0.52ms        ? ?/sec    1.00      6.4±0.46ms        ? ?/sec
array_has_any_strings/some_match/10            2.67      2.2±0.18ms        ? ?/sec    1.00   825.7±21.50µs        ? ?/sec
array_has_any_strings/some_match/100           1.15      6.4±0.25ms        ? ?/sec    1.00      5.6±0.18ms        ? ?/sec
array_has_any_strings/some_match/500           1.00     26.1±1.76ms        ? ?/sec    1.38     36.0±0.71ms        ? ?/sec
array_has_i64/found/10                         1.00     53.2±0.33µs        ? ?/sec    1.13     59.9±0.57µs        ? ?/sec
array_has_i64/found/100                        1.00    310.1±1.57µs        ? ?/sec    1.26    391.7±4.26µs        ? ?/sec
array_has_i64/found/500                        1.00  1577.5±79.52µs        ? ?/sec    1.22  1924.5±89.09µs        ? ?/sec
array_has_i64/not_found/10                     1.00     43.1±0.29µs        ? ?/sec    1.19     51.1±0.28µs        ? ?/sec
array_has_i64/not_found/100                    1.00    283.7±3.81µs        ? ?/sec    1.27    361.6±1.07µs        ? ?/sec
array_has_i64/not_found/500                    1.00  1516.6±74.76µs        ? ?/sec    1.21  1828.0±85.93µs        ? ?/sec
array_has_strings/found/10                     1.09    355.8±6.40µs        ? ?/sec    1.00   327.6±11.75µs        ? ?/sec
array_has_strings/found/100                    1.12  1452.2±60.23µs        ? ?/sec    1.00  1302.1±16.93µs        ? ?/sec
array_has_strings/found/500                    1.51      7.8±0.73ms        ? ?/sec    1.00      5.2±0.38ms        ? ?/sec
array_has_strings/not_found/10                 1.25     59.7±0.77µs        ? ?/sec    1.00     47.7±0.08µs        ? ?/sec
array_has_strings/not_found/100                1.04      2.6±0.31ms        ? ?/sec    1.00      2.4±0.11ms        ? ?/sec
array_has_strings/not_found/500                1.13     11.7±0.19ms        ? ?/sec    1.00     10.3±0.12ms        ? ?/sec

neilconway · 2026-03-02T23:10:29Z

This PR doesn't touch the array_has codepath at all, and yet the benchmark run claims fairly wide swings (51%, 27%, 25%, etc.) on some of those benchmarks... I wonder if the benchmark setup needs a longer warm up or perhaps more runs to get more reliable results.

Omega359 · 2026-03-02T23:38:35Z

This PR doesn't touch the array_has codepath at all, and yet the benchmark run claims fairly wide swings (51%, 27%, 25%, etc.) on some of those benchmarks... I wonder if the benchmark setup needs a longer warm up or perhaps more runs to get more reliable results.

I've been wondering that myself. That was run on my personal machine, and since benchmarks are largely single threaded I wouldn't expect much variability in the results .. but the numbers do seem off. I'll try another run later, otherwise I can see about running them on my ec2 instance I have.

martin-g

Could you please add a test with a sliced FixedSizeList ?
It is the only List impl that has a special logic for offsets() and a test would make us sleep better!

Omega359 · 2026-03-03T17:19:03Z

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux ip-10-150-132-212 6.17.0-1007-aws #7-Ubuntu SMP Thu Jan 22 19:26:05 UTC 2026 x86_64 GNU/Linux
Comparing neilc/optimize-array-has-any-all-rowconvert (396bec0) to a257c29 diff
BENCH_NAME=array_has
BENCH_COMMAND=cargo bench --bench array_has
BENCH_FILTER=
BENCH_BRANCH_NAME=neilc_optimize-array-has-any-all-rowconvert
Results will be posted here when complete

Omega359 · 2026-03-03T17:46:51Z

🤖: Benchmark completed

Details

group                                          main                                   neilc_optimize-array-has-any-all-rowconvert
-----                                          ----                                   -------------------------------------------
array_has_all/all_found_small_needle/10        3.92      3.5±0.01ms        ? ?/sec    1.00    895.8±7.32µs        ? ?/sec
array_has_all/all_found_small_needle/100       1.33      8.0±0.01ms        ? ?/sec    1.00      6.0±0.07ms        ? ?/sec
array_has_all/all_found_small_needle/500       1.00     30.2±0.14ms        ? ?/sec    1.79     54.0±0.56ms        ? ?/sec
array_has_all/not_all_found/10                 5.10      3.3±0.01ms        ? ?/sec    1.00    642.3±3.51µs        ? ?/sec
array_has_all/not_all_found/100                1.35      7.3±0.01ms        ? ?/sec    1.00      5.4±0.04ms        ? ?/sec
array_has_all/not_all_found/500                1.00     27.8±0.41ms        ? ?/sec    1.83     51.0±0.72ms        ? ?/sec
array_has_all_strings/all_found/10             2.53      3.4±0.01ms        ? ?/sec    1.00   1362.7±6.88µs        ? ?/sec
array_has_all_strings/all_found/100            1.28      9.7±0.05ms        ? ?/sec    1.00      7.5±0.03ms        ? ?/sec
array_has_all_strings/all_found/500            1.00     40.9±0.21ms        ? ?/sec    1.49     60.9±0.48ms        ? ?/sec
array_has_all_strings/not_all_found/10         2.69      2.4±0.01ms        ? ?/sec    1.00   905.8±19.14µs        ? ?/sec
array_has_all_strings/not_all_found/100        1.46      9.8±0.04ms        ? ?/sec    1.00      6.7±0.04ms        ? ?/sec
array_has_all_strings/not_all_found/500        1.00     44.8±0.17ms        ? ?/sec    1.45     65.2±0.67ms        ? ?/sec
array_has_any/no_match/10                      3.02      3.6±0.01ms        ? ?/sec    1.00  1183.3±13.12µs        ? ?/sec
array_has_any/no_match/100                     1.07     11.7±0.03ms        ? ?/sec    1.00     10.9±0.56ms        ? ?/sec
array_has_any/no_match/500                     1.00     49.6±0.56ms        ? ?/sec    1.54     76.2±1.60ms        ? ?/sec
array_has_any/scalar_no_match/10               1.02   1188.5±8.11µs        ? ?/sec    1.00  1162.1±13.34µs        ? ?/sec
array_has_any/scalar_no_match/100              1.00     11.3±0.01ms        ? ?/sec    1.00     11.3±0.03ms        ? ?/sec
array_has_any/scalar_no_match/500              1.02     83.9±0.69ms        ? ?/sec    1.00     82.3±1.02ms        ? ?/sec
array_has_any/scalar_some_match/10             1.00    682.4±3.92µs        ? ?/sec    1.02   693.0±18.16µs        ? ?/sec
array_has_any/scalar_some_match/100            1.00      5.8±0.02ms        ? ?/sec    1.00      5.8±0.04ms        ? ?/sec
array_has_any/scalar_some_match/500            1.05     56.8±0.72ms        ? ?/sec    1.00     54.2±1.36ms        ? ?/sec
array_has_any/some_match/10                    4.28      3.3±0.02ms        ? ?/sec    1.00    778.3±6.39µs        ? ?/sec
array_has_any/some_match/100                   1.33      7.7±0.04ms        ? ?/sec    1.00      5.8±0.08ms        ? ?/sec
array_has_any/some_match/500                   1.00     29.8±0.59ms        ? ?/sec    1.78     52.9±0.53ms        ? ?/sec
array_has_any_scalar/i64_no_match/1            1.08    214.1±1.34µs        ? ?/sec    1.00    197.7±1.41µs        ? ?/sec
array_has_any_scalar/i64_no_match/10           1.00    234.9±9.23µs        ? ?/sec    1.00    234.3±7.88µs        ? ?/sec
array_has_any_scalar/i64_no_match/100          1.02   335.4±17.80µs        ? ?/sec    1.00   328.9±17.53µs        ? ?/sec
array_has_any_scalar/i64_no_match/1000         1.00    280.1±7.92µs        ? ?/sec    1.02    285.5±9.44µs        ? ?/sec
array_has_any_scalar/string_no_match/1         1.02    162.2±1.45µs        ? ?/sec    1.00    159.1±0.93µs        ? ?/sec
array_has_any_scalar/string_no_match/10        1.02    233.0±6.74µs        ? ?/sec    1.00    227.7±6.98µs        ? ?/sec
array_has_any_scalar/string_no_match/100       1.00   298.5±11.34µs        ? ?/sec    1.00   299.7±13.40µs        ? ?/sec
array_has_any_scalar/string_no_match/1000      1.01    245.5±6.70µs        ? ?/sec    1.00    243.0±6.24µs        ? ?/sec
array_has_any_strings/no_match/10              2.09      3.5±0.02ms        ? ?/sec    1.00   1655.1±8.80µs        ? ?/sec
array_has_any_strings/no_match/100             1.18     13.8±0.05ms        ? ?/sec    1.00     11.6±0.06ms        ? ?/sec
array_has_any_strings/no_match/500             1.00     78.9±0.19ms        ? ?/sec    1.26     99.3±0.52ms        ? ?/sec
array_has_any_strings/scalar_no_match/10       1.00    559.1±2.95µs        ? ?/sec    1.06    590.7±6.64µs        ? ?/sec
array_has_any_strings/scalar_no_match/100      1.00      4.7±0.02ms        ? ?/sec    1.02      4.7±0.03ms        ? ?/sec
array_has_any_strings/scalar_no_match/500      1.00     48.3±0.09ms        ? ?/sec    1.14     54.9±0.17ms        ? ?/sec
array_has_any_strings/scalar_some_match/10     1.00    450.1±4.62µs        ? ?/sec    1.09    492.8±2.35µs        ? ?/sec
array_has_any_strings/scalar_some_match/100    1.00      2.3±0.01ms        ? ?/sec    1.05      2.4±0.01ms        ? ?/sec
array_has_any_strings/scalar_some_match/500    1.00      8.9±0.04ms        ? ?/sec    1.02      9.0±0.02ms        ? ?/sec
array_has_any_strings/some_match/10            2.50      3.0±0.01ms        ? ?/sec    1.00   1213.5±8.73µs        ? ?/sec
array_has_any_strings/some_match/100           1.28      8.6±0.10ms        ? ?/sec    1.00      6.8±0.03ms        ? ?/sec
array_has_any_strings/some_match/500           1.00     35.1±0.11ms        ? ?/sec    1.59     56.0±0.83ms        ? ?/sec
array_has_i64/found/10                         1.00    131.2±1.08µs        ? ?/sec    1.01    132.1±2.58µs        ? ?/sec
array_has_i64/found/100                        1.00    615.6±1.03µs        ? ?/sec    1.02    629.0±6.16µs        ? ?/sec
array_has_i64/found/500                        1.00      2.7±0.01ms        ? ?/sec    1.02      2.7±0.01ms        ? ?/sec
array_has_i64/not_found/10                     1.01     75.0±0.65µs        ? ?/sec    1.00     74.4±2.37µs        ? ?/sec
array_has_i64/not_found/100                    1.00    534.6±0.90µs        ? ?/sec    1.01    541.5±3.25µs        ? ?/sec
array_has_i64/not_found/500                    1.04      2.7±0.23ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
array_has_strings/found/10                     1.01    477.7±8.76µs        ? ?/sec    1.00    473.2±3.84µs        ? ?/sec
array_has_strings/found/100                    1.00  1822.6±12.53µs        ? ?/sec    1.00   1821.8±8.67µs        ? ?/sec
array_has_strings/found/500                    1.00      5.6±0.04ms        ? ?/sec    1.00      5.5±0.04ms        ? ?/sec
array_has_strings/not_found/10                 1.02     85.4±2.33µs        ? ?/sec    1.00     83.6±1.93µs        ? ?/sec
array_has_strings/not_found/100                1.01      3.5±0.01ms        ? ?/sec    1.00      3.5±0.01ms        ? ?/sec
array_has_strings/not_found/500                1.01     11.3±0.07ms        ? ?/sec    1.00     11.2±0.06ms        ? ?/sec

Omega359 · 2026-03-03T17:57:10Z

The above was run on my ec2 dev instance with nothing else running. It's as stable a benchmark as I can realistically get.

…has-any-all-rowconvert

neilconway · 2026-03-03T18:38:57Z

The above was run on my ec2 dev instance with nothing else running. It's as stable a benchmark as I can realistically get.

Interesting. The regressions are mostly in the 500 element benchmarks. I didn't see similar behavior on my local dev box (M4 Max). I'll do a run on a cloud dev box and see if I can repro the results.

neilconway · 2026-03-03T19:20:18Z

Here's what I get on a Hetzner cloud box (cax31):

  group                                          base                                   target
  -----                                          ----                                   ------
  array_has_all/all_found_small_needle/10        4.70      6.6±0.23ms        ? ?/sec    1.00  1407.2±12.51µs        ? ?/sec
  array_has_all/all_found_small_needle/100       1.46     15.6±0.13ms        ? ?/sec    1.00     10.7±0.08ms        ? ?/sec
  array_has_all/all_found_small_needle/500       1.00     55.9±2.89ms        ? ?/sec    1.57     87.6±1.55ms        ? ?/sec
  array_has_all/not_all_found/10                 5.65      6.3±0.13ms        ? ?/sec    1.00  1108.0±12.92µs        ? ?/sec
  array_has_all/not_all_found/100                1.57     14.2±0.15ms        ? ?/sec    1.00      9.0±0.08ms        ? ?/sec
  array_has_all/not_all_found/500                1.00     49.6±0.43ms        ? ?/sec    1.63     80.7±1.71ms        ? ?/sec
  array_has_all_strings/all_found/10             2.21      4.6±0.04ms        ? ?/sec    1.00      2.1±0.04ms        ? ?/sec
  array_has_all_strings/all_found/100            1.29     15.4±0.12ms        ? ?/sec    1.00     11.9±0.15ms        ? ?/sec
  array_has_all_strings/all_found/500            1.00     59.2±0.94ms        ? ?/sec    1.56     92.6±1.24ms        ? ?/sec
  array_has_all_strings/not_all_found/10         2.82      3.9±0.03ms        ? ?/sec    1.00  1386.6±17.25µs        ? ?/sec
  array_has_all_strings/not_all_found/100        1.34     13.7±0.11ms        ? ?/sec    1.00     10.2±0.15ms        ? ?/sec
  array_has_all_strings/not_all_found/500        1.00     70.5±0.74ms        ? ?/sec    1.46    102.8±1.37ms        ? ?/sec
  array_has_any/no_match/10                      3.17      7.3±0.04ms        ? ?/sec    1.00      2.3±0.03ms        ? ?/sec
  array_has_any/no_match/100                     1.19     23.2±0.36ms        ? ?/sec    1.00     19.5±0.25ms        ? ?/sec
  array_has_any/no_match/500                     1.00     93.4±0.86ms        ? ?/sec    1.41    131.6±1.76ms        ? ?/sec
  array_has_any/scalar_no_match/10               1.00      2.2±0.02ms        ? ?/sec    1.01      2.2±0.02ms        ? ?/sec
  array_has_any/scalar_no_match/100              1.00     20.9±0.33ms        ? ?/sec    1.01     21.1±0.17ms        ? ?/sec
  array_has_any/scalar_no_match/500              1.00    138.1±1.94ms        ? ?/sec    1.03    142.3±1.72ms        ? ?/sec
  array_has_any/scalar_some_match/10             1.00  1070.3±16.72µs        ? ?/sec    1.00  1069.4±13.97µs        ? ?/sec
  array_has_any/scalar_some_match/100            1.00     11.1±0.11ms        ? ?/sec    1.03     11.4±0.18ms        ? ?/sec
  array_has_any/scalar_some_match/500            1.00     85.8±1.13ms        ? ?/sec    1.02     87.5±1.15ms        ? ?/sec
  array_has_any/some_match/10                    4.94      6.4±0.11ms        ? ?/sec    1.00  1298.0±19.57µs        ? ?/sec
  array_has_any/some_match/100                   1.35     14.8±0.11ms        ? ?/sec    1.00     10.9±0.11ms        ? ?/sec
  array_has_any/some_match/500                   1.00     51.2±0.61ms        ? ?/sec    1.74     89.1±1.53ms        ? ?/sec
  array_has_any_scalar/i64_no_match/1            1.00    375.8±4.58µs        ? ?/sec    1.02    383.9±5.82µs        ? ?/sec
  array_has_any_scalar/i64_no_match/10           1.07   486.9±59.32µs        ? ?/sec    1.00   453.7±12.16µs        ? ?/sec
  array_has_any_scalar/i64_no_match/100          1.00   639.7±22.48µs        ? ?/sec    1.00   637.8±26.00µs        ? ?/sec
  array_has_any_scalar/i64_no_match/1000         1.01   556.6±21.28µs        ? ?/sec    1.00   549.5±13.52µs        ? ?/sec
  array_has_any_scalar/string_no_match/1         1.00    251.5±2.22µs        ? ?/sec    1.03    258.6±2.65µs        ? ?/sec
  array_has_any_scalar/string_no_match/10        1.03   437.6±10.87µs        ? ?/sec    1.00    424.6±7.96µs        ? ?/sec
  array_has_any_scalar/string_no_match/100       1.00   552.0±15.64µs        ? ?/sec    1.02   564.1±23.50µs        ? ?/sec
  array_has_any_scalar/string_no_match/1000      1.00   465.9±16.90µs        ? ?/sec    1.01   469.5±10.49µs        ? ?/sec
  array_has_any_strings/no_match/10              2.09      5.0±0.04ms        ? ?/sec    1.00      2.4±0.03ms        ? ?/sec
  array_has_any_strings/no_match/100             1.22     21.5±0.13ms        ? ?/sec    1.00     17.7±0.24ms        ? ?/sec
  array_has_any_strings/no_match/500             1.00    131.0±0.73ms        ? ?/sec    1.22    159.6±2.74ms        ? ?/sec
  array_has_any_strings/scalar_no_match/10       1.00    876.5±5.30µs        ? ?/sec    1.06   924.9±16.90µs        ? ?/sec
  array_has_any_strings/scalar_no_match/100      1.00      7.5±0.07ms        ? ?/sec    1.06      8.0±0.11ms        ? ?/sec
  array_has_any_strings/scalar_no_match/500      1.00     86.4±0.53ms        ? ?/sec    1.02     88.5±1.01ms        ? ?/sec
  array_has_any_strings/scalar_some_match/10     1.00    761.8±6.51µs        ? ?/sec    1.02    774.4±7.06µs        ? ?/sec
  array_has_any_strings/scalar_some_match/100    1.00      5.1±0.14ms        ? ?/sec    1.07      5.5±0.34ms        ? ?/sec
  array_has_any_strings/scalar_some_match/500    1.00     17.4±0.15ms        ? ?/sec    1.05     18.3±0.23ms        ? ?/sec
  array_has_any_strings/some_match/10            2.43      4.3±0.03ms        ? ?/sec    1.00  1763.5±21.29µs        ? ?/sec
  array_has_any_strings/some_match/100           1.30     14.1±0.15ms        ? ?/sec    1.00     10.8±0.19ms        ? ?/sec
  array_has_any_strings/some_match/500           1.00     53.2±0.65ms        ? ?/sec    1.61     85.6±1.75ms        ? ?/sec
  array_has_i64/found/10                         1.00    149.7±6.18µs        ? ?/sec    1.03    154.2±6.32µs        ? ?/sec
  array_has_i64/found/100                        1.00  613.2±101.28µs        ? ?/sec    1.04   639.6±77.43µs        ? ?/sec
  array_has_i64/found/500                        1.00      4.4±0.12ms        ? ?/sec    1.04      4.6±0.11ms        ? ?/sec
  array_has_i64/not_found/10                     1.04     71.7±1.02µs        ? ?/sec    1.00     68.7±1.93µs        ? ?/sec
  array_has_i64/not_found/100                    1.00   426.7±17.72µs        ? ?/sec    1.03   440.9±24.04µs        ? ?/sec
  array_has_i64/not_found/500                    1.00      4.4±0.11ms        ? ?/sec    1.02      4.5±0.15ms        ? ?/sec
  array_has_strings/found/10                     1.00    685.4±6.84µs        ? ?/sec    1.00    688.6±6.93µs        ? ?/sec
  array_has_strings/found/100                    1.00      2.6±0.06ms        ? ?/sec    1.03      2.7±0.04ms        ? ?/sec
  array_has_strings/found/500                    1.00     15.1±0.18ms        ? ?/sec    1.03     15.6±0.43ms        ? ?/sec
  array_has_strings/not_found/10                 1.01    152.6±0.82µs        ? ?/sec    1.00    150.7±2.05µs        ? ?/sec
  array_has_strings/not_found/100                1.00      5.8±0.04ms        ? ?/sec    1.02      5.9±0.12ms        ? ?/sec
  array_has_strings/not_found/500                1.03     16.7±0.26ms        ? ?/sec    1.00     16.2±0.19ms        ? ?/sec

So we do indeed see some regressions for large arrays. I'm not entirely sure why that would be ... I suppose for 10k rows * 500 elements we end up pushing a lot more stuff out of L1/L2, whereas the previous approach uses a smaller working set. I'm surprised that the effect is that pronounced, though.

Let me try doing the row conversion in smaller batches and see if that helps.

Omega359 · 2026-03-03T19:50:45Z

That is interesting. I ran the benchmark on a m7i.4xlarge which is an Intel Sapphire Rapids machine. I'm curious why we're seeing it on cloud machines but not on your m4 mac. L2 cache is shared iirc on m4 max which might be a reason for the difference.

I guess running benchmarks on cloud hardware is worth it considering that most implementations of DF will run on server grade hardware, not consumer - even though the consumer may be faster in some cases

neilconway · 2026-03-03T22:12:31Z

Alright, I implemented a variant where we do row conversion in chunks of 256 rows. Here are the results on the Hertzner box:

  group                                          base                                   target
  -----                                          ----                                   ------
  array_has_all/all_found_small_needle/10        4.81      6.8±0.04ms        ? ?/sec    1.00  1422.9±33.96µs        ? ?/sec
  array_has_all/all_found_small_needle/100       1.62     16.6±0.04ms        ? ?/sec    1.00     10.2±0.03ms        ? ?/sec
  array_has_all/all_found_small_needle/500       1.19     59.4±0.09ms        ? ?/sec    1.00     49.8±0.12ms        ? ?/sec
  array_has_all/not_all_found/10                 5.85      6.5±0.03ms        ? ?/sec    1.00   1115.8±9.24µs        ? ?/sec
  array_has_all/not_all_found/100                1.71     15.0±0.05ms        ? ?/sec    1.00      8.8±0.03ms        ? ?/sec
  array_has_all/not_all_found/500                1.22     52.5±0.11ms        ? ?/sec    1.00     43.0±0.09ms        ? ?/sec
  array_has_all_strings/all_found/10             2.71      5.3±0.03ms        ? ?/sec    1.00   1948.9±7.79µs        ? ?/sec
  array_has_all_strings/all_found/100            1.43     15.8±0.04ms        ? ?/sec    1.00     11.1±0.04ms        ? ?/sec
  array_has_all_strings/all_found/500            1.18     61.0±0.14ms        ? ?/sec    1.00     51.6±0.62ms        ? ?/sec
  array_has_all_strings/not_all_found/10         3.05      4.1±0.02ms        ? ?/sec    1.00  1338.3±65.23µs        ? ?/sec
  array_has_all_strings/not_all_found/100        1.48     14.2±0.08ms        ? ?/sec    1.00      9.6±0.05ms        ? ?/sec
  array_has_all_strings/not_all_found/500        1.23     75.4±0.17ms        ? ?/sec    1.00     61.2±0.19ms        ? ?/sec
  array_has_any/no_match/10                      3.46      7.8±0.05ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
  array_has_any/no_match/100                     1.35     25.3±0.11ms        ? ?/sec    1.00     18.7±0.03ms        ? ?/sec
  array_has_any/no_match/500                     1.14    105.4±0.13ms        ? ?/sec    1.00     92.8±2.97ms        ? ?/sec
  array_has_any/scalar_no_match/10               1.11      2.4±0.01ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
  array_has_any/scalar_no_match/100              1.10     22.9±0.06ms        ? ?/sec    1.00     20.8±0.06ms        ? ?/sec
  array_has_any/scalar_no_match/500              1.06    148.5±0.64ms        ? ?/sec    1.00    140.2±1.91ms        ? ?/sec
  array_has_any/scalar_some_match/10             1.07   1133.4±3.89µs        ? ?/sec    1.00   1061.6±4.64µs        ? ?/sec
  array_has_any/scalar_some_match/100            1.04     11.6±0.16ms        ? ?/sec    1.00     11.2±0.08ms        ? ?/sec
  array_has_any/scalar_some_match/500            1.05     90.9±0.71ms        ? ?/sec    1.00     87.0±0.88ms        ? ?/sec
  array_has_any/some_match/10                    5.26      6.6±0.05ms        ? ?/sec    1.00   1264.5±3.59µs        ? ?/sec
  array_has_any/some_match/100                   1.60     15.7±0.08ms        ? ?/sec    1.00      9.8±0.03ms        ? ?/sec
  array_has_any/some_match/500                   1.17     55.9±0.20ms        ? ?/sec    1.00     47.8±0.33ms        ? ?/sec
  array_has_any_scalar/i64_no_match/1            1.06    396.6±2.17µs        ? ?/sec    1.00    372.8±3.30µs        ? ?/sec
  array_has_any_scalar/i64_no_match/10           1.01    449.7±8.66µs        ? ?/sec    1.00   446.0±10.76µs        ? ?/sec
  array_has_any_scalar/i64_no_match/100          1.02   639.2±20.48µs        ? ?/sec    1.00   628.6±17.24µs        ? ?/sec
  array_has_any_scalar/i64_no_match/1000         1.00   545.1±10.73µs        ? ?/sec    1.00   544.1±13.21µs        ? ?/sec
  array_has_any_scalar/string_no_match/1         1.00    250.5±2.16µs        ? ?/sec    1.03    257.9±8.09µs        ? ?/sec
  array_has_any_scalar/string_no_match/10        1.00    418.3±6.45µs        ? ?/sec    1.00    419.4±6.58µs        ? ?/sec
  array_has_any_scalar/string_no_match/100       1.00   544.9±22.43µs        ? ?/sec    1.01   550.0±24.24µs        ? ?/sec
  array_has_any_scalar/string_no_match/1000      1.00    457.7±8.87µs        ? ?/sec    1.00    459.1±6.78µs        ? ?/sec
  array_has_any_strings/no_match/10              2.12      5.2±0.02ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
  array_has_any_strings/no_match/100             1.21     22.5±0.07ms        ? ?/sec    1.00     18.6±0.20ms        ? ?/sec
  array_has_any_strings/no_match/500             1.11    141.5±0.18ms        ? ?/sec    1.00    127.2±0.39ms        ? ?/sec
  array_has_any_strings/scalar_no_match/10       1.00    861.4±1.90µs        ? ?/sec    1.06    909.8±1.83µs        ? ?/sec
  array_has_any_strings/scalar_no_match/100      1.00      7.4±0.06ms        ? ?/sec    1.08      8.0±0.14ms        ? ?/sec
  array_has_any_strings/scalar_no_match/500      1.02     93.9±0.13ms        ? ?/sec    1.00     91.7±0.23ms        ? ?/sec
  array_has_any_strings/scalar_some_match/10     1.05    827.3±3.93µs        ? ?/sec    1.00    788.8±3.78µs        ? ?/sec
  array_has_any_strings/scalar_some_match/100    1.01      5.2±0.17ms        ? ?/sec    1.00      5.1±0.14ms        ? ?/sec
  array_has_any_strings/scalar_some_match/500    1.00     17.7±0.11ms        ? ?/sec    1.04     18.5±0.15ms        ? ?/sec
  array_has_any_strings/some_match/10            2.56      4.5±0.01ms        ? ?/sec    1.00   1758.6±7.71µs        ? ?/sec
  array_has_any_strings/some_match/100           1.36     14.4±0.07ms        ? ?/sec    1.00     10.6±0.06ms        ? ?/sec
  array_has_any_strings/some_match/500           1.10     54.9±1.41ms        ? ?/sec    1.00     50.1±0.20ms        ? ?/sec
  array_has_i64/found/10                         1.00    144.9±4.94µs        ? ?/sec    1.02    147.7±4.93µs        ? ?/sec
  array_has_i64/found/100                        1.00   570.5±31.30µs        ? ?/sec    1.06   605.6±35.62µs        ? ?/sec
  array_has_i64/found/500                        1.00      4.4±0.15ms        ? ?/sec    1.02      4.5±0.12ms        ? ?/sec
  array_has_i64/not_found/10                     1.03     68.8±0.44µs        ? ?/sec    1.00     67.0±1.26µs        ? ?/sec
  array_has_i64/not_found/100                    1.02   471.6±27.43µs        ? ?/sec    1.00   462.7±22.65µs        ? ?/sec
  array_has_i64/not_found/500                    1.00      4.5±0.11ms        ? ?/sec    1.00      4.5±0.11ms        ? ?/sec
  array_has_strings/found/10                     1.10    744.8±5.29µs        ? ?/sec    1.00    679.9±5.94µs        ? ?/sec
  array_has_strings/found/100                    1.00      2.7±0.03ms        ? ?/sec    1.00      2.7±0.04ms        ? ?/sec
  array_has_strings/found/500                    1.00     15.6±0.21ms        ? ?/sec    1.05     16.3±0.35ms        ? ?/sec
  array_has_strings/not_found/10                 1.02    150.5±0.36µs        ? ?/sec    1.00    147.0±1.14µs        ? ?/sec
  array_has_strings/not_found/100                1.11      6.5±0.04ms        ? ?/sec    1.00      5.9±0.08ms        ? ?/sec
  array_has_strings/not_found/500                1.03     16.5±0.04ms        ? ?/sec    1.00     16.0±0.07ms        ? ?/sec

Happily, this seems to address the regressions we saw on large arrays with the initial approach. Less happily, 256-row chunking performs slightly less well than full-batch row conversion on my M4 Max machine, although interestingly the regressions are only for the i64 benchmarks:

  array_has_all (general/i64):

  ┌───────────────────┬────────────────────────────────┐
  │     Benchmark     │ change (chunked vs full-batch) │
  ├───────────────────┼────────────────────────────────┤
  │ all_found/10      │ +9.6% slower                   │
  ├───────────────────┼────────────────────────────────┤
  │ not_all_found/10  │ +9.0% slower                   │
  ├───────────────────┼────────────────────────────────┤
  │ all_found/100     │ +9.2% slower                   │
  ├───────────────────┼────────────────────────────────┤
  │ not_all_found/100 │ +10.0% slower                  │
  ├───────────────────┼────────────────────────────────┤
  │ all_found/500     │ +5.9% slower                   │
  ├───────────────────┼────────────────────────────────┤
  │ not_all_found/500 │ +5.5% slower                   │
  └───────────────────┴────────────────────────────────┘

  array_has_any (general/i64):

  ┌────────────────┬────────────────────────────────┐
  │   Benchmark    │ change (chunked vs full-batch) │
  ├────────────────┼────────────────────────────────┤
  │ some_match/10  │ +4.4% slower                   │
  ├────────────────┼────────────────────────────────┤
  │ no_match/10    │ +3.4% slower                   │
  ├────────────────┼────────────────────────────────┤
  │ some_match/100 │ +4.4% slower                   │
  ├────────────────┼────────────────────────────────┤
  │ no_match/100   │ +4.0% slower                   │
  ├────────────────┼────────────────────────────────┤
  │ some_match/500 │ +2.8% slower                   │
  ├────────────────┼────────────────────────────────┤
  │ no_match/500   │ +2.4% slower                   │
  └────────────────┴────────────────────────────────┘

The string benchmarks were much closer and basically in the noise.

Avoiding the regressions on large arrays seems worth the small performance hit on M4 machines, but it's probably worth exploring a bigger chunk size and seeing if that helps at all.

neilconway · 2026-03-04T01:30:44Z

Here are the results on the Hetzner machine with 512 row chunks:

group                                          base                                   target
  -----                                          ----                                   ------
  array_has_all/all_found_small_needle/10        4.73      6.5±0.03ms        ? ?/sec    1.00   1377.3±7.28µs        ? ?/sec
  array_has_all/all_found_small_needle/100       1.50     15.5±0.05ms        ? ?/sec    1.00     10.3±0.03ms        ? ?/sec
  array_has_all/all_found_small_needle/500       1.05     54.7±0.15ms        ? ?/sec    1.00     52.2±1.56ms        ? ?/sec
  array_has_all/not_all_found/10                 5.84      6.3±0.07ms        ? ?/sec    1.00   1087.9±4.54µs        ? ?/sec
  array_has_all/not_all_found/100                1.60     14.3±0.33ms        ? ?/sec    1.00      9.0±0.08ms        ? ?/sec
  array_has_all/not_all_found/500                1.10     49.0±0.13ms        ? ?/sec    1.00     44.4±0.50ms        ? ?/sec
  array_has_all_strings/all_found/10             2.73      5.4±0.02ms        ? ?/sec    1.00  1958.8±19.92µs        ? ?/sec
  array_has_all_strings/all_found/100            1.36     15.1±0.06ms        ? ?/sec    1.00     11.1±0.08ms        ? ?/sec
  array_has_all_strings/all_found/500            1.13     60.6±1.65ms        ? ?/sec    1.00     53.8±1.26ms        ? ?/sec
  array_has_all_strings/not_all_found/10         3.03      4.0±0.04ms        ? ?/sec    1.00   1305.8±9.69µs        ? ?/sec
  array_has_all_strings/not_all_found/100        1.42     13.6±0.08ms        ? ?/sec    1.00      9.5±0.05ms        ? ?/sec
  array_has_all_strings/not_all_found/500        1.14     69.7±0.27ms        ? ?/sec    1.00     61.1±0.32ms        ? ?/sec
  array_has_any/no_match/10                      3.23      7.3±0.04ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
  array_has_any/no_match/100                     1.22     22.9±0.10ms        ? ?/sec    1.00     18.8±0.05ms        ? ?/sec
  array_has_any/no_match/500                     1.00     92.3±0.24ms        ? ?/sec    1.01     93.2±0.39ms        ? ?/sec
  array_has_any/scalar_no_match/10               1.00      2.2±0.02ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
  array_has_any/scalar_no_match/100              1.00     20.8±0.17ms        ? ?/sec    1.00     20.9±0.11ms        ? ?/sec
  array_has_any/scalar_no_match/500              1.00    136.6±1.66ms        ? ?/sec    1.02    140.0±1.22ms        ? ?/sec
  array_has_any/scalar_some_match/10             1.00  1069.6±17.71µs        ? ?/sec    1.01   1075.2±5.81µs        ? ?/sec
  array_has_any/scalar_some_match/100            1.00     11.0±0.08ms        ? ?/sec    1.01     11.1±0.08ms        ? ?/sec
  array_has_any/scalar_some_match/500            1.00     84.8±0.51ms        ? ?/sec    1.01     85.7±0.71ms        ? ?/sec
  array_has_any/some_match/10                    5.06      6.4±0.04ms        ? ?/sec    1.00   1257.1±4.21µs        ? ?/sec
  array_has_any/some_match/100                   1.46     14.6±0.07ms        ? ?/sec    1.00     10.0±0.19ms        ? ?/sec
  array_has_any/some_match/500                   1.02     51.1±0.15ms        ? ?/sec    1.00     50.0±0.33ms        ? ?/sec
  array_has_any_scalar/i64_no_match/1            1.00    375.2±4.65µs        ? ?/sec    1.02   382.6±30.10µs        ? ?/sec
  array_has_any_scalar/i64_no_match/10           1.00   451.1±11.52µs        ? ?/sec    1.03   464.7±10.41µs        ? ?/sec
  array_has_any_scalar/i64_no_match/100          1.01   638.5±27.58µs        ? ?/sec    1.00   633.0±19.30µs        ? ?/sec
  array_has_any_scalar/i64_no_match/1000         1.00   543.6±11.89µs        ? ?/sec    1.00   544.2±13.11µs        ? ?/sec
  array_has_any_scalar/string_no_match/1         1.00    249.8±1.86µs        ? ?/sec    1.03    258.4±3.13µs        ? ?/sec
  array_has_any_scalar/string_no_match/10        1.00    419.9±8.88µs        ? ?/sec    1.04   438.5±10.85µs        ? ?/sec
  array_has_any_scalar/string_no_match/100       1.00   550.3±23.91µs        ? ?/sec    1.01   556.2±18.31µs        ? ?/sec
  array_has_any_scalar/string_no_match/1000      1.00    461.9±8.79µs        ? ?/sec    1.01    465.6±7.16µs        ? ?/sec
  array_has_any_strings/no_match/10              2.04      5.0±0.03ms        ? ?/sec    1.00      2.5±0.01ms        ? ?/sec
  array_has_any_strings/no_match/100             1.16     21.6±0.14ms        ? ?/sec    1.00     18.7±0.09ms        ? ?/sec
  array_has_any_strings/no_match/500             1.01    129.2±0.40ms        ? ?/sec    1.00    127.5±0.42ms        ? ?/sec
  array_has_any_strings/scalar_no_match/10       1.00    867.0±2.78µs        ? ?/sec    1.07    926.9±9.70µs        ? ?/sec
  array_has_any_strings/scalar_no_match/100      1.00      7.4±0.02ms        ? ?/sec    1.08      8.0±0.03ms        ? ?/sec
  array_has_any_strings/scalar_no_match/500      1.00     85.6±0.35ms        ? ?/sec    1.07     92.0±0.37ms        ? ?/sec
  array_has_any_strings/scalar_some_match/10     1.00   764.9±12.75µs        ? ?/sec    1.04   797.9±10.23µs        ? ?/sec
  array_has_any_strings/scalar_some_match/100    1.00      5.1±0.07ms        ? ?/sec    1.06      5.4±0.05ms        ? ?/sec
  array_has_any_strings/scalar_some_match/500    1.00     17.3±0.10ms        ? ?/sec    1.07     18.6±0.12ms        ? ?/sec
  array_has_any_strings/some_match/10            2.37      4.3±0.01ms        ? ?/sec    1.00  1810.8±172.72µs        ? ?/sec
  array_has_any_strings/some_match/100           1.32     14.2±0.30ms        ? ?/sec    1.00     10.7±0.26ms        ? ?/sec
  array_has_any_strings/some_match/500           1.04     52.4±0.16ms        ? ?/sec    1.00     50.6±0.26ms        ? ?/sec
  array_has_i64/found/10                         1.00    144.5±4.87µs        ? ?/sec    1.03    148.4±4.69µs        ? ?/sec
  array_has_i64/found/100                        1.00   629.1±59.20µs        ? ?/sec    1.03   645.1±46.47µs        ? ?/sec
  array_has_i64/found/500                        1.00      4.4±0.07ms        ? ?/sec    1.04      4.6±0.22ms        ? ?/sec
  array_has_i64/not_found/10                     1.04     69.4±0.56µs        ? ?/sec    1.00     66.6±1.26µs        ? ?/sec
  array_has_i64/not_found/100                    1.00   492.8±37.60µs        ? ?/sec    1.00   491.9±38.98µs        ? ?/sec
  array_has_i64/not_found/500                    1.00      4.3±0.09ms        ? ?/sec    1.00      4.3±0.10ms        ? ?/sec
  array_has_strings/found/10                     1.00    676.5±6.13µs        ? ?/sec    1.01    686.6±7.26µs        ? ?/sec
  array_has_strings/found/100                    1.00      2.7±0.05ms        ? ?/sec    1.02      2.7±0.02ms        ? ?/sec
  array_has_strings/found/500                    1.01     15.6±0.18ms        ? ?/sec    1.00     15.5±0.22ms        ? ?/sec
  array_has_strings/not_found/10                 1.02    152.4±1.15µs        ? ?/sec    1.00    149.3±1.57µs        ? ?/sec
  array_has_strings/not_found/100                1.00      5.7±0.02ms        ? ?/sec    1.01      5.8±0.01ms        ? ?/sec
  array_has_strings/not_found/500                1.00     16.2±0.04ms        ? ?/sec    1.01     16.4±0.55ms        ? ?/sec

I'm inclined to go with 512 row chunking: it seems that this reduces cache pressure sufficiently, while doing half as many row-conversion calls as 256 row chunking. I've updated the PR with that approach.

perf: Use batched row conversion for array_has_any, array_has_all

396bec0

github-actions bot added the functions Changes to functions implementation label Feb 27, 2026

martin-g reviewed Mar 3, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into neilc/optimize-array-…

57c1e9f

…has-any-all-rowconvert

neilconway added 2 commits March 3, 2026 17:13

WIP use chunked row conversion, 256 row chunks

5a38b27

Use 512 chunk size, per benchmarking

7cc448b

Fix doc error in comment

521826d

Conversation

neilconway commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

neilconway commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Omega359 commented Mar 2, 2026

Uh oh!

Omega359 commented Mar 2, 2026

Uh oh!

neilconway commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Omega359 commented Mar 2, 2026

Uh oh!

martin-g left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Omega359 commented Mar 3, 2026

Uh oh!

Omega359 commented Mar 3, 2026

Uh oh!

Omega359 commented Mar 3, 2026

Uh oh!

neilconway commented Mar 3, 2026

Uh oh!

neilconway commented Mar 3, 2026

Uh oh!

Omega359 commented Mar 3, 2026

Uh oh!

neilconway commented Mar 3, 2026

Uh oh!

neilconway commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

neilconway commented Feb 27, 2026 •

edited

Loading

neilconway commented Feb 27, 2026 •

edited

Loading

neilconway commented Mar 2, 2026 •

edited

Loading

martin-g left a comment •

edited

Loading