Skip to content

perf: Use batched row conversion for array_has_any, array_has_all#20588

Open
neilconway wants to merge 5 commits intoapache:mainfrom
neilconway:neilc/optimize-array-has-any-all-rowconvert
Open

perf: Use batched row conversion for array_has_any, array_has_all#20588
neilconway wants to merge 5 commits intoapache:mainfrom
neilconway:neilc/optimize-array-has-any-all-rowconvert

Conversation

@neilconway
Copy link
Contributor

@neilconway neilconway commented Feb 27, 2026

Which issue does this PR close?

Rationale for this change

array_has_any and array_has_all called RowConverter::convert_columns twice for every input row. convert_columns has a lot of per-call overhead: allocating a new Rows buffer, doing various schema checking, and so on.

It is considerably more efficient to use RowConverter twice up front and convert all of the haystack and needle inputs in bulk. We can then implement the has_any / has_all predicate comparison by indexing into the converted rows.

array_has_any / array_has_all had a special-case for strings, but it had an analogous problem: it iterated over rows, materialized each row's inner list, and then called string_array_to_vec twice per row. That does a lot of per-row work; it is significantly faster to call string_array_to_vec on all input rows at once, and then index into the results to implement the per-row comparisons.

What changes are included in this PR?

  • Implement optimization
  • Improve test coverage for sliced arrays; not strictly related to this PR but more coverage for this codepath made me feel more comfortable

Are these changes tested?

Yes.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the functions Changes to functions implementation label Feb 27, 2026
@neilconway
Copy link
Contributor Author

neilconway commented Feb 27, 2026

Benchmarks:

   RowConverter path (i64)

  ┌─────────────────────────────┬──────┬──────────┬──────────┬────────┐
  │          Benchmark          │ Size │   Main   │  Branch  │ Change │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all/all_found     │ 10   │ 3.03 ms  │ 0.65 ms  │ -78%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all/not_all_found │ 10   │ 2.74 ms  │ 0.47 ms  │ -83%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any/some_match    │ 10   │ 2.83 ms  │ 0.57 ms  │ -80%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any/no_match      │ 10   │ 3.35 ms  │ 1.04 ms  │ -69%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all/all_found     │ 100  │ 7.47 ms  │ 4.86 ms  │ -35%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all/not_all_found │ 100  │ 6.78 ms  │ 4.07 ms  │ -40%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any/some_match    │ 100  │ 7.11 ms  │ 4.60 ms  │ -35%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any/no_match      │ 100  │ 11.95 ms │ 9.47 ms  │ -21%   │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all/all_found     │ 500  │ 33.35 ms │ 32.10 ms │ -4%    │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all/not_all_found │ 500  │ 29.57 ms │ 28.59 ms │ -3%    │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any/some_match    │ 500  │ 30.80 ms │ 29.64 ms │ -4%    │
  ├─────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any/no_match      │ 500  │ 59.78 ms │ 59.89 ms │ ~0%    │
  └─────────────────────────────┴──────┴──────────┴──────────┴────────┘

  String path

  ┌─────────────────────────────────────┬──────┬──────────┬──────────┬────────┐
  │              Benchmark              │ Size │   Main   │  Branch  │ Change │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all_strings/all_found     │ 10   │ 1.92 ms  │ 1.17 ms  │ -39%   │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all_strings/not_all_found │ 10   │ 1.41 ms  │ 0.71 ms  │ -50%   │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any_strings/some_match    │ 10   │ 1.74 ms  │ 1.04 ms  │ -40%   │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any_strings/no_match      │ 10   │ 1.96 ms  │ 1.28 ms  │ -35%   │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all_strings/all_found     │ 100  │ 6.35 ms  │ 5.50 ms  │ -13%   │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all_strings/not_all_found │ 100  │ 5.55 ms  │ 4.73 ms  │ -15%   │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any_strings/some_match    │ 100  │ 5.75 ms  │ 4.97 ms  │ -14%   │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any_strings/no_match      │ 100  │ 9.65 ms  │ 8.79 ms  │ -9%    │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all_strings/all_found     │ 500  │ 28.05 ms │ 26.96 ms │ -4%    │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_all_strings/not_all_found │ 500  │ 30.38 ms │ 29.47 ms │ -3%    │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any_strings/some_match    │ 500  │ 25.01 ms │ 24.04 ms │ -4%    │
  ├─────────────────────────────────────┼──────┼──────────┼──────────┼────────┤
  │ array_has_any_strings/no_match      │ 500  │ 58.31 ms │ 57.32 ms │ -2%    │
  └─────────────────────────────────────┴──────┴──────────┴──────────┴────────┘

It's a significant win for short arrays, and a small win for large arrays. For large arrays, the N*M comparison cost probably dominates. We should probably be able to do something smarter by hashing, I'll look at that shortly but in a separate PR.

@Omega359
Copy link
Contributor

Omega359 commented Mar 2, 2026

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux fedora 6.18.13-200.fc43.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Feb 19 19:54:01 UTC 2026 x86_64 GNU/Linux
Comparing neilc/optimize-array-has-any-all-rowconvert (396bec0) to a257c29 diff
BENCH_NAME=array_has
BENCH_COMMAND=cargo bench --bench array_has
BENCH_FILTER=
BENCH_BRANCH_NAME=neilc_optimize-array-has-any-all-rowconvert
Results will be posted here when complete

@Omega359
Copy link
Contributor

Omega359 commented Mar 2, 2026

🤖: Benchmark completed

Details

group                                          main                                   neilc_optimize-array-has-any-all-rowconvert
-----                                          ----                                   -------------------------------------------
array_has_all/all_found_small_needle/10        4.32      2.5±0.01ms        ? ?/sec    1.00    587.3±1.28µs        ? ?/sec
array_has_all/all_found_small_needle/100       1.34      6.0±0.13ms        ? ?/sec    1.00      4.5±0.15ms        ? ?/sec
array_has_all/all_found_small_needle/500       1.00     21.0±0.11ms        ? ?/sec    1.70     35.7±0.83ms        ? ?/sec
array_has_all/not_all_found/10                 6.57      2.4±0.02ms        ? ?/sec    1.00    365.1±5.59µs        ? ?/sec
array_has_all/not_all_found/100                1.40      5.4±0.04ms        ? ?/sec    1.00      3.9±0.06ms        ? ?/sec
array_has_all/not_all_found/500                1.00     19.1±0.17ms        ? ?/sec    1.68     32.1±1.57ms        ? ?/sec
array_has_all_strings/all_found/10             2.30      2.2±0.05ms        ? ?/sec    1.00   940.4±16.17µs        ? ?/sec
array_has_all_strings/all_found/100            1.19      7.1±0.22ms        ? ?/sec    1.00      5.9±0.55ms        ? ?/sec
array_has_all_strings/all_found/500            1.00     30.1±0.76ms        ? ?/sec    1.33     40.1±0.38ms        ? ?/sec
array_has_all_strings/not_all_found/10         2.97  1809.3±19.42µs        ? ?/sec    1.00   608.9±22.77µs        ? ?/sec
array_has_all_strings/not_all_found/100        1.31      6.4±0.13ms        ? ?/sec    1.00      4.9±0.14ms        ? ?/sec
array_has_all_strings/not_all_found/500        1.00     33.5±0.44ms        ? ?/sec    1.27     42.5±0.42ms        ? ?/sec
array_has_any/no_match/10                      4.21      2.7±0.01ms        ? ?/sec    1.00    631.2±7.30µs        ? ?/sec
array_has_any/no_match/100                     1.21      7.9±0.06ms        ? ?/sec    1.00      6.5±0.51ms        ? ?/sec
array_has_any/no_match/500                     1.00     31.1±0.24ms        ? ?/sec    1.44     44.7±0.70ms        ? ?/sec
array_has_any/scalar_no_match/10               1.00    637.4±5.38µs        ? ?/sec    1.02    650.4±8.17µs        ? ?/sec
array_has_any/scalar_no_match/100              1.00      7.2±0.27ms        ? ?/sec    1.82     13.1±0.46ms        ? ?/sec
array_has_any/scalar_no_match/500              1.00     48.5±2.17ms        ? ?/sec    1.61     78.0±0.37ms        ? ?/sec
array_has_any/scalar_some_match/10             1.00    448.9±2.41µs        ? ?/sec    1.03    462.1±5.77µs        ? ?/sec
array_has_any/scalar_some_match/100            1.00      4.7±0.58ms        ? ?/sec    1.28      6.0±0.10ms        ? ?/sec
array_has_any/scalar_some_match/500            1.00     38.4±3.46ms        ? ?/sec    1.16     44.7±2.10ms        ? ?/sec
array_has_any/some_match/10                    4.70      2.4±0.01ms        ? ?/sec    1.00    518.2±6.22µs        ? ?/sec
array_has_any/some_match/100                   1.24      5.5±0.03ms        ? ?/sec    1.00      4.4±0.12ms        ? ?/sec
array_has_any/some_match/500                   1.00     19.1±0.43ms        ? ?/sec    1.78     34.1±0.83ms        ? ?/sec
array_has_any_scalar/i64_no_match/1            1.02    131.9±2.03µs        ? ?/sec    1.00    129.3±4.23µs        ? ?/sec
array_has_any_scalar/i64_no_match/10           1.06    160.4±4.41µs        ? ?/sec    1.00    151.0±4.30µs        ? ?/sec
array_has_any_scalar/i64_no_match/100          1.01    220.6±7.16µs        ? ?/sec    1.00    218.6±8.77µs        ? ?/sec
array_has_any_scalar/i64_no_match/1000         1.05    194.8±5.54µs        ? ?/sec    1.00    185.7±7.38µs        ? ?/sec
array_has_any_scalar/string_no_match/1         1.03    114.1±1.98µs        ? ?/sec    1.00    110.5±1.68µs        ? ?/sec
array_has_any_scalar/string_no_match/10        1.04    163.0±5.42µs        ? ?/sec    1.00    156.6±4.13µs        ? ?/sec
array_has_any_scalar/string_no_match/100       1.03   217.8±10.17µs        ? ?/sec    1.00    212.1±7.81µs        ? ?/sec
array_has_any_scalar/string_no_match/1000      1.05    172.9±5.74µs        ? ?/sec    1.00    164.5±5.54µs        ? ?/sec
array_has_any_strings/no_match/10              2.07      2.2±0.11ms        ? ?/sec    1.00  1041.5±36.23µs        ? ?/sec
array_has_any_strings/no_match/100             1.19      9.7±0.39ms        ? ?/sec    1.00      8.2±0.02ms        ? ?/sec
array_has_any_strings/no_match/500             1.00     56.9±0.81ms        ? ?/sec    1.19     68.0±0.72ms        ? ?/sec
array_has_any_strings/scalar_no_match/10       1.03    307.3±2.17µs        ? ?/sec    1.00    297.8±9.04µs        ? ?/sec
array_has_any_strings/scalar_no_match/100      1.01      2.8±0.02ms        ? ?/sec    1.00      2.8±0.10ms        ? ?/sec
array_has_any_strings/scalar_no_match/500      1.00     33.3±0.71ms        ? ?/sec    1.04     34.5±1.05ms        ? ?/sec
array_has_any_strings/scalar_some_match/10     1.06    339.4±7.53µs        ? ?/sec    1.00   320.8±17.19µs        ? ?/sec
array_has_any_strings/scalar_some_match/100    1.03  1542.3±45.67µs        ? ?/sec    1.00  1494.7±259.82µs        ? ?/sec
array_has_any_strings/scalar_some_match/500    1.01      6.4±0.52ms        ? ?/sec    1.00      6.4±0.46ms        ? ?/sec
array_has_any_strings/some_match/10            2.67      2.2±0.18ms        ? ?/sec    1.00   825.7±21.50µs        ? ?/sec
array_has_any_strings/some_match/100           1.15      6.4±0.25ms        ? ?/sec    1.00      5.6±0.18ms        ? ?/sec
array_has_any_strings/some_match/500           1.00     26.1±1.76ms        ? ?/sec    1.38     36.0±0.71ms        ? ?/sec
array_has_i64/found/10                         1.00     53.2±0.33µs        ? ?/sec    1.13     59.9±0.57µs        ? ?/sec
array_has_i64/found/100                        1.00    310.1±1.57µs        ? ?/sec    1.26    391.7±4.26µs        ? ?/sec
array_has_i64/found/500                        1.00  1577.5±79.52µs        ? ?/sec    1.22  1924.5±89.09µs        ? ?/sec
array_has_i64/not_found/10                     1.00     43.1±0.29µs        ? ?/sec    1.19     51.1±0.28µs        ? ?/sec
array_has_i64/not_found/100                    1.00    283.7±3.81µs        ? ?/sec    1.27    361.6±1.07µs        ? ?/sec
array_has_i64/not_found/500                    1.00  1516.6±74.76µs        ? ?/sec    1.21  1828.0±85.93µs        ? ?/sec
array_has_strings/found/10                     1.09    355.8±6.40µs        ? ?/sec    1.00   327.6±11.75µs        ? ?/sec
array_has_strings/found/100                    1.12  1452.2±60.23µs        ? ?/sec    1.00  1302.1±16.93µs        ? ?/sec
array_has_strings/found/500                    1.51      7.8±0.73ms        ? ?/sec    1.00      5.2±0.38ms        ? ?/sec
array_has_strings/not_found/10                 1.25     59.7±0.77µs        ? ?/sec    1.00     47.7±0.08µs        ? ?/sec
array_has_strings/not_found/100                1.04      2.6±0.31ms        ? ?/sec    1.00      2.4±0.11ms        ? ?/sec
array_has_strings/not_found/500                1.13     11.7±0.19ms        ? ?/sec    1.00     10.3±0.12ms        ? ?/sec

@neilconway
Copy link
Contributor Author

neilconway commented Mar 2, 2026

This PR doesn't touch the array_has codepath at all, and yet the benchmark run claims fairly wide swings (51%, 27%, 25%, etc.) on some of those benchmarks... I wonder if the benchmark setup needs a longer warm up or perhaps more runs to get more reliable results.

@Omega359
Copy link
Contributor

Omega359 commented Mar 2, 2026

This PR doesn't touch the array_has codepath at all, and yet the benchmark run claims fairly wide swings (51%, 27%, 25%, etc.) on some of those benchmarks... I wonder if the benchmark setup needs a longer warm up or perhaps more runs to get more reliable results.

I've been wondering that myself. That was run on my personal machine, and since benchmarks are largely single threaded I wouldn't expect much variability in the results .. but the numbers do seem off. I'll try another run later, otherwise I can see about running them on my ec2 instance I have.

Copy link
Member

@martin-g martin-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a test with a sliced FixedSizeList ?
It is the only List impl that has a special logic for offsets() and a test would make us sleep better!

@Omega359
Copy link
Contributor

Omega359 commented Mar 3, 2026

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux ip-10-150-132-212 6.17.0-1007-aws #7-Ubuntu SMP Thu Jan 22 19:26:05 UTC 2026 x86_64 GNU/Linux
Comparing neilc/optimize-array-has-any-all-rowconvert (396bec0) to a257c29 diff
BENCH_NAME=array_has
BENCH_COMMAND=cargo bench --bench array_has
BENCH_FILTER=
BENCH_BRANCH_NAME=neilc_optimize-array-has-any-all-rowconvert
Results will be posted here when complete

@Omega359
Copy link
Contributor

Omega359 commented Mar 3, 2026

🤖: Benchmark completed

Details

group                                          main                                   neilc_optimize-array-has-any-all-rowconvert
-----                                          ----                                   -------------------------------------------
array_has_all/all_found_small_needle/10        3.92      3.5±0.01ms        ? ?/sec    1.00    895.8±7.32µs        ? ?/sec
array_has_all/all_found_small_needle/100       1.33      8.0±0.01ms        ? ?/sec    1.00      6.0±0.07ms        ? ?/sec
array_has_all/all_found_small_needle/500       1.00     30.2±0.14ms        ? ?/sec    1.79     54.0±0.56ms        ? ?/sec
array_has_all/not_all_found/10                 5.10      3.3±0.01ms        ? ?/sec    1.00    642.3±3.51µs        ? ?/sec
array_has_all/not_all_found/100                1.35      7.3±0.01ms        ? ?/sec    1.00      5.4±0.04ms        ? ?/sec
array_has_all/not_all_found/500                1.00     27.8±0.41ms        ? ?/sec    1.83     51.0±0.72ms        ? ?/sec
array_has_all_strings/all_found/10             2.53      3.4±0.01ms        ? ?/sec    1.00   1362.7±6.88µs        ? ?/sec
array_has_all_strings/all_found/100            1.28      9.7±0.05ms        ? ?/sec    1.00      7.5±0.03ms        ? ?/sec
array_has_all_strings/all_found/500            1.00     40.9±0.21ms        ? ?/sec    1.49     60.9±0.48ms        ? ?/sec
array_has_all_strings/not_all_found/10         2.69      2.4±0.01ms        ? ?/sec    1.00   905.8±19.14µs        ? ?/sec
array_has_all_strings/not_all_found/100        1.46      9.8±0.04ms        ? ?/sec    1.00      6.7±0.04ms        ? ?/sec
array_has_all_strings/not_all_found/500        1.00     44.8±0.17ms        ? ?/sec    1.45     65.2±0.67ms        ? ?/sec
array_has_any/no_match/10                      3.02      3.6±0.01ms        ? ?/sec    1.00  1183.3±13.12µs        ? ?/sec
array_has_any/no_match/100                     1.07     11.7±0.03ms        ? ?/sec    1.00     10.9±0.56ms        ? ?/sec
array_has_any/no_match/500                     1.00     49.6±0.56ms        ? ?/sec    1.54     76.2±1.60ms        ? ?/sec
array_has_any/scalar_no_match/10               1.02   1188.5±8.11µs        ? ?/sec    1.00  1162.1±13.34µs        ? ?/sec
array_has_any/scalar_no_match/100              1.00     11.3±0.01ms        ? ?/sec    1.00     11.3±0.03ms        ? ?/sec
array_has_any/scalar_no_match/500              1.02     83.9±0.69ms        ? ?/sec    1.00     82.3±1.02ms        ? ?/sec
array_has_any/scalar_some_match/10             1.00    682.4±3.92µs        ? ?/sec    1.02   693.0±18.16µs        ? ?/sec
array_has_any/scalar_some_match/100            1.00      5.8±0.02ms        ? ?/sec    1.00      5.8±0.04ms        ? ?/sec
array_has_any/scalar_some_match/500            1.05     56.8±0.72ms        ? ?/sec    1.00     54.2±1.36ms        ? ?/sec
array_has_any/some_match/10                    4.28      3.3±0.02ms        ? ?/sec    1.00    778.3±6.39µs        ? ?/sec
array_has_any/some_match/100                   1.33      7.7±0.04ms        ? ?/sec    1.00      5.8±0.08ms        ? ?/sec
array_has_any/some_match/500                   1.00     29.8±0.59ms        ? ?/sec    1.78     52.9±0.53ms        ? ?/sec
array_has_any_scalar/i64_no_match/1            1.08    214.1±1.34µs        ? ?/sec    1.00    197.7±1.41µs        ? ?/sec
array_has_any_scalar/i64_no_match/10           1.00    234.9±9.23µs        ? ?/sec    1.00    234.3±7.88µs        ? ?/sec
array_has_any_scalar/i64_no_match/100          1.02   335.4±17.80µs        ? ?/sec    1.00   328.9±17.53µs        ? ?/sec
array_has_any_scalar/i64_no_match/1000         1.00    280.1±7.92µs        ? ?/sec    1.02    285.5±9.44µs        ? ?/sec
array_has_any_scalar/string_no_match/1         1.02    162.2±1.45µs        ? ?/sec    1.00    159.1±0.93µs        ? ?/sec
array_has_any_scalar/string_no_match/10        1.02    233.0±6.74µs        ? ?/sec    1.00    227.7±6.98µs        ? ?/sec
array_has_any_scalar/string_no_match/100       1.00   298.5±11.34µs        ? ?/sec    1.00   299.7±13.40µs        ? ?/sec
array_has_any_scalar/string_no_match/1000      1.01    245.5±6.70µs        ? ?/sec    1.00    243.0±6.24µs        ? ?/sec
array_has_any_strings/no_match/10              2.09      3.5±0.02ms        ? ?/sec    1.00   1655.1±8.80µs        ? ?/sec
array_has_any_strings/no_match/100             1.18     13.8±0.05ms        ? ?/sec    1.00     11.6±0.06ms        ? ?/sec
array_has_any_strings/no_match/500             1.00     78.9±0.19ms        ? ?/sec    1.26     99.3±0.52ms        ? ?/sec
array_has_any_strings/scalar_no_match/10       1.00    559.1±2.95µs        ? ?/sec    1.06    590.7±6.64µs        ? ?/sec
array_has_any_strings/scalar_no_match/100      1.00      4.7±0.02ms        ? ?/sec    1.02      4.7±0.03ms        ? ?/sec
array_has_any_strings/scalar_no_match/500      1.00     48.3±0.09ms        ? ?/sec    1.14     54.9±0.17ms        ? ?/sec
array_has_any_strings/scalar_some_match/10     1.00    450.1±4.62µs        ? ?/sec    1.09    492.8±2.35µs        ? ?/sec
array_has_any_strings/scalar_some_match/100    1.00      2.3±0.01ms        ? ?/sec    1.05      2.4±0.01ms        ? ?/sec
array_has_any_strings/scalar_some_match/500    1.00      8.9±0.04ms        ? ?/sec    1.02      9.0±0.02ms        ? ?/sec
array_has_any_strings/some_match/10            2.50      3.0±0.01ms        ? ?/sec    1.00   1213.5±8.73µs        ? ?/sec
array_has_any_strings/some_match/100           1.28      8.6±0.10ms        ? ?/sec    1.00      6.8±0.03ms        ? ?/sec
array_has_any_strings/some_match/500           1.00     35.1±0.11ms        ? ?/sec    1.59     56.0±0.83ms        ? ?/sec
array_has_i64/found/10                         1.00    131.2±1.08µs        ? ?/sec    1.01    132.1±2.58µs        ? ?/sec
array_has_i64/found/100                        1.00    615.6±1.03µs        ? ?/sec    1.02    629.0±6.16µs        ? ?/sec
array_has_i64/found/500                        1.00      2.7±0.01ms        ? ?/sec    1.02      2.7±0.01ms        ? ?/sec
array_has_i64/not_found/10                     1.01     75.0±0.65µs        ? ?/sec    1.00     74.4±2.37µs        ? ?/sec
array_has_i64/not_found/100                    1.00    534.6±0.90µs        ? ?/sec    1.01    541.5±3.25µs        ? ?/sec
array_has_i64/not_found/500                    1.04      2.7±0.23ms        ? ?/sec    1.00      2.6±0.01ms        ? ?/sec
array_has_strings/found/10                     1.01    477.7±8.76µs        ? ?/sec    1.00    473.2±3.84µs        ? ?/sec
array_has_strings/found/100                    1.00  1822.6±12.53µs        ? ?/sec    1.00   1821.8±8.67µs        ? ?/sec
array_has_strings/found/500                    1.00      5.6±0.04ms        ? ?/sec    1.00      5.5±0.04ms        ? ?/sec
array_has_strings/not_found/10                 1.02     85.4±2.33µs        ? ?/sec    1.00     83.6±1.93µs        ? ?/sec
array_has_strings/not_found/100                1.01      3.5±0.01ms        ? ?/sec    1.00      3.5±0.01ms        ? ?/sec
array_has_strings/not_found/500                1.01     11.3±0.07ms        ? ?/sec    1.00     11.2±0.06ms        ? ?/sec

@Omega359
Copy link
Contributor

Omega359 commented Mar 3, 2026

The above was run on my ec2 dev instance with nothing else running. It's as stable a benchmark as I can realistically get.

@neilconway
Copy link
Contributor Author

The above was run on my ec2 dev instance with nothing else running. It's as stable a benchmark as I can realistically get.

Interesting. The regressions are mostly in the 500 element benchmarks. I didn't see similar behavior on my local dev box (M4 Max). I'll do a run on a cloud dev box and see if I can repro the results.

@neilconway
Copy link
Contributor Author

Here's what I get on a Hetzner cloud box (cax31):

  group                                          base                                   target
  -----                                          ----                                   ------
  array_has_all/all_found_small_needle/10        4.70      6.6±0.23ms        ? ?/sec    1.00  1407.2±12.51µs        ? ?/sec
  array_has_all/all_found_small_needle/100       1.46     15.6±0.13ms        ? ?/sec    1.00     10.7±0.08ms        ? ?/sec
  array_has_all/all_found_small_needle/500       1.00     55.9±2.89ms        ? ?/sec    1.57     87.6±1.55ms        ? ?/sec
  array_has_all/not_all_found/10                 5.65      6.3±0.13ms        ? ?/sec    1.00  1108.0±12.92µs        ? ?/sec
  array_has_all/not_all_found/100                1.57     14.2±0.15ms        ? ?/sec    1.00      9.0±0.08ms        ? ?/sec
  array_has_all/not_all_found/500                1.00     49.6±0.43ms        ? ?/sec    1.63     80.7±1.71ms        ? ?/sec
  array_has_all_strings/all_found/10             2.21      4.6±0.04ms        ? ?/sec    1.00      2.1±0.04ms        ? ?/sec
  array_has_all_strings/all_found/100            1.29     15.4±0.12ms        ? ?/sec    1.00     11.9±0.15ms        ? ?/sec
  array_has_all_strings/all_found/500            1.00     59.2±0.94ms        ? ?/sec    1.56     92.6±1.24ms        ? ?/sec
  array_has_all_strings/not_all_found/10         2.82      3.9±0.03ms        ? ?/sec    1.00  1386.6±17.25µs        ? ?/sec
  array_has_all_strings/not_all_found/100        1.34     13.7±0.11ms        ? ?/sec    1.00     10.2±0.15ms        ? ?/sec
  array_has_all_strings/not_all_found/500        1.00     70.5±0.74ms        ? ?/sec    1.46    102.8±1.37ms        ? ?/sec
  array_has_any/no_match/10                      3.17      7.3±0.04ms        ? ?/sec    1.00      2.3±0.03ms        ? ?/sec
  array_has_any/no_match/100                     1.19     23.2±0.36ms        ? ?/sec    1.00     19.5±0.25ms        ? ?/sec
  array_has_any/no_match/500                     1.00     93.4±0.86ms        ? ?/sec    1.41    131.6±1.76ms        ? ?/sec
  array_has_any/scalar_no_match/10               1.00      2.2±0.02ms        ? ?/sec    1.01      2.2±0.02ms        ? ?/sec
  array_has_any/scalar_no_match/100              1.00     20.9±0.33ms        ? ?/sec    1.01     21.1±0.17ms        ? ?/sec
  array_has_any/scalar_no_match/500              1.00    138.1±1.94ms        ? ?/sec    1.03    142.3±1.72ms        ? ?/sec
  array_has_any/scalar_some_match/10             1.00  1070.3±16.72µs        ? ?/sec    1.00  1069.4±13.97µs        ? ?/sec
  array_has_any/scalar_some_match/100            1.00     11.1±0.11ms        ? ?/sec    1.03     11.4±0.18ms        ? ?/sec
  array_has_any/scalar_some_match/500            1.00     85.8±1.13ms        ? ?/sec    1.02     87.5±1.15ms        ? ?/sec
  array_has_any/some_match/10                    4.94      6.4±0.11ms        ? ?/sec    1.00  1298.0±19.57µs        ? ?/sec
  array_has_any/some_match/100                   1.35     14.8±0.11ms        ? ?/sec    1.00     10.9±0.11ms        ? ?/sec
  array_has_any/some_match/500                   1.00     51.2±0.61ms        ? ?/sec    1.74     89.1±1.53ms        ? ?/sec
  array_has_any_scalar/i64_no_match/1            1.00    375.8±4.58µs        ? ?/sec    1.02    383.9±5.82µs        ? ?/sec
  array_has_any_scalar/i64_no_match/10           1.07   486.9±59.32µs        ? ?/sec    1.00   453.7±12.16µs        ? ?/sec
  array_has_any_scalar/i64_no_match/100          1.00   639.7±22.48µs        ? ?/sec    1.00   637.8±26.00µs        ? ?/sec
  array_has_any_scalar/i64_no_match/1000         1.01   556.6±21.28µs        ? ?/sec    1.00   549.5±13.52µs        ? ?/sec
  array_has_any_scalar/string_no_match/1         1.00    251.5±2.22µs        ? ?/sec    1.03    258.6±2.65µs        ? ?/sec
  array_has_any_scalar/string_no_match/10        1.03   437.6±10.87µs        ? ?/sec    1.00    424.6±7.96µs        ? ?/sec
  array_has_any_scalar/string_no_match/100       1.00   552.0±15.64µs        ? ?/sec    1.02   564.1±23.50µs        ? ?/sec
  array_has_any_scalar/string_no_match/1000      1.00   465.9±16.90µs        ? ?/sec    1.01   469.5±10.49µs        ? ?/sec
  array_has_any_strings/no_match/10              2.09      5.0±0.04ms        ? ?/sec    1.00      2.4±0.03ms        ? ?/sec
  array_has_any_strings/no_match/100             1.22     21.5±0.13ms        ? ?/sec    1.00     17.7±0.24ms        ? ?/sec
  array_has_any_strings/no_match/500             1.00    131.0±0.73ms        ? ?/sec    1.22    159.6±2.74ms        ? ?/sec
  array_has_any_strings/scalar_no_match/10       1.00    876.5±5.30µs        ? ?/sec    1.06   924.9±16.90µs        ? ?/sec
  array_has_any_strings/scalar_no_match/100      1.00      7.5±0.07ms        ? ?/sec    1.06      8.0±0.11ms        ? ?/sec
  array_has_any_strings/scalar_no_match/500      1.00     86.4±0.53ms        ? ?/sec    1.02     88.5±1.01ms        ? ?/sec
  array_has_any_strings/scalar_some_match/10     1.00    761.8±6.51µs        ? ?/sec    1.02    774.4±7.06µs        ? ?/sec
  array_has_any_strings/scalar_some_match/100    1.00      5.1±0.14ms        ? ?/sec    1.07      5.5±0.34ms        ? ?/sec
  array_has_any_strings/scalar_some_match/500    1.00     17.4±0.15ms        ? ?/sec    1.05     18.3±0.23ms        ? ?/sec
  array_has_any_strings/some_match/10            2.43      4.3±0.03ms        ? ?/sec    1.00  1763.5±21.29µs        ? ?/sec
  array_has_any_strings/some_match/100           1.30     14.1±0.15ms        ? ?/sec    1.00     10.8±0.19ms        ? ?/sec
  array_has_any_strings/some_match/500           1.00     53.2±0.65ms        ? ?/sec    1.61     85.6±1.75ms        ? ?/sec
  array_has_i64/found/10                         1.00    149.7±6.18µs        ? ?/sec    1.03    154.2±6.32µs        ? ?/sec
  array_has_i64/found/100                        1.00  613.2±101.28µs        ? ?/sec    1.04   639.6±77.43µs        ? ?/sec
  array_has_i64/found/500                        1.00      4.4±0.12ms        ? ?/sec    1.04      4.6±0.11ms        ? ?/sec
  array_has_i64/not_found/10                     1.04     71.7±1.02µs        ? ?/sec    1.00     68.7±1.93µs        ? ?/sec
  array_has_i64/not_found/100                    1.00   426.7±17.72µs        ? ?/sec    1.03   440.9±24.04µs        ? ?/sec
  array_has_i64/not_found/500                    1.00      4.4±0.11ms        ? ?/sec    1.02      4.5±0.15ms        ? ?/sec
  array_has_strings/found/10                     1.00    685.4±6.84µs        ? ?/sec    1.00    688.6±6.93µs        ? ?/sec
  array_has_strings/found/100                    1.00      2.6±0.06ms        ? ?/sec    1.03      2.7±0.04ms        ? ?/sec
  array_has_strings/found/500                    1.00     15.1±0.18ms        ? ?/sec    1.03     15.6±0.43ms        ? ?/sec
  array_has_strings/not_found/10                 1.01    152.6±0.82µs        ? ?/sec    1.00    150.7±2.05µs        ? ?/sec
  array_has_strings/not_found/100                1.00      5.8±0.04ms        ? ?/sec    1.02      5.9±0.12ms        ? ?/sec
  array_has_strings/not_found/500                1.03     16.7±0.26ms        ? ?/sec    1.00     16.2±0.19ms        ? ?/sec

So we do indeed see some regressions for large arrays. I'm not entirely sure why that would be ... I suppose for 10k rows * 500 elements we end up pushing a lot more stuff out of L1/L2, whereas the previous approach uses a smaller working set. I'm surprised that the effect is that pronounced, though.

Let me try doing the row conversion in smaller batches and see if that helps.

@Omega359
Copy link
Contributor

Omega359 commented Mar 3, 2026

That is interesting. I ran the benchmark on a m7i.4xlarge which is an Intel Sapphire Rapids machine. I'm curious why we're seeing it on cloud machines but not on your m4 mac. L2 cache is shared iirc on m4 max which might be a reason for the difference.

I guess running benchmarks on cloud hardware is worth it considering that most implementations of DF will run on server grade hardware, not consumer - even though the consumer may be faster in some cases

@neilconway
Copy link
Contributor Author

Alright, I implemented a variant where we do row conversion in chunks of 256 rows. Here are the results on the Hertzner box:

  group                                          base                                   target
  -----                                          ----                                   ------
  array_has_all/all_found_small_needle/10        4.81      6.8±0.04ms        ? ?/sec    1.00  1422.9±33.96µs        ? ?/sec
  array_has_all/all_found_small_needle/100       1.62     16.6±0.04ms        ? ?/sec    1.00     10.2±0.03ms        ? ?/sec
  array_has_all/all_found_small_needle/500       1.19     59.4±0.09ms        ? ?/sec    1.00     49.8±0.12ms        ? ?/sec
  array_has_all/not_all_found/10                 5.85      6.5±0.03ms        ? ?/sec    1.00   1115.8±9.24µs        ? ?/sec
  array_has_all/not_all_found/100                1.71     15.0±0.05ms        ? ?/sec    1.00      8.8±0.03ms        ? ?/sec
  array_has_all/not_all_found/500                1.22     52.5±0.11ms        ? ?/sec    1.00     43.0±0.09ms        ? ?/sec
  array_has_all_strings/all_found/10             2.71      5.3±0.03ms        ? ?/sec    1.00   1948.9±7.79µs        ? ?/sec
  array_has_all_strings/all_found/100            1.43     15.8±0.04ms        ? ?/sec    1.00     11.1±0.04ms        ? ?/sec
  array_has_all_strings/all_found/500            1.18     61.0±0.14ms        ? ?/sec    1.00     51.6±0.62ms        ? ?/sec
  array_has_all_strings/not_all_found/10         3.05      4.1±0.02ms        ? ?/sec    1.00  1338.3±65.23µs        ? ?/sec
  array_has_all_strings/not_all_found/100        1.48     14.2±0.08ms        ? ?/sec    1.00      9.6±0.05ms        ? ?/sec
  array_has_all_strings/not_all_found/500        1.23     75.4±0.17ms        ? ?/sec    1.00     61.2±0.19ms        ? ?/sec
  array_has_any/no_match/10                      3.46      7.8±0.05ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
  array_has_any/no_match/100                     1.35     25.3±0.11ms        ? ?/sec    1.00     18.7±0.03ms        ? ?/sec
  array_has_any/no_match/500                     1.14    105.4±0.13ms        ? ?/sec    1.00     92.8±2.97ms        ? ?/sec
  array_has_any/scalar_no_match/10               1.11      2.4±0.01ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
  array_has_any/scalar_no_match/100              1.10     22.9±0.06ms        ? ?/sec    1.00     20.8±0.06ms        ? ?/sec
  array_has_any/scalar_no_match/500              1.06    148.5±0.64ms        ? ?/sec    1.00    140.2±1.91ms        ? ?/sec
  array_has_any/scalar_some_match/10             1.07   1133.4±3.89µs        ? ?/sec    1.00   1061.6±4.64µs        ? ?/sec
  array_has_any/scalar_some_match/100            1.04     11.6±0.16ms        ? ?/sec    1.00     11.2±0.08ms        ? ?/sec
  array_has_any/scalar_some_match/500            1.05     90.9±0.71ms        ? ?/sec    1.00     87.0±0.88ms        ? ?/sec
  array_has_any/some_match/10                    5.26      6.6±0.05ms        ? ?/sec    1.00   1264.5±3.59µs        ? ?/sec
  array_has_any/some_match/100                   1.60     15.7±0.08ms        ? ?/sec    1.00      9.8±0.03ms        ? ?/sec
  array_has_any/some_match/500                   1.17     55.9±0.20ms        ? ?/sec    1.00     47.8±0.33ms        ? ?/sec
  array_has_any_scalar/i64_no_match/1            1.06    396.6±2.17µs        ? ?/sec    1.00    372.8±3.30µs        ? ?/sec
  array_has_any_scalar/i64_no_match/10           1.01    449.7±8.66µs        ? ?/sec    1.00   446.0±10.76µs        ? ?/sec
  array_has_any_scalar/i64_no_match/100          1.02   639.2±20.48µs        ? ?/sec    1.00   628.6±17.24µs        ? ?/sec
  array_has_any_scalar/i64_no_match/1000         1.00   545.1±10.73µs        ? ?/sec    1.00   544.1±13.21µs        ? ?/sec
  array_has_any_scalar/string_no_match/1         1.00    250.5±2.16µs        ? ?/sec    1.03    257.9±8.09µs        ? ?/sec
  array_has_any_scalar/string_no_match/10        1.00    418.3±6.45µs        ? ?/sec    1.00    419.4±6.58µs        ? ?/sec
  array_has_any_scalar/string_no_match/100       1.00   544.9±22.43µs        ? ?/sec    1.01   550.0±24.24µs        ? ?/sec
  array_has_any_scalar/string_no_match/1000      1.00    457.7±8.87µs        ? ?/sec    1.00    459.1±6.78µs        ? ?/sec
  array_has_any_strings/no_match/10              2.12      5.2±0.02ms        ? ?/sec    1.00      2.4±0.01ms        ? ?/sec
  array_has_any_strings/no_match/100             1.21     22.5±0.07ms        ? ?/sec    1.00     18.6±0.20ms        ? ?/sec
  array_has_any_strings/no_match/500             1.11    141.5±0.18ms        ? ?/sec    1.00    127.2±0.39ms        ? ?/sec
  array_has_any_strings/scalar_no_match/10       1.00    861.4±1.90µs        ? ?/sec    1.06    909.8±1.83µs        ? ?/sec
  array_has_any_strings/scalar_no_match/100      1.00      7.4±0.06ms        ? ?/sec    1.08      8.0±0.14ms        ? ?/sec
  array_has_any_strings/scalar_no_match/500      1.02     93.9±0.13ms        ? ?/sec    1.00     91.7±0.23ms        ? ?/sec
  array_has_any_strings/scalar_some_match/10     1.05    827.3±3.93µs        ? ?/sec    1.00    788.8±3.78µs        ? ?/sec
  array_has_any_strings/scalar_some_match/100    1.01      5.2±0.17ms        ? ?/sec    1.00      5.1±0.14ms        ? ?/sec
  array_has_any_strings/scalar_some_match/500    1.00     17.7±0.11ms        ? ?/sec    1.04     18.5±0.15ms        ? ?/sec
  array_has_any_strings/some_match/10            2.56      4.5±0.01ms        ? ?/sec    1.00   1758.6±7.71µs        ? ?/sec
  array_has_any_strings/some_match/100           1.36     14.4±0.07ms        ? ?/sec    1.00     10.6±0.06ms        ? ?/sec
  array_has_any_strings/some_match/500           1.10     54.9±1.41ms        ? ?/sec    1.00     50.1±0.20ms        ? ?/sec
  array_has_i64/found/10                         1.00    144.9±4.94µs        ? ?/sec    1.02    147.7±4.93µs        ? ?/sec
  array_has_i64/found/100                        1.00   570.5±31.30µs        ? ?/sec    1.06   605.6±35.62µs        ? ?/sec
  array_has_i64/found/500                        1.00      4.4±0.15ms        ? ?/sec    1.02      4.5±0.12ms        ? ?/sec
  array_has_i64/not_found/10                     1.03     68.8±0.44µs        ? ?/sec    1.00     67.0±1.26µs        ? ?/sec
  array_has_i64/not_found/100                    1.02   471.6±27.43µs        ? ?/sec    1.00   462.7±22.65µs        ? ?/sec
  array_has_i64/not_found/500                    1.00      4.5±0.11ms        ? ?/sec    1.00      4.5±0.11ms        ? ?/sec
  array_has_strings/found/10                     1.10    744.8±5.29µs        ? ?/sec    1.00    679.9±5.94µs        ? ?/sec
  array_has_strings/found/100                    1.00      2.7±0.03ms        ? ?/sec    1.00      2.7±0.04ms        ? ?/sec
  array_has_strings/found/500                    1.00     15.6±0.21ms        ? ?/sec    1.05     16.3±0.35ms        ? ?/sec
  array_has_strings/not_found/10                 1.02    150.5±0.36µs        ? ?/sec    1.00    147.0±1.14µs        ? ?/sec
  array_has_strings/not_found/100                1.11      6.5±0.04ms        ? ?/sec    1.00      5.9±0.08ms        ? ?/sec
  array_has_strings/not_found/500                1.03     16.5±0.04ms        ? ?/sec    1.00     16.0±0.07ms        ? ?/sec

Happily, this seems to address the regressions we saw on large arrays with the initial approach. Less happily, 256-row chunking performs slightly less well than full-batch row conversion on my M4 Max machine, although interestingly the regressions are only for the i64 benchmarks:

  array_has_all (general/i64):

  ┌───────────────────┬────────────────────────────────┐
  │     Benchmark     │ change (chunked vs full-batch) │
  ├───────────────────┼────────────────────────────────┤
  │ all_found/10      │ +9.6% slower                   │
  ├───────────────────┼────────────────────────────────┤
  │ not_all_found/10  │ +9.0% slower                   │
  ├───────────────────┼────────────────────────────────┤
  │ all_found/100     │ +9.2% slower                   │
  ├───────────────────┼────────────────────────────────┤
  │ not_all_found/100 │ +10.0% slower                  │
  ├───────────────────┼────────────────────────────────┤
  │ all_found/500     │ +5.9% slower                   │
  ├───────────────────┼────────────────────────────────┤
  │ not_all_found/500 │ +5.5% slower                   │
  └───────────────────┴────────────────────────────────┘

  array_has_any (general/i64):

  ┌────────────────┬────────────────────────────────┐
  │   Benchmark    │ change (chunked vs full-batch) │
  ├────────────────┼────────────────────────────────┤
  │ some_match/10  │ +4.4% slower                   │
  ├────────────────┼────────────────────────────────┤
  │ no_match/10    │ +3.4% slower                   │
  ├────────────────┼────────────────────────────────┤
  │ some_match/100 │ +4.4% slower                   │
  ├────────────────┼────────────────────────────────┤
  │ no_match/100   │ +4.0% slower                   │
  ├────────────────┼────────────────────────────────┤
  │ some_match/500 │ +2.8% slower                   │
  ├────────────────┼────────────────────────────────┤
  │ no_match/500   │ +2.4% slower                   │
  └────────────────┴────────────────────────────────┘

The string benchmarks were much closer and basically in the noise.

Avoiding the regressions on large arrays seems worth the small performance hit on M4 machines, but it's probably worth exploring a bigger chunk size and seeing if that helps at all.

@neilconway
Copy link
Contributor Author

Here are the results on the Hetzner machine with 512 row chunks:

group                                          base                                   target
  -----                                          ----                                   ------
  array_has_all/all_found_small_needle/10        4.73      6.5±0.03ms        ? ?/sec    1.00   1377.3±7.28µs        ? ?/sec
  array_has_all/all_found_small_needle/100       1.50     15.5±0.05ms        ? ?/sec    1.00     10.3±0.03ms        ? ?/sec
  array_has_all/all_found_small_needle/500       1.05     54.7±0.15ms        ? ?/sec    1.00     52.2±1.56ms        ? ?/sec
  array_has_all/not_all_found/10                 5.84      6.3±0.07ms        ? ?/sec    1.00   1087.9±4.54µs        ? ?/sec
  array_has_all/not_all_found/100                1.60     14.3±0.33ms        ? ?/sec    1.00      9.0±0.08ms        ? ?/sec
  array_has_all/not_all_found/500                1.10     49.0±0.13ms        ? ?/sec    1.00     44.4±0.50ms        ? ?/sec
  array_has_all_strings/all_found/10             2.73      5.4±0.02ms        ? ?/sec    1.00  1958.8±19.92µs        ? ?/sec
  array_has_all_strings/all_found/100            1.36     15.1±0.06ms        ? ?/sec    1.00     11.1±0.08ms        ? ?/sec
  array_has_all_strings/all_found/500            1.13     60.6±1.65ms        ? ?/sec    1.00     53.8±1.26ms        ? ?/sec
  array_has_all_strings/not_all_found/10         3.03      4.0±0.04ms        ? ?/sec    1.00   1305.8±9.69µs        ? ?/sec
  array_has_all_strings/not_all_found/100        1.42     13.6±0.08ms        ? ?/sec    1.00      9.5±0.05ms        ? ?/sec
  array_has_all_strings/not_all_found/500        1.14     69.7±0.27ms        ? ?/sec    1.00     61.1±0.32ms        ? ?/sec
  array_has_any/no_match/10                      3.23      7.3±0.04ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
  array_has_any/no_match/100                     1.22     22.9±0.10ms        ? ?/sec    1.00     18.8±0.05ms        ? ?/sec
  array_has_any/no_match/500                     1.00     92.3±0.24ms        ? ?/sec    1.01     93.2±0.39ms        ? ?/sec
  array_has_any/scalar_no_match/10               1.00      2.2±0.02ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
  array_has_any/scalar_no_match/100              1.00     20.8±0.17ms        ? ?/sec    1.00     20.9±0.11ms        ? ?/sec
  array_has_any/scalar_no_match/500              1.00    136.6±1.66ms        ? ?/sec    1.02    140.0±1.22ms        ? ?/sec
  array_has_any/scalar_some_match/10             1.00  1069.6±17.71µs        ? ?/sec    1.01   1075.2±5.81µs        ? ?/sec
  array_has_any/scalar_some_match/100            1.00     11.0±0.08ms        ? ?/sec    1.01     11.1±0.08ms        ? ?/sec
  array_has_any/scalar_some_match/500            1.00     84.8±0.51ms        ? ?/sec    1.01     85.7±0.71ms        ? ?/sec
  array_has_any/some_match/10                    5.06      6.4±0.04ms        ? ?/sec    1.00   1257.1±4.21µs        ? ?/sec
  array_has_any/some_match/100                   1.46     14.6±0.07ms        ? ?/sec    1.00     10.0±0.19ms        ? ?/sec
  array_has_any/some_match/500                   1.02     51.1±0.15ms        ? ?/sec    1.00     50.0±0.33ms        ? ?/sec
  array_has_any_scalar/i64_no_match/1            1.00    375.2±4.65µs        ? ?/sec    1.02   382.6±30.10µs        ? ?/sec
  array_has_any_scalar/i64_no_match/10           1.00   451.1±11.52µs        ? ?/sec    1.03   464.7±10.41µs        ? ?/sec
  array_has_any_scalar/i64_no_match/100          1.01   638.5±27.58µs        ? ?/sec    1.00   633.0±19.30µs        ? ?/sec
  array_has_any_scalar/i64_no_match/1000         1.00   543.6±11.89µs        ? ?/sec    1.00   544.2±13.11µs        ? ?/sec
  array_has_any_scalar/string_no_match/1         1.00    249.8±1.86µs        ? ?/sec    1.03    258.4±3.13µs        ? ?/sec
  array_has_any_scalar/string_no_match/10        1.00    419.9±8.88µs        ? ?/sec    1.04   438.5±10.85µs        ? ?/sec
  array_has_any_scalar/string_no_match/100       1.00   550.3±23.91µs        ? ?/sec    1.01   556.2±18.31µs        ? ?/sec
  array_has_any_scalar/string_no_match/1000      1.00    461.9±8.79µs        ? ?/sec    1.01    465.6±7.16µs        ? ?/sec
  array_has_any_strings/no_match/10              2.04      5.0±0.03ms        ? ?/sec    1.00      2.5±0.01ms        ? ?/sec
  array_has_any_strings/no_match/100             1.16     21.6±0.14ms        ? ?/sec    1.00     18.7±0.09ms        ? ?/sec
  array_has_any_strings/no_match/500             1.01    129.2±0.40ms        ? ?/sec    1.00    127.5±0.42ms        ? ?/sec
  array_has_any_strings/scalar_no_match/10       1.00    867.0±2.78µs        ? ?/sec    1.07    926.9±9.70µs        ? ?/sec
  array_has_any_strings/scalar_no_match/100      1.00      7.4±0.02ms        ? ?/sec    1.08      8.0±0.03ms        ? ?/sec
  array_has_any_strings/scalar_no_match/500      1.00     85.6±0.35ms        ? ?/sec    1.07     92.0±0.37ms        ? ?/sec
  array_has_any_strings/scalar_some_match/10     1.00   764.9±12.75µs        ? ?/sec    1.04   797.9±10.23µs        ? ?/sec
  array_has_any_strings/scalar_some_match/100    1.00      5.1±0.07ms        ? ?/sec    1.06      5.4±0.05ms        ? ?/sec
  array_has_any_strings/scalar_some_match/500    1.00     17.3±0.10ms        ? ?/sec    1.07     18.6±0.12ms        ? ?/sec
  array_has_any_strings/some_match/10            2.37      4.3±0.01ms        ? ?/sec    1.00  1810.8±172.72µs        ? ?/sec
  array_has_any_strings/some_match/100           1.32     14.2±0.30ms        ? ?/sec    1.00     10.7±0.26ms        ? ?/sec
  array_has_any_strings/some_match/500           1.04     52.4±0.16ms        ? ?/sec    1.00     50.6±0.26ms        ? ?/sec
  array_has_i64/found/10                         1.00    144.5±4.87µs        ? ?/sec    1.03    148.4±4.69µs        ? ?/sec
  array_has_i64/found/100                        1.00   629.1±59.20µs        ? ?/sec    1.03   645.1±46.47µs        ? ?/sec
  array_has_i64/found/500                        1.00      4.4±0.07ms        ? ?/sec    1.04      4.6±0.22ms        ? ?/sec
  array_has_i64/not_found/10                     1.04     69.4±0.56µs        ? ?/sec    1.00     66.6±1.26µs        ? ?/sec
  array_has_i64/not_found/100                    1.00   492.8±37.60µs        ? ?/sec    1.00   491.9±38.98µs        ? ?/sec
  array_has_i64/not_found/500                    1.00      4.3±0.09ms        ? ?/sec    1.00      4.3±0.10ms        ? ?/sec
  array_has_strings/found/10                     1.00    676.5±6.13µs        ? ?/sec    1.01    686.6±7.26µs        ? ?/sec
  array_has_strings/found/100                    1.00      2.7±0.05ms        ? ?/sec    1.02      2.7±0.02ms        ? ?/sec
  array_has_strings/found/500                    1.01     15.6±0.18ms        ? ?/sec    1.00     15.5±0.22ms        ? ?/sec
  array_has_strings/not_found/10                 1.02    152.4±1.15µs        ? ?/sec    1.00    149.3±1.57µs        ? ?/sec
  array_has_strings/not_found/100                1.00      5.7±0.02ms        ? ?/sec    1.01      5.8±0.01ms        ? ?/sec
  array_has_strings/not_found/500                1.00     16.2±0.04ms        ? ?/sec    1.01     16.4±0.55ms        ? ?/sec

I'm inclined to go with 512 row chunking: it seems that this reduces cache pressure sufficiently, while doing half as many row-conversion calls as 256 row chunking. I've updated the PR with that approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use batched row conversion for array_has_any, array_has_all

3 participants