perf: Optimize `array_agg()` using `GroupsAccumulator` by neilconway · Pull Request #20504 · apache/datafusion

neilconway · 2026-02-23T19:14:21Z

Which issue does this PR close?

Rationale for this change

This PR optimizes the performance of array_agg() by adding support for the GroupsAccumulator API.

The design tries to minimize the amount of per-batch work done in update_batch(): we store a reference to the batch, and a (group_idx, row_idx) pair for each row. In evaluate(), we assemble all the requested output with a single interleave call.

This turns out to be significantly faster, because we copy much less data and assembling the results can be vectorized more effectively. For example, on a benchmark with 5000 groups and 5000 int64 values per group, this approach is roughly 190x faster than the previous approach.

Releasing memory after a partial emit is a little more involved than the previous approach, but with some determination it is still possible.

What changes are included in this PR?

Implement the GroupsAccumulator API for array_agg()
Add benchmark for array_agg of a named struct over a dict, following the workload in Slow aggregrate query with array_agg, Polars is 4 times faster for equal query #17446
Add unit tests
Improve SLT test coverage
Remove a redundant SLT test

Are these changes tested?

Yes, and benchmarked.

Are there any user-facing changes?

No.

AI usage

Iterated with the help of multiple AI tools; I've reviewed and understand the resulting code.

neilconway · 2026-02-23T19:15:03Z

Benchmarks:

group                                   array-agg-opt                          array-agg-vanilla
-----                                   -------------                          -----------------
array_agg_query_group_by_few_groups     1.04    632.3±5.29µs        ? ?/sec    1.00    609.2±5.90µs        ? ?/sec
array_agg_query_group_by_many_groups    1.00      2.2±0.03ms        ? ?/sec    12.15    26.6±0.39ms        ? ?/sec
array_agg_query_group_by_mid_groups     1.00  1104.0±12.85µs        ? ?/sec    3.32      3.7±0.02ms        ? ?/sec

neilconway · 2026-02-23T19:50:14Z

datafusion/sqllogictest/test_files/aggregate_skip_partial.slt

 d 2.444444444444 0.541519476308
 e 3 0.505440263521

-# FIXME: add bool_and(v3) column when issue fixed


Not related to this PR -- this test was redundant with another query in the same file. Happy to split into another PR if preferred.

datafusion/functions-aggregate/src/array_agg.rs

martin-g · 2026-02-24T09:56:21Z

datafusion/functions-aggregate/src/array_agg.rs

+        opt_filter: Option<&BooleanArray>,
+        total_num_groups: usize,
+    ) -> Result<()> {
+        assert_eq!(values.len(), 1, "single argument to update_batch");


Suggested change

assert_eq!(values.len(), 1, "single argument to update_batch");

assert_eq_or_internal_err!(values.len(), 1, "single argument to update_batch");

I was using assert_eq! because that is what a bunch of the other aggregate functions use, e.g.,:

datafusion/datafusion/functions-aggregate/src/median.rs

Line 360 in 10a1d4e

assert_eq!(values.len(), 1, "single argument to update_batch");

datafusion/datafusion/functions-aggregate/src/median.rs

Line 385 in 10a1d4e

assert_eq!(values.len(), 1, "one argument to merge_batch");

datafusion/datafusion/functions-aggregate/src/median.rs

Line 488 in 10a1d4e

assert_eq!(values.len(), 1, "one argument to merge_batch");

datafusion/datafusion/functions-aggregate/src/variance.rs

Line 510 in 10a1d4e

assert_eq!(values.len(), 1, "single argument to update_batch");

datafusion/datafusion/functions-aggregate/src/variance.rs

Line 536 in 10a1d4e

assert_eq!(values.len(), 3, "two arguments to merge_batch");

I don't have a strong preference but we should probably be consistent.

Let's see what others think.
My opinion is that a library should try to avoid panicking as hard as possible.
But this could be improved in a follow-up if others agree!

martin-g · 2026-02-24T09:56:30Z

datafusion/functions-aggregate/src/array_agg.rs

+        _opt_filter: Option<&BooleanArray>,
+        total_num_groups: usize,
+    ) -> Result<()> {
+        assert_eq!(values.len(), 1, "one argument to merge_batch");


Suggested change

assert_eq!(values.len(), 1, "one argument to merge_batch");

assert_eq_or_internal_err!(values.len(), 1, "one argument to merge_batch");

martin-g · 2026-02-24T09:56:40Z

datafusion/functions-aggregate/src/array_agg.rs

+        values: &[ArrayRef],
+        opt_filter: Option<&BooleanArray>,
+    ) -> Result<Vec<ArrayRef>> {
+        assert_eq!(values.len(), 1, "one argument to convert_to_state");


Suggested change

assert_eq!(values.len(), 1, "one argument to convert_to_state");

assert_eq_or_internal_err!(values.len(), 1, "one argument to convert_to_state");

alamb · 2026-02-24T13:07:45Z

Also related

feat: implement GroupArrayAggAccumulator attempt 3 #17915

cc @duongcongtoai perhaps you have some time to help review this PR?

datafusion/functions-aggregate/src/array_agg.rs

duongcongtoai · 2026-02-27T15:14:53Z

datafusion/functions-aggregate/src/array_agg.rs

+        !args.is_distinct && args.order_bys.is_empty()
+    }
+
+    fn create_groups_accumulator(


i was trying to compare this implementation with myown and found 2x difference (mine is faster 😆 ), i was trying to add some profile to check if there is room for improvement but most of the runtime goes to the optimizer parts instead of the real micro function, maybe we should refactor the benchmark/add micro benchmark there

i was trying to compare this implementation with #17915 and found 2x difference (mine is faster 😆 )

Interesting -- which workload did you use for this?

i was refering to array_agg_query_group_by_many_groups

neilconway · 2026-02-27T16:52:23Z

Comparing this implementation with #17915 on the end-to-end array_agg benchmarks that were added in d303f58, here's the results I get:

  ┌─────────────┬─────────────────┬───────────────────┬───────────────────────────┐
  │  Benchmark  │ Baseline (main) │ PR #20504 (neilc) │ PR #17915 (duongcongtoai) │
  ├─────────────┼─────────────────┼───────────────────┼───────────────────────────┤
  │ few_groups  │ 600.8 µs        │ 620.1 µs          │ 867.2 µs                  │
  ├─────────────┼─────────────────┼───────────────────┼───────────────────────────┤
  │ mid_groups  │ 3.660 ms        │ 1.094 ms          │ 1.046 ms                  │
  ├─────────────┼─────────────────┼───────────────────┼───────────────────────────┤
  │ many_groups │ 26.05 ms        │ 2.248 ms          │ 1.144 ms                  │
  └─────────────┴─────────────────┴───────────────────┴───────────────────────────┘

  Change vs. Baseline

  ┌─────────────┬───────────────────────────┬───────────────────────────────┐
  │  Benchmark  │         PR #20504         │           PR #17915           │
  ├─────────────┼───────────────────────────┼───────────────────────────────┤
  │ few_groups  │ +3.0% (slight regression) │ +44% (significant regression) │
  ├─────────────┼───────────────────────────┼───────────────────────────────┤
  │ mid_groups  │ −70% (3.3x faster)        │ −71% (3.5x faster)            │
  ├─────────────┼───────────────────────────┼───────────────────────────────┤
  │ many_groups │ −91% (11.6x faster)       │ −96% (22.8x faster)           │
  └─────────────┴───────────────────────────┴───────────────────────────────┘

neilconway · 2026-02-28T03:07:43Z

I think my initial approach of organizing the aggregate state by group wasn't ideal -- when there are many groups, this leads to a lot of small allocations and a more random memory access pattern. I pushed a new version of this PR that uses a per-batch organization instead: that is, for each batch we keep a reference to the batch contents, plus a Vec of (group_idx, row_idx) pairs, one for each row. There are many fewer batches than there are groups (at least in the many-groups case), so this can be a significant win.

Updated benchmark numbers:

  ┌─────────────────────────┬─────────┬────────────────┬──────────────────────┐
  │        Benchmark        │  main   │ feature branch │        Change        │
  ├─────────────────────────┼─────────┼────────────────┼──────────────────────┤
  │ few_groups              │ 607 µs  │ 679 µs         │ +11.5% (regression)  │
  ├─────────────────────────┼─────────┼────────────────┼──────────────────────┤
  │ mid_groups              │ 3.61 ms │ 789 µs         │ -78.1% (4.6x faster) │
  ├─────────────────────────┼─────────┼────────────────┼──────────────────────┤
  │ many_groups             │ 26.0 ms │ 1.04 ms        │ -96.0% (25x faster)  │
  ├─────────────────────────┼─────────┼────────────────┼──────────────────────┤
  │ struct_mid_groups (new) │ 9.10 ms │ 996 µs         │ -89.1% (9.1x faster) │
  └─────────────────────────┴─────────┴────────────────┴──────────────────────┘

So this approach is >2x faster than my initial approach for the many-groups case, and somewhat faster for the mid-groups case as well. It is slightly slower for the few-groups case, but intuitively I'd guess that is tolerable: the regression is relatively modest compared to the speedups for other cases. But LMK if folks disagree about that.

duongcongtoai

lgtm, nice work!

neilconway · 2026-02-28T13:14:29Z

Running the repro from #17446, main vs. this PR:

$ hyperfine '~/df/review/target/release/datafusion-cli -f report.sql'
Benchmark 1: ~/df/review/target/release/datafusion-cli -f report.sql
  Time (mean ± σ):     778.5 ms ±  18.5 ms    [User: 5937.0 ms, System: 184.9 ms]
  Range (min … max):   752.9 ms … 803.6 ms    10 runs

$ hyperfine '~/df/main/target/release/datafusion-cli -f report.sql'
Benchmark 1: ~/df/main/target/release/datafusion-cli -f report.sql
  Time (mean ± σ):     132.9 ms ±   9.0 ms    [User: 273.2 ms, System: 57.5 ms]
  Range (min … max):   125.0 ms … 153.0 ms    23 runs

(Directory names are a bit confusing: the review worktree is actually on main, and the main worktree is on this PR branch.)

martin-g · 2026-02-28T13:21:41Z

Thank you @neilconway, @duongcongtoai and @alamb !

alamb · 2026-03-01T14:07:04Z

This is so great -- thank you all

github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Feb 23, 2026

neilconway commented Feb 23, 2026

View reviewed changes

neilconway changed the title ~~perf: Optimize array_agg()~~ perf: Optimize array_agg() using GroupsAccumulator Feb 23, 2026

martin-g reviewed Feb 24, 2026

View reviewed changes

martin-g approved these changes Feb 24, 2026

View reviewed changes

duongcongtoai reviewed Feb 27, 2026

View reviewed changes

neilconway added 6 commits February 27, 2026 20:31

Use GroupsAccumulator for array_agg

ffe7a6f

Address code review comments

0140c9d

Add benchmark with dict, named struct

fbb37bd

Replace per-group Vec<Vec> with flat Vec, sort at evaluate

3aa4164

Switch to storing a Vec of <group_idx, row_idx> per batch.

e7838a0

More aggressive GC on partial emit

c5238ac

neilconway force-pushed the neilc/optimize-array-agg branch from 9bc1f4e to c5238ac Compare February 28, 2026 02:58

github-actions bot added the core Core DataFusion crate label Feb 28, 2026

duongcongtoai approved these changes Feb 28, 2026

View reviewed changes

martin-g added this pull request to the merge queue Feb 28, 2026

Merged via the queue into apache:main with commit 3a23bb2 Feb 28, 2026
31 checks passed

neilconway deleted the neilc/optimize-array-agg branch February 28, 2026 13:33

alamb added the performance Make DataFusion faster label Mar 1, 2026

	assert_eq!(values.len(), 1, "single argument to update_batch");
	assert_eq_or_internal_err!(values.len(), 1, "single argument to update_batch");

	assert_eq!(values.len(), 1, "one argument to merge_batch");
	assert_eq_or_internal_err!(values.len(), 1, "one argument to merge_batch");

Conversation

neilconway commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

AI usage

Uh oh!

neilconway commented Feb 23, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Feb 24, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neilconway commented Feb 27, 2026

Uh oh!

neilconway commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

duongcongtoai left a comment

Choose a reason for hiding this comment

Uh oh!

neilconway commented Feb 28, 2026

Uh oh!

Uh oh!

martin-g commented Feb 28, 2026

Uh oh!

alamb commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

neilconway commented Feb 23, 2026 •

edited

Loading

neilconway commented Feb 28, 2026 •

edited

Loading