Skip to content

GH-39565: [C++] Do not concatenate ChunkedArray when running take function#39566

Closed
amol- wants to merge 28 commits into
apache:mainfrom
amol-:chunked-take
Closed

GH-39565: [C++] Do not concatenate ChunkedArray when running take function#39566
amol- wants to merge 28 commits into
apache:mainfrom
amol-:chunked-take

Conversation

@amol-
Copy link
Copy Markdown
Member

@amol- amol- commented Jan 11, 2024

Rationale for this change

We can avoid extra unecessary work and memory consumption of concatenating chunks when running take, we can directly run the take on the chunks at the only cost of remapping the indices which are usually much fewer than the size of the array we are applying take on.

Are these changes tested?

Two tests already existed that verify take on ChunkedArray and they covered the corner cases well, the only tweak necessary to those tests was that now take returns a chunkedarray made of multiple chunks instead of a single one.

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #39565 has been automatically assigned in GitHub to PR creator.

Comment thread cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc Outdated
Comment thread cpp/src/arrow/compute/kernels/vector_selection_take_internal.cc Outdated
@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jan 11, 2024
Comment thread cpp/src/arrow/chunk_resolver.h Outdated
@pitrou
Copy link
Copy Markdown
Member

pitrou commented Feb 28, 2024

@felipecrv @amol- Should this PR be kept open now that #40206 was merged?

@amol-
Copy link
Copy Markdown
Member Author

amol- commented Feb 28, 2024

@felipecrv @amol- Should this PR be kept open now that #40206 was merged?

I think so, this PR is focused on optimizing TakeCA, while the one that was merged was focused on TakeCC

@felipecrv
Copy link
Copy Markdown
Contributor

Before my PR: TakeCC made num_chunks Concatenate(chunks) calls.
After my PR: TakeCC makes 1 Concatenate(chunks) call.

Next step (and goal of amol's PR/issue pair): 0 concatenations.

@felipecrv
Copy link
Copy Markdown
Contributor

I opened #41700 which can handle all the fixed-width types without concatenation.

@amol- amol- closed this Dec 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Do not concatenate ChunkedArray when running Take kernel

4 participants