Skip to content

feat(spark): add arrays_overlap with Spark three-valued null semantics#20781

Draft
n0r0shi wants to merge 1 commit into
apache:mainfrom
n0r0shi:spark-arrays-overlap
Draft

feat(spark): add arrays_overlap with Spark three-valued null semantics#20781
n0r0shi wants to merge 1 commit into
apache:mainfrom
n0r0shi:spark-arrays-overlap

Conversation

@n0r0shi
Copy link
Copy Markdown

@n0r0shi n0r0shi commented Mar 7, 2026

Which issue does this PR close?

Closes #15914 (partial — adds one more Spark-compatible function)

Related: apache/datafusion-comet#3645

Rationale

Spark's arrays_overlap uses three-valued null logic, which differs from DataFusion's built-in array_has_any:

Input Spark arrays_overlap DataFusion array_has_any
[1, 2], [2, 3] true true
[1, 2], [3, 4] false false
[1, NULL], [3] null false
[1, 2], [3, NULL] null false
[1, NULL], [1, 3] true true

In Spark, when there's no definite overlap but either array contains a null element, the result is null.

What changes are included in this PR?

Adds SparkArraysOverlap to the datafusion-spark crate, following the same pattern as SparkArrayContains: delegate to DataFusion's array_has_any, then patch rows where the result is false and either input array contains null elements to null.

Are these changes tested?

Unit tests covering:

  • Definite overlap → true
  • No overlap, no nulls → false
  • No overlap, null in left → null
  • No overlap, null in right → null
  • Overlap with nulls present → true (definite match trumps null)
  • Null list → null
  • Multi-row mixed cases

DataFusion's built-in `array_has_any` returns `false` when arrays have
no definite overlap but contain null elements. Spark's `arrays_overlap`
returns `null` in this case (three-valued logic: overlap is unknown
because nulls could match any value).

This wraps `array_has_any` and patches results where `false` + either
array has nulls → `null`, following the same pattern as SparkArrayContains.
@github-actions github-actions Bot added the spark label Mar 7, 2026
@comphead
Copy link
Copy Markdown
Contributor

comphead commented Mar 7, 2026

it might be similar to #20611

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions Bot added the Stale PR has not had any activity for some time label May 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark Stale PR has not had any activity for some time

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[EPIC] Complete datafusion-spark Spark Compatible Functions

2 participants