Skip to content

concat_ws does unnecessary type casts #20434

@neilconway

Description

@neilconway

Describe the bug

  1. concat_ws returns Utf8, regardless of the input types it is called with. So if it is called with LargeUtf8, we might overflow. In general, functions like these should operate on all three string representations unless there is a compelling reason not to.
  2. simplify_concat_ws coerces literals to Utf8. Again, we should generally preserve the original string type.

To Reproduce

Note the

DataFusion CLI v52.1.0
>       CREATE TABLE test_views AS
      SELECT arrow_cast('hello', 'Utf8View') AS a, arrow_cast('world', 'Utf8View') AS b;
0 row(s) fetched.
Elapsed 0.027 seconds.

> EXPLAIN SELECT * FROM test_views WHERE concat(a, b) = a;
+---------------+-------------------------------+
| plan_type     | plan                          |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
|               | │         FilterExec        │ |
|               | │    --------------------   │ |
|               | │         predicate:        │ |
|               | │      concat(a, b) = a     │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      │ |
|               | │    --------------------   │ |
|               | │         bytes: 272        │ |
|               | │       format: memory      │ |
|               | │          rows: 1          │ |
|               | └───────────────────────────┘ |
|               |                               |
+---------------+-------------------------------+
1 row(s) fetched.
Elapsed 0.010 seconds.

> EXPLAIN SELECT * FROM test_views WHERE concat_ws(',', a, b) = a;
+---------------+-------------------------------+
| plan_type     | plan                          |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
|               | │         FilterExec        │ |
|               | │    --------------------   │ |
|               | │         predicate:        │ |
|               | │ CAST(concat_ws(,, a, b) AS│ |
|               | │        Utf8View) = a      │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      │ |
|               | │    --------------------   │ |
|               | │         bytes: 272        │ |
|               | │       format: memory      │ |
|               | │          rows: 1          │ |
|               | └───────────────────────────┘ |
|               |                               |
+---------------+-------------------------------+
1 row(s) fetched.
Elapsed 0.007 seconds.

> explain SELECT concat(a, concat(a, b)) FROM test_views;
+---------------+-------------------------------+
| plan_type     | plan                          |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
|               | │       ProjectionExec      │ |
|               | │    --------------------   │ |
|               | │ concat(test_views.a,concat│ |
|               | │  (test_views.a,test_views │ |
|               | │           .b)):           │ |
|               | │  concat(a, concat(a, b))  │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      │ |
|               | │    --------------------   │ |
|               | │         bytes: 272        │ |
|               | │       format: memory      │ |
|               | │          rows: 1          │ |
|               | └───────────────────────────┘ |
|               |                               |
+---------------+-------------------------------+
1 row(s) fetched.
Elapsed 0.005 seconds.

> explain SELECT concat(a, concat_ws(',', a, b)) FROM test_views;
+---------------+-------------------------------+
| plan_type     | plan                          |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
|               | │       ProjectionExec      │ |
|               | │    --------------------   │ |
|               | │    concat(test_views.a    │ |
|               | │     ,concat_ws(Utf8(",    │ |
|               | │      "),test_views.a      │ |
|               | │      ,test_views.b)):     │ |
|               | │ concat(a, CAST(concat_ws(,│ |
|               | │   , a, b) AS Utf8View))   │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      │ |
|               | │    --------------------   │ |
|               | │         bytes: 272        │ |
|               | │       format: memory      │ |
|               | │          rows: 1          │ |
|               | └───────────────────────────┘ |
|               |                               |
+---------------+-------------------------------+
1 row(s) fetched.
Elapsed 0.006 seconds.

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions