Skip to content

Enable filter pushdown when using In_list on parquet#2282

Merged
alamb merged 3 commits intoapache:masterfrom
Ted-Jiang:fix_pushdown
Apr 22, 2022
Merged

Enable filter pushdown when using In_list on parquet#2282
alamb merged 3 commits intoapache:masterfrom
Ted-Jiang:fix_pushdown

Conversation

@Ted-Jiang
Copy link
Member

Which issue does this PR close?

Closes #2281 .

Rationale for this change

Before:

explain select count(*) from test where o_orderkey in(2785313, 2, 3);
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                               |
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: #COUNT(UInt8(1))                                                                                                                                                                                       |
|               |   Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]]                                                                                                                                                                |
|               |     Filter: #test.o_orderkey IN ([Int64(2785313), Int64(2), Int64(3)])                                                                                                                                             |
|               |       TableScan: test projection=Some([0]), partial_filters=[#test.o_orderkey IN ([Int64(2785313), Int64(2), Int64(3)])]                                                                                           |
| physical_plan | ProjectionExec: expr=[COUNT(UInt8(1))@0 as COUNT(UInt8(1))]                                                                                                                                                        |
|               |   HashAggregateExec: mode=Final, gby=[], aggr=[COUNT(UInt8(1))]                                                                                                                                                    |
|               |     CoalescePartitionsExec                                                                                                                                                                                         |
|               |       HashAggregateExec: mode=Partial, gby=[], aggr=[COUNT(UInt8(1))]                                                                                                                                              |
|               |         CoalesceBatchesExec: target_batch_size=4096                                                                                                                                                                |
|               |           FilterExec: o_orderkey@0 IN ([Literal { value: Int64(2785313) }, Literal { value: Int64(2) }, Literal { value: Int64(3) }])                                                                              |
|               |             RepartitionExec: partitioning=RoundRobinBatch(16)                                                                                                                                                      |
|               |               ParquetExec: limit=None, partitions=[/Users/yangjiang/test-data/tpch-1g-oneFile/orders/part-00000-0e58b960-37a0-44c9-8561-53f0c32cf038-c000.snappy.parquet], predicate=true, projection=[o_orderkey] |
|               |                                                                                                                                                                                                                    |
+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.007 seconds.

Now

explain select count(*) from test where o_orderkey in(2785313, 2, 3);
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type     | plan                                                                                                                                                                                                                                                                                                                                                                            |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logical_plan  | Projection: #COUNT(UInt8(1))                                                                                                                                                                                                                                                                                                                                                    |
|               |   Aggregate: groupBy=[[]], aggr=[[COUNT(UInt8(1))]]                                                                                                                                                                                                                                                                                                                             |
|               |     Filter: #test.o_orderkey IN ([Int64(2785313), Int64(2), Int64(3)])                                                                                                                                                                                                                                                                                                          |
|               |       TableScan: test projection=Some([0]), partial_filters=[#test.o_orderkey IN ([Int64(2785313), Int64(2), Int64(3)])]                                                                                                                                                                                                                                                        |
| physical_plan | ProjectionExec: expr=[COUNT(UInt8(1))@0 as COUNT(UInt8(1))]                                                                                                                                                                                                                                                                                                                     |
|               |   HashAggregateExec: mode=Final, gby=[], aggr=[COUNT(UInt8(1))]                                                                                                                                                                                                                                                                                                                 |
|               |     CoalescePartitionsExec                                                                                                                                                                                                                                                                                                                                                      |
|               |       HashAggregateExec: mode=Partial, gby=[], aggr=[COUNT(UInt8(1))]                                                                                                                                                                                                                                                                                                           |
|               |         CoalesceBatchesExec: target_batch_size=4096                                                                                                                                                                                                                                                                                                                             |
|               |           FilterExec: o_orderkey@0 IN ([Literal { value: Int64(2785313) }, Literal { value: Int64(2) }, Literal { value: Int64(3) }])                                                                                                                                                                                                                                           |
|               |             RepartitionExec: partitioning=RoundRobinBatch(16)                                                                                                                                                                                                                                                                                                                   |
|               |               ParquetExec: limit=None, partitions=[/Users/yangjiang/test-data/tpch-1g-oneFile/orders/part-00000-0e58b960-37a0-44c9-8561-53f0c32cf038-c000.snappy.parquet], predicate=o_orderkey_min@0 <= 2785313 AND 2785313 <= o_orderkey_max@1 OR o_orderkey_min@0 <= 2 AND 2 <= o_orderkey_max@1 OR o_orderkey_min@0 <= 3 AND 3 <= o_orderkey_max@1, projection=[o_orderkey] |
|               |                                                                                                                                                                                                                                                                                                                                                                                 |
+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.012 seconds.

when optimize in pruning.rs, try change in_list to the combination of or for reusing code.

What changes are included in this PR?

Are there any user-facing changes?

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support filter pushdown when using In_list on parquet

3 participants