Extend dynamic filter to joins that preserve probe side ON#20447
Extend dynamic filter to joins that preserve probe side ON#20447adriangb merged 1 commit intoapache:mainfrom
Conversation
|
|
||
| fn allow_join_dynamic_filter_pushdown(&self, config: &ConfigOptions) -> bool { | ||
| if self.join_type != JoinType::Inner | ||
| if !matches!(self.join_type, JoinType::Inner | JoinType::Left | JoinType::LeftSemi) |
There was a problem hiding this comment.
Can JoinType::Right and JoinType::RightSemi also be applied based on the same rationale?
There was a problem hiding this comment.
I don't think so. The build side is always the left side and the probe side is always the right side. The filters are therefore always pushed to the right side. We need to retain all rows from the right side for a Right join, so there's nothing we can push down.
A right join query can still take advantage of this if the the optimizer decides to swap the sides to build the right side to the join, in that case it would become a left join here.
There was a problem hiding this comment.
Actually I think it might be safe to add RightSemi and LeftAnti to this list.
The dynamic filter only removes probe rows that can't match any build row, and neither RightSemi (which outputs only matched probe rows) nor LeftAnti (which outputs only unmatched build rows) includes unmatched probe rows in its output. So I think it would be safe to add those here as well. Right and RightAnti would not be.
7cf1bc6 to
4e40891
Compare
0dac298 to
1ea0252
Compare
|
Thanks @helgikrs, all tests have passed. I'll try to review it carefully tomorrow. |
nuno-faria
left a comment
There was a problem hiding this comment.
LGTM, I couldn't find an edge case where it fails. I think other join types can also be added, but this can be done in a future PR.
| if !matches!( | ||
| self.join_type, | ||
| JoinType::Inner | JoinType::Left | JoinType::LeftSemi | ||
| ) || !config.optimizer.enable_join_dynamic_filter_pushdown |
There was a problem hiding this comment.
I think it would also work with LeftAnti, but it doesn't need to be included in this PR.
Also, it would be interesting to explore RightSemi as well, since some queries will opt to use it if the statistics are not accurate. For example:
-- t1 >> t2
-- uses LeftSemi: dynamic filter added
select *
from t1
where t1.k in (select k from t2)
and t1.v2 in (123);
-- uses RightSemi: no dynamic filter
select *
from t1
where t1.k in (select k from t2)
and t1.v2 in (123, 456);Once again, both cases can be considered in a future PR.
|
@adriangb please take a look as well if you have the chance. |
adriangb
left a comment
There was a problem hiding this comment.
Looks good to me!
I wonder - should we use on_lr_is_preserved to determine what is safe or not?
d411d39 to
dde27fc
Compare
@helgikrs I don't think it's required for this PR, we can merge it as is but curious to get your thoughts about |
I considered that, and I think that would definitely make sense, it captures the safety semantics we'd want. The practical consideration is just that it's currently is |
That seems like a nice cleanup to me. |
dde27fc to
e9c406c
Compare
The dynamic filter from HashJoinExec was previously gated to Inner joins only. PR apache#20192 refactored the join filter pushdown infrastructure, which makes extending self-generated filters to join types that preserve probe side on as defined by by `on_lr_is_preserved` function. This PR promotes that function to the `JoinType` and uses it to determine what self-generated dynamic join filter to push down. This enables dynamic filter for hash joins to Left, LeftSemi, RightSemi, LeftAnti and LeftMark.
e9c406c to
c680f0d
Compare
|
@adriangb moved |
|
Thank you @helgikrs! @nuno-faria how do you feel about testing? On the one hand we are now using a well tested logic for when it is safe to push down join clause filters. On the other hand I am hesitant to merge this without tests. I personally feel it would be nice to have join fuzz tests that verify that with / without dynamic filters results are the same. But that would be a big change that delays this PR, I don't want to block it. One thing we could do is wait until the 53 release branch is cut (next week or so I think) and merge this after, so we at least have a month to fix any bugs. But if you are confident this is correct then I'm also happy to merge it as is. Wdyt? |
100% agreed.
Yeah it might be better to wait just to be safe. I am curious about what we would be missing out on DF53. @alamb is it possible to run the tpc-ds queries against this PR? |
|
run benchmark tpcds |
|
run benchmark tpcds |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
The v53 branch has been cut, so I think we can go ahead and merge this and target v54. @helgikrs would you be able to help us build some join fuzz tests in the meantime? |
The dynamic filter from HashJoinExec was previously gated to Inner joins only.
PR #20192 refactored the join filter pushdown infrastructure, which makes extending self-generated filters to join types that preserve probe side on as defined by by
on_lr_is_preservedfunction trivial.This PR promotes that function to the
JoinTypeand uses it to determine what self-generated dynamic join filter to push down.This enables dynamic filter for hash joins to Left, LeftSemi, RightSemi, LeftAnti and LeftMark.
Which issue does this PR close?
This PR makes progress on #16973
Rationale for this change
The self-generated dynamic filter in HashJoinExec filters the probe side using build-side values. For Left and LeftSemi joins, the right-hand probe side has the same filtering semantics as Inner. Relaxing the gate using
on_lr_is_preserved.Are these changes tested?
Yes.
Are there any user-facing changes?
No.