Merge hash table implementations and remove leftover utilities#7366
Merge hash table implementations and remove leftover utilities#7366ozankabak merged 12 commits intoapache:mainfrom synnada-ai:upstream/prunable-hash-join
Conversation
I think if this is the last usage, we should remove it from Cargo.toml as well |
|
I wonder whether you're seeing performance improvements in symmetric / streaming hash join with this PR? |
|
I created a small benchmark for streaming using tpch data. First query is SELECT
o_orderkey
FROM
orders,
lineitem
WHERE
o_orderdate = l_shipdate
AND l_orderkey >= o_orderkey - 10
AND l_orderkey < o_orderkey + 10
AND l_returnflag = 'R'and the second one is SELECT
o_orderkey
FROM
orders,
lineitem
WHERE
o_orderstatus = l_linestatus
AND l_orderkey >= o_orderkey - 10
AND l_orderkey < o_orderkey + 10
AND l_returnflag = 'R'
LIMIT 10000;The second query involves key pairs with low cardinality. While |
Co-authored-by: Daniël Heres <danielheres@gmail.com>
ozankabak
left a comment
There was a problem hiding this comment.
This is good to go from my perspective, what do you think @Dandandan?
|
I will go ahead and merge this PR after CI passes. We will file a follow-on PR in case any other suggestions come in post merge |
|
Thanks @metesynnada @ozankabak 🙏 |
Looks great, thank you |
Which issue does this PR close?
Continue on #6679.
Rationale for this change
The current implementation of the
JoinHashMapandSymmetricJoinHashMaptypes could benefit from being more generic and flexible. Specifically, the ability to support different types of list data structures for chaining, as well as handling resizing in a more idiomatic and efficient manner, would be advantageous. This PR introduces theJoinHashMapTypetrait and implements it for bothJoinHashMapandPruningJoinHashMap, which allows for more code reuse and a clearer separation of concerns.In this PR, Several unused hash join utilities are removed. Also, we can introduce a vectorized implementation of
SymmetricHashJointhat includes hash collision checks.What changes are included in this PR?
JoinHashMapTypetrait with methods for handling the mutable map and mutable list, as well as a methodas_any_mutfor dynamic downcasting.JoinHashMapTypetrait for bothJoinHashMapandPruningJoinHashMap.update_hashfunction to use theJoinHashMapTypetrait and only resize the list in the case ofPruningJoinHashMap.build_equal_condition_join_indicesfunction to use the JoinHashMapType trait and introduced an offset parameter.I have removed
smallveccompletely from the code, but I am unsure whether or not to remove it from Cargo.toml.Are these changes tested?
Yes, the changes are covered by the existing tests. No new tests were required as the new implementation preserves the existing functionality. All tests passed successfully after the changes were applied.
Are there any user-facing changes?
No, the changes made in this PR are internal and do not affect the public API or the functionality of the crate.
cc @Dandandan