Replace dict with thin wrapper around hashtable#3366
Replace dict with thin wrapper around hashtable#3366zuiderkwast merged 8 commits intovalkey-io:unstablefrom
Conversation
Stop overriding libvalkey's dict with valkey's. Remove the DICT_INCLUDE_DIR mechanism from libvalkey's build system since it is no longer needed. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Data hashtables (keys, sets, zsets, hashes) now use a configurable seed separate from the global hashtable seed. This allows the hash-seed config to control SCAN iteration order without affecting internal hashtables (commands, ACL, modules, etc.) that are populated before config loading. The configurable seed defaults to the random seed and is overridden after config loading if hash-seed is set. Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #3366 +/- ##
============================================
- Coverage 76.82% 76.52% -0.30%
============================================
Files 159 157 -2
Lines 79704 79025 -679
============================================
- Hits 61232 60474 -758
- Misses 18472 18551 +79
🚀 New features to boost your workflow:
|
Replace the dict.c implementation with a header-only wrapper (dict.h) around the hashtable API. The dict types, iterators and API functions are now typedefs, macros and inline functions that delegate to hashtable. This unifies the hashtable implementations in the project and removes duplicated logic. Changes to dict: - Remove dict.c; dict.h is now the entire implementation - dict, dictType and dictIterator are direct aliases for the hashtable counterparts. - dictEntry is a struct allocated by dict wrapper functions to hold key and value. It doesn't have a next pointer anymore. - Fix key duplication for dictTypes that had keyDup callback by calling sdsdup() at call sites in functions.c - Remove unused functions, macros, includes and casts - Move some dict defrag logic to defrag.c - Remove obsolete dict unit tests (covered by test_hashtable.cpp) Changes to hashtable: - Change hashtable keyCompare convention to match dict: non-zero means keys are equal, so existing dict compare functions can be reused - Add const to hashtableMemUsage parameter Changes to server implementation: - Deduplicate common dict/hashtable callbacks in server.c Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
rainsupreme
left a comment
There was a problem hiding this comment.
The code looks good to me! I'd like to see the benchmark results, otherwise looks good to go! 😁
There was a problem hiding this comment.
Awesome work! My only concern was always allocating new entry before checking if key exists in dictReplace() and freeing new entry if key already exists. But this function is not that frequently used so impact is minimal.
BTW for module dict API should we mention somewhere that now its another implementation?
Exactly, that was my conclusion too, so it's fine.
It's a rax! Crazy... but it's not affected by this PR. |
sarthakaggarwal97
left a comment
There was a problem hiding this comment.
Minor comments! Looks pretty goood!
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
|
Benchmark ran on this commit: Benchmark Comparison: unstable vs 8ca73b3 (averaged) - rps metricsRun Summary:
Statistical Notes:
Note: Values with (n=X, σ=Y, CV=Z%, CI99%=±W%, PI99%=±V%) indicate averages from X runs with standard deviation Y, coefficient of variation Z%, 99% confidence interval margin of error ±W% of the mean, and 99% prediction interval margin of error ±V% of the mean. CI bounds [A, B] and PI bounds [C, D] show the actual interval ranges. Configuration:
Configuration:
|
enjoy-binbin
left a comment
There was a problem hiding this comment.
LGTM, thanks! I am guessing #3360 will have some conflicts.
madolson
left a comment
There was a problem hiding this comment.
Looks OK to me, just some minor nits.
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Replace the dict.c implementation with a header-only wrapper (dict.h) around the hashtable API. The dict types, iterators and API functions are now typedefs, macros and inline functions that delegate to hashtable. This unifies the hashtable implementations in the project and removes duplicated logic. Changes to dict: - Remove dict.c; dict.h is now the entire implementation - dict, dictType and dictIterator are direct aliases for the hashtable counterparts. - dictEntry is a struct allocated by dict wrapper functions to hold key and value. It doesn't have a next pointer anymore. - Fix key duplication for dictTypes that had keyDup callback by calling sdsdup() at call sites in functions.c - Remove unused functions, macros, includes and casts - Move some dict defrag logic to defrag.c - Remove obsolete dict unit tests (covered by test_hashtable.cpp) Changes to hashtable: - Change hashtable keyCompare convention to match dict: non-zero means keys are equal, so existing dict compare functions can be reused - Add const to hashtableMemUsage parameter Changes to server implementation: - Deduplicate common dict/hashtable callbacks in server.c - Change configured hash-seed to only apply to data hashtables. In particular, it must not modify the hash seed for dicts already initialized during startup for reading configs and similar. Changes to libvalkey: - Let libvalkey use its own dict implementation. --------- Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
|
I didn't notice this PR before it was merged. But there is a concern about using inline functions in the The problem is that an inline function in the .h file MUST be inlined by the compiler. It will be instantiated separately in each compilation unit. This code bloat can lead to an increase in L1 cache usage, and an overall decrease in performance vs. functions which exist in a single compilation unit (in a .c file). Now that we have LTO, there is limited benefit to using inline functions in the .h file.
|
| /* Compare function, returns 0 if the keys are equal. Defaults to just | ||
| * comparing the pointers for equality. */ |
There was a problem hiding this comment.
This comment is now incorrect, right? It looks like the compare function is supposed to return 1 if the keys are equal.
| uint64_t dictCStrHash(const void *key); | ||
| uint64_t dictCStrCaseHash(const void *key); |
There was a problem hiding this comment.
I'd like to see all of the standard dict/hash helper functions moved to a separate compilation unit. Maybe hashtable_helpers.h(c)? We have a lot of fairly standard helpers (destructors, comparators, hashers) that could be extracted from server.c/h (or wherever they might be located).
Note: Anything weird or unique should remain local to the file that declares it. I'm only suggesting this for fairly basic reusable utility functions.
There was a problem hiding this comment.
Sure, go ahead!
I avoided moving too much code to keep the diff reasonably small.
| .entryGetKey = watchedKeyGetKey, | ||
| .hashFunction = dictEncObjHash, | ||
| .keyCompare = hashtableEncObjKeyCompare, | ||
| .keyCompare = dictEncObjKeyCompare, |
There was a problem hiding this comment.
It's a bit confusing now that we have hashtable callbacks being provided as dict callback functions.
If we can relocate all of these utility helpers, it might be worth having equivalent function with both names just to eliminate the confusion. Maybe something like:
int hashtableCStrKeyCompare(const void *key1, const void *key2) {
return strcmp(key1, key2) == 0;
}
int dictCStrKeyCompare(const void *key1, const void *key2) {
return hashtableCStrKeyCompare(key1, key2);
}There was a problem hiding this comment.
Now, dicts are hashtables and dictType is a synonym of hashtableType, so I think it's fine that there's only one copy of these. We could unify on one set of names though.
Another approach is that we continue moving towards hashtable terminology. We can replace all dictType occurrences with hashtableType and unifying on the hashtable-prefixed functions, since it's the actual implementation. The only thing that is really dict specific is dictEntry.
JimB123
left a comment
There was a problem hiding this comment.
Some after-the-fact comments.
Yes, the main reason for deleting the .c file is to avoid symbol collisions with libvalkey's copy of dict.c, which is the old dict implementation. If it wasn't for this reason, I would have kept these functions in dict.c.
Yes, but this is only with Still, I think we should create a macro like |
This isn't a question of dead code. Dead code never gets loaded into L1 cache. Consider 3 files which include
In The problem occurs when execution cycles back and forth between If, instead, the code was contained in the |
|
OK, now I get what you mean. I did think about this when deciding to go this way. The largest function is dictReplace, and I checked where it's called and concluded that it's very rarly used: only when updating cluster configs and by valkey-cli and valkey-benchmark for handling cluster nodes and slots. Second largest is dictAddRaw. It's only used for clients blocked on keys. Only in blocked.c. Additionally it's used by dictAddOrFind which is used for replica keys with expire and cluster nodes black list. Both pretty rare operations. Third largest is dictAdd. But it's really not large. These functions are larger than the others just because they provide a different API than the hashtable. We've already converted the most critical ones to hashtable: the keyspace, hashes, sets, sorted sets, command lookup table, etc. We could phase out dictReplace and do these things differently at the call site and then delete dictReplace. We could put these few nontrivial functions in a C file, but with a name that doesn't clash with the names used in libvalkey's dict, i.e. we can rename them with macros. But I'm not really convinced that it's necessary. |
|
Right. So back to my original statement:
In the old days, this technique ( We need to be cautious because this change may represent tech debt. You surveyed the current usage and concluded that there's not currently an issue. This may not be true in the future. BUT MY BIGGER CONCERN is that I want to ensure that other people don't copy this technique to other places. (Where it may represent a larger concern.) We need to question this technique if/when we see it. |
Replace the dict.c implementation with a header-only wrapper (dict.h) around the hashtable API. The dict types, iterators and API functions are now typedefs, macros and inline functions that delegate to hashtable. This unifies the hashtable implementations in the project and removes duplicated logic.
Changes to dict:
counterparts.
and value. It doesn't have a next pointer anymore.
calling sdsdup() at call sites in functions.c
Changes to hashtable:
keys are equal, so existing dict compare functions can be reused
Changes to server implementation:
particular, it must not modify the hash seed for dicts already
initialized during startup for reading configs and similar.
Changes to libvalkey: