Add @functools.lru_cache decorator for get_binding_version()#512
Merged
Conversation
>>> 35.381859/0.004149
8527.804049168473
$ git stash
$ python test_slowness.py 100000
driver.cuDriverGetVersion() 12060
cuda_utils.get_binding_version() (12, 8)
driver.cuDriverGetVersion()
0.023946 seconds for 100000 iterations
0.24 µs per call
cuda_utils.get_binding_version()
35.381859 seconds for 100000 iterations
353.82 µs per call
$ git stash pop
$ python test_slowness.py 100000
driver.cuDriverGetVersion() 12060
cuda_utils.get_binding_version() (12, 8)
driver.cuDriverGetVersion()
0.022644 seconds for 100000 iterations
0.23 µs per call
cuda_utils.get_binding_version()
0.004149 seconds for 100000 iterations
0.04 µs per call
Contributor
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Contributor
Author
|
/ok to test |
Contributor
Author
This comment has been minimized.
This comment has been minimized.
leofang
approved these changes
Mar 12, 2025
Member
|
Thanks, Ralf! How did you notice the slowness? |
|
Contributor
Author
When I was working on this: Originally I had the Today I was hoping fixing that very obvious problem first would help with #439 as well. But no, that's something different, and not nearly as extreme (5x vs 8500x). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This one-line change results in a 8k+ fold speedup.
In retrospect, I should have just looked at the
get_bindings()implementation immediately.The way I actually found this (perf version 6.8.12):
I gave the top of the perf report and the
get_bindings()implementation to ChatGPT:That made it immediately obvious that
importlib.metadata.version("cuda-bindings")is the bottleneck, mainly because it involves regex calls, but also because it triggers filesystem calls.