gh-132519: fix excessive mem usage in QSBR with large blocks#132520
Closed
tom-pytel wants to merge 2 commits intopython:mainfrom
Closed
gh-132519: fix excessive mem usage in QSBR with large blocks#132520tom-pytel wants to merge 2 commits intopython:mainfrom
tom-pytel wants to merge 2 commits intopython:mainfrom
Conversation
Contributor
Author
|
Ping @colesbury, @kumaraditya303. Is there a better place for the |
colesbury
reviewed
Apr 14, 2025
Contributor
colesbury
left a comment
There was a problem hiding this comment.
I don't think we should do this. You risk accidentally introducing quadratic behavior.
We will likely tweak the heuristics in the future for when _PyMem_ProcessDelayed() is called, but that should be based on data for real applications.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Memory usage numbers (proposed fix explained below):
Test script:
Delayed memory free checks (and subsequent frees if applicable) currently only occur in one of two situations:
_PyMem_FreeDelayed()when the number of pending delayed free memory blocks reaches exactly 254. And then it waits another 254 frees even if could not free any pending blocks this time, which is a lot for big buffers.This works great for many small objects, but with larger buffers these can accumulate quickly, so more frequent checks should be done.
I tried a few things but
_PyMem_ProcessDelayed()added to_Py_HandlePending()seems to work well and be safe and aQSBR_QUIESCENT_STATEhas just been reported so there is a fresh chance to actually free. Seems to happen often enough that memory usage is kept down, and if nothing to free then_PyMem_ProcessDelayed()is super-cheap.Another option would be to track the amount of pending memory to be freed and increase the frequency of free attempts if that number gets too large, but to start with this small change seems to solved the problem well enough. Could also schedule GC if pending frees get too high, but that seems like a roundabout way to arrive at
_PyMem_ProcessDelayedNoDealloc().Performance as checked by
pyperformancefull suite is unchanged with the fix (literally 0.17% better avg, so noise).