hsakmt: Fix shmem leak in reserved_aperture_release#7621
Open
williampalacek wants to merge 4 commits into
Open
hsakmt: Fix shmem leak in reserved_aperture_release#7621williampalacek wants to merge 4 commits into
williampalacek wants to merge 4 commits into
Conversation
Add munmap() before mmap() to release kernel shmem refcount for MAP_SHARED allocations. Without this, reserved_aperture_release() leaves shmem pages pinned, causing OOM during repeated allocations. Fixes: ROCM-23563 Signed-off-by: William Palacek <William.Palacek@amd.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a shmem leak in reserved_aperture_release() within libhsakmt by explicitly munmap()-ing CPU mappings before re-reserving the VA range with a PROT_NONE anonymous mmap(MAP_FIXED). This ensures MAP_SHARED-backed shmem refcounts are decremented, preventing unbounded shmem growth and OOM in stress tests.
Changes:
- Add
munmap(address, size)before thePROT_NONEremap to correctly drop shmem references for MAP_SHARED allocations. - Simplify the ENOMEM retry logic in the remap path.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The retry logic was dead weight - retrying the identical mmap() call immediately after failure won't change the outcome. Simplified to a single warning log if the VA range can't be re-reserved. Signed-off-by: William Palacek <William.Palacek@amd.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add munmap() before mmap() to release kernel shmem refcount for MAP_SHARED allocations. Without this,
reserved_aperture_release() leaves shmem pages pinned, causing OOM during repeated allocations.
Fixes: ROCM-23563
Motivation
KFDMemoryTest.BigSysBufferStressTest fails with OOM kills on MI210. Shmem grows unbounded across allocation iterations
because MAP_SHARED memory is not properly released.
Technical Details
reserved_aperture_release()inprojects/rocr-runtime/libhsakmt/src/fmm.cremaps freed memory as PROT_NONE usingmmap(..., MAP_FIXED). For MAP_SHARED allocations, this does not decrement the kernel shmem refcount - onlymunmap()does.Added
munmap()before themmap()call. Also simplified the ENOMEM retry path since memory is already unmapped.JIRA ID
Resolves ROCM-23563
Test Plan
Run
kfdtest --gtest_filter='*BigSysBufferStressTest*'on MI210 and monitor shmem via/proc/meminfo.Test Result
Submission Checklist