Reset DecodingPress state when exiting context manager#192
Merged
maxjeblick merged 1 commit intoMar 12, 2026
Merged
Conversation
DecodingPress.layer_step_counts and hidden_states_buffer were not being reset between uses when the press was used as a context manager. This caused stale state to carry over between different questions in the evaluation framework (e.g. aime25 with 30 questions), leading to inconsistent compression behaviour. Override __call__ to call self.reset() on exit, mirroring the pattern already used by PrefillDecodingPress. Fixes NVIDIA#191 Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
Collaborator
|
HI @cluster2600 , thanks a lot for opening this PR! I think it would have been nice to sync with @Yeuvoir beforehand, as he opened the issue and also expressed interest in opening a PR. |
Contributor
Author
|
Hi @maxjeblick, thanks for the feedback — apologies for not coordinating with @Yeuvoir beforehand, that was an oversight on my part. @Yeuvoir, happy to defer to you if you'd prefer to submit your own PR, or if you'd like to collaborate on this one I'm open to that as well. Let me know what works best for you both and I'll adapt accordinly. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DecodingPress.layer_step_countsandhidden_states_bufferwere not being reset between successive uses of teh context manager. When the evaluation framework runsDecodingPressacross multiple questions (e.g. aime25 with 30 questions), stale state from a previous question's decoding carried over to the next, causing inconsistent compression behaviour depending on input order.__call__onDecodingPressto callself.reset()when the context manager exits. This mirrors the pattern already recognised inPrefillDecodingPress.__call__, which resets its innerdecoding_pressin itsfinallyblock.Test plan
test_decoding_compression.pytests should continue to pass (they already reuse the same press object across calls)test_decoding_press_equivalencestill produces identical results for standalone vs combined pressDecodingPresson a multi-question benchmark (e.g. aime25) and confirm scores are now deterministic regardless of question orderFixes #191