Skip to content

Reset DecodingPress state when exiting context manager#192

Merged
maxjeblick merged 1 commit into
NVIDIA:mainfrom
cluster2600:fix/reset-decoding-press-state-between-questions
Mar 12, 2026
Merged

Reset DecodingPress state when exiting context manager#192
maxjeblick merged 1 commit into
NVIDIA:mainfrom
cluster2600:fix/reset-decoding-press-state-between-questions

Conversation

@cluster2600

Copy link
Copy Markdown
Contributor

Summary

  • DecodingPress.layer_step_counts and hidden_states_buffer were not being reset between successive uses of teh context manager. When the evaluation framework runs DecodingPress across multiple questions (e.g. aime25 with 30 questions), stale state from a previous question's decoding carried over to the next, causing inconsistent compression behaviour depending on input order.
  • Overrides __call__ on DecodingPress to call self.reset() when the context manager exits. This mirrors the pattern already recognised in PrefillDecodingPress.__call__, which resets its inner decoding_press in its finally block.

Test plan

  • Existing test_decoding_compression.py tests should continue to pass (they already reuse the same press object across calls)
  • Verify that test_decoding_press_equivalence still produces identical results for standalone vs combined press
  • Run an evaluation with DecodingPress on a multi-question benchmark (e.g. aime25) and confirm scores are now deterministic regardless of question order

Fixes #191

DecodingPress.layer_step_counts and hidden_states_buffer were not
being reset between uses when the press was used as a context manager.
This caused stale state to carry over between different questions
in the evaluation framework (e.g. aime25 with 30 questions),
leading to inconsistent compression behaviour.

Override __call__ to call self.reset() on exit, mirroring the
pattern already used by PrefillDecodingPress.

Fixes NVIDIA#191

Signed-off-by: Maxime Grenu <maxime.grenu@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Mar 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@maxjeblick

Copy link
Copy Markdown
Collaborator

HI @cluster2600 , thanks a lot for opening this PR!
We will look into the issue and figure out whether we want to fix the bug at the press level (in this case the PR looks probably fine) or if we choose another modification.

I think it would have been nice to sync with @Yeuvoir beforehand, as he opened the issue and also expressed interest in opening a PR.

@cluster2600

Copy link
Copy Markdown
Contributor Author

Hi @maxjeblick, thanks for the feedback — apologies for not coordinating with @Yeuvoir beforehand, that was an oversight on my part. @Yeuvoir, happy to defer to you if you'd prefer to submit your own PR, or if you'd like to collaborate on this one I'm open to that as well. Let me know what works best for you both and I'll adapt accordinly.

@maxjeblick maxjeblick left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THanks a lot LGTM!

@maxjeblick maxjeblick merged commit 04abb96 into NVIDIA:main Mar 12, 2026
2 checks passed
@cluster2600 cluster2600 deleted the fix/reset-decoding-press-state-between-questions branch March 12, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A logical issue in the test framework

2 participants