A logical issue in the test framework: when testing decoding_press, self.layer_step_counts, which is used to determine whether compression should be triggered, is not reset when switching between different test questions(in the same test like aime25, which has 30 questions). I think this could cause inconsistent results depending on the order of the input cases. If you agree, I’d be happy to submit a PR, even though it would just be a simple fix.
A logical issue in the test framework: when testing
decoding_press,self.layer_step_counts, which is used to determine whether compression should be triggered, is not reset when switching between different test questions(in the same test like aime25, which has 30 questions). I think this could cause inconsistent results depending on the order of the input cases. If you agree, I’d be happy to submit a PR, even though it would just be a simple fix.