GH-118036: Fix a bug with CALL_STAT_INC#117933
Conversation
We were under-counting calls in `_PyEvalFramePushAndInit` because the `CALL_STAT_INC` macro was redefined to a no-op for the Tier 2 interpreter. The fix is a little convoluted. This ought to result in ~37% more "Frames pushed" reported under "Call stats". The new count is the correct one (I presume).
There was a problem hiding this comment.
I understand how this is broken and why this change fixes it, so I'm marking as approved, but I agree it's "weird" / more convoluted than it needs to be. Moving the order of functions in ceval.c would result in less weirdness (probably just moving _PyEval_EvalFrameDefault to the bottom since that's where all the macro updates happen) but that would create a lot of churn.
Alternatively, what if you rename REAL_CALL_STAT_INC to CALL_STAT_INC_ALWAYS and then just use CALL_STAT_INC_ALWAYS directly from _PyEvalFramePushAndInit (which I think is the only call site impacted by this change). That would get rid of the weird dance of #undef and restoring just CALL_STAT_INC (but admittedly that would just replace it with another subtlety in _PyEvalFramePushAndInit).
Let's try a different fix instead.
This should fix the issue with CALL_STAT_INC in a cleaner way (even if the diff is much larger).
|
Here's another version, where I moved the interpreter loop (and a small assortment of related stuff) to the end of the file. I ran Pystats on a single loop of the Richards benchmark, and found that most stats are approximately the same (not completely) except that "Frames pushed" is about 10% larger, which indicates that the fix works. (The diff is much more annoying to review, but I promise I just moved the stuff from the original lines 606 - 1120 to the bottom of the file, except I had to move |
|
Benchmark says speed and memory unchanged. |
|
I think the correct fix for In executor_cases.c.h |
Okay, I'll try that next. |
I'm going to try yet another approach.
|
The proof will be in the pudding. I'll fire off two benchmark runs, with pystats, one plain, one with Tier 2. (The JIT pystats ought to be similar but I don't want to wait.) |
|
Benchmark using Tier 1 only shows 36.8% more frames pushed, which is very close to what I measured with the first version of this PR, so I think that suggests this fixes that issue. Everything else I looked at is basically unchanged, suggesting I'm not breaking anything. Still waiting for the Tier 2 benchmark, will update when I see those numbers. |
|
Benchmark with Tier 2 poses a bit of a mystery, at least the pystats diff. Go to Call stats and open the details box.
@markshannon, @mdboom, @brandtbucher -- could there be some kind of double counting going on? Or is this an expected result? The only change from main is now that we don't undefine |
|
I think the change is correct, even though the stats are probably still wrong. |
We were under-counting calls in
_PyEvalFramePushAndInitbecause theCALL_STAT_INCmacro was redefined to a no-op for the Tier 2 interpreter. The fix is a little convoluted (I had wanted to move the code around, but that would require moving something else around, and in the end I figured it was easier to tweak the macros @markshannon might disagree though?). This ought to result in ~37% more "Frames pushed" reported under "Call stats". The new count is the correct one (I presume).@mdboom can you review? This is one commit from my experiment about removing Tier 2 entirely (gh-117908).
To see the effect, look at these pystat diffs.