GH-133231: Changes to executor management to support proposed sys._jit module#133287
GH-133231: Changes to executor management to support proposed sys._jit module#133287markshannon merged 8 commits intopython:mainfrom
sys._jit module#133287Conversation
… Batch executors for deallocation to avoid having to constantly incref executors; this is an ad-hoc form of deferred reference counting.
|
The msvc jit failure might be fixed by #132746 |
Wow I did not expect that to fix it. |
brandtbucher
left a comment
There was a problem hiding this comment.
I think the overall approach is neat, but I think there are a couple of potential issues (and possible cleanups) here.
Long story short, anything that uses the entry_frame for this purpose is going to run into issues with the tail-calling interpreter (whether it's your approach, or mine). It's a hard problem.
| typedef struct { | ||
| _PyInterpreterFrame frame; | ||
| _PyStackRef stack[1]; | ||
| } _PyEntryFrame; |
There was a problem hiding this comment.
Is this needed? We only need one slot in localsplus, which _PyInterpreterFrame already has.
There was a problem hiding this comment.
We now need two slots. One on the stack for the return value, as before, and one local variable to hold the current executor.
There was a problem hiding this comment.
Ah, I see. Forgot about the return value.
Maybe add a comment that the one stack slot is for the return value, and the one local is for the executor.
|
When you're done making the requested changes, leave the comment: |
|
Idead: if we have a struct Something like this: diff --git a/Python/ceval_macros.h b/Python/ceval_macros.h
index e1d2673848c..c77c2cafee7 100644
--- a/Python/ceval_macros.h
+++ b/Python/ceval_macros.h
@@ -360,10 +360,13 @@ do { \
OPT_STAT_INC(traces_executed); \
_PyExecutorObject *_executor = (EXECUTOR); \
jit_func jitted = _executor->jit_code; \
+ node prev = tstate->current_executor; \
+ tstate->current_executor = (node){_executor, &prev}; \
/* Keep the shim frame alive via the executor: */ \
Py_INCREF(_executor); \
next_instr = jitted(frame, stack_pointer, tstate); \
Py_DECREF(_executor); \
+ tstate->current_executor = prev; \
Py_CLEAR(tstate->previous_executor); \
frame = tstate->current_frame; \
stack_pointer = _PyFrame_GetStackPointer(frame); \
@@ -379,6 +382,9 @@ do { \
OPT_STAT_INC(traces_executed); \
next_uop = (EXECUTOR)->trace; \
assert(next_uop->opcode == _START_EXECUTOR); \
+ /* prev is declared local to the tier two interpreter: */ \
+ prev = tstate->current_executor; \
+ tstate->current_executor = (node){_executor, &prev}; \
goto enter_tier_two; \
} while (0)
#endif
@@ -386,6 +392,7 @@ do { \
#define GOTO_TIER_ONE(TARGET) \
do \
{ \
+ tstate->current_executor = prev; \
next_instr = (TARGET); \
OPT_HIST(trace_uop_execution_counter, trace_run_length_hist); \
_PyFrame_SetStackPointer(frame, stack_pointer); \No entry frame needed. |
|
I don't see how that would work, as macros don't have scope. But its moot anyway as we can use the entry frame. The entry frame exists in the tail-calling interpreter, it is the |
|
I have made the requested changes; please review again |
|
Thanks for making the requested changes! @brandtbucher: please review the changes made to this pull request. |
I mean, it would just be a local declared in the But as you say...
Ah, this is what I missed. Makes sense: we're always able to set it when entering because we have |
brandtbucher
left a comment
There was a problem hiding this comment.
Looks good, thanks. I'm just going to kick off some JIT benchmarks to be safe.
| { \ | ||
| tstate->current_executor = NULL; \ | ||
| next_instr = (TARGET); \ | ||
| assert(tstate->current_executor == NULL); \ |
There was a problem hiding this comment.
Minor: this seems redundant, given that we're setting it one line above.
| #undef GOTO_TIER_ONE | ||
| #define GOTO_TIER_ONE(TARGET) \ | ||
| do { \ | ||
| tstate->current_executor = NULL; \ |
There was a problem hiding this comment.
You can either do this here, or in the second half of the _Py_JIT implementation of GOTO_TIER_TWO in ceval_macros.h, after the JIT code returns. No need to do it in both places.
| @@ -5316,7 +5322,6 @@ dummy_func( | |||
| } | |||
|
|
|||
| tier2 op(_START_EXECUTOR, (executor/4 --)) { | |||
There was a problem hiding this comment.
Side note, this instruction now exists solely to set the current_executor in the tier two interpreter. We should try to remove it as a follow-up (doesn't need to be in this PR).
There was a problem hiding this comment.
We can replace all uses of current_executor with an operand, which will remove the current_executor local variable from the tier 2 interpreter and the _JIT_EXECUTOR patch from the JIT template.
We can also shrink _CHECK_VALIDITY a bit by replacing current_executor->vm_data.valid with *validity_ptr:
tier2 op(_CHECK_VALIDITY, (validity_ptr/4 --)) {
DEOPT_IF(*validity_ptr == 0);
}For another PR.
* origin/main: (111 commits) pythongh-91048: Add filename and line number to external inspection routines (pythonGH-133385) pythongh-131178: Add tests for `ast` command-line interface (python#133329) Regenerate pcbuild.sln in Visual Studio 2022 (python#133394) pythongh-133042: disable HACL* HMAC on Emscripten (python#133064) pythongh-133351: Fix remote PDB's multi-line block tab completion (python#133387) pythongh-109700: Improve stress tests for interpreter creation (pythonGH-109946) pythongh-81793: Skip tests for os.link() to symlink on Android (pythonGH-133388) pythongh-126835: Rename `ast_opt.c` to `ast_preprocess.c` and related stuff after moving const folding to the peephole optimizier (python#131830) pythongh-91048: Relax test_async_global_awaited_by to fix flakyness (python#133368) pythongh-132457: make staticmethod and classmethod generic (python#132460) pythongh-132805: annotationlib: Fix handling of non-constant values in FORWARDREF (python#132812) pythongh-132426: Add get_annotate_from_class_namespace replacing get_annotate_function (python#132490) pythongh-81793: Always call linkat() from os.link(), if available (pythonGH-132517) pythongh-122559: Synchronize C and Python implementation of the io module about pickling (pythonGH-122628) pythongh-69605: Add PyREPL import autocomplete feature to 'What's New' (python#133358) bpo-44172: Keep reference to original window in curses subwindow objects (pythonGH-26226) pythonGH-133231: Changes to executor management to support proposed `sys._jit` module (pythonGH-133287) pythongh-133363: Fix Cmd completion for lines beginning with `! ` (python#133364) pythongh-132983: Introduce `_zstd` bindings module (pythonGH-133027) pythonGH-91048: Add utils for printing the call stack for asyncio tasks (python#133284) ...
…sys._jit` module (pythonGH-133287) * Track the current executor, not the previous one, on the thread-state. * Batch executors for deallocation to avoid having to constantly incref executors; this is an ad-hoc form of deferred reference counting.
This PR:
Should simplify the implementation of
sys._jit.is_enabled()as it is simplytstate->current_executor != NULLand doesn't require additional state to be tracked.sys._jit#133231