gh-105699: Use a _Py_hashtable_t for the PyModuleDef Cache#106974
gh-105699: Use a _Py_hashtable_t for the PyModuleDef Cache#106974ericsnowcurrently merged 14 commits intopython:mainfrom
Conversation
|
@Yhg1s, any objections to this change in 3.12 (pending a review here)? I suspect the current use of dict will eventually cause a crash somewhere and more-so as folks use subinterpreters more. I'm asking here because I'd prefer 3.12 to match 3.13. |
Yhg1s
left a comment
There was a problem hiding this comment.
Given that this fixes a fundamental issue, I'm okay with backporting it to 3.12rc1, if it can get into a decent shape in time.
Python/import.c
Outdated
|
|
||
| PyObject *extensions = EXTENSIONS.dict; | ||
| if (extensions == NULL) { | ||
| key = hashtable_key_from_2_strings(filename, name, ':'); |
There was a problem hiding this comment.
: is valid in filenames (on most filesystems). Why not /, which isn't? Also, since it's used in two places and not easy to miss/misinterpret, it should probably be a #define/static constant.
There was a problem hiding this comment.
An module name (identifier) will never have ":" or "/" in it. I figured the separator didn't matter as long as it wasn't a valid character in identifiers. I thought "/" might be a little confusing when debugging since it will likely be in the filename, so I picked the next character that made sense to me in this context, ":".
If you think there's an advantage to "/" (or any other character), or a disadvantage to ":" or you feel strongly about it, I'd be glad to change the separator. It isn't critical to me.
Python/import.c
Outdated
| goto finally; | ||
| } | ||
| res = 0; | ||
| /* This decref would be problematic if the module def were |
There was a problem hiding this comment.
That brings up a good point: shouldn't the PyModuleDefs placed in the shared cache be made immortal? Otherwise how can they sensibly be shared at all?
There was a problem hiding this comment.
That makes sense. We don't use module defs like the usual objects and generally assume they are statically defined, but it would be good to play it safe.
Co-authored-by: T. Wouters <thomas@python.org>
|
Thanks @ericsnowcurrently for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12. |
|
Sorry, @ericsnowcurrently, I could not cleanly backport this to |
…hongh-106974) This fixes a crasher due to a race condition, triggered infrequently when two isolated (own GIL) subinterpreters simultaneously initialize their sys or builtins modules. The crash happened due the combination of the "detached" thread state we were using and the "last holder" logic we use for the GIL. It turns out it's tricky to use the same thread state for different threads. Who could have guessed? We solve the problem by eliminating the one object we were still sharing between interpreters. We replace it with a low-level hashtable, using the "raw" allocator to avoid tying it to the main interpreter. We also remove the accommodations for "detached" thread states, which were a dubious idea to start with. (cherry picked from commit 8ba4df9)
|
GH-107412 is a backport of this pull request to the 3.12 branch. |
|
Thanks for the reviews, @Yhg1s! |
…-106974) (gh-107412) gh-105699: Use a _Py_hashtable_t for the PyModuleDef Cache (gh-106974) This fixes a crasher due to a race condition, triggered infrequently when two isolated (own GIL) subinterpreters simultaneously initialize their sys or builtins modules. The crash happened due the combination of the "detached" thread state we were using and the "last holder" logic we use for the GIL. It turns out it's tricky to use the same thread state for different threads. Who could have guessed? We solve the problem by eliminating the one object we were still sharing between interpreters. We replace it with a low-level hashtable, using the "raw" allocator to avoid tying it to the main interpreter. We also remove the accommodations for "detached" thread states, which were a dubious idea to start with. (cherry picked from commit 8ba4df9)
This fixes a crasher due to a race condition, triggered infrequently when two isolated (own GIL) subinterpreters simultaneously initialize their sys or builtins modules. The crash happened due the combination of the "detached" thread state we were using and the "last holder" logic we use for the GIL. It turns out it's tricky to use the same thread state for different threads. Who could have guessed?
We solve the problem by eliminating the one object we were still sharing between interpreters. We replace it with a low-level hashtable, using the "raw" allocator to avoid tying it to the main interpreter.
We also remove the accommodations for "detached" thread states, which were a dubious idea to start with.
(This approach does not have the same potential performance penalty as the linked list we used in gh-106899.)