gh-119109: improve functools.partial vectorcall with keywords#124584
gh-119109: improve functools.partial vectorcall with keywords#124584kumaraditya303 merged 37 commits intopython:mainfrom
functools.partial vectorcall with keywords#124584Conversation
|
Perhaps @vstinner has the time and interest in looking at this. |
|
I think it is a good compromise between simplicity and performance now. One micro-optimization that I couldn't figure out how to do simply is pre-storing Not sure how much sense it makes yet, but I posted faster-cpython/ideas#699 in relation to this. Ready for review now. |
|
Was wandering if it might be worth factoring out macros for private use. |
|
No refleaks: ❯ ./python.exe -m test -R 3:3 test_functools
Using random seed: 458260461
0:00:00 load avg: 1.72 Run 1 test sequentially in a single process
0:00:00 load avg: 1.72 [1/1] test_functools
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
XX. ...
0:00:02 load avg: 1.66 [1/1] test_functools passed
== Tests result: SUCCESS ==
1 test OK.
Total duration: 2.1 sec
Total tests: run=321 skipped=1
Total test files: run=1/1
Result: SUCCESS |
I get the same on And also the same if I remove all tests except Maybe not related to this PR? EDIT: I assumed this shows that there are refleaks? Or am I failing to interpret something? Never used |
The output I posted means that there are no leaks, -R is used to check for reference leaks and 3:3 indicates the numbers of runs to checks for leaks. |
|
I think the resizing logic is unnecessary and adds extra complexity, would you simplify it to what Serhiy suggested or alternatively remove it altogether @dg-pb? |
I don't think it was simplification. He did propose an alternative and I incorporated it into the logic. Why I kept the old one # only: (noveralloc > init_stack_size / 2)
p = partial(a=1)
f(a=2)
# initial stack size: 3
# used stack size: 1Thus, would be kicking in too often. 10% is I think a good number, given we are talking about the pool of 0.01% of cases. I feel that what is currently there provides necessary protection at minimal cost and is unlikely to cause any issues. But if you have better rationale in mind, I am open to further amendments.
Things like recursion of "Anything that can go wrong will go wrong." Thus, given enough time it is very likely that more than 1 instance of issues with this will happen. Given it is pretty much for free, and the block can be easily removed if there is a need for it in the future, I am in favour of being "better safe than sorry" here. |
I don't have better resizing strategy in mind atm so I'll not block this PR on it now, I'll merge this now thanks. |
…python#124584) Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…python#124584) Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…python#124584) Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
|
Is a 3.14 backport possible for this PR, given it fixes a free threading race also? |
This is performance improvement so it cannot be backported, I haven't seen any real crashes on this so adding supression should be fine for 3.14. |
…python#124584) Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…python#124584) Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
(Potentially closes #128050)
This IMO is the best approach to resolve fallback "issue". It:
a) Eliminates the need for the fallback or any need to switch between implementation after initial construction
b) Delivers performance benefits for vectorcall when
partialhas keywordsBenchmark:
No penalty for calls without
pto_kwds.Non negligible speed improvement for calls with
pto_kwds: 27 - 55%