gh-119109: improve `functools.partial` vectorcall with keywords by dg-pb · Pull Request #124584 · python/cpython

dg-pb · 2024-09-26T08:40:10Z

(Potentially closes #128050)

This IMO is the best approach to resolve fallback "issue". It:
a) Eliminates the need for the fallback or any need to switch between implementation after initial construction
b) Delivers performance benefits for vectorcall when partial has keywords

Benchmark:

# BENCH 2 ARGS
# ------------
S="
from functools import partial
f=lambda a, b: a - b
p1 = partial(f)
p2 = partial(f, b=2)
l = lambda a: f(a, b=2)
"

$PYCMD -c "${S}; print(p1(1, 2))"   # -1     | -1     |
$PYCMD -c "${S}; print(p2(1))"      # -1     | -1     |
                                    # BEFORE | AFTER  | %CHN | LAMBDA LB
$PYCMD -m timeit -s $S 'p1(1, 2)'   #  87 ns |  85 ns |      |
$PYCMD -m timeit -s $S 'p1(1, b=2)' # 100 ns |  96 ns |      |
$PYCMD -m timeit -s $S 'p2(1)'      # 240 ns | 135 ns | -45% |  94 ns
$PYCMD -m timeit -s $S 'p2(a=1)'    # 350 ns | 160 ns | -55% | 110 ns


# BENCH 10 ARGS
# -------------
S="
from functools import partial
func = lambda a, b, c, d, e, f, g, h, i, j: (a + b + c + d + e + f + g + h + i + j)
p = partial(func, f=5, g=6, h=7, i=8, j=9)
l = lambda a, b, c, d, e, f=5, g=6, h=7, i=8, j=9: func(a, b, c, d, e, f=f, g=g, h=h, i=i, j=j)
"

C0="${S}; print(p(0, 1, 2, 3, 4))"
C1='p(0, 1, 2, 3, 4)'
C2='p(a=0, b=1, c=2, d=3, e=4)'                             # disjoint kw and pto_kw
C3='p(a=0, b=1, c=2, d=3, e=4, f=5, g=6)'                   # kw partially overlaps pto_kw
C4='p(a=0, b=1, c=2, d=3, e=4, f=5, g=6, h=7, i=8, j=9)'    # kw overrides pto_kw


$PYCMD -c $C0               #  45     | 45     |
                            #  BEFORE | AFTER  | %CHN | LAMBDA LB
$PYCMD -m timeit -s $S $C1  #  440 ns | 320 ns | -28% | 240 ns
$PYCMD -m timeit -s $S $C2  #  890 ns | 440 ns | -50% | 260 ns
$PYCMD -m timeit -s $S $C3  # 1000 ns | 600 ns | -40% | 270 ns
$PYCMD -m timeit -s $S $C4  # 1250 ns | 700 ns | -44% | 300 ns

# FUNCTION CALL - 210 ms
$PYCMD -m timeit -s $S 'f(a=0, b=1, c=2, d=3, e=4, f=5, g=6, h=7, i=8, j=9)'

No penalty for calls without pto_kwds.
Non negligible speed improvement for calls with pto_kwds: 27 - 55%

Issue: functools.partial does not re-set vector call. #119109

Modules/_functoolsmodule.c

rhettinger · 2024-09-26T17:38:49Z

Perhaps @vstinner has the time and interest in looking at this.

dg-pb · 2024-09-29T07:26:51Z

I think it is a good compromise between simplicity and performance now.

One micro-optimization that I couldn't figure out how to do simply is pre-storing kwnames tuple so it doesn't need to be created on every call. It would drop another ~50 ns.

Not sure how much sense it makes yet, but I posted faster-cpython/ideas#699 in relation to this.

Ready for review now.

Modules/_functoolsmodule.c

dg-pb · 2024-10-17T01:52:13Z

Was wandering if it might be worth factoring out macros for private use.

Modules/_functoolsmodule.c

serhiy-storchaka

LGTM. 👍

kumaraditya303 · 2025-07-07T16:50:44Z

No refleaks:

❯ ./python.exe -m test -R 3:3 test_functools
Using random seed: 458260461
0:00:00 load avg: 1.72 Run 1 test sequentially in a single process
0:00:00 load avg: 1.72 [1/1] test_functools
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
XX. ...
0:00:02 load avg: 1.66 [1/1] test_functools passed

== Tests result: SUCCESS ==

1 test OK.

Total duration: 2.1 sec
Total tests: run=321 skipped=1
Total test files: run=1/1
Result: SUCCESS

dg-pb · 2025-07-07T17:16:17Z

No refleaks:

I get the same on main.

And also the same if I remove all tests except test_functools:TestImportTime.

Maybe not related to this PR?

EDIT: I assumed this shows that there are refleaks? Or am I failing to interpret something? Never used -R - a bit more explicit comment would be much appreciated.

kumaraditya303 · 2025-07-08T16:13:09Z

I assumed this shows that there are refleaks? Or am I failing to interpret something? Never used -R - a bit more explicit comment would be much appreciated.

The output I posted means that there are no leaks, -R is used to check for reference leaks and 3:3 indicates the numbers of runs to checks for leaks.

See https://devguide.python.org/testing/run-write-tests/

kumaraditya303 · 2025-07-08T17:59:35Z

I think the resizing logic is unnecessary and adds extra complexity, would you simplify it to what Serhiy suggested or alternatively remove it altogether @dg-pb?

dg-pb · 2025-07-08T18:20:57Z

simplify it to what Serhiy suggested

I don't think it was simplification. He did propose an alternative and I incorporated it into the logic.

Why I kept the old one noveralloc > 6 is because without it:

# only: (noveralloc > init_stack_size / 2)
p = partial(a=1)
f(a=2)
# initial stack size: 3
# used stack size: 1

Thus, would be kicking in too often.
And I took 10 instead of 2, as it wouldn't capture cases such as:
initial stack size = 1000
used stack size = 499

10% is I think a good number, given we are talking about the pool of 0.01% of cases.

I feel that what is currently there provides necessary protection at minimal cost and is unlikely to cause any issues.

But if you have better rationale in mind, I am open to further amendments.

alternatively remove it altogether

Things like recursion of partial objects or partial calls with very high number of keyword arguments, although not common and not the best idea for production code, but given description of partial object are perfectly valid use cases that can be useful in prototyping/testing something quickly.

"Anything that can go wrong will go wrong."
0.1% doesn't seem much, out of 10M it is 10K. Out of those 10K who decide to use more than 7 keywords, say 0.1% will have a noticeable memory consumption by partial object, that is 10 people.

Thus, given enough time it is very likely that more than 1 instance of issues with this will happen.

Given it is pretty much for free, and the block can be easily removed if there is a need for it in the future, I am in favour of being "better safe than sorry" here.

kumaraditya303 · 2025-07-08T19:01:48Z

But if you have better rationale in mind, I am open to further amendments.

I don't have better resizing strategy in mind atm so I'll not block this PR on it now, I'll merge this now thanks.

Modules/_functoolsmodule.c

…python#124584) Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>

hawkinsp · 2025-07-29T19:52:27Z

Is a 3.14 backport possible for this PR, given it fixes a free threading race also?

kumaraditya303 · 2025-07-30T13:02:07Z

Is a 3.14 backport possible for this PR, given it fixes a free threading race also?

This is performance improvement so it cannot be backported, I haven't seen any real crashes on this so adding supression should be fine for 3.14.

…python#124584) Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>

initial implementation

69ba0e9

dg-pb requested a review from rhettinger as a code owner September 26, 2024 08:40

bedevere-app bot added the awaiting review label Sep 26, 2024

bedevere-app bot mentioned this pull request Sep 26, 2024

functools.partial does not re-set vector call. #119109

Closed

rruuaanng reviewed Sep 26, 2024

View reviewed changes

Modules/_functoolsmodule.c Outdated Show resolved Hide resolved

Modules/_functoolsmodule.c Outdated Show resolved Hide resolved

dg-pb added 3 commits September 26, 2024 15:29

V2

9a21b55

small fixes

f23021c

V3

d840ad7

rhettinger requested review from vstinner and removed request for rhettinger September 26, 2024 17:33

dg-pb marked this pull request as draft September 27, 2024 06:31

bedevere-app bot removed the awaiting review label Sep 27, 2024

V4

2dd7568

dg-pb marked this pull request as ready for review September 27, 2024 14:54

bedevere-app bot added the awaiting review label Sep 27, 2024

dg-pb added 2 commits September 27, 2024 18:04

fix compiler warnings

862097f

V5 stable

64c889b

add commented fix if merging after pythongh-124652

a7142d5

rruuaanng reviewed Oct 1, 2024

View reviewed changes

Modules/_functoolsmodule.c Show resolved Hide resolved

Modules/_functoolsmodule.c Show resolved Hide resolved

Modules/_functoolsmodule.c Show resolved Hide resolved

dg-pb and others added 5 commits October 4, 2024 16:36

error check

ba36d01

fix error check

acba269

minor macro edit

898a104

merge to main

10b9f3b

📜🤖 Added by blurb_it.

3647c25

dg-pb mentioned this pull request Oct 17, 2024

functools.partial placeholders #119127

Closed

small edits

f9e3fd4

dg-pb mentioned this pull request Dec 20, 2024

Race between partial_vectorcall_fallback and _PyVectorcall_FunctionInline under free-threading #128050

Closed

Merge branch 'main' into pythongh-119109-partial_vectorcall_kw

a09ce19

serhiy-storchaka reviewed Jul 7, 2025

View reviewed changes

dg-pb added 3 commits July 7, 2025 18:26

few more brushes based on ss review

192d261

comment

1f21e74

assertion fixes

3070c67

serhiy-storchaka approved these changes Jul 7, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Jul 7, 2025

stack resize rule

2a56327

comment edit

e83099b

serhiy-storchaka reviewed Jul 8, 2025

View reviewed changes

Modules/_functoolsmodule.c Outdated Show resolved Hide resolved

potential overflow fix

482336d

kumaraditya303 approved these changes Jul 8, 2025

View reviewed changes

colesbury mentioned this pull request Jul 8, 2025

gh-128050: add free-threading suppression for partial_vectorcall_fallback #136418

Closed

kumaraditya303 merged commit f9932f5 into python:main Jul 9, 2025
40 checks passed

bedevere-app bot removed the awaiting merge label Jul 9, 2025

dg-pb deleted the gh-119109-partial_vectorcall_kw branch July 9, 2025 13:36

picnixz pushed a commit to picnixz/cpython that referenced this pull request Jul 13, 2025

pythongh-119109: improve functools.partial vectorcall with keywords (…

04f7781

…python#124584) Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>

Uh oh!

Conversation

dg-pb commented Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rhettinger commented Sep 26, 2024

Uh oh!

dg-pb commented Sep 29, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dg-pb commented Oct 17, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

kumaraditya303 commented Jul 7, 2025

Uh oh!

dg-pb commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kumaraditya303 commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kumaraditya303 commented Jul 8, 2025

Uh oh!

dg-pb commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kumaraditya303 commented Jul 8, 2025

Uh oh!

Uh oh!

Uh oh!

hawkinsp commented Jul 29, 2025

Uh oh!

kumaraditya303 commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

dg-pb commented Sep 26, 2024 •

edited

Loading

dg-pb commented Jul 7, 2025 •

edited

Loading

kumaraditya303 commented Jul 8, 2025 •

edited

Loading

dg-pb commented Jul 8, 2025 •

edited

Loading