gh-133968: Add PyUnicodeWriter_WriteASCII() function by vstinner · Pull Request #133973 · python/cpython

vstinner · 2025-05-13T14:15:13Z

Replace most PyUnicodeWriter_WriteUTF8() calls with PyUnicodeWriter_WriteASCII().

Issue: Using the public PyUnicodeWriter C API made the json module slower #133968

📚 Documentation preview 📚: https://cpython-previews--133973.org.readthedocs.build/

Replace most PyUnicodeWriter_WriteUTF8() calls with PyUnicodeWriter_WriteASCII().

vstinner · 2025-05-13T14:33:52Z

Benchmark	ref	change
encode 100 booleans	7.15 us	6.54 us: 1.09x faster
encode 100 integers	11.6 us	11.7 us: 1.01x slower
encode 100 "ascii" strings	13.4 us	13.2 us: 1.02x faster
encode escaped string len=128	1.11 us	1.10 us: 1.01x faster
encode 1000 booleans	39.3 us	32.9 us: 1.19x faster
encode Unicode string len=1000	4.93 us	4.94 us: 1.00x slower
encode 10000 booleans	343 us	286 us: 1.20x faster
encode ascii string len=10000	28.5 us	28.8 us: 1.01x slower
encode escaped string len=9984	38.7 us	38.9 us: 1.00x slower
encode Unicode string len=10000	42.6 us	42.4 us: 1.00x faster
Geometric mean	(ref)	1.02x faster

Benchmark hidden because not significant (11): encode 100 floats, encode ascii string len=100, encode Unicode string len=100, encode 1000 integers, encode 1000 floats, encode 1000 "ascii" strings, encode ascii string len=1000, encode escaped string len=896, encode 10000 integers, encode 10000 floats, encode 10000 "ascii" strings

Up to 1.20x faster to encode booleans is interesting knowing that these strings are very short: "true" (4 characters) and "false" (5 characters).

vstinner · 2025-05-13T14:35:02Z

The PyUnicodeWriter_WriteASCII() function is faster than PyUnicodeWriter_WriteUTF8(), but has an undefined behavior if the input string contains non-ASCII characters.

@serhiy-storchaka: What do you think of this function?

vstinner · 2025-05-13T14:35:13Z

cc @ZeroIntensity

ZeroIntensity

Some nits

Doc/c-api/unicode.rst

Objects/unicodeobject.c

serhiy-storchaka · 2025-05-13T15:42:03Z

Well, we had _PyUnicodeWriter_WriteASCIIString for reasons.

But unicode_decode_utf8_writer is already optimized for ASCII. Can it be optimized even more? In theory, it can be made almost as fast as _PyUnicodeWriter_WriteASCIIString.

We can add private _PyUnicodeWriter_WriteASCII for now, to avoid regression in JSON encode, and then try to squeeze nanoseconds from PyUnicodeWriter_WriteUTF8. If we fail, we can add public PyUnicodeWriter_WriteASCII.

Co-authored-by: Peter Bierma <[email protected]>

vstinner · 2025-05-13T16:12:12Z

But unicode_decode_utf8_writer is already optimized for ASCII. Can it be optimized even more?

I don't think that it can become as fast or faster than a function which takes ASCII string as argument. If we know that the input string is ASCII, there is no need to scan the string for non-ASCII characters, and we can take the fast path.

You're right that the UTF-8 decoder is already highly optimized.

vstinner · 2025-05-14T13:23:08Z

In short:

PyUnicodeWriter_WriteUTF8() calls ascii_decode() which is an efficient ASCII decoder.
PyUnicodeWriter_WriteASCII() calls memcpy().

It's hard to beat memcpy() performance!

serhiy-storchaka · 2025-05-14T15:07:42Z

Yes, although it was close, at least for moderately large strings. Could it be optimized even more? I don't know.

But decision about PyUnicodeWriter_WriteASCII should be made by the C API Workgroup. I'm not sure of my opinion yet. This API is unsafe.

vstinner · 2025-05-14T19:45:06Z

I created capi-workgroup/decisions#65 issue.

vstinner · 2025-05-14T19:57:26Z

Benchmark:

write_utf8 size=10: Mean +- std dev: 153 ns +- 1 ns
write_utf8 size=100: Mean +- std dev: 174 ns +- 1 ns
write_utf8 size=1,000: Mean +- std dev: 279 ns +- 0 ns
write_utf8 size=10,000: Mean +- std dev: 1.36 us +- 0.00 us

write_ascii size=10: Mean +- std dev: 141 ns +- 0 ns
write_ascii size=100: Mean +- std dev: 149 ns +- 0 ns
write_ascii size=1,000: Mean +- std dev: 176 ns +- 3 ns
write_ascii size=10,000: Mean +- std dev: 690 ns +- 8 ns

On long strings (10,000 bytes), PyUnicodeWriter_WriteASCII() is up to 2x faster (1.36 us => 690 ns) than PyUnicodeWriter_WriteUTF8().

Details

from _testcapi import PyUnicodeWriter
import pyperf

range_100 = range(100)

def bench_write_utf8(text, size):
    writer = PyUnicodeWriter(0)
    for _ in range_100:
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)
        writer.write_utf8(text, size)

def bench_write_ascii(text, size):
    writer = PyUnicodeWriter(0)
    for _ in range_100:
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)
        writer.write_ascii(text, size)

runner = pyperf.Runner()
sizes = (10, 100, 1_000, 10_000)

for size in sizes:
    text = b'x' * size
    runner.bench_func(f'write_utf8 size={size:,}', bench_write_utf8, text, size,
                      inner_loops=1_000)

for size in sizes:
    text = b'x' * size
    runner.bench_func(f'write_ascii size={size:,}', bench_write_ascii, text, size,
                      inner_loops=1_000)

encukou · 2025-05-15T09:04:41Z

Do we know where the bottleneck is for long strings?
Would it make sense have a version of find_first_nonascii that checks and copies in the same loop?

vstinner · 2025-05-15T15:38:31Z

Do we know where the bottleneck is for long strings?

WriteUTF8() has to check for non-ASCII characters: this check has a cost. That's the bottleneck.

Would it make sense have a version of find_first_nonascii that checks and copies in the same loop?

Maybe, I don't know if it would be faster.

Doc/whatsnew/3.15.rst

Python/hamt.c

Objects/unionobject.c

vstinner · 2025-05-15T19:36:06Z

Would it make sense have a version of find_first_nonascii that checks and copies in the same loop?

I tried but failed to modify the code to copy while reading (checking if the string is encoded to ASCII). The code is quite complicated.

Co-authored-by: Bénédikt Tran <[email protected]>

picnixz

I'm happy to have this function public. I always preferred using the faster versions of the writer API when I hardcoded strings, but they were private.

Lib/test/test_capi/test_unicode.py

ZeroIntensity

Sorry for the late review, LGTM as well.

vstinner · 2025-05-29T14:22:35Z

I created capi-workgroup/decisions#65 issue.

The C API Working Group voted in favor of adding the function.

…3973) Replace most PyUnicodeWriter_WriteUTF8() calls with PyUnicodeWriter_WriteASCII(). Unrelated change to please the linter: remove an unused import in test_ctypes. Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> (cherry picked from commit f49a07b)

bedevere-app · 2025-05-31T11:30:29Z

GH-134974 is a backport of this pull request to the 3.14 branch.

…3973) Replace most PyUnicodeWriter_WriteUTF8() calls with PyUnicodeWriter_WriteASCII(). Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> (cherry picked from commit f49a07b)

…#134974) gh-133968: Add PyUnicodeWriter_WriteASCII() function (#133973) Replace most PyUnicodeWriter_WriteUTF8() calls with PyUnicodeWriter_WriteASCII(). (cherry picked from commit f49a07b) Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]>

…3973) Replace most PyUnicodeWriter_WriteUTF8() calls with PyUnicodeWriter_WriteASCII(). Unrelated change to please the linter: remove an unused import in test_ctypes. Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]>

pythongh-133968: Add PyUnicodeWriter_WriteASCII() function

2aa2e87

Replace most PyUnicodeWriter_WriteUTF8() calls with PyUnicodeWriter_WriteASCII().

vstinner requested review from 1st1, Eclips4, JelleZijlstra, gpshead, isidentical, markshannon and picnixz as code owners May 13, 2025 14:15

bedevere-app bot mentioned this pull request May 13, 2025

Using the public PyUnicodeWriter C API made the json module slower #133968

Closed

bedevere-app bot added the awaiting core review label May 13, 2025

vstinner mentioned this pull request May 13, 2025

gh-133968: Use private unicode writer for json #133832

Merged

ZeroIntensity reviewed May 13, 2025

View reviewed changes

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

Doc/c-api/unicode.rst Show resolved Hide resolved

Objects/unicodeobject.c Show resolved Hide resolved

Update Doc/c-api/unicode.rst

fc08c32

Co-authored-by: Peter Bierma <[email protected]>

vstinner mentioned this pull request May 14, 2025

Add PyUnicodeWriter_WriteASCII() function capi-workgroup/decisions#65

Closed

6 tasks

picnixz reviewed May 15, 2025

View reviewed changes

Doc/whatsnew/3.15.rst Show resolved Hide resolved

Python/hamt.c Show resolved Hide resolved

Python/hamt.c Show resolved Hide resolved

Python/hamt.c Outdated Show resolved Hide resolved

Objects/unionobject.c Show resolved Hide resolved

vstinner and others added 3 commits May 15, 2025 21:41

Update Doc/whatsnew/3.15.rst

14c22c3

Co-authored-by: Bénédikt Tran <[email protected]>

Update Python/hamt.c

33b3276

Co-authored-by: Bénédikt Tran <[email protected]>

Address Peter's review

e7ca52f

picnixz approved these changes May 15, 2025

View reviewed changes

Lib/test/test_capi/test_unicode.py Show resolved Hide resolved

bedevere-app bot added awaiting merge and removed awaiting core review labels May 15, 2025

Test also empty string

25e9444

ZeroIntensity approved these changes May 18, 2025

View reviewed changes

Merge branch 'main' into write_ascii

418c836

Please the linter: remove an unused import

b01a577

vstinner enabled auto-merge (squash) May 29, 2025 14:40

vstinner merged commit f49a07b into python:main May 29, 2025
39 checks passed

vstinner deleted the write_ascii branch May 29, 2025 14:54

bedevere-app bot removed the awaiting merge label May 29, 2025

Uh oh!

Conversation

vstinner commented May 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vstinner commented May 13, 2025

Uh oh!

vstinner commented May 13, 2025

Uh oh!

vstinner commented May 13, 2025

Uh oh!

ZeroIntensity left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

serhiy-storchaka commented May 13, 2025

Uh oh!

vstinner commented May 13, 2025

Uh oh!

vstinner commented May 14, 2025

Uh oh!

serhiy-storchaka commented May 14, 2025

Uh oh!

vstinner commented May 14, 2025

Uh oh!

vstinner commented May 14, 2025

Uh oh!

encukou commented May 15, 2025

Uh oh!

vstinner commented May 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vstinner commented May 15, 2025

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ZeroIntensity left a comment

Choose a reason for hiding this comment

Uh oh!

vstinner commented May 29, 2025

Uh oh!

Uh oh!

bedevere-app bot commented May 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vstinner commented May 13, 2025 •

edited by github-actions bot

Loading