gh-135336: Add fast path to json string encoding by nineteendo · Pull Request #133239 · python/cpython ·

pyperformance (with `--enable-optimizations` and `--with-lto`)

main.json
=========

Performance version: 1.11.0
Python version: 3.15.0a0 (64-bit) revision c600310663
Report on macOS-13.7.6-x86_64-i386-64bit-Mach-O
Number of logical CPUs: 8
Start date: 2025-06-12 08:26:22.632424
End date: 2025-06-12 08:26:59.100296

feature.json
============

Performance version: 1.11.0
Python version: 3.15.0a0 (64-bit) revision 660d962602
Report on macOS-13.7.6-x86_64-i386-64bit-Mach-O
Number of logical CPUs: 8
Start date: 2025-06-12 08:27:40.576627
End date: 2025-06-12 08:28:11.517308

### json_dumps ###
Mean +- std dev: 12.2 ms +- 0.2 ms -> 10.0 ms +- 0.2 ms: 1.22x faster
Significant (t=88.87)

jsonyx-performance-tests (with `--enable-optimizations` and `--with-lto`)

encode	main	feature	difference
Dict with 65,536 booleans	8735.25 μs	5793.46 μs	1.50x faster
List of 65,536 empty strings	3424.57 μs	1654.34 μs	2.07x faster
List of 65,536 ASCII strings	12975.45 μs	5896.28 μs	2.20x faster
List of 65,536 strings	85195.07 μs	85930.24 μs	1.01x slower

Issue: Add fast path to json string encoding #135336

https://gist..com/methane/e080ec9783db2a313f40a2b9e1837e72

Benchmark	main	#133186	#133239
json_dumps: List of 256 booleans	16.6 us	not significant	17.2 us: 1.03x slower
json_dumps: List of 256 ASCII strings	67.9 us	34.7 us: 1.96x faster	46.5 us: 1.46x faster
json_dumps: List of 256 dicts with 1 int	122 us	101 us: 1.21x faster	112 us: 1.09x faster
json_dumps: Medium complex object	205 us	173 us: 1.18x faster	189 us: 1.09x faster
json_dumps: List of 256 strings	330 us	302 us: 1.09x faster	298 us: 1.11x faster
json_dumps: Complex object	2.57 ms	1.96 ms: 1.31x faster	not significant
json_dumps: Dict with 256 lists of 256 dicts with 1 int	30.5 ms	26.5 ms: 1.15x faster	29.4 ms: 1.04x faster
json_dumps(ensure_ascii=False): List of 256 booleans	16.6 us	not significant	17.2 us: 1.03x slower
json_dumps(ensure_ascii=False): List of 256 ASCII strings	68.1 us	34.6 us: 1.96x faster	46.5 us: 1.46x faster
json_dumps(ensure_ascii=False): List of 256 dicts with 1 int	122 us	101 us: 1.21x faster	112 us: 1.09x faster
json_dumps(ensure_ascii=False): Medium complex object	205 us	172 us: 1.19x faster	188 us: 1.09x faster
json_dumps(ensure_ascii=False): List of 256 strings	329 us	303 us: 1.09x faster	298 us: 1.11x faster
json_dumps(ensure_ascii=False): Complex object	2.56 ms	1.95 ms: 1.31x faster	not significant
json_dumps(ensure_ascii=False): Dict with 256 lists of 256 dicts with 1 int	30.6 ms	26.5 ms: 1.15x faster	29.4 ms: 1.04x faster
json_loads: List of 256 floats	91.4 us	88.3 us: 1.03x faster	not significant
json_loads: List of 256 strings	848 us	816 us: 1.04x faster	not significant
Geometric mean	(ref)	1.13x faster	1.05x faster

Benchmark hidden because not significant (10): json_dumps: List of 256 floats, json_dumps(ensure_ascii=False): List of 256 floats, json_loads: List of 256 booleans, json_loads: List of 256 ASCII strings, json_loads: List of 256 dicts with 1 int, json_loads: Medium complex object, json_loads: Complex object, json_loads: Dict with 256 lists of 256 dicts with 1 int, json_loads: List of 256 stringsensure_ascii=False, json_loads: Complex objectensure_ascii=False

@mdboom

@mdboom do you have the results of the Faster CPython infrastructure?

@mdboom

@mdboom do you have the results of the Faster CPython infrastructure?

Sorry, forgot to come back to them.

They are here: https://.com/faster-cpython/benchmarking-public/blob/main/results/bm-20250501-3.14.0a7%2B-930e938/bm-20250501-linux-x86_64-nineteendo-speedup_json_encode-3.14.0a7%2B-930e938-vs-base.svg

Confirmed 14% faster on json_dumps benchmark. In the noise for the others (as one would expect).

Modules/_json.c

Some very high level comments. I haven't dove too deep into the actual implementation yet.

Modules/_json.c

It would also be good to make an issue explaining the rationale and whatnot, and a blurb entry containing the performance increase.

Before merging this, we need to decide using private _PyUnicodeWriter APIs or not.
We shouldn't decide how to optimize more before it.

This adds quite a bit of code. Could not it be shared between py_encode_basestring and write_escaped_unicode?

I've created an issue and re-used shared code, but https://blurb-it.herokuapp.com is down

I ran my benchmark #133832 (comment) on this PR. I rebased the PR on the main branch.

Encoding a list of ASCII strings is up to 1.7x faster, it's impressive!

Sadly, encoding a long ASCII string is always slower (between 1.05x and 1.09x slower).

Benchmark	main	pr133239
encode 100 booleans	4.38 us	3.97 us: 1.10x faster
encode 100 integers	7.97 us	6.76 us: 1.18x faster
encode 100 floats	12.7 us	11.1 us: 1.14x faster
encode 100 "ascii" strings	8.75 us	5.63 us: 1.55x faster
encode ascii string len=100	540 ns	577 ns: 1.07x slower
encode escaped string len=128	754 ns	615 ns: 1.23x faster
encode Unicode string len=100	645 ns	595 ns: 1.08x faster
encode 1000 booleans	18.0 us	19.7 us: 1.09x slower
encode 1000 "ascii" strings	59.0 us	34.1 us: 1.73x faster
encode ascii string len=1000	2.09 us	1.91 us: 1.09x faster
encode escaped string len=896	2.33 us	2.15 us: 1.08x faster
encode Unicode string len=1000	2.81 us	2.90 us: 1.03x slower
encode 10000 booleans	158 us	169 us: 1.07x slower
encode 10000 integers	501 us	442 us: 1.13x faster
encode 10000 floats	1.04 ms	888 us: 1.18x faster
encode 10000 "ascii" strings	596 us	348 us: 1.71x faster
encode ascii string len=10000	16.9 us	17.8 us: 1.05x slower
encode escaped string len=9984	20.2 us	19.6 us: 1.03x faster
encode Unicode string len=10000	27.3 us	24.1 us: 1.13x faster
Geometric mean	(ref)	1.13x faster

Benchmark hidden because not significant (2): encode 1000 integers, encode 1000 floats

UPDATE: I had to re-run the benchmark since my first attempt was on debug builds :-(

Before merging this, we need to decide using private _PyUnicodeWriter APIs or not.

Whenever possible, I would prefer to use the public PyUnicodeWriter API. In issue gh-133968, I optimized PyUnicodeWriter to make the public API faster and so more interesting.

I've created an issue and re-used shared code, but https://blurb-it.herokuapp.com/ is down

You can install the blurb tool (pip install blurb) and run it locally in a terminal to add a NEWS entry.

This is not what I had in mind, although it does speed up a common case.

Currently, encoding is two-pass. First we calculate the size of the encoded string, then create the Unicode object of such size and fill it char by char. This PR uses the first step to determine whether we can get rid of the intermediate Unicode object (if there are no characters that need escaping). This helps for booleand, numbers, and many simple strings. But we can get rid of the intermediate Unicode object in all cases -- just reserve space in PyUnicodeWriter and write the encoded string directly there. For performance, we should not use high-level API like PyUnicodeWriter_WriteChar(), but write directly in the buffer.

we can get rid of the intermediate Unicode object in all cases -- just reserve space in PyUnicodeWriter and write the encoded string directly there. For performance, we should not use high-level API like PyUnicodeWriter_WriteChar(), but write directly in the buffer.

This not exposed through the public API. You could maybe try to use PyUnicodeWriter_WriteUCS4(), but I doubt that's much faster.

Not sure why but calling PyUnicode_GET_LENGTH(), PyUnicode_DATA(), PyUnicode_KIND() multiple times is inefficient.

Fast path for string encoding

930e938

nineteendo mentioned this pull request May 1, 2025
json: Optimize escaping string in Encoder #133186
Open

Merge branch 'main' into speedup-json-encode

32eefb3

nineteendo marked this pull request as ready for review May 13, 2025 14:06

bedevere-app bot added the awaiting review label May 13, 2025

nineteendo commented May 15, 2025
View reviewed changes

Modules/_json.c Outdated Show resolved Hide resolved

nineteendo added 3 commits May 30, 2025 17:09

Merge branch 'main' into speedup-json-encode

078965b

Improve slow path

83499a4

Reduce diff

25da453

This comment was marked as resolved.
Sign in to view

ZeroIntensity reviewed Jun 10, 2025
View reviewed changes

ZeroIntensity added the performancePerformance or resource usagelabel Jun 10, 2025

Reduce diff

b4a8026

serhiy-storchaka reviewed Jun 10, 2025
View reviewed changes

nineteendo added 2 commits June 10, 2025 10:09

Reuse code

8ee4c93

Fix compiler warnings

7b055be

nineteendo changed the title ~~json: Fast path for string encoding~~ gh-135336: Add fast path to json string encoding Jun 10, 2025

bedevere-app bot mentioned this pull request Jun 10, 2025
Add fast path to json string encoding #135336
Open

picnixz added the skip news label Jun 10, 2025

📜🤖 Added by blurb_it.

660d962

ZeroIntensity removed the skip news label Jun 11, 2025

Improve fast path

19f252a

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Conversation

pyperformance (with --enable-optimizations and --with-lto)

jsonyx-performance-tests (with --enable-optimizations and --with-lto)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pyperformance (with `--enable-optimizations` and `--with-lto`)

jsonyx-performance-tests (with `--enable-optimizations` and `--with-lto`)