Conversation

nineteendo

pyperformance (with --enable-optimizations and --with-lto)

main.json
=========

Performance version: 1.11.0
Python version: 3.15.0a0 (64-bit) revision c600310663
Report on macOS-13.7.6-x86_64-i386-64bit-Mach-O
Number of logical CPUs: 8
Start date: 2025-06-12 08:26:22.632424
End date: 2025-06-12 08:26:59.100296

feature.json
============

Performance version: 1.11.0
Python version: 3.15.0a0 (64-bit) revision 660d962602
Report on macOS-13.7.6-x86_64-i386-64bit-Mach-O
Number of logical CPUs: 8
Start date: 2025-06-12 08:27:40.576627
End date: 2025-06-12 08:28:11.517308

### json_dumps ###
Mean +- std dev: 12.2 ms +- 0.2 ms -> 10.0 ms +- 0.2 ms: 1.22x faster
Significant (t=88.87)

jsonyx-performance-tests (with --enable-optimizations and --with-lto)

encodemainfeaturedifference
Dict with 65,536 booleans8735.25 μs5793.46 μs1.50x faster
List of 65,536 empty strings3424.57 μs1654.34 μs2.07x faster
List of 65,536 ASCII strings12975.45 μs5896.28 μs2.20x faster
List of 65,536 strings85195.07 μs85930.24 μs1.01x slower

@methane

https://gist..com/methane/e080ec9783db2a313f40a2b9e1837e72

Benchmarkmain#133186#133239
json_dumps: List of 256 booleans16.6 usnot significant17.2 us: 1.03x slower
json_dumps: List of 256 ASCII strings67.9 us34.7 us: 1.96x faster46.5 us: 1.46x faster
json_dumps: List of 256 dicts with 1 int122 us101 us: 1.21x faster112 us: 1.09x faster
json_dumps: Medium complex object205 us173 us: 1.18x faster189 us: 1.09x faster
json_dumps: List of 256 strings330 us302 us: 1.09x faster298 us: 1.11x faster
json_dumps: Complex object2.57 ms1.96 ms: 1.31x fasternot significant
json_dumps: Dict with 256 lists of 256 dicts with 1 int30.5 ms26.5 ms: 1.15x faster29.4 ms: 1.04x faster
json_dumps(ensure_ascii=False): List of 256 booleans16.6 usnot significant17.2 us: 1.03x slower
json_dumps(ensure_ascii=False): List of 256 ASCII strings68.1 us34.6 us: 1.96x faster46.5 us: 1.46x faster
json_dumps(ensure_ascii=False): List of 256 dicts with 1 int122 us101 us: 1.21x faster112 us: 1.09x faster
json_dumps(ensure_ascii=False): Medium complex object205 us172 us: 1.19x faster188 us: 1.09x faster
json_dumps(ensure_ascii=False): List of 256 strings329 us303 us: 1.09x faster298 us: 1.11x faster
json_dumps(ensure_ascii=False): Complex object2.56 ms1.95 ms: 1.31x fasternot significant
json_dumps(ensure_ascii=False): Dict with 256 lists of 256 dicts with 1 int30.6 ms26.5 ms: 1.15x faster29.4 ms: 1.04x faster
json_loads: List of 256 floats91.4 us88.3 us: 1.03x fasternot significant
json_loads: List of 256 strings848 us816 us: 1.04x fasternot significant
Geometric mean(ref)1.13x faster1.05x faster

Benchmark hidden because not significant (10): json_dumps: List of 256 floats, json_dumps(ensure_ascii=False): List of 256 floats, json_loads: List of 256 booleans, json_loads: List of 256 ASCII strings, json_loads: List of 256 dicts with 1 int, json_loads: Medium complex object, json_loads: Complex object, json_loads: Dict with 256 lists of 256 dicts with 1 int, json_loads: List of 256 stringsensure_ascii=False, json_loads: Complex objectensure_ascii=False

@nineteendo

@mdboom do you have the results of the Faster CPython infrastructure?

@mdboom

@mdboom do you have the results of the Faster CPython infrastructure?

Sorry, forgot to come back to them.

They are here: https://.com/faster-cpython/benchmarking-public/blob/main/results/bm-20250501-3.14.0a7%2B-930e938/bm-20250501-linux-x86_64-nineteendo-speedup_json_encode-3.14.0a7%2B-930e938-vs-base.svg

Confirmed 14% faster on json_dumps benchmark. In the noise for the others (as one would expect).

@nineteendonineteendo marked this pull request as ready for review May 13, 2025 14:06
@nineteendo

This comment was marked as resolved.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some very high level comments. I haven't dove too deep into the actual implementation yet.

@ZeroIntensityZeroIntensity added the performancePerformance or resource usagelabel Jun 10, 2025
@ZeroIntensity

It would also be good to make an issue explaining the rationale and whatnot, and a blurb entry containing the performance increase.

@methane

Before merging this, we need to decide using private _PyUnicodeWriter APIs or not.
We shouldn't decide how to optimize more before it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds quite a bit of code. Could not it be shared between py_encode_basestring and write_escaped_unicode?

@nineteendonineteendo changed the title json: Fast path for string encoding gh-135336: Add fast path to json string encoding Jun 10, 2025
@nineteendo

I've created an issue and re-used shared code, but https://blurb-it.herokuapp.com is down

@vstinner

I ran my benchmark #133832 (comment) on this PR. I rebased the PR on the main branch.

Encoding a list of ASCII strings is up to 1.7x faster, it's impressive!

Sadly, encoding a long ASCII string is always slower (between 1.05x and 1.09x slower).

Benchmarkmainpr133239
encode 100 booleans4.38 us3.97 us: 1.10x faster
encode 100 integers7.97 us6.76 us: 1.18x faster
encode 100 floats12.7 us11.1 us: 1.14x faster
encode 100 "ascii" strings8.75 us5.63 us: 1.55x faster
encode ascii string len=100540 ns577 ns: 1.07x slower
encode escaped string len=128754 ns615 ns: 1.23x faster
encode Unicode string len=100645 ns595 ns: 1.08x faster
encode 1000 booleans18.0 us19.7 us: 1.09x slower
encode 1000 "ascii" strings59.0 us34.1 us: 1.73x faster
encode ascii string len=10002.09 us1.91 us: 1.09x faster
encode escaped string len=8962.33 us2.15 us: 1.08x faster
encode Unicode string len=10002.81 us2.90 us: 1.03x slower
encode 10000 booleans158 us169 us: 1.07x slower
encode 10000 integers501 us442 us: 1.13x faster
encode 10000 floats1.04 ms888 us: 1.18x faster
encode 10000 "ascii" strings596 us348 us: 1.71x faster
encode ascii string len=1000016.9 us17.8 us: 1.05x slower
encode escaped string len=998420.2 us19.6 us: 1.03x faster
encode Unicode string len=1000027.3 us24.1 us: 1.13x faster
Geometric mean(ref)1.13x faster

Benchmark hidden because not significant (2): encode 1000 integers, encode 1000 floats

UPDATE: I had to re-run the benchmark since my first attempt was on debug builds :-(

@vstinner

Before merging this, we need to decide using private _PyUnicodeWriter APIs or not.

Whenever possible, I would prefer to use the public PyUnicodeWriter API. In issue gh-133968, I optimized PyUnicodeWriter to make the public API faster and so more interesting.

@vstinner

I've created an issue and re-used shared code, but https://blurb-it.herokuapp.com/ is down

You can install the blurb tool (pip install blurb) and run it locally in a terminal to add a NEWS entry.

@serhiy-storchaka

This is not what I had in mind, although it does speed up a common case.

Currently, encoding is two-pass. First we calculate the size of the encoded string, then create the Unicode object of such size and fill it char by char. This PR uses the first step to determine whether we can get rid of the intermediate Unicode object (if there are no characters that need escaping). This helps for booleand, numbers, and many simple strings. But we can get rid of the intermediate Unicode object in all cases -- just reserve space in PyUnicodeWriter and write the encoded string directly there. For performance, we should not use high-level API like PyUnicodeWriter_WriteChar(), but write directly in the buffer.

@nineteendo

we can get rid of the intermediate Unicode object in all cases -- just reserve space in PyUnicodeWriter and write the encoded string directly there. For performance, we should not use high-level API like PyUnicodeWriter_WriteChar(), but write directly in the buffer.

This not exposed through the public API. You could maybe try to use PyUnicodeWriter_WriteUCS4(), but I doubt that's much faster.

@nineteendo

Not sure why but calling PyUnicode_GET_LENGTH(), PyUnicode_DATA(), PyUnicode_KIND() multiple times is inefficient.

Sign up for free to join this conversation on . Already have an account? Sign in to comment
awaiting review performancePerformance or resource usage
None yet

Successfully merging this pull request may close these issues.