Each instruction is two codewords, and consists of "opcode, oparg, 0, 0" by iritkatriel · Pull Request #100106 · python/cpython

iritkatriel · 2022-12-08T10:45:49Z

This emits "opcode, oparg, 0, 0" for each instruction.

Still debugging some test failures related to line numbers/tracing etc. But this works well enough to benchmark with pyperformance:

+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| Benchmark               | /home/benchmarking/BENCH/REQUESTS/req-compile-bench-1670439089-iritkatriel-linux/pyperformance-results.json.gz | /home/benchmarking/BENCH/REQUESTS/req-compile-bench-1670428040-iritkatriel-linux/pyperformance-results.json.gz |
+=========================+================================================================================================================+================================================================================================================+
| 2to3                    | 247 ms                                                                                                         | 255 ms: 1.03x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| async_generators        | 356 ms                                                                                                         | 360 ms: 1.01x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| async_tree_none         | 533 ms                                                                                                         | 541 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| async_tree_cpu_io_mixed | 741 ms                                                                                                         | 762 ms: 1.03x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| async_tree_io           | 1.33 sec                                                                                                       | 1.34 sec: 1.01x slower                                                                                         |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| async_tree_memoization  | 636 ms                                                                                                         | 677 ms: 1.06x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| chameleon               | 6.57 ms                                                                                                        | 6.30 ms: 1.04x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| chaos                   | 67.3 ms                                                                                                        | 69.4 ms: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| bench_thread_pool       | 769 us                                                                                                         | 785 us: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| coroutines              | 25.2 ms                                                                                                        | 25.9 ms: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| crypto_pyaes            | 77.0 ms                                                                                                        | 74.9 ms: 1.03x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| deepcopy                | 329 us                                                                                                         | 335 us: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| deepcopy_reduce         | 2.86 us                                                                                                        | 2.95 us: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| deepcopy_memo           | 34.3 us                                                                                                        | 34.9 us: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| deltablue               | 3.24 ms                                                                                                        | 3.44 ms: 1.06x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| django_template         | 32.7 ms                                                                                                        | 33.3 ms: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| docutils                | 2.49 sec                                                                                                       | 2.52 sec: 1.01x slower                                                                                         |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| dulwich_log             | 61.0 ms                                                                                                        | 61.9 ms: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| fannkuch                | 380 ms                                                                                                         | 387 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| float                   | 72.8 ms                                                                                                        | 76.6 ms: 1.05x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| genshi_text             | 20.6 ms                                                                                                        | 20.7 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| genshi_xml              | 47.9 ms                                                                                                        | 47.4 ms: 1.01x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| go                      | 137 ms                                                                                                         | 143 ms: 1.05x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| hexiom                  | 6.11 ms                                                                                                        | 6.35 ms: 1.04x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| html5lib                | 59.0 ms                                                                                                        | 62.1 ms: 1.05x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| json_dumps              | 9.29 ms                                                                                                        | 9.34 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| logging_format          | 6.27 us                                                                                                        | 6.43 us: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| logging_silent          | 91.6 ns                                                                                                        | 94.8 ns: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| logging_simple          | 5.71 us                                                                                                        | 5.81 us: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| mako                    | 9.73 ms                                                                                                        | 9.62 ms: 1.01x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| mdp                     | 2.51 sec                                                                                                       | 2.59 sec: 1.03x slower                                                                                         |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| nbody                   | 94.3 ms                                                                                                        | 90.2 ms: 1.05x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| nqueens                 | 83.3 ms                                                                                                        | 81.1 ms: 1.03x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pickle                  | 10.1 us                                                                                                        | 10.2 us: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pickle_dict             | 30.9 us                                                                                                        | 31.1 us: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pickle_list             | 4.16 us                                                                                                        | 4.06 us: 1.02x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pickle_pure_python      | 280 us                                                                                                         | 290 us: 1.04x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pycparser               | 1.13 sec                                                                                                       | 1.12 sec: 1.02x faster                                                                                         |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| pyflate                 | 405 ms                                                                                                         | 425 ms: 1.05x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| python_startup          | 8.56 ms                                                                                                        | 8.59 ms: 1.00x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| python_startup_no_site  | 6.28 ms                                                                                                        | 6.31 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| raytrace                | 278 ms                                                                                                         | 284 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| regex_compile           | 130 ms                                                                                                         | 133 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| regex_dna               | 206 ms                                                                                                         | 202 ms: 1.02x faster                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| regex_effbot            | 3.76 ms                                                                                                        | 3.62 ms: 1.04x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| regex_v8                | 22.2 ms                                                                                                        | 21.9 ms: 1.02x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| richards                | 42.3 ms                                                                                                        | 43.4 ms: 1.03x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| scimark_fft             | 315 ms                                                                                                         | 310 ms: 1.02x faster                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| scimark_lu              | 106 ms                                                                                                         | 109 ms: 1.03x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| scimark_monte_carlo     | 68.3 ms                                                                                                        | 69.2 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| scimark_sor             | 105 ms                                                                                                         | 119 ms: 1.13x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| scimark_sparse_mat_mult | 4.24 ms                                                                                                        | 3.99 ms: 1.06x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| spectral_norm           | 99.4 ms                                                                                                        | 95.8 ms: 1.04x faster                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sqlglot_parse           | 1.34 ms                                                                                                        | 1.36 ms: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sqlglot_transpile       | 1.63 ms                                                                                                        | 1.65 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sqlglot_optimize        | 50.9 ms                                                                                                        | 51.3 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sqlglot_normalize       | 105 ms                                                                                                         | 106 ms: 1.01x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sqlite_synth            | 2.59 us                                                                                                        | 2.64 us: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sympy_expand            | 454 ms                                                                                                         | 463 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sympy_integrate         | 20.4 ms                                                                                                        | 20.9 ms: 1.02x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sympy_sum               | 163 ms                                                                                                         | 165 ms: 1.01x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| sympy_str               | 281 ms                                                                                                         | 287 ms: 1.02x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| telco                   | 6.32 ms                                                                                                        | 6.58 ms: 1.04x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| thrift                  | 763 us                                                                                                         | 750 us: 1.02x faster                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| unpack_sequence         | 42.1 ns                                                                                                        | 43.8 ns: 1.04x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| unpickle_list           | 4.93 us                                                                                                        | 4.98 us: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| unpickle_pure_python    | 202 us                                                                                                         | 214 us: 1.06x slower                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| xml_etree_iterparse     | 106 ms                                                                                                         | 103 ms: 1.03x faster                                                                                           |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| xml_etree_generate      | 76.7 ms                                                                                                        | 77.2 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| xml_etree_process       | 53.1 ms                                                                                                        | 53.8 ms: 1.01x slower                                                                                          |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+
| Geometric mean          | (ref)                                                                                                          | 1.01x slower                                                                                                   |
+-------------------------+----------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------+

Benchmark hidden because not significant (13): bench_mp_pool, coverage, generators, json, json_loads, meteor_contest, mypy, pathlib, pidigits, pprint_safe_repr, pprint_pformat, unpickle, xml_etree_parse
Ignored benchmarks (3) of /home/benchmarking/BENCH/REQUESTS/req-compile-bench-1670428040-iritkatriel-linux/pyperformance-results.json.gz: aiohttp, gunicorn, tornado_http

netlify · 2022-12-08T10:45:54Z

✅ Deploy Preview for python-cpython-preview canceled.

Name	Link
🔨 Latest commit	`414665b`
🔍 Latest deploy log	https://app.netlify.com/sites/python-cpython-preview/deploys/63970f4f73026d0008111626

gvanrossum

Cool work. So the doubling of the instruction size only costs us 1%. That means if we can realize the removal of LOAD/STORE_FAST and LOAD_CONST we should be able to gain quite a bit.

Do you envision we could do a gradual transition to the register world, where some instructions use registers and others still use the stack?

iritkatriel · 2022-12-08T21:39:06Z

Do you envision we could do a gradual transition to the register world, where some instructions use registers and others still use the stack?

I think so. A register can be an index into the stack, and some opcodes can just push and pop as before. This makes the transition incremental.

gvanrossum · 2022-12-08T21:49:26Z

I think so. A register can be an index into the stack, and some opcodes can just push and pop as before. This makes the transition incremental.

Sounds good. Maybe we should add that to faster-cpython/ideas#485 (or one of the other issues about registers?)

Python/ceval.c

gvanrossum

Time to start making one simple instruction use an extra oparg? Without even optimizing LOAD/STORE -- we could just tackle UNARY_NEGATIVE and give it a second oparg that designates the destination, and make the compiler write the bytecode like that.

Lib/dis.py

gvanrossum · 2022-12-12T19:31:13Z

Include/opcode.h

 #define NB_INPLACE_XOR                          25

+/* number of codewords for opcode+oparg(s) */
+#define OPSIZE 2


I guess for now we're not contemplating the size depending on the opcode. Probably just as well.

Yeah, it won’t be hard to change this macro if we decide to do that.

iritkatriel · 2022-12-15T19:40:24Z

I made a new PR with this stuff on today's version of main: #100276.

Each instruction is two codewords, and consists of "opcode, oparg, 0, 0"

144c64d

iritkatriel requested a review from markshannon as a code owner December 8, 2022 10:45

bedevere-bot added the awaiting core review label Dec 8, 2022

iritkatriel marked this pull request as draft December 8, 2022 10:45

iritkatriel requested a review from gvanrossum December 8, 2022 10:47

fewer tests are failing

ac05ae0

iritkatriel force-pushed the 3_arg branch from 29626ea to ac05ae0 Compare December 8, 2022 18:28

fix test_compile

b3d2a0c

gvanrossum reviewed Dec 8, 2022

View reviewed changes

iritkatriel mentioned this pull request Dec 8, 2022

register-based interpreter faster-cpython/ideas#485

Open

fix the trace

e5ebc54

iritkatriel commented Dec 9, 2022

View reviewed changes

Python/ceval.c Outdated Show resolved Hide resolved

iritkatriel added 2 commits December 10, 2022 16:50

fix test_peepholer

f6d7070

skip (for now) two tests that fail in mark_stacks

2c68a21

gvanrossum reviewed Dec 10, 2022

View reviewed changes

Lib/dis.py Show resolved Hide resolved

iritkatriel added 2 commits December 10, 2022 19:07

fix test_dis

f9e5993

add OPSIZE to make the code a bit more self-documenting

414665b

gvanrossum reviewed Dec 12, 2022

View reviewed changes

iritkatriel closed this Dec 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Each instruction is two codewords, and consists of "opcode, oparg, 0, 0"#100106

Each instruction is two codewords, and consists of "opcode, oparg, 0, 0"#100106
iritkatriel wants to merge 8 commits intopython:mainfrom
iritkatriel:3_arg

iritkatriel commented Dec 8, 2022

Uh oh!

netlify bot commented Dec 8, 2022 •

edited

Loading

Uh oh!

gvanrossum left a comment

Uh oh!

iritkatriel commented Dec 8, 2022

Uh oh!

gvanrossum commented Dec 8, 2022

Uh oh!

Uh oh!

gvanrossum left a comment

Uh oh!

Uh oh!

gvanrossum Dec 12, 2022

Uh oh!

iritkatriel Dec 12, 2022

Uh oh!

iritkatriel commented Dec 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

iritkatriel commented Dec 8, 2022

Uh oh!

netlify bot commented Dec 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for python-cpython-preview canceled.

Uh oh!

gvanrossum left a comment

Choose a reason for hiding this comment

Uh oh!

iritkatriel commented Dec 8, 2022

Uh oh!

gvanrossum commented Dec 8, 2022

Uh oh!

Uh oh!

gvanrossum left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gvanrossum Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

iritkatriel Dec 12, 2022

Choose a reason for hiding this comment

Uh oh!

iritkatriel commented Dec 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

netlify bot commented Dec 8, 2022 •

edited

Loading