gh-96735: Fix undefined behaviour in struct unpacking functions#96739
gh-96735: Fix undefined behaviour in struct unpacking functions#96739mdickinson merged 10 commits intopython:mainfrom
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
…behaviour' into fix-struct-unpack-undefined-behaviour
|
It's very tempting to convert all of these functions to use fixed-width integer types ( |
kumaraditya303
left a comment
There was a problem hiding this comment.
LGTM, I tested this and indeed it fixes the UB.
Thanks for fixing the other ones too, just noticed that those were also affected.
I agree, it would be better but best left for 3.12 only. |
|
🤖 New build scheduled with the buildbot fleet by @mdickinson for commit 1294440 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
|
@kumaraditya303 I've updated this PR to be more efficient, while still avoiding undefined behaviour and implementation-defined behaviour (specifically, the conversion from an unsigned type to the corresponding signed type when the value being converted is not representable in the signed type; cf. C99 §6.3.1.3p3). The sign-extension is now branch free (and should compile to a no-op in the case that the C type being used has exactly the correct width), and the conversion from unsigned to signed should always compile to a no-op on any semi-reasonable compiler. Godbolt example for the case where sign extension is needed: https://godbolt.org/z/e8roKrn6r |
|
We are currently compiling all non-pydebug builds with I don't think we will turn off I'd say: just do the Right Thing for 3.12 in See also #96821 and #96678 (comment) As a minimal change for 3.11, I would suggest enabling That being said, I'm not opposed to this PR. It's a great start to getting everything safe for |
Agreed; I think I'll merge this for 3.12 only. |
|
Some not very rigorous timings, on macOS/Intel, non-optimised non-debug build. (This PR is not primarily about performance, but it would be unfortunate if it caused a significant performance regression.) On this branch: On the main branch at commit 6281aff: |
This PR fixes undefined behaviour in the struct module unpacking support functions
bu_longlong,lu_longlong,bu_intandlu_int; thanks to @kumaraditya303 for finding these.The fix is to accumulate the bytes in an unsigned integer type instead of a signed integer type, then to convert to the appropriate signed type. In cases where the width matches, that conversion will typically be compiled away to a no-op.
(Evidence from Godbolt: https://godbolt.org/z/5zvxodj64 .)
To make the conversions efficient, I've specialised the relevant functions for their output size: for
bu_longlongandlu_longlong, this only entails checking that the output size is indeed8. Butbu_intandlu_intwere used for format sizes2and4- I've split those into two separate functions each.No tests, because all of the affected cases are already exercised by the test suite.
struct.unpack#96735