gh-52551: Fix encoding issues in strftime()#125193
gh-52551: Fix encoding issues in strftime()#125193serhiy-storchaka merged 5 commits intopython:mainfrom
Conversation
Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also pythongh-78662 and pythongh-124531.
|
@vstinner, could you please review this PR? |
Modules/_datetimemodule.c
Outdated
| newformat, timetuple, NULL); | ||
| Py_DECREF(newformat); | ||
| } | ||
| if (0) { |
There was a problem hiding this comment.
This if (0) is weird. Why not adding a label after _PyUnicodeWriter_Dealloc() and using goto?
There was a problem hiding this comment.
This would require changing more code. Renaming label Done to Error (there are many goto Done in not changed code) and adding new Done label. The if (0) idiom is weird, but used in many places.
Modules/_datetimemodule.c
Outdated
| } | ||
| newformat = _PyUnicodeWriter_Finish(&writer); | ||
| } | ||
| if (newformat != NULL) { |
There was a problem hiding this comment.
I would prefer to exit early: if (newformat == NULL) goto ....
By the way, only the _PyUnicodeWriter_Finish() code path can fail, no?
There was a problem hiding this comment.
This would require adding more labels. The flow would look more complicated.
Modules/timemodule.c
Outdated
| * will be ahead of time... | ||
| */ | ||
| while (1) { | ||
| outbuf = (time_char *)PyMem_Realloc(outbuf, bufsize*sizeof(time_char)); |
There was a problem hiding this comment.
You may also add an explicit integer overflow check here.
Modules/timemodule.c
Outdated
| Py_DECREF(unicode); | ||
| } | ||
|
|
||
| j = i; |
There was a problem hiding this comment.
You should declare the j variable here, and I suggest to rename it to start.
| } | ||
| format[fmtlen++] = (char)c; | ||
| } | ||
| if (fmtlen) { |
There was a problem hiding this comment.
This function is quite big. You might move the code handling format in a sub-function (the whole if block).
There was a problem hiding this comment.
This may be inconvenient, because this code changes several outer variables.: outbuf and bufsize.
|
I addressed all comments. Perhaps it is now more difficult to review. |
vstinner
left a comment
There was a problem hiding this comment.
LGTM. I just left minor coding style suggestions that you can ignore.
Thanks for the exhaustive added tests, it's very helpful.
Modules/timemodule.c
Outdated
| *outbuf = (time_char *)PyMem_Realloc(*outbuf, | ||
| *bufsize*sizeof(time_char)); |
There was a problem hiding this comment.
| *outbuf = (time_char *)PyMem_Realloc(*outbuf, | |
| *bufsize*sizeof(time_char)); | |
| *outbuf = (time_char *)PyMem_Realloc(*outbuf, | |
| *bufsize*sizeof(time_char)); |
Modules/timemodule.c
Outdated
|
|
||
| static PyObject * | ||
| time_strftime(PyObject *module, PyObject *args) | ||
| strftime1(time_char **outbuf, size_t *bufsize, |
There was a problem hiding this comment.
nitpick: you might rename the function to time_strftime1() to avoid any conflict with a potential strftime1() name from the C library.
|
Thank you for your review @vstinner. |
|
Thanks @serhiy-storchaka for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13. |
|
Sorry, @serhiy-storchaka, I could not cleanly backport this to |
|
Sorry, @serhiy-storchaka, I could not cleanly backport this to |
…5193) Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also pythongh-78662 and pythongh-124531. (cherry picked from commit ad3eac1) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also pythongh-78662 and pythongh-124531. (cherry picked from commit ad3eac1)
|
GH-125657 is a backport of this pull request to the 3.13 branch. |
…5657) Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also gh-78662 and gh-124531. (cherry picked from commit ad3eac1)
…5193) (pythonGH-125657) Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also pythongh-78662 and pythongh-124531. (cherry picked from commit 08ccbb9) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> (cherry picked from commit ad3eac1)
…5657) (GH-125661) Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also gh-78662 and gh-124531. (cherry picked from commit 08ccbb9) (cherry picked from commit ad3eac1) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
|
Congrats! That's a nice change. Using Unicode instead of bytes for the internal buffer is a good move. |
Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time. * Characters not encodable in the current locale are now acceptable in the format string. * Surrogate pairs and sequence of surrogatescape-encoded bytes are no longer recombinated. * Embedded null character no longer terminates the format string. This fixes also pythongh-78662 and pythongh-124531.
Fix time.strftime(), the strftime() method and formatting of the datetime classes datetime, date and time.
Closes #52551, #78662 and #124531.
datetime.strftimestrings can be terminated by "\x00" literals #124531