Skip to content
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
a3c014b
Raise a warning when encoding is omitted
methane Apr 7, 2020
050bd1b
add test
methane Apr 12, 2020
939f4a0
wrap encoding=None with text_encoding.
methane Apr 12, 2020
3c99777
Add io.LOCALE_ENCODING = "locale"
methane Jan 29, 2021
4016278
Add EncodingWarning.
methane Jan 29, 2021
c5c556c
Add sys.warn_default_encoding
methane Jan 29, 2021
d9a08c2
shorten option names
methane Jan 30, 2021
772648e
EncodingWarning extends Warning
methane Jan 30, 2021
1a8e305
make clinic
methane Jan 30, 2021
20966cd
fix test
methane Jan 30, 2021
2b80f42
remove wrong test case
methane Jan 30, 2021
760308c
fix exception_hierarchy.txt
methane Jan 30, 2021
a95dff2
Make sys.flags.encoding_warning int
methane Jan 31, 2021
31fb411
Fix text_embed.
methane Jan 31, 2021
096a0a3
Fix test_pickle
methane Jan 31, 2021
99fc938
configparser: use io.text_encoding()
methane Feb 13, 2021
6fdbcbc
Rename option names
methane Feb 22, 2021
3f362bc
Merge remote-tracking branch 'upstream/master' into open-encoding
methane Mar 16, 2021
674feff
Update docs
methane Mar 16, 2021
d9d850f
Add NEWS entry
methane Mar 16, 2021
16463ea
Add document for text_encoding and encoding="locale".
methane Mar 17, 2021
412d633
Suppress EncodingWarning from site.py
methane Mar 17, 2021
ee883d1
Remove io.LOCALE_ENCODING
methane Mar 18, 2021
6a15e2a
text_encoding() first argument is mandatory.
methane Mar 18, 2021
5d474b4
Apply suggestions from code review
methane Mar 18, 2021
c17016f
Simplify _PyPreCmdline and PyConfig
methane Mar 18, 2021
03f971c
Update EncodingWarning doc
methane Mar 18, 2021
9d26b7a
Update document
methane Mar 19, 2021
60e74cf
tweak warning message
methane Mar 19, 2021
a505b5f
Use stacklevel=2 for text_encoding() default
methane Mar 19, 2021
cbe22e2
fixup
methane Mar 19, 2021
a9f9f04
tweak for readability
methane Mar 19, 2021
3bea88f
make clinic
methane Mar 19, 2021
d260a4c
fix doc build error
methane Mar 19, 2021
049a269
tweak warning message
methane Mar 19, 2021
018ba64
fixup
methane Mar 19, 2021
3a9623e
Fix subprocess
methane Mar 23, 2021
737059e
Update Doc/library/io.rst
methane Mar 23, 2021
6a62211
Update Doc/library/io.rst
methane Mar 23, 2021
54c7dc6
Update Doc/library/io.rst
methane Mar 23, 2021
5b2830b
Update Doc/library/io.rst
methane Mar 23, 2021
14f2a6e
Apply suggestions from code review
methane Mar 23, 2021
06e2a32
Move EncodingWarnings
methane Mar 23, 2021
27d49d2
fix comment
methane Mar 23, 2021
80f4644
fix text_encoding() docstring
methane Mar 23, 2021
6ad0e7f
update what's new
methane Mar 23, 2021
73b27f1
fix doc build
methane Mar 23, 2021
c149d65
Update Doc/library/io.rst
methane Mar 24, 2021
4eb7655
Apply suggestions from code review
methane Mar 24, 2021
e3bce76
Apply suggestions from code review
methane Mar 24, 2021
c089fd7
Update Doc/library/io.rst
methane Mar 24, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions Doc/c-api/init_config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -583,6 +583,13 @@ PyConfig

Default: ``0``.

.. c:member:: int warn_default_encoding

If equals to 1, emit a ``EncodingWarning`` when ``TextIOWrapper``
used its default encoding. See :pep:`597` for detail.
Comment thread
methane marked this conversation as resolved.
Outdated
Comment thread
methane marked this conversation as resolved.
Outdated

.. versionadded:: 3.10

.. c:member:: wchar_t* check_hash_pycs_mode

Control the validation behavior of hash-based ``.pyc`` files:
Expand Down
8 changes: 8 additions & 0 deletions Doc/library/exceptions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -688,6 +688,14 @@ The following exceptions are used as warning categories; see the
Base class for warnings generated by user code.


.. exception:: EncodingWarning

Base class for warnings about encodings when those warnings are intended for
other Python developers.
Comment thread
methane marked this conversation as resolved.
Outdated

.. versionadded:: 3.10


.. exception:: DeprecationWarning

Base class for warnings about deprecated features when those warnings are
Expand Down
34 changes: 34 additions & 0 deletions Doc/library/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,32 @@ High-level Module Interface
.. versionadded:: 3.8


.. function:: text_encoding(encoding, stacklevel=1)

This is a helper function for functions that use :func:`open` or
:class:`TextIOWrapper` and take ``encoding=None`` option.

This function returns *encoding* if it is not ``None`` and "locale" if
*encoding* is ``None``.

This function emits an :class:`EncodingWarning` if
``sys.flags.warn_default_encoding`` is true. *stacklevel* specifies where
Comment thread
methane marked this conversation as resolved.
Outdated
Comment thread
methane marked this conversation as resolved.
Outdated
the warning is emit for. For example::

def read_text(path, encoding=None):
encoding = io.text_encoding(encoding) # stacklevel=1
with open(path, encoding) as f:
return f.read()

In this example, an :class:`EncodingWarning` is emit for the caller of the
``read_text()``. If *stacklevel* is greater than 1, more stack frames are
skipped.

See :envvar:`PYTHONWARNDEFAULTENCODING` and :pep:`597` for more information.

.. versionadded:: 3.10


.. exception:: BlockingIOError

This is a compatibility alias for the builtin :exc:`BlockingIOError`
Expand Down Expand Up @@ -880,6 +906,11 @@ Text I/O
encoded with. It defaults to
:func:`locale.getpreferredencoding(False) <locale.getpreferredencoding>`.

If ``sys.flags.warn_default_encoding`` is true and the default encoding
is used, this function emits an :class:`EncodingWarning`. You can suppress
the warning by using ``encoding="locale"`` option.
See :envvar:`PYTHONWARNDEFAULTENCODING` and :pep:`597` for more information.

*errors* is an optional string that specifies how encoding and decoding
errors are to be handled. Pass ``'strict'`` to raise a :exc:`ValueError`
exception if there is an encoding error (the default of ``None`` has the same
Expand Down Expand Up @@ -930,6 +961,9 @@ Text I/O
locale encoding using :func:`locale.setlocale`, use the current locale
encoding instead of the user preferred encoding.

.. versionchanged:: 3.10
*encoding* option now supports ``"locale"`` dummy encoding name.
Comment thread
methane marked this conversation as resolved.
Outdated

:class:`TextIOWrapper` provides these data attributes and methods in
addition to those from :class:`TextIOBase` and :class:`IOBase`:

Expand Down
28 changes: 28 additions & 0 deletions Doc/using/cmdline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,9 @@ Miscellaneous options
* ``-X pycache_prefix=PATH`` enables writing ``.pyc`` files to a parallel
tree rooted at the given directory instead of to the code tree. See also
:envvar:`PYTHONPYCACHEPREFIX`.
* ``-X warn_default_encoding`` issues a :class:`EncodingWarning` when
an ``encoding`` option is omitted and the default encoding is locale-specific.
See also :envvar:`PYTHONWARNDEFAULTENCODING`.

It also allows passing arbitrary values and retrieving them through the
:data:`sys._xoptions` dictionary.
Expand Down Expand Up @@ -482,6 +485,9 @@ Miscellaneous options

The ``-X showalloccount`` option has been removed.

.. versionadded:: 3.10
The ``-X warn_default_encoding`` option.

.. deprecated-removed:: 3.9 3.10
The ``-X oldparser`` option.

Expand Down Expand Up @@ -907,6 +913,28 @@ conflict.

.. versionadded:: 3.7

.. envvar:: PYTHONWARNDEFAULTENCODING

If this environment variable is set to a non-empty string, issue a
:class:`EncodingWarning` when an ``encoding`` option is omitted and
the default encoding is locale-specific.

This option can be used to find bugs caused by not passing
``encoding="utf8"`` option. For example::

# This code may cause UnicodeDecodeError on Windows.
# encoding="utf8" or "b" mode must be used.
with open(path) as f:
data = json.load(f)

``encoding="locale"`` option can be used to specify locale-specific
encoding explicitly since Python 3.10. Python won't issue a
:class:`EncodingWarning` for it.
Comment thread
methane marked this conversation as resolved.
Outdated

See :pep:`597` for detail.

.. versionadded:: 3.10


Debug-mode variables
~~~~~~~~~~~~~~~~~~~~
Expand Down
1 change: 1 addition & 0 deletions Include/cpython/initconfig.h
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ typedef struct PyConfig {
PyWideStringList warnoptions;
int site_import;
int bytes_warning;
int warn_default_encoding;
int inspect;
int interactive;
int optimization_level;
Expand Down
4 changes: 3 additions & 1 deletion Include/internal/pycore_initconfig.h
Original file line number Diff line number Diff line change
Expand Up @@ -102,13 +102,15 @@ typedef struct {
int isolated; /* -I option */
int use_environment; /* -E option */
int dev_mode; /* -X dev and PYTHONDEVMODE */
int warn_default_encoding; /* -X warn_default_encoding and PYTHONWARNDEFAULTENCODING */
} _PyPreCmdline;

#define _PyPreCmdline_INIT \
(_PyPreCmdline){ \
.use_environment = -1, \
.isolated = -1, \
.dev_mode = -1}
.dev_mode = -1, \
.warn_default_encoding = -1}
Comment thread
methane marked this conversation as resolved.
Outdated
/* Note: _PyPreCmdline_INIT sets other fields to 0/NULL */

extern void _PyPreCmdline_Clear(_PyPreCmdline *cmdline);
Expand Down
1 change: 1 addition & 0 deletions Include/pyerrors.h
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ PyAPI_DATA(PyObject *) PyExc_FutureWarning;
PyAPI_DATA(PyObject *) PyExc_ImportWarning;
PyAPI_DATA(PyObject *) PyExc_UnicodeWarning;
PyAPI_DATA(PyObject *) PyExc_BytesWarning;
PyAPI_DATA(PyObject *) PyExc_EncodingWarning;
PyAPI_DATA(PyObject *) PyExc_ResourceWarning;


Expand Down
46 changes: 36 additions & 10 deletions Lib/_pyio.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,28 @@
_CHECK_ERRORS = _IOBASE_EMITS_UNRAISABLE


def text_encoding(encoding, stacklevel=1):
"""
Helper function to choose the text encoding.

When encoding is not None, just return it.
Otherwise, return the default text encoding (i.e. "locale").

This function emits EncodingWarning if *encoding* is None and
sys.flags.warn_default_encoding is true.

This function can be used in APIs having encoding=None option.
But please consider encoding="utf-8" for new APIs.
Comment thread
methane marked this conversation as resolved.
Outdated
"""
if encoding is None:
if sys.flags.warn_default_encoding:
Comment thread
methane marked this conversation as resolved.
import warnings
warnings.warn("'encoding' option is not specified.",
EncodingWarning, stacklevel + 2)
Comment thread
methane marked this conversation as resolved.
Outdated
encoding = "locale"
return encoding


def open(file, mode="r", buffering=-1, encoding=None, errors=None,
newline=None, closefd=True, opener=None):

Expand Down Expand Up @@ -248,6 +270,7 @@ def open(file, mode="r", buffering=-1, encoding=None, errors=None,
result = buffer
if binary:
return result
encoding = text_encoding(encoding)
text = TextIOWrapper(buffer, encoding, errors, newline, line_buffering)
result = text
text.mode = mode
Expand Down Expand Up @@ -2004,19 +2027,22 @@ class TextIOWrapper(TextIOBase):
def __init__(self, buffer, encoding=None, errors=None, newline=None,
line_buffering=False, write_through=False):
self._check_newline(newline)
if encoding is None:
encoding = text_encoding(encoding)

if encoding == "locale":
try:
encoding = os.device_encoding(buffer.fileno())
encoding = os.device_encoding(buffer.fileno()) or "locale"
except (AttributeError, UnsupportedOperation):
pass
if encoding is None:
try:
import locale
except ImportError:
# Importing locale may fail if Python is being built
encoding = "ascii"
else:
encoding = locale.getpreferredencoding(False)

if encoding == "locale":
try:
import locale
except ImportError:
# Importing locale may fail if Python is being built
encoding = "utf-8"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw what you did there! :-D Mention it in the final commit message (I didn't read your 24 commit messages, GitHub UI isn't convenient for that :-( ).

else:
encoding = locale.getpreferredencoding(False)
Comment thread
methane marked this conversation as resolved.

if not isinstance(encoding, str):
raise ValueError("invalid encoding: %r" % encoding)
Expand Down
1 change: 1 addition & 0 deletions Lib/bz2.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,7 @@ def open(filename, mode="rb", compresslevel=9,
binary_file = BZ2File(filename, bz_mode, compresslevel=compresslevel)

if "t" in mode:
encoding = io.text_encoding(encoding)
return io.TextIOWrapper(binary_file, encoding, errors, newline)
else:
return binary_file
Expand Down
1 change: 1 addition & 0 deletions Lib/configparser.py
Original file line number Diff line number Diff line change
Expand Up @@ -690,6 +690,7 @@ def read(self, filenames, encoding=None):
"""
if isinstance(filenames, (str, bytes, os.PathLike)):
filenames = [filenames]
encoding = io.text_encoding(encoding)
read_ok = []
for filename in filenames:
try:
Expand Down
1 change: 1 addition & 0 deletions Lib/gzip.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ def open(filename, mode="rb", compresslevel=_COMPRESS_LEVEL_BEST,
raise TypeError("filename must be a str or bytes object, or a file")

if "t" in mode:
encoding = io.text_encoding(encoding)
return io.TextIOWrapper(binary_file, encoding, errors, newline)
else:
return binary_file
Expand Down
2 changes: 1 addition & 1 deletion Lib/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
from _io import (DEFAULT_BUFFER_SIZE, BlockingIOError, UnsupportedOperation,
open, open_code, FileIO, BytesIO, StringIO, BufferedReader,
BufferedWriter, BufferedRWPair, BufferedRandom,
IncrementalNewlineDecoder, TextIOWrapper)
IncrementalNewlineDecoder, text_encoding, TextIOWrapper)

OpenWrapper = _io.open # for compatibility with _pyio

Expand Down
1 change: 1 addition & 0 deletions Lib/lzma.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,6 +302,7 @@ def open(filename, mode="rb", *,
preset=preset, filters=filters)

if "t" in mode:
encoding = io.text_encoding(encoding)
return io.TextIOWrapper(binary_file, encoding, errors, newline)
else:
return binary_file
Expand Down
4 changes: 4 additions & 0 deletions Lib/pathlib.py
Original file line number Diff line number Diff line change
Expand Up @@ -1241,6 +1241,8 @@ def open(self, mode='r', buffering=-1, encoding=None,
Open the file pointed by this path and return a file object, as
the built-in open() function does.
"""
if "b" not in mode:
encoding = io.text_encoding(encoding)
return io.open(self, mode, buffering, encoding, errors, newline,
opener=self._opener)

Expand All @@ -1255,6 +1257,7 @@ def read_text(self, encoding=None, errors=None):
"""
Open the file in text mode, read it, and close the file.
"""
encoding = io.text_encoding(encoding)
with self.open(mode='r', encoding=encoding, errors=errors) as f:
return f.read()

Expand All @@ -1274,6 +1277,7 @@ def write_text(self, data, encoding=None, errors=None, newline=None):
if not isinstance(data, str):
raise TypeError('data must be str, not %s' %
data.__class__.__name__)
encoding = io.text_encoding(encoding)
with self.open(mode='w', encoding=encoding, errors=errors, newline=newline) as f:
return f.write(data)

Expand Down
4 changes: 3 additions & 1 deletion Lib/site.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,9 @@ def addpackage(sitedir, name, known_paths):
fullname = os.path.join(sitedir, name)
_trace(f"Processing .pth file: {fullname!r}")
try:
f = io.TextIOWrapper(io.open_code(fullname))
# locale encoding is not ideal especially on Windows. But we have used
# it for a long time. setuptools uses the locale encoding too.
f = io.TextIOWrapper(io.open_code(fullname), encoding="locale")
except OSError:
return
with f:
Expand Down
7 changes: 7 additions & 0 deletions Lib/tempfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -543,6 +543,9 @@ def NamedTemporaryFile(mode='w+b', buffering=-1, encoding=None,
if _os.name == 'nt' and delete:
flags |= _os.O_TEMPORARY

if "b" not in mode:
encoding = _io.text_encoding(encoding)

(fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type)
try:
file = _io.open(fd, mode, buffering=buffering,
Expand Down Expand Up @@ -583,6 +586,9 @@ def TemporaryFile(mode='w+b', buffering=-1, encoding=None,
"""
global _O_TMPFILE_WORKS

if "b" not in mode:
encoding = _io.text_encoding(encoding)

prefix, suffix, dir, output_type = _sanitize_params(prefix, suffix, dir)

flags = _bin_openflags
Expand Down Expand Up @@ -638,6 +644,7 @@ def __init__(self, max_size=0, mode='w+b', buffering=-1,
if 'b' in mode:
self._file = _io.BytesIO()
else:
encoding = _io.text_encoding(encoding)
self._file = _io.TextIOWrapper(_io.BytesIO(),
encoding=encoding, errors=errors,
newline=newline)
Expand Down
1 change: 1 addition & 0 deletions Lib/test/exception_hierarchy.txt
Original file line number Diff line number Diff line change
Expand Up @@ -61,4 +61,5 @@ BaseException
+-- ImportWarning
+-- UnicodeWarning
+-- BytesWarning
+-- EncodingWarning
+-- ResourceWarning
1 change: 1 addition & 0 deletions Lib/test/test_embed.py
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,7 @@ class InitConfigTests(EmbeddingTestsMixin, unittest.TestCase):

'site_import': 1,
'bytes_warning': 0,
'warn_default_encoding': 0,
'inspect': 0,
'interactive': 0,
'optimization_level': 0,
Expand Down
23 changes: 23 additions & 0 deletions Lib/test/test_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -4249,6 +4249,29 @@ def test_check_encoding_errors(self):
proc = assert_python_failure('-X', 'dev', '-c', code)
self.assertEqual(proc.rc, 10, proc)

def test_check_encoding_warning(self):
# PEP 597: Raise warning when encoding is not specified
# and dev mode is enabled.
Comment thread
methane marked this conversation as resolved.
Outdated
mod = self.io.__name__
filename = __file__
code = textwrap.dedent(f'''\
import sys
from {mod} import open, TextIOWrapper
import pathlib

with open({filename!r}) as f: # line 5
pass

pathlib.Path({filename!r}).read_text() # line 8
''')
proc = assert_python_ok('-X', 'warn_default_encoding', '-c', code)
warnings = proc.err.splitlines()
self.assertEqual(len(warnings), 2)
self.assertTrue(
warnings[0].startswith(b"<string>:5: EncodingWarning: "))
self.assertTrue(
warnings[1].startswith(b"<string>:8: EncodingWarning: "))


class CMiscIOTest(MiscIOTest):
io = io
Expand Down
3 changes: 2 additions & 1 deletion Lib/test/test_pickle.py
Original file line number Diff line number Diff line change
Expand Up @@ -483,7 +483,8 @@ def test_exceptions(self):
if exc in (BlockingIOError,
ResourceWarning,
StopAsyncIteration,
RecursionError):
RecursionError,
EncodingWarning):
continue
if exc is not OSError and issubclass(exc, OSError):
self.assertEqual(reverse_mapping('builtins', name),
Expand Down
Loading