Add support for byte and unicode Literal strings#6087
Conversation
This pull request adds support for byte and unicode Literal strings.
I left in some comments explaining some nuances of the implementation;
here are a few additional meta-notes:
1. I reworded several of the comments suggesting that the way we
represent bytes as a string is a "hack" or that we should eventually
switch to representing bytes as literally bytes.
I started with that approach but ultimately rejected it: I ended
up having to constantly serialize/deserialize between bytes and
strings, which I felt complicated the code.
As a result, I decided that the solution we had previously is in
fact, from a high-level perspective, the best possible approach.
(The actual code for translating the output of `typed_ast` into a
human-readable string *is* admittedly a bit hacky though.)
In any case, the phrase "how mypy currently parses the contents of bytes
literals" is severely out-of-date anyways. That comment was added
about 3 years ago, when we were adding the fast parser for the first
time and running it concurrently with the actual parser.
2. I removed the `is_stub` field from `fastparse2.ASTConverter`: it turned
out we were just never using that field.
3. One complication I ran into was figuring out how to handle forward
references to literal strings. For example, suppose we have the type
`List["Literal['foo']"]`. Do we treat this as being equivalent to
`List[Literal[u'foo']]` or `List[Literal[b'foo']]`?
If this is a Python 3 file or a Python 2 file with
`unicode_literals`, we'd want to pick the former. If this is a
standard Python 2 file, we'd want to pick the latter.
In order to make this happen, I decided to use a heuristic where the
type of the "outer" string decides the type of the "inner" string.
For example:
- In Python 3, `"Literal['foo']"` is a unicode string. So,
the inner `Literal['foo']` will be treated as the same as
`Literal[u'foo']`.
- The same thing happens when using Python 2 with
`unicode_literals`.
- In Python 3, it is illegal to use a byte string as a forward
reference. So, types like `List[b"Literal['foo']"]` are already
illegal.
- In standard Python 2, `"Literal['foo']"` is a byte string. So the
inner `Literal['foo']` will be treated as the same as
`Literal[u'foo']`.
4. I will add tests validating that all of this stuff works as expected
with incremental and fine-grained mode in a separate diff --
probably after fixing and landing python#6075,
which I intend to use as a baseline foundation.
|
|
||
| If 'unicode_literals' is true, we assume that `Foo["blah"]` is equivalent | ||
| to `Foo[u"blah"]` (for both Python 2 and 3). Otherwise, we assume it's | ||
| equivalent to `Foo[b"blah"]`. |
There was a problem hiding this comment.
This comment is misleading. IIUC on Python 3 Literal["blah"] is equivalent to Literal[u"blah"], but from the last sentence it looks like it is Literal[b"blah"].
Probably my confusion stems from using unicode_literals for both future import and the argument name. Maybe make them different?
mypy/nodes.py
Outdated
| """String literal""" | ||
|
|
||
| value = '' | ||
| from_python_2 = False |
| else: | ||
| return NotImplemented | ||
|
|
||
| def value_repr(self) -> str: |
…dard before the Python 3.4 release
It fouls up the tests under Python 3.4, and is immaterial to the substance of the test anyways.
|
@ilevkivskyi -- alas, I don't think this PR helps resolve any of those issues. The only real non-literal-types related change this PR makes is modifying the way mypy handles parsing "string inside strings" -- when parsing types like That said, I think we might be able to close #2536 though it seems like it's mostly a duplicate of #5098 at this point. (And also, the code snippet Jukka included at the end doesn't work because it's using function annotations inside of Python 2 code, not because of anything to do with strings or The suggestion in #3619 may be eventually superseded by python/typing#208, not sure. |
This pull request adds support for byte and unicode Literal strings. I left in some comments explaining some nuances of the implementation; here are a few additional meta-notes:
I reworded several of the comments suggesting that the way we represent bytes as a string is a hack or that we should eventually switch to representing bytes as literally bytes.
Basically, I tried experimenting with that approach but ultimately rejected it: I ended up having to constantly serialize/deserialize between bytes and strings, which I felt complicated the code.
As a result, I decided that the solution we had previously is in fact, from a high-level perspective, the best possible approach. (The actual code for translating the output of
typed_astinto a human-readable string is admittedly a bit hacky though.)In any case, the phrase "how mypy currently parses the contents of bytes literals" is severely out-of-date anyways. That comment was added about 3 years ago, when we were adding the fast parser for the first time and running it concurrently with the actual parser.
I removed the
is_stubfield fromfastparse2.ASTConverter: it turned out we were just never using that field.One complication I ran into was figuring out how to handle forward references to literal strings. For example, suppose we have the type
List["Literal['foo']"]. Do we treat this as being equivalent toList[Literal[u'foo']]orList[Literal[b'foo']]?If this is a Python 3 file or a Python 2 file with
unicode_literals, we'd want to pick the former. If this is a standard Python 2 file, we'd want to pick the latter.In order to make this happen, I decided to use a heuristic where the type of the "outer" string decides the type of the "inner" string. For example:
In Python 3,
"Literal['foo']"is a unicode string. So, the innerLiteral['foo']will be treated as the same asLiteral[u'foo'].The same thing happens when using Python 2 with
unicode_literals.In Python 3, it is illegal to use a byte string as a forward reference. So, types like
List[b"Literal['foo']"]are already illegal.In standard Python 2,
"Literal['foo']"is a byte string. So the innerLiteral['foo']will be treated as the same asLiteral[u'foo'].I'll add tests validating that all of this stuff works as expected with incremental and fine-grained mode in a separate diff -- probably after fixing and landing Add tests for Literal types with incremental and fine-grained mode #6075, which I intend to use as a baseline foundation.