Skip to content

Expand std::string_view support to str, bytes, memoryview#3521

Merged
jagerman merged 7 commits into
pybind:masterfrom
jagerman:more-string-view
Dec 3, 2021
Merged

Expand std::string_view support to str, bytes, memoryview#3521
jagerman merged 7 commits into
pybind:masterfrom
jagerman:more-string-view

Conversation

@jagerman

@jagerman jagerman commented Dec 1, 2021

Copy link
Copy Markdown
Member

Description

  1. Allows constructing a str or bytes implicitly from a string_view; this is essentially a small shortcut allowing a caller to write py::bytes{sv} rather than py::bytes{sv.data(), sv.size()}.

    For py::str this also allows std::u8string_view, but not for py::bytes because that didn't seem entirely appropriate to me.

  2. Allows implicit conversion to std::string_view from py::bytes—this plugs a current hole where there's no simple way to get such a view of the bytes without copying it (or resorting to Python API calls).

    (This is not done for str because when the str contains unicode we have to allocate to a temporary and so there might not be some string data we can properly view without owning.)

  3. Allows memoryview::from_memory to accept a string_view. As with the other from_memory calls, it's entirely your responsibility to keep it alive.

This also required moving the string_view availability detection into detail/common.h because this PR needs it in pytypes.h, which is higher up the include chain than cast.h where it was being detected currently.

Suggested changelog entry:

* Make str/bytes/memoryview more interoperable with ``std::string_view``.

Comment thread include/pybind11/pytypes.h Outdated

@rwgk rwgk left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment thread include/pybind11/pytypes.h
@rwgk

rwgk commented Dec 1, 2021

Copy link
Copy Markdown
Collaborator

The 2.7 ubuntu failure probably just needs a leading u (u"i'm a string", a trick we use in many places).

I've never seen the ITERATOR_DEBUG_LEVEL failure before.

Comment thread include/pybind11/pytypes.h
Comment thread tests/test_builtin_casters.py Outdated
Comment thread include/pybind11/detail/common.h Outdated
Comment thread include/pybind11/pytypes.h
Comment thread include/pybind11/pytypes.h
@jagerman jagerman force-pushed the more-string-view branch 2 times, most recently from 0bf48b2 to 0c7d446 Compare December 2, 2021 13:35
@jagerman

jagerman commented Dec 2, 2021

Copy link
Copy Markdown
Member Author

Squashed it.

Comment thread include/pybind11/detail/common.h Outdated
@Skylion007 Skylion007 requested a review from henryiii December 2, 2021 15:28
@Skylion007

Copy link
Copy Markdown
Collaborator

@rwgk Does this pass the Google Test suite?

@jagerman

jagerman commented Dec 2, 2021

Copy link
Copy Markdown
Member Author

I don't understand why the 3.6 - windows-latest build here started failing. Is that a known flakey build or something?

@rwgk

rwgk commented Dec 2, 2021

Copy link
Copy Markdown
Collaborator

I don't understand why the 3.6 - windows-latest build here started failing. Is that a known flakey build or something?

Yes, that's our most common known flake, safe to ignore. It's a lot better than it used to be before PR #2995, but it's still not stable.

There are other common flakes: 1. print from destructor; 2. various transient issues installing or downloading dependencies.

@rwgk

rwgk commented Dec 2, 2021

Copy link
Copy Markdown
Collaborator

@rwgk Does this pass the Google Test suite?

I'll initiate this now (will take several hours).

@henryiii henryiii left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy when Google's happy. :)

@rwgk

rwgk commented Dec 2, 2021

Copy link
Copy Markdown
Collaborator

Happy when Google's happy. :)

I missed the 12:00 global testing opportunity, but got this PR into the 16:00 batch. Results are expected about 4-5 hours later.

I did comprehensive manual testing before sending this PR for global testing, including testing with ASAN. No issues.

@rwgk

rwgk commented Dec 3, 2021

Copy link
Copy Markdown
Collaborator

This PR didn't make it through the basic "smoke check" :-(
Below is one breakage. I still have to try to understand myself.
IIRC we recently had some trouble with tensorflow::tstring already, I forget what exactly ...

EDIT: What I had in mind was a PyCLIF issue, not pybind11: google/clif@ef5fa11
(Not sure what we can learn from it. Just mentioning for completeness.)

EDIT: This is the tensorflow::tstring code that causes the issue: https://github.com/tensorflow/tensorflow/blob/289850292c2b9761055415b91c2a6c0f924780f0/tensorflow/core/platform/tstring.h#L142

Error message:

third_party/tensorflow/python/lib/io/file_io_wrapper.cc:316:21: error: ambiguous conversion for functional-style cast from 'tensorflow::tstring' to 'py::bytes'
             return py::bytes(result);
                    ^~~~~~~~~~~~~~~~
./third_party/pybind11/include/pybind11/detail/../pytypes.h:1174:5: note: candidate constructor
    bytes(const std::string &s) : bytes(s.data(), s.size()) { }
    ^
./third_party/pybind11/include/pybind11/detail/../pytypes.h:1189:5: note: candidate constructor
    bytes(std::string_view s) : bytes(s.data(), s.size()) { }
    ^
1 error generated.

While compiling:

      .def("read",
           [](BufferedInputStream* self, int64_t bytes_to_read) {
             py::gil_scoped_release release;
             tensorflow::tstring result;
             const auto status = self->ReadNBytes(bytes_to_read, &result);
             if (!status.ok() && !tensorflow::errors::IsOutOfRange(status)) {
               result.clear();
               tensorflow::MaybeRaiseRegisteredFromStatusWithGIL(status);
             }
             py::gil_scoped_acquire acquire;
             return py::bytes(result);
           })

jagerman and others added 3 commits December 2, 2021 17:19
1. Allows constructing a str or bytes implicitly from a string_view;
   this is essentially a small shortcut allowing a caller to write
   `py::bytes{sv}` rather than `py::bytes{sv.data(), sv.size()}`.

2. Allows implicit conversion *to* string_view from py::bytes -- this
   saves a fair bit more as currently there is no simple way to get such
   a view of the bytes without copying it (or resorting to Python API
   calls).

   (This is not done for `str` because when the str contains unicode we
   have to allocate to a temporary and so there might not be some string
   data we can properly view without owning.)

3. Allows `memoryview::from_memory` to accept a string_view.  As with
   the other from_memory calls, it's entirely your responsibility to
   keep it alive.

This also required moving the string_view availability detection into
detail/common.h because this PR needs it in pytypes.h, which is higher
up the include chain than cast.h where it was being detected currently.
This change is known to fix the `tensorflow::tstring` issue reported under pybind#3521 (comment)

TODO: Minimal reproducer for the `tensorflow::tstring` issue.
@rwgk rwgk force-pushed the more-string-view branch from c207063 to add6628 Compare December 3, 2021 01:19
@rwgk

rwgk commented Dec 3, 2021

Copy link
Copy Markdown
Collaborator

Trying a fix. I don't know if this is the best approach, but at least it fixes the failures in the wild (at least some; I still have to try the global testing again).

The force push was needed because I also rebased on master.

rwgk added 3 commits December 2, 2021 17:28
Error without the enable_if trick:

```
/usr/local/google/home/rwgk/forked/pybind11/tests/test_builtin_casters.cpp:169:16: error: ambiguous conversion for functional-style cast from 'TypeWithBothOperatorStringAndStringView' to 'py::bytes'
        return py::bytes(TypeWithBothOperatorStringAndStringView());
               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/local/google/home/rwgk/forked/pybind11/include/pybind11/detail/../pytypes.h:1174:5: note: candidate constructor
    bytes(const std::string &s) : bytes(s.data(), s.size()) { }
    ^
/usr/local/google/home/rwgk/forked/pybind11/include/pybind11/detail/../pytypes.h:1191:5: note: candidate constructor
    bytes(std::string_view s) : bytes(s.data(), s.size()) { }
    ^
```
@rwgk

rwgk commented Dec 3, 2021

Copy link
Copy Markdown
Collaborator

I think this will work now, pending full global testing; just initiated. Results when I wake up (hopefully).

I already verified that all failing targets from the previous sample run pass with the latest version of this PR.

@jagerman

jagerman commented Dec 3, 2021

Copy link
Copy Markdown
Member Author

I think this will work now, pending full global testing; just initiated. Results when I wake up (hopefully).

str() will have the same issue; I've pushed an update to apply the workaround there, too.

@jagerman

jagerman commented Dec 3, 2021

Copy link
Copy Markdown
Member Author

I'd kind of prefer that, in the case of ambiguous conversions, we were preferring string_view to string (because it likely avoids an extra allocation), but I don't really see a nice way to do that without introducing potential breakage for existing code.

@rwgk

rwgk commented Dec 3, 2021

Copy link
Copy Markdown
Collaborator

I'd kind of prefer that, in the case of ambiguous conversions, we were preferring string_view to string (because it likely avoids an extra allocation), but I don't really see a nice way to do that without introducing potential breakage for existing code.

Initially I tried #ifdefing out the std::string constructor, which also fixed the tensorflow::tstring issue in the wild, but broke existing unit tests. With a pushing-and-shoving mentality I tried to rescue it, but quickly got into a mess and gave up.

@rwgk

rwgk commented Dec 3, 2021

Copy link
Copy Markdown
Collaborator

The one CI failure is a flake (Run jwlawson/actions-setup-cmake@v1.11 Error: connect ETIMEDOUT 140.82.114.6:443).

Google-global testing was successful (it did not run into the str issue fixed here by 5709ccf in the meantime).

Definitely good to merge Jason!

@jagerman jagerman merged commit b4939fc into pybind:master Dec 3, 2021
@github-actions github-actions Bot added the needs changelog Possibly needs a changelog entry label Dec 3, 2021
@henryiii henryiii removed the needs changelog Possibly needs a changelog entry label Dec 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants