beautifulsoup4 4.13 introduces a breaking change in the text processing module at /src/commoncode/text.py (Link), see #4129
as_unicode(s) returns bytes instead of str starting with 4.13, which in turn breaks is_markup(location)/is_markup_text(text) in scancode here.
From the Changelog:
- UnicodeDammit.markup is now always a bytestring representing the
original markup (sans BOM), and UnicodeDammit.unicode_markup is
always the converted Unicode equivalent of the original
markup. Previously, UnicodeDammit.markup was treated inconsistently
and would often end up containing Unicode. UnicodeDammit.markup was
not a documented attribute, but if you were using it, you probably
want to switch to using .unicode_markup instead.
If UnicodeDammit(s).unicode_markup is used here instead of UnicodeDammit(s).markup, a unicode string is returned:
Originally posted by @watschi in #4129
beautifulsoup4 4.13 introduces a breaking change in the text processing module at /src/commoncode/text.py (Link), see #4129
as_unicode(s)returnsbytesinstead ofstrstarting with 4.13, which in turn breaksis_markup(location)/is_markup_text(text)in scancode here.Originally posted by @watschi in #4129