Skip to content

HTML5 Named character references missing? #502

@jtconsol

Description

@jtconsol

Hi guys, first off let me thank you for all the work, especially on the new release - Splendid! :)

Coincidentally, I was revisiting the XSS filter in our application, which makes use of

ESAPI.encoder().canonicalize(...)

Then I stumbled upon the following XSS attack vector:

<a href="j&Tab;a&Tab;v&Tab;asc&NewLine;ri&Tab;pt&colon;&lpar;a&Tab;l&Tab;e&Tab;r&Tab;t&Tab;(document.domain)&rpar;">X</a>

Didn't even know &Tab; or &NewLine; . So I checked the HTML5 spec (here and here - also, here's a more visually pleasing overview) and they seem to agree on having these named character references.

In contrast, HTMLEntityCodec.decode... currently delivers:

<a href="j&Tab;a&Tab;v&Tab;asc≠wLine;ri&Tab;pt&colon;&lpar;a&Tab;l&Tab;e&Tab;r&Tab;t&Tab;(document.domain)&rpar;">X</a>

This seems wrong. Also, shouldn't all named character references be unescaped?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions