chore(proxy): strict media-type match for form bodies#27939
Conversation
``_read_request_body`` and ``get_request_body`` routed on
``"form" in content_type`` / ``"multipart/form-data" in content_type``,
which match any header containing the literal — ``application/form-json``,
``multiform/anything``, ``application/json; xform=1``. Starlette's
``request.form()`` returns an empty ``FormData`` for any non-canonical
type without consuming the body, so the auth-time pre-read saw ``{}``
and skipped the banned-param check while the handler's later
``request.body()`` saw the original JSON payload.
Parse the media type per RFC 7231 (substring before ``;``, trimmed,
lowercased) and accept only ``application/x-www-form-urlencoded`` and
``multipart/form-data``. Replace both substring sites with the shared
``_is_form_content_type`` helper.
Tests pin: case/whitespace/charset variants of the two real types
match; ``application/form-json`` and similar substring-match traps
fall through to the JSON parse path; real form POSTs continue to
route through ``request.form()``.
Greptile SummaryThis PR fixes a content-type bypass in the proxy's request-body parsing that allowed an attacker to use a non-canonical
Confidence Score: 5/5The change is a targeted, well-tested tightening of content-type matching in the auth-time body pre-read; no regressions were identified. Both changed sites are covered by new parametrized tests that include the specific bypass vectors described in the PR. The helpers are small, pure functions with no external side effects, and the behaviour change is strictly narrowing. The new 400 path on malformed form data is also tested. No unguarded code paths were found. No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/proxy/common_utils/http_parsing_utils.py | Adds _normalize_media_type, _is_form_content_type, and _is_json_content_type helpers; replaces unsafe substring matching with strict RFC 7231 media-type checks; surfaces malformed form payloads as HTTP 400 instead of silently returning {}. |
| tests/test_litellm/proxy/common_utils/test_http_parsing_utils.py | Adds parametrized tests for form-type matching, non-canonical content-type fall-through, form-parse failure → 400, and get_request_body routing; all tests use mocks with no real network calls. |
Reviews (3): Last reviewed commit: "chore(proxy): drop redundant _is_json_co..." | Re-trigger Greptile
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Mirror ``_is_form_content_type`` for the JSON branch of ``get_request_body`` so both classifications share the same media-type normalisation (strip params, trim, lowercase) and any future change to the parsing rules has one place to update. Adds tests for ``_is_json_content_type`` and for ``get_request_body`` covering the canonical JSON / form / unsupported / non-POST paths.
Starlette's ``request.form()`` raises ``MultiPartException`` /
``ValueError`` / ``AssertionError`` on malformed multipart input
(missing boundary, malformed chunk encoding, etc.). The outer
``except Exception: return {}`` swallowed every form-parse failure
and cached an empty parsed body — auth-time pre-reads saw ``{}`` and
skipped every banned-param check while a later raw-body re-read in
the handler still saw the original payload. Same TOCTOU shape as the
substring-match bypass: the auth gate and the handler don't agree on
what the body is.
Wrap ``request.form()`` in a narrow ``try`` that converts any parse
failure to a 400 ``ProxyException``. The outer broad ``except`` is
retained for unrelated unexpected errors but no longer covers
form-parse-side bypass shapes.
Adds a regression test parametrised over the exception classes
Starlette can raise from ``request.form()``.
|
🤖 litellm-agent: This PR is currently BLOCKED from merge. Score: 3/5 ❌ Why blocked:
Details: Score docked for: 1 PR-related CI failure (Size gate: tests (+173) exceed code (+50) by more than 3× — over-specified or feature too thin. Add the Fix the issues above and push an update — the bot will re-review automatically.
|
``_is_json_content_type`` is a 3-line wrapper around the shared ``_normalize_media_type`` helper. Positive coverage lives in ``TestGetRequestBody.test_json_with_charset_param_parses_as_json``; negative coverage is covered transitively by ``TestIsFormContentType``'s non-form parametrize matrix (anything that isn't a form type falls through to the JSON branch).
* chore(proxy): strict media-type match for form bodies (#27939) * chore(proxy): strict media-type match for form bodies ``_read_request_body`` and ``get_request_body`` routed on ``"form" in content_type`` / ``"multipart/form-data" in content_type``, which match any header containing the literal — ``application/form-json``, ``multiform/anything``, ``application/json; xform=1``. Starlette's ``request.form()`` returns an empty ``FormData`` for any non-canonical type without consuming the body, so the auth-time pre-read saw ``{}`` and skipped the banned-param check while the handler's later ``request.body()`` saw the original JSON payload. Parse the media type per RFC 7231 (substring before ``;``, trimmed, lowercased) and accept only ``application/x-www-form-urlencoded`` and ``multipart/form-data``. Replace both substring sites with the shared ``_is_form_content_type`` helper. Tests pin: case/whitespace/charset variants of the two real types match; ``application/form-json`` and similar substring-match traps fall through to the JSON parse path; real form POSTs continue to route through ``request.form()``. * chore(proxy): extract _is_json_content_type symmetric helper Mirror ``_is_form_content_type`` for the JSON branch of ``get_request_body`` so both classifications share the same media-type normalisation (strip params, trim, lowercase) and any future change to the parsing rules has one place to update. Adds tests for ``_is_json_content_type`` and for ``get_request_body`` covering the canonical JSON / form / unsupported / non-POST paths. * chore(proxy): surface form-parse failures instead of caching empty body Starlette's ``request.form()`` raises ``MultiPartException`` / ``ValueError`` / ``AssertionError`` on malformed multipart input (missing boundary, malformed chunk encoding, etc.). The outer ``except Exception: return {}`` swallowed every form-parse failure and cached an empty parsed body — auth-time pre-reads saw ``{}`` and skipped every banned-param check while a later raw-body re-read in the handler still saw the original payload. Same TOCTOU shape as the substring-match bypass: the auth gate and the handler don't agree on what the body is. Wrap ``request.form()`` in a narrow ``try`` that converts any parse failure to a 400 ``ProxyException``. The outer broad ``except`` is retained for unrelated unexpected errors but no longer covers form-parse-side bypass shapes. Adds a regression test parametrised over the exception classes Starlette can raise from ``request.form()``. * chore(proxy): drop redundant _is_json_content_type test class ``_is_json_content_type`` is a 3-line wrapper around the shared ``_normalize_media_type`` helper. Positive coverage lives in ``TestGetRequestBody.test_json_with_charset_param_parses_as_json``; negative coverage is covered transitively by ``TestIsFormContentType``'s non-form parametrize matrix (anything that isn't a form type falls through to the JSON branch). * chore(proxy): carry ASGI path into WebSocket auth synthetic Request (#27940) ``user_api_key_auth_websocket`` built a synthetic ``Request`` with a two-key scope (``type`` + ``headers``) and set ``request._url = websocket.url``. ``get_request_route`` reads ``scope.get("path", ...)`` and falls back to ``request.url.path`` only when ``path`` is absent. For the WebSocket flow that fallback fires and resolves to the Host-header-derived value (Starlette reconstructs ``websocket.url`` from the Host header), so a malformed Host collapses the resolved route and lets the auth gate compare against the wrong value. Carry the ASGI scope's ``path``, ``root_path``, and ``app_root_path`` into the synthetic scope so the lookup never reaches the fallback on the legitimate path. Regression test pins that the request handed to ``user_api_key_auth`` has ``scope["path"]`` equal to the ASGI scope's path. --------- Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>
* chore(proxy): strict media-type match for form bodies (BerriAI#27939) * chore(proxy): strict media-type match for form bodies ``_read_request_body`` and ``get_request_body`` routed on ``"form" in content_type`` / ``"multipart/form-data" in content_type``, which match any header containing the literal — ``application/form-json``, ``multiform/anything``, ``application/json; xform=1``. Starlette's ``request.form()`` returns an empty ``FormData`` for any non-canonical type without consuming the body, so the auth-time pre-read saw ``{}`` and skipped the banned-param check while the handler's later ``request.body()`` saw the original JSON payload. Parse the media type per RFC 7231 (substring before ``;``, trimmed, lowercased) and accept only ``application/x-www-form-urlencoded`` and ``multipart/form-data``. Replace both substring sites with the shared ``_is_form_content_type`` helper. Tests pin: case/whitespace/charset variants of the two real types match; ``application/form-json`` and similar substring-match traps fall through to the JSON parse path; real form POSTs continue to route through ``request.form()``. * chore(proxy): extract _is_json_content_type symmetric helper Mirror ``_is_form_content_type`` for the JSON branch of ``get_request_body`` so both classifications share the same media-type normalisation (strip params, trim, lowercase) and any future change to the parsing rules has one place to update. Adds tests for ``_is_json_content_type`` and for ``get_request_body`` covering the canonical JSON / form / unsupported / non-POST paths. * chore(proxy): surface form-parse failures instead of caching empty body Starlette's ``request.form()`` raises ``MultiPartException`` / ``ValueError`` / ``AssertionError`` on malformed multipart input (missing boundary, malformed chunk encoding, etc.). The outer ``except Exception: return {}`` swallowed every form-parse failure and cached an empty parsed body — auth-time pre-reads saw ``{}`` and skipped every banned-param check while a later raw-body re-read in the handler still saw the original payload. Same TOCTOU shape as the substring-match bypass: the auth gate and the handler don't agree on what the body is. Wrap ``request.form()`` in a narrow ``try`` that converts any parse failure to a 400 ``ProxyException``. The outer broad ``except`` is retained for unrelated unexpected errors but no longer covers form-parse-side bypass shapes. Adds a regression test parametrised over the exception classes Starlette can raise from ``request.form()``. * chore(proxy): drop redundant _is_json_content_type test class ``_is_json_content_type`` is a 3-line wrapper around the shared ``_normalize_media_type`` helper. Positive coverage lives in ``TestGetRequestBody.test_json_with_charset_param_parses_as_json``; negative coverage is covered transitively by ``TestIsFormContentType``'s non-form parametrize matrix (anything that isn't a form type falls through to the JSON branch). * chore(proxy): carry ASGI path into WebSocket auth synthetic Request (BerriAI#27940) ``user_api_key_auth_websocket`` built a synthetic ``Request`` with a two-key scope (``type`` + ``headers``) and set ``request._url = websocket.url``. ``get_request_route`` reads ``scope.get("path", ...)`` and falls back to ``request.url.path`` only when ``path`` is absent. For the WebSocket flow that fallback fires and resolves to the Host-header-derived value (Starlette reconstructs ``websocket.url`` from the Host header), so a malformed Host collapses the resolved route and lets the auth gate compare against the wrong value. Carry the ASGI scope's ``path``, ``root_path``, and ``app_root_path`` into the synthetic scope so the lookup never reaches the fallback on the legitimate path. Regression test pins that the request handed to ``user_api_key_auth`` has ``scope["path"]`` equal to the ASGI scope's path. --------- Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>
Summary
`_read_request_body` and `get_request_body` routed on `"form" in content_type` /
`"multipart/form-data" in content_type`, which match any header containing the
literal — `application/form-json`, `multiform/anything`, `application/json; xform=1`.
Starlette's `request.form()` returns an empty `FormData` for any non-canonical
type without consuming the body, so the auth-time pre-read saw `{}` and skipped
the banned-param check while the handler's later `request.body()` saw the
original JSON payload.
Parse the media type per RFC 7231 (substring before `;`, trimmed, lowercased)
and accept only `application/x-www-form-urlencoded` and `multipart/form-data`.
Replace both substring sites with the shared `_is_form_content_type` helper.
Test plan
Type
🐛 Bug Fix