Skip to content

chore(proxy): strict media-type match for form bodies#27939

Merged
yuneng-berri merged 4 commits into
BerriAI:litellm_yj_may18from
stuxf:chore/strict-content-type-parse
May 18, 2026
Merged

chore(proxy): strict media-type match for form bodies#27939
yuneng-berri merged 4 commits into
BerriAI:litellm_yj_may18from
stuxf:chore/strict-content-type-parse

Conversation

@stuxf

@stuxf stuxf commented May 14, 2026

Copy link
Copy Markdown
Collaborator

Summary

`_read_request_body` and `get_request_body` routed on `"form" in content_type` /
`"multipart/form-data" in content_type`, which match any header containing the
literal — `application/form-json`, `multiform/anything`, `application/json; xform=1`.
Starlette's `request.form()` returns an empty `FormData` for any non-canonical
type without consuming the body, so the auth-time pre-read saw `{}` and skipped
the banned-param check while the handler's later `request.body()` saw the
original JSON payload.

Parse the media type per RFC 7231 (substring before `;`, trimmed, lowercased)
and accept only `application/x-www-form-urlencoded` and `multipart/form-data`.
Replace both substring sites with the shared `_is_form_content_type` helper.

Test plan

  • `uv run pytest tests/test_litellm/proxy/common_utils/test_http_parsing_utils.py -v` — 49 pass (added parametrized matrix for form-type matching + non-canonical content-type fall-through)
  • `uv run pytest tests/proxy_unit_tests/test_multipart_bypass_repro.py -v` — 3 pass (existing canary)
  • `uv run black .`

Type

🐛 Bug Fix

``_read_request_body`` and ``get_request_body`` routed on
``"form" in content_type`` / ``"multipart/form-data" in content_type``,
which match any header containing the literal — ``application/form-json``,
``multiform/anything``, ``application/json; xform=1``. Starlette's
``request.form()`` returns an empty ``FormData`` for any non-canonical
type without consuming the body, so the auth-time pre-read saw ``{}``
and skipped the banned-param check while the handler's later
``request.body()`` saw the original JSON payload.

Parse the media type per RFC 7231 (substring before ``;``, trimmed,
lowercased) and accept only ``application/x-www-form-urlencoded`` and
``multipart/form-data``. Replace both substring sites with the shared
``_is_form_content_type`` helper.

Tests pin: case/whitespace/charset variants of the two real types
match; ``application/form-json`` and similar substring-match traps
fall through to the JSON parse path; real form POSTs continue to
route through ``request.form()``.
@greptile-apps

greptile-apps Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes a content-type bypass in the proxy's request-body parsing that allowed an attacker to use a non-canonical Content-Type (e.g., application/form-json) to make the auth-time pre-read see an empty body while the handler's later read saw the original JSON payload, defeating banned-param checks.

  • Introduces _normalize_media_type (RFC 7231 parse: strip params, trim, lowercase) and two strict helpers _is_form_content_type / _is_json_content_type that replace unsafe substring matching in _read_request_body and get_request_body.
  • Adds explicit 400 error surfacing when request.form() throws (malformed multipart, missing boundary, etc.), preventing the silent-empty-body cache inconsistency.
  • Ships a comprehensive parametrized test suite (form-type matching, non-canonical fall-through, form-parse failure → 400, get_request_body routing) with all mock-only tests.

Confidence Score: 5/5

The change is a targeted, well-tested tightening of content-type matching in the auth-time body pre-read; no regressions were identified.

Both changed sites are covered by new parametrized tests that include the specific bypass vectors described in the PR. The helpers are small, pure functions with no external side effects, and the behaviour change is strictly narrowing. The new 400 path on malformed form data is also tested. No unguarded code paths were found.

No files require special attention.

Important Files Changed

Filename Overview
litellm/proxy/common_utils/http_parsing_utils.py Adds _normalize_media_type, _is_form_content_type, and _is_json_content_type helpers; replaces unsafe substring matching with strict RFC 7231 media-type checks; surfaces malformed form payloads as HTTP 400 instead of silently returning {}.
tests/test_litellm/proxy/common_utils/test_http_parsing_utils.py Adds parametrized tests for form-type matching, non-canonical content-type fall-through, form-parse failure → 400, and get_request_body routing; all tests use mocks with no real network calls.

Reviews (3): Last reviewed commit: "chore(proxy): drop redundant _is_json_co..." | Re-trigger Greptile

Comment thread litellm/proxy/common_utils/http_parsing_utils.py Outdated
@codecov

codecov Bot commented May 14, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

stuxf added 2 commits May 14, 2026 14:52
Mirror ``_is_form_content_type`` for the JSON branch of
``get_request_body`` so both classifications share the same media-type
normalisation (strip params, trim, lowercase) and any future change
to the parsing rules has one place to update.

Adds tests for ``_is_json_content_type`` and for ``get_request_body``
covering the canonical JSON / form / unsupported / non-POST paths.
Starlette's ``request.form()`` raises ``MultiPartException`` /
``ValueError`` / ``AssertionError`` on malformed multipart input
(missing boundary, malformed chunk encoding, etc.). The outer
``except Exception: return {}`` swallowed every form-parse failure
and cached an empty parsed body — auth-time pre-reads saw ``{}`` and
skipped every banned-param check while a later raw-body re-read in
the handler still saw the original payload. Same TOCTOU shape as the
substring-match bypass: the auth gate and the handler don't agree on
what the body is.

Wrap ``request.form()`` in a narrow ``try`` that converts any parse
failure to a 400 ``ProxyException``. The outer broad ``except`` is
retained for unrelated unexpected errors but no longer covers
form-parse-side bypass shapes.

Adds a regression test parametrised over the exception classes
Starlette can raise from ``request.form()``.
@stuxf

stuxf commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

@greptileai

@oss-pr-review-agent-shin

Copy link
Copy Markdown
Contributor

🤖 litellm-agent: This PR is currently BLOCKED from merge.

Score: 3/5

Why blocked:

  • 1 PR-related CI failure (Size gate: tests (+173) exceed code (+50) by more than 3× — over-specified or feature too thin. Add the oversized-ok label if intentional.) (pr_related_failures, -2 pts)

Details: Score docked for: 1 PR-related CI failure (Size gate: tests (+173) exceed code (+50) by more than 3× — over-specified or feature too thin. Add the oversized-ok label if intentional.).

Fix the issues above and push an update — the bot will re-review automatically.

Note: This bot is still in beta and might not always work as expected. Please share any feedback via Slack.

``_is_json_content_type`` is a 3-line wrapper around the shared
``_normalize_media_type`` helper. Positive coverage lives in
``TestGetRequestBody.test_json_with_charset_param_parses_as_json``;
negative coverage is covered transitively by
``TestIsFormContentType``'s non-form parametrize matrix (anything that
isn't a form type falls through to the JSON branch).
@stuxf

stuxf commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

@greptileai

@yuneng-berri yuneng-berri changed the base branch from litellm_internal_staging to litellm_yj_may18 May 18, 2026 20:42
@yuneng-berri yuneng-berri merged commit e35affd into BerriAI:litellm_yj_may18 May 18, 2026
42 checks passed
yuneng-berri added a commit that referenced this pull request May 21, 2026
* chore(proxy): strict media-type match for form bodies (#27939)

* chore(proxy): strict media-type match for form bodies

``_read_request_body`` and ``get_request_body`` routed on
``"form" in content_type`` / ``"multipart/form-data" in content_type``,
which match any header containing the literal — ``application/form-json``,
``multiform/anything``, ``application/json; xform=1``. Starlette's
``request.form()`` returns an empty ``FormData`` for any non-canonical
type without consuming the body, so the auth-time pre-read saw ``{}``
and skipped the banned-param check while the handler's later
``request.body()`` saw the original JSON payload.

Parse the media type per RFC 7231 (substring before ``;``, trimmed,
lowercased) and accept only ``application/x-www-form-urlencoded`` and
``multipart/form-data``. Replace both substring sites with the shared
``_is_form_content_type`` helper.

Tests pin: case/whitespace/charset variants of the two real types
match; ``application/form-json`` and similar substring-match traps
fall through to the JSON parse path; real form POSTs continue to
route through ``request.form()``.

* chore(proxy): extract _is_json_content_type symmetric helper

Mirror ``_is_form_content_type`` for the JSON branch of
``get_request_body`` so both classifications share the same media-type
normalisation (strip params, trim, lowercase) and any future change
to the parsing rules has one place to update.

Adds tests for ``_is_json_content_type`` and for ``get_request_body``
covering the canonical JSON / form / unsupported / non-POST paths.

* chore(proxy): surface form-parse failures instead of caching empty body

Starlette's ``request.form()`` raises ``MultiPartException`` /
``ValueError`` / ``AssertionError`` on malformed multipart input
(missing boundary, malformed chunk encoding, etc.). The outer
``except Exception: return {}`` swallowed every form-parse failure
and cached an empty parsed body — auth-time pre-reads saw ``{}`` and
skipped every banned-param check while a later raw-body re-read in
the handler still saw the original payload. Same TOCTOU shape as the
substring-match bypass: the auth gate and the handler don't agree on
what the body is.

Wrap ``request.form()`` in a narrow ``try`` that converts any parse
failure to a 400 ``ProxyException``. The outer broad ``except`` is
retained for unrelated unexpected errors but no longer covers
form-parse-side bypass shapes.

Adds a regression test parametrised over the exception classes
Starlette can raise from ``request.form()``.

* chore(proxy): drop redundant _is_json_content_type test class

``_is_json_content_type`` is a 3-line wrapper around the shared
``_normalize_media_type`` helper. Positive coverage lives in
``TestGetRequestBody.test_json_with_charset_param_parses_as_json``;
negative coverage is covered transitively by
``TestIsFormContentType``'s non-form parametrize matrix (anything that
isn't a form type falls through to the JSON branch).

* chore(proxy): carry ASGI path into WebSocket auth synthetic Request (#27940)

``user_api_key_auth_websocket`` built a synthetic ``Request`` with a
two-key scope (``type`` + ``headers``) and set ``request._url =
websocket.url``. ``get_request_route`` reads ``scope.get("path", ...)``
and falls back to ``request.url.path`` only when ``path`` is absent.
For the WebSocket flow that fallback fires and resolves to the
Host-header-derived value (Starlette reconstructs ``websocket.url``
from the Host header), so a malformed Host collapses the resolved
route and lets the auth gate compare against the wrong value.

Carry the ASGI scope's ``path``, ``root_path``, and ``app_root_path``
into the synthetic scope so the lookup never reaches the fallback on
the legitimate path.

Regression test pins that the request handed to ``user_api_key_auth``
has ``scope["path"]`` equal to the ASGI scope's path.

---------

Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>
lorenzbaraldi pushed a commit to lorenzbaraldi/litellm that referenced this pull request May 21, 2026
* chore(proxy): strict media-type match for form bodies (BerriAI#27939)

* chore(proxy): strict media-type match for form bodies

``_read_request_body`` and ``get_request_body`` routed on
``"form" in content_type`` / ``"multipart/form-data" in content_type``,
which match any header containing the literal — ``application/form-json``,
``multiform/anything``, ``application/json; xform=1``. Starlette's
``request.form()`` returns an empty ``FormData`` for any non-canonical
type without consuming the body, so the auth-time pre-read saw ``{}``
and skipped the banned-param check while the handler's later
``request.body()`` saw the original JSON payload.

Parse the media type per RFC 7231 (substring before ``;``, trimmed,
lowercased) and accept only ``application/x-www-form-urlencoded`` and
``multipart/form-data``. Replace both substring sites with the shared
``_is_form_content_type`` helper.

Tests pin: case/whitespace/charset variants of the two real types
match; ``application/form-json`` and similar substring-match traps
fall through to the JSON parse path; real form POSTs continue to
route through ``request.form()``.

* chore(proxy): extract _is_json_content_type symmetric helper

Mirror ``_is_form_content_type`` for the JSON branch of
``get_request_body`` so both classifications share the same media-type
normalisation (strip params, trim, lowercase) and any future change
to the parsing rules has one place to update.

Adds tests for ``_is_json_content_type`` and for ``get_request_body``
covering the canonical JSON / form / unsupported / non-POST paths.

* chore(proxy): surface form-parse failures instead of caching empty body

Starlette's ``request.form()`` raises ``MultiPartException`` /
``ValueError`` / ``AssertionError`` on malformed multipart input
(missing boundary, malformed chunk encoding, etc.). The outer
``except Exception: return {}`` swallowed every form-parse failure
and cached an empty parsed body — auth-time pre-reads saw ``{}`` and
skipped every banned-param check while a later raw-body re-read in
the handler still saw the original payload. Same TOCTOU shape as the
substring-match bypass: the auth gate and the handler don't agree on
what the body is.

Wrap ``request.form()`` in a narrow ``try`` that converts any parse
failure to a 400 ``ProxyException``. The outer broad ``except`` is
retained for unrelated unexpected errors but no longer covers
form-parse-side bypass shapes.

Adds a regression test parametrised over the exception classes
Starlette can raise from ``request.form()``.

* chore(proxy): drop redundant _is_json_content_type test class

``_is_json_content_type`` is a 3-line wrapper around the shared
``_normalize_media_type`` helper. Positive coverage lives in
``TestGetRequestBody.test_json_with_charset_param_parses_as_json``;
negative coverage is covered transitively by
``TestIsFormContentType``'s non-form parametrize matrix (anything that
isn't a form type falls through to the JSON branch).

* chore(proxy): carry ASGI path into WebSocket auth synthetic Request (BerriAI#27940)

``user_api_key_auth_websocket`` built a synthetic ``Request`` with a
two-key scope (``type`` + ``headers``) and set ``request._url =
websocket.url``. ``get_request_route`` reads ``scope.get("path", ...)``
and falls back to ``request.url.path`` only when ``path`` is absent.
For the WebSocket flow that fallback fires and resolves to the
Host-header-derived value (Starlette reconstructs ``websocket.url``
from the Host header), so a malformed Host collapses the resolved
route and lets the auth gate compare against the wrong value.

Carry the ASGI scope's ``path``, ``root_path``, and ``app_root_path``
into the synthetic scope so the lookup never reaches the fallback on
the legitimate path.

Regression test pins that the request handed to ``user_api_key_auth``
has ``scope["path"]`` equal to the ASGI scope's path.

---------

Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants