chore(proxy): strict media-type match for form bodies by stuxf · Pull Request #27939 · BerriAI/litellm

stuxf · 2026-05-14T14:34:02Z

Summary

`_read_request_body` and `get_request_body` routed on `"form" in content_type` /
`"multipart/form-data" in content_type`, which match any header containing the
literal — `application/form-json`, `multiform/anything`, `application/json; xform=1`.
Starlette's `request.form()` returns an empty `FormData` for any non-canonical
type without consuming the body, so the auth-time pre-read saw `{}` and skipped
the banned-param check while the handler's later `request.body()` saw the
original JSON payload.

Parse the media type per RFC 7231 (substring before `;`, trimmed, lowercased)
and accept only `application/x-www-form-urlencoded` and `multipart/form-data`.
Replace both substring sites with the shared `_is_form_content_type` helper.

Test plan

`uv run pytest tests/test_litellm/proxy/common_utils/test_http_parsing_utils.py -v` — 49 pass (added parametrized matrix for form-type matching + non-canonical content-type fall-through)
`uv run pytest tests/proxy_unit_tests/test_multipart_bypass_repro.py -v` — 3 pass (existing canary)
`uv run black .`

Type

🐛 Bug Fix

``_read_request_body`` and ``get_request_body`` routed on ``"form" in content_type`` / ``"multipart/form-data" in content_type``, which match any header containing the literal — ``application/form-json``, ``multiform/anything``, ``application/json; xform=1``. Starlette's ``request.form()`` returns an empty ``FormData`` for any non-canonical type without consuming the body, so the auth-time pre-read saw ``{}`` and skipped the banned-param check while the handler's later ``request.body()`` saw the original JSON payload. Parse the media type per RFC 7231 (substring before ``;``, trimmed, lowercased) and accept only ``application/x-www-form-urlencoded`` and ``multipart/form-data``. Replace both substring sites with the shared ``_is_form_content_type`` helper. Tests pin: case/whitespace/charset variants of the two real types match; ``application/form-json`` and similar substring-match traps fall through to the JSON parse path; real form POSTs continue to route through ``request.form()``.

greptile-apps · 2026-05-14T14:36:11Z

Greptile Summary

This PR fixes a content-type bypass in the proxy's request-body parsing that allowed an attacker to use a non-canonical Content-Type (e.g., application/form-json) to make the auth-time pre-read see an empty body while the handler's later read saw the original JSON payload, defeating banned-param checks.

Introduces _normalize_media_type (RFC 7231 parse: strip params, trim, lowercase) and two strict helpers _is_form_content_type / _is_json_content_type that replace unsafe substring matching in _read_request_body and get_request_body.
Adds explicit 400 error surfacing when request.form() throws (malformed multipart, missing boundary, etc.), preventing the silent-empty-body cache inconsistency.
Ships a comprehensive parametrized test suite (form-type matching, non-canonical fall-through, form-parse failure → 400, get_request_body routing) with all mock-only tests.

Confidence Score: 5/5

The change is a targeted, well-tested tightening of content-type matching in the auth-time body pre-read; no regressions were identified.

Both changed sites are covered by new parametrized tests that include the specific bypass vectors described in the PR. The helpers are small, pure functions with no external side effects, and the behaviour change is strictly narrowing. The new 400 path on malformed form data is also tested. No unguarded code paths were found.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/proxy/common_utils/http_parsing_utils.py	Adds `_normalize_media_type`, `_is_form_content_type`, and `_is_json_content_type` helpers; replaces unsafe substring matching with strict RFC 7231 media-type checks; surfaces malformed form payloads as HTTP 400 instead of silently returning `{}`.
tests/test_litellm/proxy/common_utils/test_http_parsing_utils.py	Adds parametrized tests for form-type matching, non-canonical content-type fall-through, form-parse failure → 400, and `get_request_body` routing; all tests use mocks with no real network calls.

_{Reviews (3): Last reviewed commit: "chore(proxy): drop redundant _is_json_co..." | Re-trigger Greptile}

codecov · 2026-05-14T14:37:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Mirror ``_is_form_content_type`` for the JSON branch of ``get_request_body`` so both classifications share the same media-type normalisation (strip params, trim, lowercase) and any future change to the parsing rules has one place to update. Adds tests for ``_is_json_content_type`` and for ``get_request_body`` covering the canonical JSON / form / unsupported / non-POST paths.

Starlette's ``request.form()`` raises ``MultiPartException`` / ``ValueError`` / ``AssertionError`` on malformed multipart input (missing boundary, malformed chunk encoding, etc.). The outer ``except Exception: return {}`` swallowed every form-parse failure and cached an empty parsed body — auth-time pre-reads saw ``{}`` and skipped every banned-param check while a later raw-body re-read in the handler still saw the original payload. Same TOCTOU shape as the substring-match bypass: the auth gate and the handler don't agree on what the body is. Wrap ``request.form()`` in a narrow ``try`` that converts any parse failure to a 400 ``ProxyException``. The outer broad ``except`` is retained for unrelated unexpected errors but no longer covers form-parse-side bypass shapes. Adds a regression test parametrised over the exception classes Starlette can raise from ``request.form()``.

stuxf · 2026-05-14T20:09:03Z

@greptileai

oss-pr-review-agent-shin · 2026-05-14T20:11:27Z

🤖 litellm-agent: This PR is currently BLOCKED from merge.

Score: 3/5 ❌

Why blocked:

1 PR-related CI failure (Size gate: tests (+173) exceed code (+50) by more than 3× — over-specified or feature too thin. Add the oversized-ok label if intentional.) (pr_related_failures, -2 pts)

Details: Score docked for: 1 PR-related CI failure (Size gate: tests (+173) exceed code (+50) by more than 3× — over-specified or feature too thin. Add the oversized-ok label if intentional.).

Fix the issues above and push an update — the bot will re-review automatically.

Note: This bot is still in beta and might not always work as expected. Please share any feedback via Slack.

``_is_json_content_type`` is a 3-line wrapper around the shared ``_normalize_media_type`` helper. Positive coverage lives in ``TestGetRequestBody.test_json_with_charset_param_parses_as_json``; negative coverage is covered transitively by ``TestIsFormContentType``'s non-form parametrize matrix (anything that isn't a form type falls through to the JSON branch).

stuxf · 2026-05-14T20:25:09Z

@greptileai

* chore(proxy): strict media-type match for form bodies (#27939) * chore(proxy): strict media-type match for form bodies ``_read_request_body`` and ``get_request_body`` routed on ``"form" in content_type`` / ``"multipart/form-data" in content_type``, which match any header containing the literal — ``application/form-json``, ``multiform/anything``, ``application/json; xform=1``. Starlette's ``request.form()`` returns an empty ``FormData`` for any non-canonical type without consuming the body, so the auth-time pre-read saw ``{}`` and skipped the banned-param check while the handler's later ``request.body()`` saw the original JSON payload. Parse the media type per RFC 7231 (substring before ``;``, trimmed, lowercased) and accept only ``application/x-www-form-urlencoded`` and ``multipart/form-data``. Replace both substring sites with the shared ``_is_form_content_type`` helper. Tests pin: case/whitespace/charset variants of the two real types match; ``application/form-json`` and similar substring-match traps fall through to the JSON parse path; real form POSTs continue to route through ``request.form()``. * chore(proxy): extract _is_json_content_type symmetric helper Mirror ``_is_form_content_type`` for the JSON branch of ``get_request_body`` so both classifications share the same media-type normalisation (strip params, trim, lowercase) and any future change to the parsing rules has one place to update. Adds tests for ``_is_json_content_type`` and for ``get_request_body`` covering the canonical JSON / form / unsupported / non-POST paths. * chore(proxy): surface form-parse failures instead of caching empty body Starlette's ``request.form()`` raises ``MultiPartException`` / ``ValueError`` / ``AssertionError`` on malformed multipart input (missing boundary, malformed chunk encoding, etc.). The outer ``except Exception: return {}`` swallowed every form-parse failure and cached an empty parsed body — auth-time pre-reads saw ``{}`` and skipped every banned-param check while a later raw-body re-read in the handler still saw the original payload. Same TOCTOU shape as the substring-match bypass: the auth gate and the handler don't agree on what the body is. Wrap ``request.form()`` in a narrow ``try`` that converts any parse failure to a 400 ``ProxyException``. The outer broad ``except`` is retained for unrelated unexpected errors but no longer covers form-parse-side bypass shapes. Adds a regression test parametrised over the exception classes Starlette can raise from ``request.form()``. * chore(proxy): drop redundant _is_json_content_type test class ``_is_json_content_type`` is a 3-line wrapper around the shared ``_normalize_media_type`` helper. Positive coverage lives in ``TestGetRequestBody.test_json_with_charset_param_parses_as_json``; negative coverage is covered transitively by ``TestIsFormContentType``'s non-form parametrize matrix (anything that isn't a form type falls through to the JSON branch). * chore(proxy): carry ASGI path into WebSocket auth synthetic Request (#27940) ``user_api_key_auth_websocket`` built a synthetic ``Request`` with a two-key scope (``type`` + ``headers``) and set ``request._url = websocket.url``. ``get_request_route`` reads ``scope.get("path", ...)`` and falls back to ``request.url.path`` only when ``path`` is absent. For the WebSocket flow that fallback fires and resolves to the Host-header-derived value (Starlette reconstructs ``websocket.url`` from the Host header), so a malformed Host collapses the resolved route and lets the auth gate compare against the wrong value. Carry the ASGI scope's ``path``, ``root_path``, and ``app_root_path`` into the synthetic scope so the lookup never reaches the fallback on the legitimate path. Regression test pins that the request handed to ``user_api_key_auth`` has ``scope["path"]`` equal to the ASGI scope's path. --------- Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>

* chore(proxy): strict media-type match for form bodies (BerriAI#27939) * chore(proxy): strict media-type match for form bodies ``_read_request_body`` and ``get_request_body`` routed on ``"form" in content_type`` / ``"multipart/form-data" in content_type``, which match any header containing the literal — ``application/form-json``, ``multiform/anything``, ``application/json; xform=1``. Starlette's ``request.form()`` returns an empty ``FormData`` for any non-canonical type without consuming the body, so the auth-time pre-read saw ``{}`` and skipped the banned-param check while the handler's later ``request.body()`` saw the original JSON payload. Parse the media type per RFC 7231 (substring before ``;``, trimmed, lowercased) and accept only ``application/x-www-form-urlencoded`` and ``multipart/form-data``. Replace both substring sites with the shared ``_is_form_content_type`` helper. Tests pin: case/whitespace/charset variants of the two real types match; ``application/form-json`` and similar substring-match traps fall through to the JSON parse path; real form POSTs continue to route through ``request.form()``. * chore(proxy): extract _is_json_content_type symmetric helper Mirror ``_is_form_content_type`` for the JSON branch of ``get_request_body`` so both classifications share the same media-type normalisation (strip params, trim, lowercase) and any future change to the parsing rules has one place to update. Adds tests for ``_is_json_content_type`` and for ``get_request_body`` covering the canonical JSON / form / unsupported / non-POST paths. * chore(proxy): surface form-parse failures instead of caching empty body Starlette's ``request.form()`` raises ``MultiPartException`` / ``ValueError`` / ``AssertionError`` on malformed multipart input (missing boundary, malformed chunk encoding, etc.). The outer ``except Exception: return {}`` swallowed every form-parse failure and cached an empty parsed body — auth-time pre-reads saw ``{}`` and skipped every banned-param check while a later raw-body re-read in the handler still saw the original payload. Same TOCTOU shape as the substring-match bypass: the auth gate and the handler don't agree on what the body is. Wrap ``request.form()`` in a narrow ``try`` that converts any parse failure to a 400 ``ProxyException``. The outer broad ``except`` is retained for unrelated unexpected errors but no longer covers form-parse-side bypass shapes. Adds a regression test parametrised over the exception classes Starlette can raise from ``request.form()``. * chore(proxy): drop redundant _is_json_content_type test class ``_is_json_content_type`` is a 3-line wrapper around the shared ``_normalize_media_type`` helper. Positive coverage lives in ``TestGetRequestBody.test_json_with_charset_param_parses_as_json``; negative coverage is covered transitively by ``TestIsFormContentType``'s non-form parametrize matrix (anything that isn't a form type falls through to the JSON branch). * chore(proxy): carry ASGI path into WebSocket auth synthetic Request (BerriAI#27940) ``user_api_key_auth_websocket`` built a synthetic ``Request`` with a two-key scope (``type`` + ``headers``) and set ``request._url = websocket.url``. ``get_request_route`` reads ``scope.get("path", ...)`` and falls back to ``request.url.path`` only when ``path`` is absent. For the WebSocket flow that fallback fires and resolves to the Host-header-derived value (Starlette reconstructs ``websocket.url`` from the Host header), so a malformed Host collapses the resolved route and lets the auth gate compare against the wrong value. Carry the ASGI scope's ``path``, ``root_path``, and ``app_root_path`` into the synthetic scope so the lookup never reaches the fallback on the legitimate path. Regression test pins that the request handed to ``user_api_key_auth`` has ``scope["path"]`` equal to the ASGI scope's path. --------- Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>

greptile-apps Bot reviewed May 14, 2026

View reviewed changes

Comment thread litellm/proxy/common_utils/http_parsing_utils.py Outdated

stuxf added 2 commits May 14, 2026 14:52

yuneng-berri changed the base branch from litellm_internal_staging to litellm_yj_may18 May 18, 2026 20:42

yuneng-berri merged commit e35affd into BerriAI:litellm_yj_may18 May 18, 2026
42 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(proxy): strict media-type match for form bodies#27939

chore(proxy): strict media-type match for form bodies#27939
yuneng-berri merged 4 commits into
BerriAI:litellm_yj_may18from
stuxf:chore/strict-content-type-parse

stuxf commented May 14, 2026

Uh oh!

greptile-apps Bot commented May 14, 2026 •

edited

Loading

Important Files Changed

Uh oh!

Uh oh!

codecov Bot commented May 14, 2026 •

edited

Loading

Uh oh!

stuxf commented May 14, 2026

Uh oh!

oss-pr-review-agent-shin Bot commented May 14, 2026

Uh oh!

stuxf commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

stuxf commented May 14, 2026

Summary

Test plan

Type

Uh oh!

greptile-apps Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

codecov Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

stuxf commented May 14, 2026

Uh oh!

oss-pr-review-agent-shin Bot commented May 14, 2026

Uh oh!

stuxf commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 14, 2026 •

edited

Loading

codecov Bot commented May 14, 2026 •

edited

Loading