feat(mcp): core sampling and elicitation flow by yugborana · Pull Request #27109 · BerriAI/litellm

yugborana · 2026-05-04T08:39:49Z

Live Test -

https://github.com/user-attachments/assets/dc7e5e98-8345-4a8a-bdaa-e0e37092f807

Relevant issues

Fixes #23761

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

[ yes] I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
[ yes] My PR passes all unit tests on make test-unit
[yes ] My PR's scope is as isolated as possible, it only solves 1 specific problem
[ yes] I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Type

🆕 New Feature

Changes

This PR fixes several bugs to make the MCP tool system more reliable and secure. I fixed an issue where the user's API key was ignored during extra AI requests, ensuring that usage costs are now correctly billed to the right person. I also fixed a problem where the server couldn't ask the user follow-up questions by making sure the system remembers exactly which user is currently active. To prevent common crashes with apps like Claude Desktop, I improved how the connection handles messages and made the system work better on Windows by ignoring folder capitalization. One remaining limitation is that if multiple people use the same shared tool at the exact same time, the system might struggle to tell them apart for billing because of how the underlying connection works, but this setup works perfectly for standard individual use.

greptile-apps · 2026-05-04T08:43:45Z

Greptile Summary

This PR implements core MCP sampling and elicitation flows, allowing upstream MCP servers to request LLM inference and user input through LiteLLM's proxy infrastructure, with proper auth propagation, budget enforcement, and model-permission checks.

sampling_handler.py (new): Full MCP sampling/createMessage implementation — model resolution from ModelPreferences hints/priorities, message/tool format conversion, per-key/team/user model-access checks, budget enforcement via common_checks, and litellm.acompletion dispatch.
elicitation_handler.py (new): Handles elicitation/create by relaying to the downstream client when it declares elicitation capabilities, or declining gracefully otherwise.
client.py / mcp_server_manager.py: Wires sampling and elicitation callbacks into MCPClient and MCPServerManager; fixes stdio subprocess env to use an allowlist so subprocesses inherit PATH, HOME, etc., and sensitive LiteLLM keys are not leaked.
server.py: Replaces plain session-auth dict storage with a WeakKeyDictionary keyed on the live session object; adds JSON-RPC response detection to skip the session lock and prevent deadlock during sampling/elicitation callbacks.

Confidence Score: 4/5

The PR is broadly safe to merge; the new sampling/elicitation paths are well-guarded by opt-in flags and the session-auth storage regressions from prior reviews have been addressed.

The previous id(session) WeakKeyDictionary key bug is fixed (session object used directly). The _session_obj_auth_storage unbounded-growth concern is resolved with WeakKeyDictionary. The MCP_SAMPLING_AVAILABLE gate now correctly prevents callback installation when the mcp package is absent. Two minor findings remain: _build_sampling_request is constructed twice per sampling call (redundant but not wrong), and the heuristic session-lock skip on truncated bodies has a theoretical false-positive window that is very unlikely to trigger in practice.

sampling_handler.py warrants a close read of the budget and model-access check paths to ensure they stay in sync with the main /chat/completions auth flow as that evolves.

Important Files Changed

Filename	Overview
litellm/experimental_mcp_client/client.py	Adds sampling/elicitation/logging callbacks to MCPClient; introduces _get_safe_stdio_env() allowlist to avoid leaking LiteLLM API keys to stdio subprocesses.
litellm/proxy/_experimental/mcp_server/sampling_handler.py	New 1,251-line module implementing MCP sampling. _build_sampling_request() is constructed twice per call (once in _run_budget_checks, once inline), which is redundant but harmless.
litellm/proxy/_experimental/mcp_server/elicitation_handler.py	New module handling MCP elicitation/create: relays to downstream client when available or declines gracefully; capability checks for form/URL mode are correct.
litellm/proxy/_experimental/mcp_server/mcp_server_manager.py	Adds sampling/elicitation callback factories gated on availability flags; fixes stdio env from empty-dict to None so _get_safe_stdio_env supplies a proper allowlist; adds allow_sampling/allow_elicitation config fields.
litellm/proxy/_experimental/mcp_server/server.py	Adds WeakKeyDictionary session-auth storage and JSON-RPC response detection to prevent deadlock during sampling/elicitation callbacks. Heuristic string-match fallback for truncated bodies is documented as the safer tradeoff.
litellm/types/mcp_server/mcp_server_manager.py	Adds allow_sampling: bool = False and allow_elicitation: bool = False fields to MCPServer with correct opt-in defaults.

_{Reviews (15): Last reviewed commit: "feat(mcp): core sampling and elicitation..." | Re-trigger Greptile}

codecov · 2026-05-04T08:45:35Z

Codecov Report

❌ Patch coverage is 50.52770% with 375 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...proxy/_experimental/mcp_server/sampling_handler.py	62.06%	176 Missing ⚠️
litellm/proxy/_experimental/mcp_server/server.py	32.24%	124 Missing ⚠️
...xy/_experimental/mcp_server/elicitation_handler.py	13.72%	44 Missing ⚠️
...oxy/_experimental/mcp_server/mcp_server_manager.py	41.17%	20 Missing ⚠️
litellm/experimental_mcp_client/client.py	54.16%	11 Missing ⚠️

📢 Thoughts on this report? Let us know!

yugborana · 2026-05-04T09:02:01Z

@greptileai solved all the issues

yugborana · 2026-05-04T09:13:54Z

@greptileai solved the issues

yugborana · 2026-05-04T09:30:29Z

@greptileai solved the issues

veria-ai · 2026-05-04T09:40:26Z

PR overview

All previously flagged issues have been addressed. No open security concerns remain on this pull request.

Security review

No open security issues remain on this pull request.

Fixed/addressed: 12 · PR risk: 0/10

yugborana · 2026-05-04T09:51:54Z

@greptileai fixed non commented issues too

yugborana · 2026-05-04T10:20:53Z

@greptileai solved the issues

yugborana · 2026-05-09T12:48:53Z

@krrish-berri-2

yugborana · 2026-05-09T18:39:27Z

@greptileai tried to fix some issues which were previously there

yugborana · 2026-06-01T04:16:28Z

@greptileai solved all the issues. give the confidence score

greptile-apps · 2026-06-01T04:18:27Z

Confidence Score: 3/5

Good progress — most of the original review issues have been resolved. Here's the current state:

Resolved ✅

SSRF protection in OAuth discovery (P0)
id(session) WeakKeyDictionary key bug
validate_no_callback_env_reference type guard
Auth bypass on pass-through endpoints
Server-side secret exfiltration via callback vars
MCP_SAMPLING_AVAILABLE flag guard

Still open ❌

1 unresolved review thread — the model permission bypass thread from veria-ai (added May 9) remains open in GitHub. The _check_model_access function is now present in sampling_handler.py and looks correct, but the thread hasn't been formally resolved — please mark it closed so reviewers know it's been addressed.
Merge conflicts — the PR is in dirty state and cannot be merged as-is.
Very low test coverage — Codecov reports sampling_handler.py at 7.72% (215 lines missing), elicitation_handler.py at 13.72% (44 lines missing). The core sampling path is largely untested.
Dummy request issue still present — _build_sampling_request hardcodes server: ("127.0.0.1", 0) and only partially reconstructs headers. Header-dependent guardrails, IP-based routing, and trace-ID correlation will be silently skipped for all sampling sub-calls. This was called out in the previous summary and hasn't changed.
active_mcp_session_var not reset — the context var is set in tool handlers without saving and restoring a reset token, which can leak context across concurrent requests.

To reach 4/5: resolve the merge conflicts, close the open review thread, and add meaningful tests for the handle_sampling_create_message and elicitation paths. The dummy request concern should at least be explicitly documented as a known limitation if not fixed.

CLAassistant · 2026-06-02T09:06:44Z

All committers have signed the CLA.

yugborana · 2026-06-02T10:15:44Z

@greptileai solved all issues, give score

yugborana · 2026-06-02T10:25:57Z

@greptileai no unresolved thread present, dummy request issue is solved, active_mcp_session_var issue solved too,tests are covered in PR #26748

yugborana · 2026-06-02T15:37:38Z

@greptileai solved all 3 quality issues give

yugborana · 2026-06-02T16:00:38Z

@greptileai solved the dead code and test issues too

yugborana · 2026-06-02T16:12:10Z

@greptileai solved the sampling_handler issues

greptile-apps · 2026-06-02T16:13:44Z

The sampling_handler.py looks significantly improved. The key security fixes are all in place:

OAuth denial (handle_sampling_create_message line 676): user_api_key_auth is None now returns ErrorData before doing anything.
Model permission checks (_check_model_access): Now runs can_key_call_model, can_team_access_model, _check_team_member_model_access, can_user_call_model, and can_project_access_model — covering the team/project restriction gaps flagged earlier.
Budget enforcement (_run_budget_checks): Delegates to common_checks, the same path /chat/completions uses.
MCP_SAMPLING_AVAILABLE guard: handle_sampling_create_message returns ErrorData immediately when False.

One thing still worth double-checking: the x-forwarded-for forwarding in _build_sampling_request. The original suggestion was to drop it from raw_headers entirely (using the already-resolved client_ip instead), but the current code still passes caller-supplied x-forwarded-for through:

# raw_headers loop — x-forwarded-for is NOT excluded
if _key == "content-type" or _key in _HOP_BY_HOP:
    continue
_scope_headers.append(...)

# client_ip is only injected if x-forwarded-for is absent
if client_ip and not any(h[0] == b"x-forwarded-for" for h in _scope_headers):
    _scope_headers.append((b"x-forwarded-for", client_ip.encode("utf-8")))

If a client sends a spoofed x-forwarded-for in raw_headers, it wins over the resolved client_ip. The fix would be:

            if _key in {"content-type", "x-forwarded-for"} or _key in _HOP_BY_HOP:
                continue

That said, this is a medium-severity concern (IP-based routing bypass, not an auth bypass), and that thread is marked outdated. Everything else looks solid.

yugborana · 2026-06-02T16:20:38Z

@greptileai solved spoofable error

yugborana · 2026-06-02T16:55:36Z

@greptileai solver server.py issues like json rpc one

yugborana · 2026-06-02T17:25:52Z

@greptileai solved bedget check error propagation and json rpc empty body issue. give final confidence score

greptile-apps · 2026-06-02T17:30:00Z

Final Confidence Score: 4/5

All 14 review threads are now resolved, including the critical security findings. Here's the updated assessment:

Resolved since last review:

Budget check error propagation fixed — sampling requests now correctly surface budget exhaustion errors to callers
JSON-RPC empty body handling fixed — prevents Claude Desktop / stdio client crashes on response posts
Session-keying bug (id(session) → session object on WeakKeyDictionary) — resolved
Model permission bypass — resolved
Sampling budget check bypass (common_checks path) — resolved
Virtual-key route restriction bypass — resolved
OAuth passthrough without LiteLLM key — resolved
Team/project model restriction enforcement for sampling — resolved

Remaining known limitation (acknowledged in PR):

Concurrent multi-user billing disambiguation is imperfect due to stdio session identity constraints — accepted limitation, not a regression

What keeps it at 4/5 rather than 5/5:

The PR is large (+3382/-444, 47 files) and several high-severity security threads were resolved without visible inline code diffs — a final pass confirming the actual code changes for each security fix (especially the common_checks bypass and route restriction gate) is warranted before merge
CI run links are not filled in the checklist

The core feature is solid and the fixes address real bugs. Ready for maintainer review with CI confirmation.

Sameerlite · 2026-06-03T11:09:47Z

@yugborana I see too many files changed which are not related to the feat:

Is this intentional? Can you fix and once all files changed look good. Please retag me for review

yugborana · 2026-06-03T11:23:38Z

@Sameerlite most of the files have changes for mypy because type casting was not there. I have commented the core files, so those are the main ones

Sameerlite · 2026-06-03T12:10:22Z

Just take the latest pull from internal staging branch and only have files that actually have the code you changed for this feat

- Add sampling_handler.py: full MCP sampling/createMessage flow with model selection (hint-based + priority-based), auth enforcement, budget checks, route restriction gates, and tag policy pre-auth - Add elicitation_handler.py: MCP elicitation/create relay with downstream client capability detection - Wire sampling/elicitation callbacks in mcp_server_manager.py gated behind allow_sampling/allow_elicitation config flags - Add allow_sampling/allow_elicitation fields to MCPServer type - Fix session lock deadlock: skip lock for JSON-RPC response POSTs (elicitation/sampling replies) with truncated-body heuristic - Extend client.py with sampling_callback and elicitation_callback - Security: RouteChecks gate, tag-budget bypass fix, x-forwarded-for spoofing fix, Latin-1 header encoding guard - Add 4 new test modules (model access, priority selection, request builder, tool conversion) + update existing MCP tests

yugborana · 2026-06-03T13:06:59Z

@Sameerlite yes cleaned the noise and now core files are there but lint failing due to mypy issues

yugborana · 2026-06-03T13:09:12Z

@greptileai review the files except tests from this PR

Without this, an upstream MCP server with allow_sampling enabled could send prompts that bypass every guardrail (content filtering, PII redaction, prompt-injection detection) configured on /chat/completions. - Call proxy_logging_obj.pre_call_hook(call_type='acompletion') before llm_router.acompletion so guardrails fire for sampling sub-calls - Add HTTPException to the re-raise list so guardrail rejections propagate correctly instead of being swallowed as generic errors

yugborana · 2026-06-04T05:59:09Z

@Sameerlite

Sameerlite

LGTM

Sameerlite · 2026-06-04T11:40:07Z

@yugborana if you can create a PR in litellm-docs showing how to use this etc? Thanks

yugborana · 2026-06-04T11:52:36Z

@Sameerlite Thanks for the fast review. I will surely try to make a PR in litellm-docs.

greptile-apps Bot reviewed May 4, 2026

View reviewed changes

Comment thread litellm/proxy/_experimental/mcp_server/mcp_server_manager.py

greptile-apps Bot reviewed May 4, 2026

View reviewed changes

Comment thread litellm/proxy/_experimental/mcp_server/server.py Outdated

greptile-apps Bot reviewed May 4, 2026

View reviewed changes

Comment thread litellm/proxy/_types.py Outdated

veria-ai Bot reviewed May 4, 2026

View reviewed changes

Comment thread litellm/proxy/_types.py Outdated

Comment thread litellm/proxy/_types.py Outdated

Comment thread litellm/proxy/_experimental/mcp_server/mcp_server_manager.py Outdated

greptile-apps Bot reviewed May 4, 2026

View reviewed changes

Comment thread litellm/proxy/_experimental/mcp_server/sampling_handler.py

yugborana changed the title ~~feat(mcp): split core mcp files into separate PR~~ feat(mcp): core sampling and elicitation flow May 9, 2026

veria-ai Bot reviewed May 9, 2026

View reviewed changes

Comment thread litellm/proxy/_experimental/mcp_server/sampling_handler.py

abdulmunimjemal mentioned this pull request May 17, 2026

fix(mcp): log + respond to unsupported server→client requests instead of dropping silently #28121

Closed

3 tasks

veria-ai Bot reviewed Jun 1, 2026

View reviewed changes

Comment thread litellm/proxy/_experimental/mcp_server/sampling_handler.py

yugborana force-pushed the fix/mcp-core-files branch from afc71e6 to bd3417d Compare June 2, 2026 09:06

yugborana requested a review from a team June 2, 2026 09:06

yugborana force-pushed the fix/mcp-core-files branch 4 times, most recently from 0dc6cdc to 34d699e Compare June 2, 2026 10:15

veria-ai Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread litellm/proxy/_experimental/mcp_server/sampling_handler.py

veria-ai Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread litellm/proxy/_experimental/mcp_server/sampling_handler.py

yugborana force-pushed the fix/mcp-core-files branch 2 times, most recently from 017ca34 to bc104ff Compare June 2, 2026 20:20

Sameerlite requested changes Jun 3, 2026

View reviewed changes

Comment thread litellm/proxy/batches_endpoints/endpoints.py Outdated

yugborana force-pushed the fix/mcp-core-files branch from bc305c0 to d1d651b Compare June 3, 2026 13:01

yugborana force-pushed the fix/mcp-core-files branch from d1d651b to 695cde6 Compare June 3, 2026 13:05

yugborana requested a review from Sameerlite June 3, 2026 13:05

veria-ai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread litellm/proxy/_experimental/mcp_server/sampling_handler.py

Sameerlite changed the base branch from litellm_internal_staging to litellm_oss_staging_040626 June 4, 2026 11:38

Sameerlite approved these changes Jun 4, 2026

View reviewed changes

Sameerlite merged commit ea1fc54 into BerriAI:litellm_oss_staging_040626 Jun 4, 2026
44 of 46 checks passed

Uh oh!

Conversation

yugborana commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Uh oh!

greptile-apps Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Uh oh!

Uh oh!

codecov Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yugborana commented May 4, 2026

Uh oh!

Uh oh!

yugborana commented May 4, 2026

Uh oh!

Uh oh!

yugborana commented May 4, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

veria-ai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR overview

Security review

Uh oh!

yugborana commented May 4, 2026

Uh oh!

Uh oh!

yugborana commented May 4, 2026

Uh oh!

yugborana commented May 9, 2026

Uh oh!

yugborana commented May 9, 2026

Uh oh!

Uh oh!

yugborana commented Jun 1, 2026

Uh oh!

greptile-apps Bot commented Jun 1, 2026

Uh oh!

Uh oh!

CLAassistant commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yugborana commented Jun 2, 2026

Uh oh!

yugborana commented Jun 2, 2026

Uh oh!

yugborana commented Jun 2, 2026

Uh oh!

yugborana commented Jun 2, 2026

Uh oh!

yugborana commented Jun 2, 2026

Uh oh!

greptile-apps Bot commented Jun 2, 2026

Uh oh!

yugborana commented Jun 2, 2026

Uh oh!

yugborana commented Jun 2, 2026

Uh oh!

Uh oh!

yugborana commented Jun 2, 2026

Uh oh!

greptile-apps Bot commented Jun 2, 2026

Uh oh!

Uh oh!

Sameerlite commented Jun 3, 2026

yugborana commented May 4, 2026 •

edited

Loading

greptile-apps Bot commented May 4, 2026 •

edited

Loading

codecov Bot commented May 4, 2026 •

edited

Loading

veria-ai Bot commented May 4, 2026 •

edited

Loading

CLAassistant commented Jun 2, 2026 •

edited

Loading

yugborana commented Jun 3, 2026 •

edited

Loading

yugborana commented Jun 4, 2026 •

edited

Loading