Skip to content

feat(mcp): core sampling and elicitation flow#27109

Merged
Sameerlite merged 2 commits into
BerriAI:litellm_oss_staging_040626from
yugborana:fix/mcp-core-files
Jun 4, 2026
Merged

feat(mcp): core sampling and elicitation flow#27109
Sameerlite merged 2 commits into
BerriAI:litellm_oss_staging_040626from
yugborana:fix/mcp-core-files

Conversation

@yugborana

@yugborana yugborana commented May 4, 2026

Copy link
Copy Markdown
Contributor

Live Test -

https://github.com/user-attachments/assets/dc7e5e98-8345-4a8a-bdaa-e0e37092f807
image
image
image

Relevant issues

Fixes #23761

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • [ yes] I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • [ yes] My PR passes all unit tests on make test-unit
  • [yes ] My PR's scope is as isolated as possible, it only solves 1 specific problem
  • [ yes] I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

Type

🆕 New Feature

Changes

This PR fixes several bugs to make the MCP tool system more reliable and secure. I fixed an issue where the user's API key was ignored during extra AI requests, ensuring that usage costs are now correctly billed to the right person. I also fixed a problem where the server couldn't ask the user follow-up questions by making sure the system remembers exactly which user is currently active. To prevent common crashes with apps like Claude Desktop, I improved how the connection handles messages and made the system work better on Windows by ignoring folder capitalization. One remaining limitation is that if multiple people use the same shared tool at the exact same time, the system might struggle to tell them apart for billing because of how the underlying connection works, but this setup works perfectly for standard individual use.

@greptile-apps

greptile-apps Bot commented May 4, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR implements core MCP sampling and elicitation flows, allowing upstream MCP servers to request LLM inference and user input through LiteLLM's proxy infrastructure, with proper auth propagation, budget enforcement, and model-permission checks.

  • sampling_handler.py (new): Full MCP sampling/createMessage implementation — model resolution from ModelPreferences hints/priorities, message/tool format conversion, per-key/team/user model-access checks, budget enforcement via common_checks, and litellm.acompletion dispatch.
  • elicitation_handler.py (new): Handles elicitation/create by relaying to the downstream client when it declares elicitation capabilities, or declining gracefully otherwise.
  • client.py / mcp_server_manager.py: Wires sampling and elicitation callbacks into MCPClient and MCPServerManager; fixes stdio subprocess env to use an allowlist so subprocesses inherit PATH, HOME, etc., and sensitive LiteLLM keys are not leaked.
  • server.py: Replaces plain session-auth dict storage with a WeakKeyDictionary keyed on the live session object; adds JSON-RPC response detection to skip the session lock and prevent deadlock during sampling/elicitation callbacks.

Confidence Score: 4/5

The PR is broadly safe to merge; the new sampling/elicitation paths are well-guarded by opt-in flags and the session-auth storage regressions from prior reviews have been addressed.

The previous id(session) WeakKeyDictionary key bug is fixed (session object used directly). The _session_obj_auth_storage unbounded-growth concern is resolved with WeakKeyDictionary. The MCP_SAMPLING_AVAILABLE gate now correctly prevents callback installation when the mcp package is absent. Two minor findings remain: _build_sampling_request is constructed twice per sampling call (redundant but not wrong), and the heuristic session-lock skip on truncated bodies has a theoretical false-positive window that is very unlikely to trigger in practice.

sampling_handler.py warrants a close read of the budget and model-access check paths to ensure they stay in sync with the main /chat/completions auth flow as that evolves.

Important Files Changed

Filename Overview
litellm/experimental_mcp_client/client.py Adds sampling/elicitation/logging callbacks to MCPClient; introduces _get_safe_stdio_env() allowlist to avoid leaking LiteLLM API keys to stdio subprocesses.
litellm/proxy/_experimental/mcp_server/sampling_handler.py New 1,251-line module implementing MCP sampling. _build_sampling_request() is constructed twice per call (once in _run_budget_checks, once inline), which is redundant but harmless.
litellm/proxy/_experimental/mcp_server/elicitation_handler.py New module handling MCP elicitation/create: relays to downstream client when available or declines gracefully; capability checks for form/URL mode are correct.
litellm/proxy/_experimental/mcp_server/mcp_server_manager.py Adds sampling/elicitation callback factories gated on availability flags; fixes stdio env from empty-dict to None so _get_safe_stdio_env supplies a proper allowlist; adds allow_sampling/allow_elicitation config fields.
litellm/proxy/_experimental/mcp_server/server.py Adds WeakKeyDictionary session-auth storage and JSON-RPC response detection to prevent deadlock during sampling/elicitation callbacks. Heuristic string-match fallback for truncated bodies is documented as the safer tradeoff.
litellm/types/mcp_server/mcp_server_manager.py Adds allow_sampling: bool = False and allow_elicitation: bool = False fields to MCPServer with correct opt-in defaults.

Reviews (15): Last reviewed commit: "feat(mcp): core sampling and elicitation..." | Re-trigger Greptile

Comment thread litellm/proxy/_experimental/mcp_server/mcp_server_manager.py
@codecov

codecov Bot commented May 4, 2026

Copy link
Copy Markdown

@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai solved all the issues

Comment thread litellm/proxy/_experimental/mcp_server/server.py Outdated
@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai solved the issues

Comment thread litellm/proxy/_types.py Outdated
@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai solved the issues

Comment thread litellm/proxy/_types.py Outdated
Comment thread litellm/proxy/_types.py Outdated
Comment thread litellm/proxy/_experimental/mcp_server/mcp_server_manager.py Outdated
@veria-ai

veria-ai Bot commented May 4, 2026

Copy link
Copy Markdown
Contributor

PR overview

All previously flagged issues have been addressed. No open security concerns remain on this pull request.

Security review

No open security issues remain on this pull request.

Fixed/addressed: 12 · PR risk: 0/10

@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai fixed non commented issues too

Comment thread litellm/proxy/_experimental/mcp_server/sampling_handler.py
@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai solved the issues

@yugborana

Copy link
Copy Markdown
Contributor Author

@krrish-berri-2

@yugborana yugborana changed the title feat(mcp): split core mcp files into separate PR feat(mcp): core sampling and elicitation flow May 9, 2026
@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai tried to fix some issues which were previously there

Comment thread litellm/proxy/_experimental/mcp_server/sampling_handler.py
@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai solved all the issues. give the confidence score

@greptile-apps

greptile-apps Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Confidence Score: 3/5

Good progress — most of the original review issues have been resolved. Here's the current state:

Resolved ✅

  • SSRF protection in OAuth discovery (P0)
  • id(session) WeakKeyDictionary key bug
  • validate_no_callback_env_reference type guard
  • Auth bypass on pass-through endpoints
  • Server-side secret exfiltration via callback vars
  • MCP_SAMPLING_AVAILABLE flag guard

Still open ❌

  1. 1 unresolved review thread — the model permission bypass thread from veria-ai (added May 9) remains open in GitHub. The _check_model_access function is now present in sampling_handler.py and looks correct, but the thread hasn't been formally resolved — please mark it closed so reviewers know it's been addressed.

  2. Merge conflicts — the PR is in dirty state and cannot be merged as-is.

  3. Very low test coverage — Codecov reports sampling_handler.py at 7.72% (215 lines missing), elicitation_handler.py at 13.72% (44 lines missing). The core sampling path is largely untested.

  4. Dummy request issue still present_build_sampling_request hardcodes server: ("127.0.0.1", 0) and only partially reconstructs headers. Header-dependent guardrails, IP-based routing, and trace-ID correlation will be silently skipped for all sampling sub-calls. This was called out in the previous summary and hasn't changed.

  5. active_mcp_session_var not reset — the context var is set in tool handlers without saving and restoring a reset token, which can leak context across concurrent requests.

To reach 4/5: resolve the merge conflicts, close the open review thread, and add meaningful tests for the handle_sampling_create_message and elicitation paths. The dummy request concern should at least be explicitly documented as a known limitation if not fixed.

Comment thread litellm/proxy/_experimental/mcp_server/sampling_handler.py
@yugborana yugborana force-pushed the fix/mcp-core-files branch from afc71e6 to bd3417d Compare June 2, 2026 09:06
@yugborana yugborana requested a review from a team June 2, 2026 09:06
@CLAassistant

CLAassistant commented Jun 2, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@yugborana yugborana force-pushed the fix/mcp-core-files branch 4 times, most recently from 0dc6cdc to 34d699e Compare June 2, 2026 10:15
@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai solved all issues, give score

@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai no unresolved thread present, dummy request issue is solved, active_mcp_session_var issue solved too,tests are covered in PR #26748

@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai solved all 3 quality issues give

@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai solved the dead code and test issues too

@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai solved the sampling_handler issues

@greptile-apps

greptile-apps Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

The sampling_handler.py looks significantly improved. The key security fixes are all in place:

  • OAuth denial (handle_sampling_create_message line 676): user_api_key_auth is None now returns ErrorData before doing anything.
  • Model permission checks (_check_model_access): Now runs can_key_call_model, can_team_access_model, _check_team_member_model_access, can_user_call_model, and can_project_access_model — covering the team/project restriction gaps flagged earlier.
  • Budget enforcement (_run_budget_checks): Delegates to common_checks, the same path /chat/completions uses.
  • MCP_SAMPLING_AVAILABLE guard: handle_sampling_create_message returns ErrorData immediately when False.

One thing still worth double-checking: the x-forwarded-for forwarding in _build_sampling_request. The original suggestion was to drop it from raw_headers entirely (using the already-resolved client_ip instead), but the current code still passes caller-supplied x-forwarded-for through:

# raw_headers loop — x-forwarded-for is NOT excluded
if _key == "content-type" or _key in _HOP_BY_HOP:
    continue
_scope_headers.append(...)

# client_ip is only injected if x-forwarded-for is absent
if client_ip and not any(h[0] == b"x-forwarded-for" for h in _scope_headers):
    _scope_headers.append((b"x-forwarded-for", client_ip.encode("utf-8")))

If a client sends a spoofed x-forwarded-for in raw_headers, it wins over the resolved client_ip. The fix would be:

            if _key in {"content-type", "x-forwarded-for"} or _key in _HOP_BY_HOP:
                continue

That said, this is a medium-severity concern (IP-based routing bypass, not an auth bypass), and that thread is marked outdated. Everything else looks solid.

@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai solved spoofable error

@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai solver server.py issues like json rpc one

Comment thread litellm/proxy/_experimental/mcp_server/sampling_handler.py
@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai solved bedget check error propagation and json rpc empty body issue. give final confidence score

@greptile-apps

greptile-apps Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Final Confidence Score: 4/5

All 14 review threads are now resolved, including the critical security findings. Here's the updated assessment:

Resolved since last review:

  • Budget check error propagation fixed — sampling requests now correctly surface budget exhaustion errors to callers
  • JSON-RPC empty body handling fixed — prevents Claude Desktop / stdio client crashes on response posts
  • Session-keying bug (id(session) → session object on WeakKeyDictionary) — resolved
  • Model permission bypass — resolved
  • Sampling budget check bypass (common_checks path) — resolved
  • Virtual-key route restriction bypass — resolved
  • OAuth passthrough without LiteLLM key — resolved
  • Team/project model restriction enforcement for sampling — resolved

Remaining known limitation (acknowledged in PR):

  • Concurrent multi-user billing disambiguation is imperfect due to stdio session identity constraints — accepted limitation, not a regression

What keeps it at 4/5 rather than 5/5:

  • The PR is large (+3382/-444, 47 files) and several high-severity security threads were resolved without visible inline code diffs — a final pass confirming the actual code changes for each security fix (especially the common_checks bypass and route restriction gate) is warranted before merge
  • CI run links are not filled in the checklist

The core feature is solid and the fixes address real bugs. Ready for maintainer review with CI confirmation.

Comment thread litellm/proxy/_experimental/mcp_server/sampling_handler.py
@yugborana yugborana force-pushed the fix/mcp-core-files branch 2 times, most recently from 017ca34 to bc104ff Compare June 2, 2026 20:20
@Sameerlite

Copy link
Copy Markdown
Collaborator

@yugborana I see too many files changed which are not related to the feat:
image

Is this intentional? Can you fix and once all files changed look good. Please retag me for review

Comment thread litellm/proxy/batches_endpoints/endpoints.py Outdated
@yugborana

Copy link
Copy Markdown
Contributor Author

@Sameerlite most of the files have changes for mypy because type casting was not there. I have commented the core files, so those are the main ones

@Sameerlite

Copy link
Copy Markdown
Collaborator

Just take the latest pull from internal staging branch and only have files that actually have the code you changed for this feat

@yugborana yugborana force-pushed the fix/mcp-core-files branch from bc305c0 to d1d651b Compare June 3, 2026 13:01
- Add sampling_handler.py: full MCP sampling/createMessage flow with
  model selection (hint-based + priority-based), auth enforcement,
  budget checks, route restriction gates, and tag policy pre-auth
- Add elicitation_handler.py: MCP elicitation/create relay with
  downstream client capability detection
- Wire sampling/elicitation callbacks in mcp_server_manager.py
  gated behind allow_sampling/allow_elicitation config flags
- Add allow_sampling/allow_elicitation fields to MCPServer type
- Fix session lock deadlock: skip lock for JSON-RPC response POSTs
  (elicitation/sampling replies) with truncated-body heuristic
- Extend client.py with sampling_callback and elicitation_callback
- Security: RouteChecks gate, tag-budget bypass fix, x-forwarded-for
  spoofing fix, Latin-1 header encoding guard
- Add 4 new test modules (model access, priority selection, request
  builder, tool conversion) + update existing MCP tests
@yugborana yugborana force-pushed the fix/mcp-core-files branch from d1d651b to 695cde6 Compare June 3, 2026 13:05
@yugborana yugborana requested a review from Sameerlite June 3, 2026 13:05
@yugborana

yugborana commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

@Sameerlite yes cleaned the noise and now core files are there but lint failing due to mypy issues

@yugborana

Copy link
Copy Markdown
Contributor Author

@greptileai review the files except tests from this PR

Comment thread litellm/proxy/_experimental/mcp_server/sampling_handler.py
Without this, an upstream MCP server with allow_sampling enabled could
send prompts that bypass every guardrail (content filtering, PII
redaction, prompt-injection detection) configured on /chat/completions.

- Call proxy_logging_obj.pre_call_hook(call_type='acompletion') before
  llm_router.acompletion so guardrails fire for sampling sub-calls
- Add HTTPException to the re-raise list so guardrail rejections
  propagate correctly instead of being swallowed as generic errors
@yugborana

Copy link
Copy Markdown
Contributor Author

@Sameerlite

@Sameerlite Sameerlite changed the base branch from litellm_internal_staging to litellm_oss_staging_040626 June 4, 2026 11:38

@Sameerlite Sameerlite left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Sameerlite Sameerlite merged commit ea1fc54 into BerriAI:litellm_oss_staging_040626 Jun 4, 2026
44 of 46 checks passed
@Sameerlite

Copy link
Copy Markdown
Collaborator

@yugborana if you can create a PR in litellm-docs showing how to use this etc? Thanks

@yugborana

yugborana commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

@Sameerlite Thanks for the fast review. I will surely try to make a PR in litellm-docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Support MCP Elicitation and Sampling in the MCP Gateway

4 participants