feat(mcp): core sampling and elicitation flow#27109
Conversation
Greptile SummaryThis PR implements core MCP sampling and elicitation flows, allowing upstream MCP servers to request LLM inference and user input through LiteLLM's proxy infrastructure, with proper auth propagation, budget enforcement, and model-permission checks.
Confidence Score: 4/5The PR is broadly safe to merge; the new sampling/elicitation paths are well-guarded by opt-in flags and the session-auth storage regressions from prior reviews have been addressed. The previous id(session) WeakKeyDictionary key bug is fixed (session object used directly). The _session_obj_auth_storage unbounded-growth concern is resolved with WeakKeyDictionary. The MCP_SAMPLING_AVAILABLE gate now correctly prevents callback installation when the mcp package is absent. Two minor findings remain: _build_sampling_request is constructed twice per sampling call (redundant but not wrong), and the heuristic session-lock skip on truncated bodies has a theoretical false-positive window that is very unlikely to trigger in practice. sampling_handler.py warrants a close read of the budget and model-access check paths to ensure they stay in sync with the main /chat/completions auth flow as that evolves.
|
| Filename | Overview |
|---|---|
| litellm/experimental_mcp_client/client.py | Adds sampling/elicitation/logging callbacks to MCPClient; introduces _get_safe_stdio_env() allowlist to avoid leaking LiteLLM API keys to stdio subprocesses. |
| litellm/proxy/_experimental/mcp_server/sampling_handler.py | New 1,251-line module implementing MCP sampling. _build_sampling_request() is constructed twice per call (once in _run_budget_checks, once inline), which is redundant but harmless. |
| litellm/proxy/_experimental/mcp_server/elicitation_handler.py | New module handling MCP elicitation/create: relays to downstream client when available or declines gracefully; capability checks for form/URL mode are correct. |
| litellm/proxy/_experimental/mcp_server/mcp_server_manager.py | Adds sampling/elicitation callback factories gated on availability flags; fixes stdio env from empty-dict to None so _get_safe_stdio_env supplies a proper allowlist; adds allow_sampling/allow_elicitation config fields. |
| litellm/proxy/_experimental/mcp_server/server.py | Adds WeakKeyDictionary session-auth storage and JSON-RPC response detection to prevent deadlock during sampling/elicitation callbacks. Heuristic string-match fallback for truncated bodies is documented as the safer tradeoff. |
| litellm/types/mcp_server/mcp_server_manager.py | Adds allow_sampling: bool = False and allow_elicitation: bool = False fields to MCPServer with correct opt-in defaults. |
Reviews (15): Last reviewed commit: "feat(mcp): core sampling and elicitation..." | Re-trigger Greptile
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
@greptileai solved all the issues |
|
@greptileai solved the issues |
|
@greptileai solved the issues |
PR overviewAll previously flagged issues have been addressed. No open security concerns remain on this pull request. Security reviewNo open security issues remain on this pull request. Fixed/addressed: 12 · PR risk: 0/10 |
|
@greptileai fixed non commented issues too |
|
@greptileai solved the issues |
|
@greptileai tried to fix some issues which were previously there |
|
@greptileai solved all the issues. give the confidence score |
|
Confidence Score: 3/5 Good progress — most of the original review issues have been resolved. Here's the current state: Resolved ✅
Still open ❌
To reach 4/5: resolve the merge conflicts, close the open review thread, and add meaningful tests for the |
afc71e6 to
bd3417d
Compare
0dc6cdc to
34d699e
Compare
|
@greptileai solved all issues, give score |
|
@greptileai no unresolved thread present, dummy request issue is solved, active_mcp_session_var issue solved too,tests are covered in PR #26748 |
|
@greptileai solved all 3 quality issues give |
|
@greptileai solved the dead code and test issues too |
|
@greptileai solved the sampling_handler issues |
|
The sampling_handler.py looks significantly improved. The key security fixes are all in place:
One thing still worth double-checking: the # raw_headers loop — x-forwarded-for is NOT excluded
if _key == "content-type" or _key in _HOP_BY_HOP:
continue
_scope_headers.append(...)
# client_ip is only injected if x-forwarded-for is absent
if client_ip and not any(h[0] == b"x-forwarded-for" for h in _scope_headers):
_scope_headers.append((b"x-forwarded-for", client_ip.encode("utf-8")))If a client sends a spoofed That said, this is a medium-severity concern (IP-based routing bypass, not an auth bypass), and that thread is marked outdated. Everything else looks solid. |
|
@greptileai solved spoofable error |
|
@greptileai solver server.py issues like json rpc one |
|
@greptileai solved bedget check error propagation and json rpc empty body issue. give final confidence score |
|
Final Confidence Score: 4/5 All 14 review threads are now resolved, including the critical security findings. Here's the updated assessment: Resolved since last review:
Remaining known limitation (acknowledged in PR):
What keeps it at 4/5 rather than 5/5:
The core feature is solid and the fixes address real bugs. Ready for maintainer review with CI confirmation. |
017ca34 to
bc104ff
Compare
|
@yugborana I see too many files changed which are not related to the feat: Is this intentional? Can you fix and once all files changed look good. Please retag me for review |
|
@Sameerlite most of the files have changes for mypy because type casting was not there. I have commented the core files, so those are the main ones |
|
Just take the latest pull from internal staging branch and only have files that actually have the code you changed for this feat |
bc305c0 to
d1d651b
Compare
- Add sampling_handler.py: full MCP sampling/createMessage flow with model selection (hint-based + priority-based), auth enforcement, budget checks, route restriction gates, and tag policy pre-auth - Add elicitation_handler.py: MCP elicitation/create relay with downstream client capability detection - Wire sampling/elicitation callbacks in mcp_server_manager.py gated behind allow_sampling/allow_elicitation config flags - Add allow_sampling/allow_elicitation fields to MCPServer type - Fix session lock deadlock: skip lock for JSON-RPC response POSTs (elicitation/sampling replies) with truncated-body heuristic - Extend client.py with sampling_callback and elicitation_callback - Security: RouteChecks gate, tag-budget bypass fix, x-forwarded-for spoofing fix, Latin-1 header encoding guard - Add 4 new test modules (model access, priority selection, request builder, tool conversion) + update existing MCP tests
d1d651b to
695cde6
Compare
|
@Sameerlite yes cleaned the noise and now core files are there but lint failing due to mypy issues |
|
@greptileai review the files except tests from this PR |
Without this, an upstream MCP server with allow_sampling enabled could send prompts that bypass every guardrail (content filtering, PII redaction, prompt-injection detection) configured on /chat/completions. - Call proxy_logging_obj.pre_call_hook(call_type='acompletion') before llm_router.acompletion so guardrails fire for sampling sub-calls - Add HTTPException to the re-raise list so guardrail rejections propagate correctly instead of being swallowed as generic errors
ea1fc54
into
BerriAI:litellm_oss_staging_040626
|
@yugborana if you can create a PR in litellm-docs showing how to use this etc? Thanks |
|
@Sameerlite Thanks for the fast review. I will surely try to make a PR in litellm-docs. |

Live Test -
https://github.com/user-attachments/assets/dc7e5e98-8345-4a8a-bdaa-e0e37092f807



Relevant issues
Fixes #23761
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Screenshots / Proof of Fix
Type
🆕 New Feature
Changes
This PR fixes several bugs to make the MCP tool system more reliable and secure. I fixed an issue where the user's API key was ignored during extra AI requests, ensuring that usage costs are now correctly billed to the right person. I also fixed a problem where the server couldn't ask the user follow-up questions by making sure the system remembers exactly which user is currently active. To prevent common crashes with apps like Claude Desktop, I improved how the connection handles messages and made the system work better on Windows by ignoring folder capitalization. One remaining limitation is that if multiple people use the same shared tool at the exact same time, the system might struggle to tell them apart for billing because of how the underlying connection works, but this setup works perfectly for standard individual use.