merge main#28837
Conversation
Caller-supplied tags (`x-litellm-tags` header, body `tags`, `metadata.tags`) were silently dropped unless the key/team had `metadata.allow_client_tags: true` set. Restore the documented behavior: tags from the request always flow into `metadata.tags` and union with any admin-configured static tags from key/team/project metadata. Removes the `allow_client_tags` opt-in flag from the pre-call pipeline. The flag was only ever read here; it has no schema or endpoint footprint, so leftover values in existing key metadata are inert. Test cleanup mirrors the simplification: drop the three tests that verified the strip-when-not-opted-in path, drop the `allow_client_tags` fixture lines from the merge/union tests.
The tag-strip block was removed in the parent commit but two surrounding comments still referenced "tags without opt-in" and "runs AFTER the strip". Update them to describe the remaining user_api_key_* and _pipeline_managed_guardrails strip that the snapshot/merge ordering actually protects against.
DALL-E 2 and DALL-E 3 were removed from the OpenAI API on 2026-05-12, causing e2e image-generation tests to fail with "model does not exist". Swap all live-API DALL-E references in proxy-backed tests to gpt-image-1 and update the dall-e-2 alias in proxy_server_config.yaml to point at openai/gpt-image-1 (preserves any historical dall-e-2 callers).
… gpt-image-1 Second wave of failures from the 2026-05-12 DALL-E shutdown: - tests/image_gen_tests/test_image_edits.py::TestOpenAIImageEditDallE2 and tests/image_gen_tests/test_image_generation.py::TestOpenAIDalle3 are explicitly named for the deprecated models and can't pass; remove. gpt-image-1 coverage already exists in sibling classes. - tests/local_testing/test_router.py image gen tests use dall-e-3 only as a routing example; swap to gpt-image-1. - tests/local_testing/test_custom_callback_input.py image_generation success/failure paths swapped to gpt-image-1.
fix(tests): swap dall-e to gpt-image-1 after openai deprecation
fix(proxy): always merge caller-supplied tags into request metadata
…#27762) * chore: reject bare str at file-input sinks to prevent local-file read (#27667) Squash-merged by litellm-agent from stuxf's PR. * fix: use os.PathLike in ocr sink and check truthy reasoningSummary for bridge - ocr/main.py: widen Path check to os.PathLike for consistency with other sinks - main.py: bridge condition checks truthiness of reasoning_summary, not just None Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: remove unused pathlib.Path import in ocr/main.py --------- Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
DALL-E 2 and DALL-E 3 were removed from the OpenAI API on 2026-05-12, causing e2e image-generation tests to fail with "model does not exist". Swap all live-API DALL-E references in proxy-backed tests to gpt-image-1 and update the dall-e-2 alias in proxy_server_config.yaml to point at openai/gpt-image-1 (preserves any historical dall-e-2 callers).
… gpt-image-1 Second wave of failures from the 2026-05-12 DALL-E shutdown: - tests/image_gen_tests/test_image_edits.py::TestOpenAIImageEditDallE2 and tests/image_gen_tests/test_image_generation.py::TestOpenAIDalle3 are explicitly named for the deprecated models and can't pass; remove. gpt-image-1 coverage already exists in sibling classes. - tests/local_testing/test_router.py image gen tests use dall-e-3 only as a routing example; swap to gpt-image-1. - tests/local_testing/test_custom_callback_input.py image_generation success/failure paths swapped to gpt-image-1.
Caller-supplied tags (`x-litellm-tags` header, body `tags`, `metadata.tags`) were silently dropped unless the key/team had `metadata.allow_client_tags: true` set. Restore the documented behavior: tags from the request always flow into `metadata.tags` and union with any admin-configured static tags from key/team/project metadata. Removes the `allow_client_tags` opt-in flag from the pre-call pipeline. The flag was only ever read here; it has no schema or endpoint footprint, so leftover values in existing key metadata are inert. Test cleanup mirrors the simplification: drop the three tests that verified the strip-when-not-opted-in path, drop the `allow_client_tags` fixture lines from the merge/union tests.
The tag-strip block was removed in the parent commit but two surrounding comments still referenced "tags without opt-in" and "runs AFTER the strip". Update them to describe the remaining user_api_key_* and _pipeline_managed_guardrails strip that the snapshot/merge ordering actually protects against.
* feat(ui): add Vertex AI Search as vector store provider Adds a "Vertex AI Search" entry to the provider dropdown (custom_llm_provider=vertex_ai/search_api) with fields for project, location (global/us/eu select), and optional collection ID. Extends VectorStoreFieldConfig with `options` so select fields can be data-driven instead of falling through to the embedding-model list. * fix(ui): clarify vertex_collection_id placeholder copy Placeholder previously displayed "default_collection" — the literal fallback value — which invited users to type it instead of leaving the field blank. Switch to an example placeholder and tighten the tooltip.
* fix(proxy): resolve cache handling issues in _lookup_deprecated_key - Updated the in-memory cache for deprecated key lookups to store a 3-tuple (active_token_id, cache_expires_at_ts, revoke_at_ts) instead of a 2-tuple, ensuring proper unpacking and backward compatibility. - Removed duplicate cache reads and added logic to handle legacy cache entries gracefully. - Enhanced unit tests to cover scenarios for cache hits, DB misses, and respect for revoke_at timestamps, ensuring robust handling of the grace-period key-rotation feature. * refactor(proxy): streamline cache handling in _lookup_deprecated_key - Simplified the cache retrieval logic by directly unpacking the 3-tuple cache entries, removing the need for backward compatibility checks for 2-tuple entries. - Updated unit tests to ensure that pre-warmed 3-tuple cache entries are served correctly without unnecessary database lookups. * chore(ci): add new unit test for deprecated key grace period - Included `test_deprecated_key_grace_period.py` in the CI workflow to enhance coverage for deprecated key handling scenarios. * fix(proxy): remove unnecessary check for revoke_at in _lookup_deprecated_key - Eliminated the redundant check for None on revoke_at, streamlining the logic for handling deprecated keys in the cache. This change enhances the efficiency of the key lookup process. * test(proxy): add end-to-end tests for deprecated key lookup behavior - Introduced a new test class `TestDeprecatedKeyLookupDbE2E` to validate the behavior of deprecated key lookups against a real Prisma-backed database. - The test ensures that old key hashes resolve correctly and that repeated lookups utilize the in-memory cache without errors. - Cleaned up the `_lookup_deprecated_key` function by removing an unnecessary check for `revoke_at`, enhancing the efficiency of the key lookup process.
…ypass A non-admin caller could rebind their own key's ``user_id`` via ``/key/regenerate``. ``_execute_virtual_key_regeneration`` had org/team guards but no ``user_id`` guard, and ``prepare_key_update_data`` did not strip the field — it survived ``model_dump(exclude_unset=True)`` into the Prisma update. On the next request, ``_return_user_api_key_auth_obj`` resolved the rebound ``user_id`` against ``litellm_usertable`` and returned ``PROXY_ADMIN`` whenever the target row's ``user_role`` was admin (e.g. the default ``user_id="default_user_id"`` created on first password-UI login). ``/key/update`` had the equivalent guard inline at ``_validate_update_key_data``; extract it to a shared helper ``_validate_caller_can_change_key_ownership`` and call from both ``/key/update`` and ``_execute_virtual_key_regeneration``. Future regenerate-style endpoints inherit the guard for free. Also tighten the premium gate that allowed the master-key rotation branch to skip the enterprise check. The previous predicate was ``data.new_master_key is not None`` — a field-presence test, not an identity check. Any non-premium caller could send any value in that field and the premium check would no-op. Verify the caller actually holds the master key via ``_is_master_key`` before allowing the non-premium path. Tests: - ``test_regenerate_user_id_rebind_guard`` — parametrized table over cross-user rebind (blocked), empty-string removal (blocked), and same-user no-op rebind (allowed). - ``test_regenerate_premium_gate_requires_actual_master_key`` / ``test_regenerate_premium_gate_allows_actual_master_key_holder`` — ensure the premium check requires the caller actually present the master key, and that legitimate master-key rotation still works.
…eaks
Convert the per-test VCR verdict line from a single 'NOOP / HIT / MISS /
PARTIAL' tag into a classified outcome that distinguishes the cases that
silently bill the live API on every CI run from the ones that don't:
HIT pure replay
PARTIAL mixed replay + new recordings
MISS:RECORDED new cassette saved to Redis (cached next run)
MISS:OVERFLOW cassette > MAX_EPISODES_PER_CASSETTE; persister
refused to save; re-bills every run
MISS:NOT_PERSISTED test failed; save_cassette skipped; re-bills
NOOP VCR-marked but no HTTP traffic (mocked elsewhere)
UNMARKED:LIVE_CALL test bypassed VCR AND opened a TCP connection
to a known LLM provider host -> wasted spend
UNMARKED:NO_TRAFFIC test bypassed VCR but didn't call out
The UNMARKED:LIVE_CALL signal is what converts 'this test probably hits
live' into 'this test connected to api.openai.com'. We install a
socket.connect / socket.create_connection wrapper for the duration of
each non-VCR-marked test and record any outbound TCP to a known LLM
provider hostname. The probe sits below the httpx layer so vcrpy and
respx (which both patch above the socket) are unaffected.
Replace the file-level _RESPX_CONFLICTING_FILES blacklists in the
llm_translation and local_testing conftests with per-item respx
detection in apply_vcr_auto_marker_to_items. A test now skips VCR when
it actually carries @pytest.mark.respx or has respx_mock in its fixture
chain - not just because some other test in the same file imports
MockRouter. Items skipped by skip_files are split into respx_conflict
(real conflict, the module wires up respx) vs file_opt_out (dead skip-
list entry whose module never touches respx) so the session summary
makes pruning obvious.
Stabilize the AWS SigV4 fingerprint: the Authorization header on
Bedrock requests rotates its Credential date and Signature on every
call, which previously pushed every Bedrock test past the 50-episode
overflow threshold. Extract the access-key id only
('aws-sigv4:AKIA...') so two requests with the same identity match.
Always emit verdict logging when VCR is active (set
LITELLM_VCR_VERBOSE=0 to opt back into the legacy quiet mode). Add a
session-end classification summary that lists overflow tests, unmarked
live-call tests, and the skip-reason breakdown.
Wire the live-call probe + summary hook into every test directory that
already uses the Redis-backed VCR cache (audio_tests, guardrails_tests,
image_gen_tests, litellm_utils_tests, llm_responses_api_testing,
llm_translation, local_testing, logging_callback_tests, ocr_tests,
pass_through_unit_tests, router_unit_tests, search_tests,
unified_google_tests).
Add tests/llm_translation/test_vcr_classification.py covering the
verdict classifier, skip-reason tagging, AWS SigV4 fingerprint stability,
live-host classification, and session summary rendering.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
These seven test files were on _RESPX_CONFLICTING_FILES, which made the auto-marker skip them entirely. Inspecting the source shows the only respx artifact is a top-level 'from respx import MockRouter' that no test ever uses - no @pytest.mark.respx, no respx_mock fixture, no respx.mock context manager. The import is dead code left over from a previous mocking pattern. Now that apply_vcr_auto_marker_to_items detects respx per-item via the marker / fixture chain (b637d9f), the file-level skip is no longer needed for these files - they were the reason the OpenAI tests (test_o3_reasoning_effort, test_streaming_response[o1/o3-mini], TestOpenAIO1::test_streaming, TestOpenAIChatCompletion::test_web_search, TestOpenAIO3::test_web_search, etc.) ran live every CI build despite the cassette cache being healthy. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
…en module-level file handles
Module-level
TEST_IMAGES = [
open(os.path.join(pwd, 'ishaan_github.png'), 'rb'),
open(os.path.join(pwd, 'litellm_site.png'), 'rb'),
]
SINGLE_TEST_IMAGE = open(...)
opens the file once at import. After the first multipart upload, the
file pointer is at EOF, so every subsequent test in the same xdist
worker sends an empty multipart body. That non-determinism (a) blows
the recorded cassette past MAX_EPISODES_PER_CASSETTE (50) so
_RedisPersister.save_cassette refuses to save it, and (b) re-bills the
live image edit endpoint on every CI run.
Recent CI runs confirm the leak: tests/image_gen_tests/test_image_edits.py
shows six tests parking at 51-52 cassette entries
(TestOpenAIImageEditGPTImage1::test_openai_image_edit_litellm_sdk[False],
TestOpenAIImageEditDallE2::..., test_openai_image_edit_with_bytesio,
test_openai_image_edit_litellm_router, test_multiple_vs_single_image_edit[False],
test_multiple_image_edit_with_different_formats).
Replace the module-level file handles with _make_test_images() /
_make_single_test_image() factories that return fresh _RewindableImage
(BytesIO subclass) objects whose pointer always starts at 0. The image
bytes are read once at import into module-level constants
(_ISHAAN_GITHUB_BYTES, _LITELLM_SITE_BYTES), so disk I/O cost is
unchanged.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
Previous wording read "User=<new_owner> is not allowed to update the key to belong to user=<current_owner>" — easy to misread as "caller wants to keep the key on its current owner". Reframe as "Non-admin caller is not allowed to rebind the key from user=<existing> to user=<incoming>" so the direction of the failed operation is unambiguous. Same shape preserved (HTTPException 403); only the ``detail`` string changes. Regression test substring updated.
The suffix '.bedrock-runtime.amazonaws.com' never matched real Bedrock
endpoints, which use the format 'bedrock-runtime[-fips].{region}.amazonaws.com'
(region between 'bedrock-runtime' and 'amazonaws.com'). Add an explicit
host check for that pattern so Bedrock live calls are visible to the
probe, and update the unit test accordingly. Also drop the unused
'_LIVE_CALL_PROBE_INSTALLED' module variable.
…name cascade tests Removes the allow_client_tags metadata check from apply_client_tag_policy_pre_auth so x-litellm-tags headers are always merged into request metadata, matching the post-auth behavior in add_litellm_data_to_request. Updates pre-call tests accordingly and adds a new test suite covering cascading credential renames into model rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
``model_dump(exclude_unset=True)`` in ``prepare_key_update_data`` includes any field the caller explicitly set, even when the value is ``None``. The previous guard short-circuited on ``getattr(data, 'user_id', None) is None``, which conflated "field omitted" (safe) with "field explicitly set to null" (writes NULL to the token row, detaching the key from its user and bypassing user-row role checks). Switch the omitted-vs-set distinction to ``data.model_fields_set``; treat explicit-null and explicit-empty-string identically as a removal attempt, both 403-rejected for non-admin callers. Parametrized regression adds ``explicit_null_blocked`` alongside the existing ``rebind_blocked`` / ``empty_blocked`` / ``same_user_id_allowed`` cases.
…ebind-guard chore(proxy): close /key/regenerate ownership-rebind + premium-gate bypass
… upload
The _RewindableImage(BytesIO) wrapper auto-rewound on every read after
EOF, which made the OpenAI SDK's multipart upload writer read the same
bytes forever instead of seeing EOF. Workers OOM'd / SIGKILL'd:
[gw0] node down: Not properly terminated
replacing crashed worker gw0
...
worker 'gw1' crashed while running
'tests/image_gen_tests/test_image_edits.py::TestOpenAIImageEditGPTImage1::test_openai_image_edit_litellm_sdk[False]'
The auto-rewind was added defensively for parametrized + flaky-retried
tests, but BaseLLMImageEditTest::test_openai_image_edit_litellm_sdk
already calls get_base_image_edit_call_args() once per invocation and
that helper now constructs fresh streams via _make_test_images(), so
rewinding inside the stream is unnecessary. Replace with plain BytesIO
seeded with the cached image bytes.
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
… path ``get_instance_fn`` previously routed any ``s3://`` / ``gcs://`` value into ``_load_instance_from_remote_storage`` regardless of how the value got there. The function ultimately calls ``spec.loader.exec_module(module)`` — Python in the proxy process. On admin-callable endpoints that accept a ``target`` / ``custom_handler`` field from the request body (e.g. ``/config/pass_through_endpoint``, custom-callback registration), that is a one-step admin-to-RCE primitive: any future privilege-escalation bug becomes immediate code execution. The documented operator flow for remote-module loading is ``litellm_settings.callbacks: ["s3://bucket/module.instance"]`` in ``config.yaml``. That path always carries the YAML's ``config_file_path`` through to ``get_instance_fn``. Use the presence of ``config_file_path`` as the discriminator: refuse remote URLs when it is absent (the request-body path) unless the operator explicitly opts back in via ``LITELLM_ALLOW_REMOTE_INSTANCE_FN_FROM_API=true``. The three success/failure/audit-log callback-loop call sites in ``proxy_server.py:load_config`` were already running inside the startup config-file load but had stopped threading ``config_file_path`` through. Pass it through so the documented ``s3://`` callback flow continues to work unchanged. Tests cover: remote URL without ``config_file_path`` raises; remote URL with the opt-in env reaches the loader; remote URL with ``config_file_path`` passes (documented startup flow); local dotted-name imports unaffected.
`apply_client_tag_policy_pre_auth` overwrote string-typed metadata
with `{}` before merging header tags, dropping any tags inside. A
caller could send `metadata='{"tags":["over-budget"]}'` plus
`x-litellm-tags: within-budget` and bypass `_tag_max_budget_check`
on the body tag. Parse the string via `safe_json_loads` first so
existing tags survive the merge.
Also drop the empty `tests/test_litellm/proxy/credential_endpoints/`
directory — the cascade-rename tests it held imported a function
that was never implemented (out of scope for this PR).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The pre-existing s3:// / gcs:// custom-logger tests called ``get_instance_fn`` without ``config_file_path``, which means the new runtime gate (refuse remote URLs unless invoked from a config-file load) now raises ``ValueError`` before reaching the mocked download paths. Each test was exercising the documented startup config-file load scenario; pass ``config_file_path="/any/path"`` to make that intent explicit and route past the gate. Affected: test_s3_download_success, test_gcs_download_success, test_invalid_url_format, test_download_failure_handling, test_file_cleanup.
The pass_through prompt-caching tests
(test_prompt_caching_returns_cache_read_tokens_on_second_call,
test_prompt_caching_streaming_second_call_returns_cache_read) make a
warm-up call and then assert the *second* call sees a non-zero
cache_read_input_tokens count from the upstream's prompt-cache. VCR
replay can't model cross-call provider state — both calls match the
same cassette episode, so the second call returns the first call's
pre-warmup response and the assertion fails:
AssertionError: Expected cache_read_input_tokens > 0 on second call,
but got 0. Full usage: {'input_tokens': 4986,
'cache_creation_input_tokens': 4974, 'cache_read_input_tokens': 0}
This started biting after the AWS SigV4 fingerprint stabilization
(b637d9f): Bedrock requests now produce a stable per-access-key
fingerprint instead of a per-request signature, so cassettes
successfully replay where they previously always missed and re-recorded
live. Opt these tests out via skip_nodeid_suffixes so they run live and
match the existing pattern in tests/llm_translation/conftest.py
(::test_prompt_caching).
Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>
Co-authored-by: Michael Riad Zaky <michaelr@Michaels-MacBook-Air.local> Co-authored-by: Yuneng Jiang <yuneng@berri.ai>
…aming hot paths (#28289) * perf: reduce per-request and per-chunk overhead across Anthropic streaming hot paths - Introduce pure-text fast-path in `_build_complete_streaming_response` that collapses O(N) `content_block_delta` events into a single equivalent SSE event before conversion, eliminating per-output-token Pydantic `ModelResponseStream` construction; non-text streams (tool_use, thinking, citations) fall back to the unchanged legacy path - Skip agentic streaming wrapper entirely when no callback overrides `async_should_run_agentic_loop`; the wrapper buffered every chunk and rebuilt the SSE response only to call hooks that all return `(False, {})` — a pure no-op for the default config - Serialize request body once (`json.dumps`) for both the pre-call log input and the wire, instead of twice; avoids a full O(payload) scan per request, significant for long-context Claude Code histories - Add fast path in `async_streaming_data_generator` that bypasses the per-chunk `async_post_call_streaming_hook` coroutine await, response-string materialization, and cost-injection call when no callback/guardrail/cost-injection is active (the default config) - Resolve `_DD_STREAMING_TRACE_ENABLED` once at import time; eliminate per-chunk `NullSpan` context manager allocation when Datadog tracing is disabled (the default) - Memoize `get_type_hints(AnthropicMessagesRequestOptionalParams)` with `@lru_cache(maxsize=1)` — resolves once per process instead of once per `/v1/messages` request (~80µs each) - Hoist `cost_injection_active` out of the per-chunk loop in `chunk_processor`; eliminates repeated `getattr` + endpoint-type checks on every streamed byte chunk - Extract `_build_passthrough_logging_result` from `_route_streaming_logging_to_handler` as a standalone static method to facilitate future off-loop dispatch - Convert `async_sse_data_generator` from an `async for: yield` trampoline to a direct return of the underlying generator, removing one async-generator layer per streamed chunk - Skip redundant `strip_empty_text_blocks_from_anthropic_messages` scan in `anthropic_messages_handler` when the async wrapper already sanitized (signalled via `_litellm_messages_presanitized` sentinel, popped before reaching provider params) - Gate debug log `f-string` evaluation behind `isEnabledFor(DEBUG)` in both the streaming generator and the transformation layer to avoid serializing entire message payloads on every request at non-debug log levels - Add benchmark script (`scripts/benchmark_anthropic_messages_perf.py`) with a local mock Anthropic SSE provider for reproducible TTFT and TPM measurement across commits/branches - Add parity tests asserting fast-path and legacy-path produce byte-identical logged/billed payloads, plus unit tests for agentic hook detection, pre-serialized body reuse, and memoized key resolution * perf: address greptile review for anthropic streaming hot path - Bail to legacy in `_collapse_pure_text_chunks` when content_block_delta events from different block indexes are observed without an intervening flush. Anthropic sends blocks strictly sequentially, but defensive bail prevents silent text-merging if the protocol ever interleaves. - Replace leaf-class `__dict__` check for `async_post_call_streaming_hook` in `_callback_capabilities` with a function-identity comparison that walks the MRO. A vendor base class can carry the override and the registered class can add nothing else; before this PR the hook was unconditionally invoked, so an inherited-override miss would silently drop the hook on the streaming path. - Add unit tests for both behaviors. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(mypy): narrow model_name to str in cost-injection branch The hoisted cost_injection_active flag in chunk_processor encodes the `bool(model_name)` requirement but mypy can't track that invariant through the local, so the per-chunk `_process_chunk_with_cost_injection( chunk, model_name)` calls flagged Optional[str] vs str. Pin a typed non-None local inside the cost-injection branch so mypy narrows correctly without changing runtime behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>
… management endpoints (#28681) * test(proxy): phase-4 payload behavior pinning for tier-2/3 key + team management endpoints Extends the Phase 1–3 behavior-pin suite at tests/proxy_behavior/management/ with a second axis: payload-shape pinning. Phase 1–3 held payload minimal and pinned (actor, target) → status across 37 routes; Phase 4 holds the caller fixed at an authorized actor, varies the payload shape, and asserts the observable DB effect (on accept) or the named guard / row-unchanged (on reject). Faithfulness contract from Phase 1–3 is unchanged. Six families + one gap-closer (59 new scenarios, 620 → 679 total): * F1 — key budget / rate-limit (test_key_budget_limits.py, 18) * F2 — key↔team reassignment (test_key_team_change.py, 6) * F3 — team budget / rate-limit (test_team_budget_limits.py, 15) * F4 — member-info validation (test_team_member_info_validation.py, 5) * F5 — permission batching (test_team_permissions_bulk_update.py, 6) * F6 — org-scoped team access (+2 detail-string pins in existing files) * F7 — coverage gap-closer (test_f7_coverage_closeout.py, 7) Harness extensions in conftest.py (additive only): * create_scratch_org() seeder with its own scratch-prefixed budget row * budget / limit fields on create_scratch_team() * scratch teardown also sweeps litellm_organizationtable Coverage telemetry (behavior-suite-only): * key_management_endpoints.py 60 % → 65 % (+82 lines) * team_endpoints.py 62 % → 72 % (+137 lines, crosses 70 % stretch) Key lands under 70 % per plan §7 escape hatch — the gap is dominated by routes outside F1–F6 scope (key list/info v2 internals) and structurally dead org-budget guards (call sites at lines 889 + 2310 + 985 + 1751 load the org without include_budget_table=True, so org.litellm_budget_table is None at guard time and the aggregate guard no-ops). Pinned as observed no-op behavior so a future fix that flips the flag turns these into reds. Zero source-code changes; pyproject.toml diff is empty; test_route_coverage.py stays green untouched; G3 grep guards still green; local wall-time 14 s for the full suite (no coverage), 22 s with coverage. G4 regression-replay protocol executed against three representative fix-PR parents (410ce76, 0bd49ec, 8bbc61e): all Phase 4 tests PASS at pre-fix SHAs — confirming the F1–F7 layer is a helper-body pin, not a regression-replay layer for those specific historical bypass shapes. Targeted RED-bait scenarios for each fix are left for a follow-up PR. * test(proxy): push key_management_endpoints.py past the 70% stretch (F7-extension) Adds 24 more payload-pin scenarios in test_f7_key_coverage_push.py following the same accepted-effect / rejected-guard pattern. Each scenario cites the file:line range it pins; same anti-snapshot rules apply. Target ranges (all reachable via HTTP-boundary payload variation): * 5942-6063 /key/health with metadata.logging → test_key_logging body * 4565-4692 /key/reset_spend happy + 404 + non-admin gate + value validation * 4421-4533 /key/regenerate ghost-404 + happy + new_key + grace_period * 4168-4202 _insert_deprecated_key body via grace_period * 6118-6133 _enforce_unique_key_alias duplicate-alias rejection * 6148-6169 validate_model_max_budget malformed-payload rejection * 4708-4789 validate_key_list_check user/team/org/key_hash branches * 2622-2733 /key/bulk_update mixed success/failure + admin gate + size limits * 2797-2950 /team/key/bulk_update all-keys path + explicit-keys dedupe + 404 * 5108-5207 /key/aliases admin + scoped + search-filter branches * 3253-3303 /key/info ghost + explicit-key + no-key-uses-auth-header * 3427-3436 generate_key_helper_fn budget_limits initialization * 1794-1815 prepare_key_update_data duration + budget_duration paths * 5280-5388 _build_filter_conditions across include_created_by_keys/team/sort/alias Coverage telemetry — full PR4 dataset: key_management_endpoints.py: 60 % → 71 % (+11 pts, +194 lines) team_endpoints.py: 62 % → 72 % (+10 pts, +137 lines) Both files now over the plan §7 PR4.M4 70 % stretch as a side effect of pinning real payload behavior. 721 tests pass in 19 s local (full suite, no coverage); 27 s with coverage. Zero source-code changes; pyproject.toml diff still empty; test_route_coverage.py + G3 grep guards still green. Honest finding (kept from the prior commit's body): four structurally-dead org-budget guards remain pinned as observed no-op behavior — they fire only when get_org_object is called with include_budget_table=True, which none of the four management-endpoint call sites currently do. Pinned so a future change that flips the flag turns these into reds. Two helper guards are honest-ceiling: _validate_reset_spend_value's isinstance check at line 4568 is unreachable from HTTP because Pydantic 422s non-float before the helper runs; same shape for /team/key/bulk_update's missing team_id / no-selector pre-handler guards. * test(proxy): address PR review — try/finally cleanup + loosen 500 envelope pins + Optional annotations Greptile review feedback on PR #28681: 1. Wrap manual budget-row cleanup in try/finally so an assertion failure doesn't leave non-scratch-prefixed budget rows orphaned across CI re-runs (test_team_new_with_team_member_budget_creates_budget_row and test_team_update_team_member_budget_upserts). 2. Loosen the two 500-status pins to in (400, 422, 500) — the named-guard substring is the real pin; the outer ValueError-wrap envelope is an implementation detail that a future improvement should be free to fix to a proper 400/422 without flipping these tests red. 3. Add missing Optional annotations on _seed_token's max_budget / metadata / team_id keyword args (they default to None). Greptile's typo flag on 'read-world' in the conftest comment is declined — 'read-world' is the project's established term for the immutable seeded world fixture (see other usages in conftest.py and actors.py). 721 tests still pass in 17 s.
…) (#28378) * feat(prometheus): emit per-token-type detail metrics (LIT-3220) (#28372) Adds five sparse counter metrics that break out the token detail fields providers already report in `usage.prompt_tokens_details` and `usage.completion_tokens_details`: - litellm_input_cached_tokens_metric (provider prompt-cache reads) - litellm_input_cache_creation_tokens_metric (Anthropic prompt-cache writes) - litellm_input_audio_tokens_metric (audio input tokens) - litellm_output_reasoning_tokens_metric (reasoning tokens) - litellm_output_audio_tokens_metric (audio output tokens) These are additive — existing input/output/total counters are unchanged, so no dashboards break. Each new counter is only incremented when the underlying detail is populated and > 0, keeping scrape output sparse for providers that don't report a given field. Data is read from the canonical Usage dict that `get_standard_logging_object_payload` already attaches at `standard_logging_payload["metadata"]["usage_object"]`, so no new plumbing through the logging pipeline is required. Tests: 10 new unit tests covering registration, label-set parity, all-types increment, zero/None/negative skip behaviour, and the no-metadata/no-usage_object no-op paths. Closes LIT-3220 Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Krrish Dholakia <krrishdholakia@berri.ai> Co-authored-by: Claude <noreply@anthropic.com> * chore: remove proof folder image --------- Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Krrish Dholakia <krrishdholakia@berri.ai> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
…8405) * fix(otel): stamp http.response.status_code on all error responses httpx.HTTPStatusError exposes status under .response.status_code, not as a top-level attr, so unified-endpoint 5xx failures left the SERVER span without a status. The admin hooks only wrote a child span and never stamped or ended the parent at all, so admin 4xx/5xx (and success) responses were invisible to dashboards. Adds a fallback to .response.status_code in get_error_information, and ends the parent SERVER span in async_management_endpoint_{success,failure}_hook with the same _record_exception_on_span helper the unified path uses. Resolves LIT-3193 * test(otel): exercise httpx.HTTPStatusError through admin path Pins the contract that get_error_information's response.status_code fallback is reachable from any entry point — without this, a future refactor that bypasses _record_exception_on_span in the admin hooks could regress for httpx-wrapped exceptions while the unified suite still passes. * chore(otel): trim verbose comments in LIT-3193 changes Tighten docstrings and remove redundant section dividers/inline narration. Behavior is unchanged. * fix(otel): set span.status on management hook parent SERVER span Mirror the unified failure path: stamp StatusCode.ERROR on the parent SERVER span before recording the exception, and StatusCode.OK before ending it on success. Without this, OTEL backends filtering on span status (the idiomatic primitive) miss admin-endpoint failures even though the http.response.status_code attribute is correct. Extend assert_server_span_attrs to assert span.status.status_code matches the expected outcome so the gap can't regress. * fix(otel): close SERVER span on body-validation and unhandled errors Stash the SERVER span on request.state in auth so FastAPI exception handlers can finish it for failures that occur after auth but before the route handler (e.g. /model/new TypeError, /key/generate RequestValidationError). Without this, those requests left dangling spans missing http.response.status_code. Resolves LIT-3193 * fix(otel): generic 500 body, log exception details server-side Don't leak str(exc) and type(exc).__name__ to clients on uncaught exceptions. The full traceback is logged via verbose_proxy_logger and the SERVER span still gets http.response.status_code=500. Resolves LIT-3193 * fix(otel): stamp http.response.status_code on every SERVER span path Closes three remaining gaps where the proxy SERVER span ended without the http.response.status_code attribute: 1. ProxyException raised from _read_request_body (e.g. invalid JSON body) bubbled out of user_api_key_auth before the SERVER span was created, so the FastAPI handler had nothing to close and the trace never reached the backend. Hoist the span creation to a new idempotent _ensure_parent_otel_span_on_request_state helper called at the top of user_api_key_auth; wire openai_exception_handler to close the dangling span. Covers /v1/chat/completions, /v1/messages, /v1/responses (shared handler). 2. /v1/responses success — _handle_success ends the proxy span before async_post_call_success_hook fires on this path, so the hook's set_response_status_code_attribute(200) silently no-op'd against an ended span. Stamp 200 + set OK status at the close site in _handle_success / _end_proxy_span_from_kwargs via a shared _close_proxy_span_ok helper, so the attribute lands regardless of which success hook runs first. 3. Failure path for exceptions without code/status_code (e.g. a bare TypeError surfacing through _handle_llm_api_exception) — empty error_information.error_code → _record_exception_on_span skips the stamp → the hook ends the span. Default to 500 in async_post_call_failure_hook so the attribute is always set. Resolves LIT-3193
* fix(helm): drop main- prefix from default image tag
The default image tag in the deployment + migrations-job templates was
`main-{{ .Chart.AppVersion }}`. The current release pipeline publishes
content tags without the `main-` prefix (e.g. `v1.85.1` / `1.85.1`,
`v1.86.0-rc.1` / `1.86.0-rc.1`), so the rendered ref points at a tag
that does not exist on GHCR or DockerHub and installs fail with
ImagePullBackOff.
- templates/deployment.yaml, templates/migrations-job.yaml: render
`.Chart.AppVersion` directly instead of `main-<AppVersion>`.
- Chart.yaml: bump stale `appVersion: v1.80.12` (not on either
registry) to `v1.85.1` so local-checkout installs also resolve.
- values.yaml: update the commented tag-override hint to match.
* fix(helm): use :latest in tag override example, not pinned version
Per review: ghcr.io/berriai/litellm-database:latest is a floating
alias for the most recent stable (same digest as :main-stable),
maintained by the release pipeline's UPDATE_LATEST advance step.
Better example than a pinned version that goes stale.
The schema in test_aaamodel_prices_and_context_window_json_is_valid uses additionalProperties: false. The azure/speech/azure-stt entry added in #27482 introduced an audio_transcription_config field that the schema did not whitelist, so the test fails on every branch built on top of staging. Add the field as a string property.
…8683) * fix(team): refresh team cache on team_model_add/delete (LIT-3244) team_model_add and team_model_delete wrote to the DB but did not invalidate the in-memory LiteLLM_TeamTableCachedObj used by common_checks. After the v1.83.14 common_checks centralization made team.models authoritative on /v1/files and /v1/vector_stores/*, adding a Team-BYOK model silently failed to grant the new public model name to team members until the cache TTL expired (and a removed model kept working until then on the symmetric path). Extract the cache-refresh snippet from update_team into a small helper and apply it consistently at all three team-write sites. * test: also assert updated models in team-cache-refresh pin Strengthens the LIT-3244 regression test to also assert `call_kwargs["team_table"].models` matches the updated row, not just `team_id`. Both `existing_team` and `updated_team` share `team_id` in the test setup, so the previous assertion would have passed even if the implementation accidentally cached the pre-mutation row. Greptile review feedback. * fix(team): hydrate object_permission on cache-refreshing team updates The Prisma update calls in update_team, team_model_add, and team_model_delete returned a team row with object_permission_id set but object_permission=None (the relation was not requested via include=). _refresh_cached_team then wrote that to the in-memory LiteLLM_TeamTableCachedObj, and the cache-hit path in get_team_object returns the cached object without re-hydrating. Downstream consumers (validate_key_search_tools_against_team, the MCP/agent authz paths) treat a missing object_permission as no team-level restriction, so a team-write op silently dropped object-permission enforcement until the cache TTL expired or a DB-fetch path re-hydrated it. Add include={"object_permission": True} to all three updates so the refresh writes a complete cached team. Extend the LIT-3244 regression test to pin both the cached object_permission and the include shape on the Prisma call. Surfaced in PR review of LIT-3244.
… Anthropic (#28723) `getProviderModels()` matched a model into a provider's dropdown when the model's `litellm_provider` string *contained* the provider key as a substring. The intent was to admit suffix variants (e.g. `anthropic_text`, `bedrock_converse`), but the substring check is too loose: it also pulls in unrelated providers whose name happens to contain the key, most visibly `vertex_ai-anthropic_models` matching `anthropic` and `vertex_ai-openai_models` matching `openai`. Replace `.includes()` with separator-anchored prefix matching (`startsWith(provider + "_")` / `startsWith(provider + "-")`). All legitimate variants in `model_prices_and_context_window.json` still match (`anthropic_text`, `azure_text`, `azure_ai`, `bedrock_converse`, `bedrock_mantle`, `cohere_chat`, `fireworks_ai-embedding-models`, `vertex_ai-*`, `vertex_ai_beta`), and the cross-provider leak is closed. Tests: update one assertion that pinned the buggy substring behavior (`custom_openai_endpoint` matching `openai` — not a real provider value); add 6 new tests covering the leak regressions and the variant-preservation contract for vertex_ai/bedrock/fireworks.
Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>
…rs and signed request body (#27526) * Fix Bedrock KB pass-through SigV4 headers and signed body Coerce botocore HeadersDict to a dict for pass-through routes. When forward_headers is true, drop request headers that collide case-insensitively with signed headers so client Bearer auth does not shadow AWS SigV4. Send prepped.body as raw content so the outbound payload matches the signature after logging hooks mutate the parsed dict. Co-authored-by: Cursor <cursoragent@cursor.com> * Simplify pass-through raw body handling Read the SigV4-signed bytes directly from request.state inside pass_through_request instead of threading a custom_raw_body argument through three functions. Helper methods are restored to their original signatures, and the new branch lives in one place at each httpx call site. Co-authored-by: Cursor <cursoragent@cursor.com> * Harden pass-through raw body read from request.state Guard missing request.state (test fixtures) and ignore non-bytes/str values so MagicMock does not trigger the SigV4 raw-body path. Co-authored-by: Cursor <cursoragent@cursor.com> * Test pass_through_request state_raw_body uses httpx content= Cover non-streaming (async_client.request) and streaming (build_request) paths so SigV4 bytes on request.state are not replaced by json= of a hook-mutated dict. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
* chore(tests): migrate Bedrock CI from AWS account 888602223428 to 941277531214
The original account (888602223428) was put under a security restriction by
AWS after a root access key leaked in a PR comment. While that account works
its way through the AWS Support unlock process, Bedrock-touching CI tests have
been migrated to a fresh account (941277531214).
Changes:
- Replace 26 hardcoded references to 888602223428 with 941277531214 across
8 files (provisioned-model ARNs, imported-model ARNs, AgentCore runtime
ARNs, batch execution role ARN, and example proxy config).
- The provisioned-model and imported-model ARNs are referenced only from
mocked unit tests — no AWS resources to recreate.
- The batch execution IAM role has been recreated in the new account with
the same name and equivalent permissions.
- The two AgentCore runtimes (hosted_agent_r9jvp-3ySZuRHjLC,
hosted_agent_13sf6-cALnp38iZD) are being recreated in the new account
under the same names — see tools/agentcore-deploy/ in a follow-up.
CircleCI env vars AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION_NAME
were updated separately via the CircleCI API to point at the new account.
Smoke-tested locally against the new account:
aws bedrock-runtime converse --region us-west-2 \
--model-id us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
--messages '[{"role":"user","content":[{"text":"ping"}]}]'
→ 200, model returned 'pong'
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* chore(tests): refresh AgentCore ARN suffixes to match newly-deployed runtimes
The first migration commit replaced just the account ID, but AgentCore
auto-assigns a random 10-char suffix to every runtime on creation — we
can't reuse the original suffixes (`3ySZuRHjLC`, `cALnp38iZD`) in the
new account. Updated the AgentCore-runtime ARNs in the three files that
reference real runtime IDs (not the mock-based unit-test ARNs).
Deployed runtimes:
arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_r9jvp-Rq79QFC2fp
arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_13sf6-4046UzHSwy
Both runtimes are status=READY and pass a smoke invoke:
$ aws bedrock-agentcore invoke-agent-runtime --agent-runtime-arn ... --payload '{"prompt":"ping"}'
→ 200, {"result": "echo: ping"}
The agent is a minimal echo (see /tmp/agentcore_deploy/agent.py for the
deploy artifacts). Tests that only verify the SDK wiring will pass; if any
test asserts on agent output content, swap the echo for the real agent.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* chore(tests): point Bedrock batch tests at new-account S3 bucket
The account migration (888602223428 -> 941277531214) was a flat
account-ID swap, which only rewrites ARNs that embed the account
number. S3 bucket names carry no account ID, so the live Bedrock
batch tests still uploaded to `litellm-proxy` — a bucket that lives
in the old account. S3 names are globally unique, and the old account
still holds that name, so it can't be recreated in the new account.
Rename to `litellm-proxy-941277531214` (account-ID suffix guarantees
global uniqueness). The bucket must be created in 941277531214 and the
batch execution role granted s3:GetObject/PutObject/ListBucket on it
before this job is run in CI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(tests): point live S3 logging test at new-account bucket
Same account-ID-free blind spot as the batch bucket: `load-testing-oct`
lives in the old account and its name can't be reused globally. The
`logging_testing` CI job is wired into the workflow and runs
test_basic_s3_logging, which uploads to this bucket with the CI env
creds, then lists and deletes objects — a live dependency.
Rename to `load-testing-oct-941277531214`. The bucket must exist in the
new account with the CI IAM principal granted
s3:PutObject/GetObject/ListBucket/DeleteObject before this job runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* chore(tests): repoint Bedrock guardrail IDs to new-account guardrails
The migration left guardrail IDs untouched (no account ID in them), so
all live guardrail tests failed with "guardrail identifier or version
does not exist" against 941277531214. Recreated both guardrails in the
new account and updated the hardcoded IDs:
- wf0hkdb5x07f -> zgkmukebruil (PII mask: PHONE + CREDIT_DEBIT_CARD,
with explicit inputAction=ANONYMIZE so masking applies to INPUT,
which is the source litellm's moderation hook sends)
- ff6ujrregl1q -> 4w3d1di3snt5 (blocks "coffee"; blocked message set
to the exact string the tests assert on)
Updated test_bedrock_guardrails.py, otel_test_config.yaml, and the
guardrailConfig in test_bedrock_completion.py. Verified locally: the 5
previously-failing guardrail tests now pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(bedrock): migrate legacy models to current inference profiles
The new CI account (941277531214) cannot invoke legacy Bedrock models
(AWS gates them: "marked by provider as Legacy... not actively using in
the last 30 days"). Migrated the live-call tests:
- anthropic.claude-3-sonnet-20240229 -> us.anthropic.claude-sonnet-4-5-20250929-v1:0
- anthropic.claude-3-haiku-20240307 -> us.anthropic.claude-haiku-4-5-20251001-v1:0
Current Claude models on Bedrock require the us. inference-profile prefix
(bare on-demand ids are rejected).
cohere.command-r-plus has no working replacement (all Cohere is legacy-
gated in the new account): swapped to claude-haiku-4-5 in provider-
agnostic param lists. amazon.titan-image-generator skipped (no working
replacement). Mocked/transformation/cost tests that reference the legacy
strings are intentionally left unchanged. Verified live against the new
account.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(bedrock): repoint SageMaker + Knowledge Base to new-account resources
These referenced account-scoped resources by hardcoded id that only
existed in the old account, so the migration's account-ID swap missed
them. Recreated in 941277531214 and repointed:
- SageMaker endpoint jumpstart-dft-hf-textgeneration1-mp-20240815-185614
-> litellm-ci-textgen (gpt2 on a TGI container, ml.g5.xlarge)
- Bedrock Knowledge Base T37J8R4WTM -> LCYXFBR2TU (OpenSearch Serverless
vector store + titan-embed-text-v2, seeded with a LiteLLM doc)
Verified live: test_sagemaker.py (12 passed) and
test_bedrock_knowledgebase_hook.py (12 passed).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(reasoning_effort_grid): skip bedrock claude-opus-4-7 cells (not entitled on 941277531214)
claude-opus-4-7 is listed in the new Bedrock CI account's foundation
models but invoke is denied (AccessDeniedException: "not available for
this account"). Bedrock access to the flagship Opus requires an AWS
Sales request, not the self-serve model-access toggle, so it can't be
enabled inline with the rest of the account migration.
Add an optional `skip_reason` to ModelEntry and set it on the
bedrock-claude-opus-4-7 entry; the grid test honors it via pytest.skip.
Cell count (231) and route coverage are unchanged, so the structural
asserts still pass. Restore coverage by deleting the one skip_reason
line once access is granted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* test(bedrock): swap/skip legacy-gated models unavailable on new CI account
The migrated AWS account (941277531214) cannot access several models that
the old account could, so the remaining red CI jobs were hitting real
Bedrock "Access denied / Legacy" and "account not authorized" errors:
- image_gen: skip both Nova Canvas test classes (amazon.nova-canvas-v1:0 is
legacy-gated), matching the existing titan skip.
- batches: skip test_async_file_and_batch (Bedrock batch inference is not
authorized on the new account; requires an AWS support case).
- litellm_overhead: swap legacy claude-3-5-haiku for the active
us.anthropic.claude-haiku-4-5 inference profile.
- test_completion_claude_3_function_call: swap legacy claude-3-sonnet for the
active us.anthropic.claude-sonnet-4-5 inference profile.
https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa
* test(bedrock): fix remaining e2e legacy-model + batch failures on new CI account
- e2e_openai_endpoints: skip test_bedrock_batches_api (Bedrock batch inference
is not authorized on account 941277531214) and migrate the missed
s3_bucket_name in oai_misc_config.yaml to litellm-proxy-941277531214.
- build_and_test: swap legacy bedrock claude-3-sonnet for the active
us.anthropic.claude-sonnet-4-5 inference profile in the proxy structured
output e2e test.
https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa
* test(bedrock): make opus-4-7 + batch cells fail loudly and mock image-gen (#28791)
Replace the silent skips added for the new CI account with noisier behavior:
- reasoning-effort grid: opus-4-7 cells now fail (when AWS creds are present)
instead of skipping, so the missing entitlement stays visible in CI; they
still skip when AWS creds are absent (local dev)
- Bedrock batch inference tests: drop the skip so they run and fail until
batch access is granted
- Titan + Nova Canvas image-gen tests: mock the Bedrock HTTP call so the
transform + cost-tracking path stays under test without live model access
https://claude.ai/code/session_01MT7SWDnXUjv6e6EPG7BDjT
Co-authored-by: Claude <noreply@anthropic.com>
* test(bedrock): use pytest.xfail for known-failing opus-4-7 cells
Replace pytest.fail with pytest.xfail when a model has a fail_reason,
so known-broken cells stay visible as XFAIL without keeping CI red.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
---------
Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
…http_request (#28794) Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local>
* chore(proxy): route path-dependent call sites through get_request_route Replace direct ``request.url.path`` reads in auth, ACL, routing, and audit-log decisions with ``get_request_route(request)`` — the helper already added in ``auth/auth_utils.py`` that returns the ASGI ``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs ``url.path`` from the Host header; ``scope["path"]`` is uvicorn's parse of the request line and matches what FastAPI dispatches on, so it's the authoritative route for any decision that should agree with the actual handler. Sites: - _experimental/mcp_server/auth/user_api_key_auth_mcp.py - management_endpoints/mcp_management_endpoints.py - vector_store_endpoints/utils.py - pass_through_endpoints/pass_through_endpoints.py - auth/route_checks.py - litellm_pre_call_utils.py - spend_tracking/spend_management_endpoints.py - common_utils/http_parsing_utils.py - management_helpers/utils.py - health_endpoints/_health_endpoints.py Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py that construct a Request with scope["path"] set to a benign route and the Host header crafted so url.path would resolve differently; each site's decision is asserted against scope["path"]. * chore(proxy): make get_request_route imports lazy at call sites Move the ``from litellm.proxy.auth.auth_utils import get_request_route`` imports added in the prior commit back to the function bodies that use them. The module-level form participates in a long-standing import cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL on the PR; the lazy form matches the pattern the proxy already uses for ``user_api_key_auth`` and related helpers elsewhere in these files. Also drop the ``RouteChecks._is_assistants_api_request`` delegation in ``_get_metadata_variable_name`` introduced in the prior commit — the delegation pulled ``RouteChecks`` into the same cycle, and the call site reuses the resolved route for its other branches, so inlining the substring check is both cycle-free and avoids a redundant second ``get_request_route`` call. Comment in test_proxy_routes.py acknowledges that the two MCP table entries exercise ``get_request_route`` directly rather than the full production handler (which needs ASGI scope + MCP state to invoke). --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: user <70670632+stuxf@users.noreply.github.com>
* feat(dashboard): navbar hierarchy + Agent Platform notifications (#27543) * feat(dashboard): refine navbar zones and Agent Platform notice Restructure the admin navbar for production users: clear product vs community vs personal columns with vertical dividers, icon-only Slack/GitHub in a shared chip, and Docs/Blog typography aligned on an 8px rhythm. Add a notifications bell with popover linking to the LiteLLM Agent Platform repo and optional mark-as-read persistence. Promote the account control with initials avatar, single-line display name, and navDisplayName mapping for placeholder user ids (e.g. default_user_id). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(dashboard): address PR review — AntD buttons, public page guard, dedupe regex - Replace raw <button> with AntD Button in BlogDropdown, NotificationsBell, UserDropdown, and test mock - Guard NotificationsBell + container behind !isPublicPage to avoid rendering on public pages - Remove redundant equality checks in navDisplayName (regex already covers them) - Remove unused `lower` variable after simplification Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * fix(dashboard): drop dead useHealthReadiness import in navbar The module was removed in #27896 (replaced by useHealthReadinessDetails), but the import survived the rebase. The symbol is unused — only useHealthReadinessDetails is consumed in the file. Removing the dead import unblocks the UI TypeScript build. * fix(dashboard): align CommunityEngagementButtons test with icon-only aria-labels The component was refactored to an icon-only chip with aria-label='LiteLLM on GitHub' (squash #27543), but the test still asserted /star us on github/i. Update the query to match the rendered accessible name. * refactor(dashboard): drop unused props from NavbarProps The navbar refactor moved user identity + dark-mode state to internal hooks (useAuthorized, useWorker), but the NavbarProps interface still declared userID, userEmail, userRole, premiumUser, isDarkMode, and toggleDarkMode as required, forcing every caller to thread them through. Drop them from the interface and all four call sites (page.tsx, (dashboard)/layout.tsx, public_model_hub.tsx, navbar.test.tsx). Also shrinks the destructure in layout.tsx so the now-unused locals stop being pulled out of useAuthorized(). * refactor(dashboard): use useSyncExternalStore for NotificationsBell dismiss flag Reads/writes of the litellmHideAgentPlatformBanner key were done directly inside NotificationsBell via a useEffect + useState pair. Every other localStorage-backed flag in the dashboard (Disable ShowPrompts, DisableBouncingIcon, DisableShowNewBadge, DisableUsageIndicator, DisableBlogPosts) is wrapped in a useSyncExternalStore hook over localStorageUtils so all mounted components stay in sync. Extract useHideAgentPlatformBanner to follow the same shape, swap NotificationsBell to consume it, and add a regression test that two sibling bells stay in sync without a remount when one is dismissed. * refactor: mask credential fields in proxy settings GET responses (#28682) * refactor: mask credential fields in proxy settings GET responses Brings SSO settings, cache settings, and the email/Slack alerting view in /get/config/callbacks in line with the HashiCorp Vault config-override pattern, so persisted credentials are not transported back to the UI in plaintext. * refactor: harden short-value masking and hoist alerting var constant Closes two review observations: - mask_sensitive_keys now replaces short values (below the visible prefix+suffix length) with an all-mask string instead of returning them unchanged, so a 1-7 character credential is no longer round-tripped verbatim. - _ALERTING_SENSITIVE_VARS is moved out of get_config() to a module-level constant, matching the analogous _SSO_SENSITIVE_FIELDS and _CACHE_SENSITIVE_FIELDS in the SSO and cache endpoint files. --------- Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…28442) * feat(proxy): allow llm_api_routes virtual keys to list MCP servers Add a new `mcp_discovery_routes` group (GET /v1/mcp/server and GET /v1/mcp/server/{server_id}) and include it in `llm_api_routes` so that virtual keys configured with `allowed_routes=["llm_api_routes"]` can discover the MCP servers they have access to. Previously these calls failed with 'Virtual key is not allowed to call this route. Only allowed to call routes: [llm_api_routes]'. The GET handlers already sanitize the response for restricted virtual keys via `_sanitize_mcp_server_list_for_virtual_key`, stripping credential-bearing fields (url, headers, env). Write methods (POST/PUT/DELETE) on the same paths remain gated by the existing handler-level admin role checks. The new discovery list is intentionally kept OUT of `mcp_inference_routes`, so `is_llm_api_route()` still returns False for these paths — this preserves the existing contract that DISABLE_LLM_API_ENDPOINTS must not block the Admin UI from listing MCP servers. Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com> * refactor(proxy): make MCP discovery carve-out method-aware Replace the `mcp_discovery_routes` group in `llm_api_routes` with a method-aware special case inside `is_virtual_key_allowed_to_call_route`. Virtual keys with allowed_routes=["llm_api_routes"] are now permitted to call only GET /v1/mcp/server and GET /v1/mcp/server/{server_id} — non-GET methods and multi-segment admin sub-paths fall through to the existing 403. This keeps the general llm_api_routes list free of management paths and avoids accidentally exposing POST/PUT/DELETE writes through the route-check layer. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com>
* chore(proxy): route path-dependent call sites through get_request_route Replace direct ``request.url.path`` reads in auth, ACL, routing, and audit-log decisions with ``get_request_route(request)`` — the helper already added in ``auth/auth_utils.py`` that returns the ASGI ``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs ``url.path`` from the Host header; ``scope["path"]`` is uvicorn's parse of the request line and matches what FastAPI dispatches on, so it's the authoritative route for any decision that should agree with the actual handler. Sites: - _experimental/mcp_server/auth/user_api_key_auth_mcp.py - management_endpoints/mcp_management_endpoints.py - vector_store_endpoints/utils.py - pass_through_endpoints/pass_through_endpoints.py - auth/route_checks.py - litellm_pre_call_utils.py - spend_tracking/spend_management_endpoints.py - common_utils/http_parsing_utils.py - management_helpers/utils.py - health_endpoints/_health_endpoints.py Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py that construct a Request with scope["path"] set to a benign route and the Host header crafted so url.path would resolve differently; each site's decision is asserted against scope["path"]. * chore(proxy): make get_request_route imports lazy at call sites Move the ``from litellm.proxy.auth.auth_utils import get_request_route`` imports added in the prior commit back to the function bodies that use them. The module-level form participates in a long-standing import cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL on the PR; the lazy form matches the pattern the proxy already uses for ``user_api_key_auth`` and related helpers elsewhere in these files. Also drop the ``RouteChecks._is_assistants_api_request`` delegation in ``_get_metadata_variable_name`` introduced in the prior commit — the delegation pulled ``RouteChecks`` into the same cycle, and the call site reuses the resolved route for its other branches, so inlining the substring check is both cycle-free and avoids a redundant second ``get_request_route`` call. Comment in test_proxy_routes.py acknowledges that the two MCP table entries exercise ``get_request_route`` directly rather than the full production handler (which needs ASGI scope + MCP state to invoke). --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: user <70670632+stuxf@users.noreply.github.com>
…#28737) * fix(team): keep team_alias cache in sync on _cache_team_object writes _cache_team_object wrote only to the team_id:<id> cache key, but the JWT auth path that uses team_alias_jwt_field reads from a separate team_alias:<alias> key (get_team_object_by_alias caches under both keys on miss, but reads only the alias-keyed one). After any team-mutation endpoint (team_model_add, team_model_delete, update_team, the two access-group writes) the team_id cache was refreshed but the team_alias cache stayed stale until TTL — JWT callers using team_alias_jwt_field kept seeing the pre-mutation team for the full cache window. Mirror the write under the alias key inside _cache_team_object so every existing caller stays in sync without further changes. Skip the alias write when team_alias is None/empty so we don't collide across alias-less teams. Surfaced testing the LIT-3244 cherry-pick on patch/1.86.0: the LIT-3244 fix correctly invalidated the team_id cache but the customer's JWT used team_alias_jwt_field, so they kept hitting the stale alias-keyed entry. * fix(team): delete (not overwrite) team_alias cache on _cache_team_object The prior shape of this PR wrote both team_id:<id> AND team_alias:<alias> from _cache_team_object. team_alias is NOT unique in the schema (no @unique on LiteLLM_TeamTable.team_alias), and get_team_object_by_alias enforces uniqueness on its own DB-fetch path (len(teams) > 1 raises). Writing the alias-keyed cache from the generic refresh path bypassed that check: a team admin renaming their team to collide with another team's alias could silently overwrite the cached team for JWT-by-alias auth, swapping the resolved team under that alias for the cache window. Switch the alias-keyed operation from a write to a delete (mirroring the dual-cache delete pattern in _delete_cache_key_object). After every team write, the next JWT-by-alias reader cache-misses and falls through to get_team_object_by_alias, which (a) re-fetches the fresh team from DB, closing the LIT-3244 staleness gap that motivated this PR, and (b) enforces alias uniqueness before populating either cache key. team_id:<id> writes are unchanged — team_id is the table PK and is guaranteed unique. Surfaced in veria-ai review on #28739. * fix(managed-files): anchor model_id regex so it doesn't match llm_output_file_model_id extract_model_id_from_unified_id used `re.search(r"model_id,([^;]+)", ...)` which substring-matches the `model_id,` inside the file-ID encoding's `llm_output_file_model_id,<deployment_uuid>` field. parse_unified_id then fed that deployment UUID back into the auth path as a model candidate via _extract_models_from_managed_resource_id, and every team-BYOK file attach 403'd with: team not allowed to access model. This team can only access models=['openai/*']. Tried to access <deployment-uuid> The team's models list correctly contains the public name (`openai/*`) that target_model_names matches, but the bogus UUID candidate fails the wildcard check first. Anchor the regex to a field boundary (`(?:^|;)model_id,`) so it matches the legitimate top-level `model_id,<value>` field on vector_store unified IDs and skips substring matches inside other fields. File-IDs (which have no top-level `model_id` field) now return None and contribute no spurious UUID candidate. Surfaced reproducing LIT-3244 on patch/1.86.0 with the customer's exact flow: team with openai/* BYOK deployment, JWT-scoped user, POST /v1/vector_stores/{id}/files attaching a file uploaded with target_model_names=openai/gpt-4o.
Creates litellm_oss_agent_shin_MM_DD_YYYY from main every day at 00:00 UTC. Lets us retarget oss-agent-shin fork PRs onto a canonical branch so CircleCI runs with secrets, without granting the agent write access. Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
* test(proxy): add harness for proxy_server.py behavior-pinning Creates tests/test_litellm/proxy/proxy_server/ with: - conftest.py: 11 shared fixtures (app, client, mock_prisma, auth_as, mock_router with parametrized response builders, normalize, etc.) - _coverage_check.py: per-PR coverage gate (line + branch) against a baseline, self-selects target by inspecting which placeholder files have been filled - _pin_check.py: AST-based gate that verifies every pin-list item has >=1 happy + >=1 error test with a real assertion (no status-only) - test_harness_smoke.py: 19 smoke tests covering every fixture + both scripts end-to-end - 26 placeholder test files (one docstring each) reserved for follow-up PRs per the directory ownership in the Notion plan - .coverage_baseline pinned at 0% so future PRs measure deltas against new-tests-only and aren't entangled with the broader scattered test suite Adds a dedicated proxy-server job to test-unit-proxy-endpoints.yml so this directory's runtime + coverage are tracked independently. Plan: https://www.notion.so/36c43b8acdab81ee845fd5365128a2fc * ci(proxy-endpoints): allow workflow_dispatch Lets the workflow be triggered manually on a branch via `gh workflow run`, which is needed for the verify-first flow on workflow changes before opening a PR. * test(proxy): address review feedback on proxy_server harness - conftest.py: anchor sys.path insert to __file__ (Path(__file__).resolve().parents[4]) instead of CWD-relative os.path.abspath("../../../../") which resolved to the wrong directory when pytest is launched from the repo root. - _coverage_check.py: actually read .coverage_baseline and use it as the floor (line_min = max(target, baseline)). Closes the gap between the PR description's "delta semantics" and what the script was doing. With baseline=0.0 today this is a no-op; future PRs that update the baseline cause regressions (test deletions etc.) to trip the gate even if the static PR target is still met. - _pin_check.py: drop unreachable startswith("_") guard (test_*.py glob never yields underscore-prefixed names) and read each test file once instead of twice.
…sidency (#28626) * feat(openai): apply regional-processing cost uplift for EU/US data residency OpenAI charges a 10% uplift on the latest GPT models when requests are served from a regionalized hostname (eu./us.api.openai.com). Infer the region from `api_base`, expose it on `kwargs["litellm_params"]["data_residency"]`, and multiply the computed cost by a per-model `regional_processing_uplift_multiplier_<region>` field. https://claude.ai/code/session_012ebH44s7ohYxjoix5CXzTW * test: allow regional_processing_uplift_multiplier_{eu,us} in model_prices schema * fix(cost): tighten data_residency inference and restore model_cost in tests - Only infer OpenAI data_residency when custom_llm_provider == "openai"; drop the implicit None fallback so non-OpenAI callers can't accidentally pick up a regional tag from a stray OpenAI hostname. - _local_model_cost_map fixture now snapshots and restores litellm.model_cost and LITELLM_LOCAL_MODEL_COST_MAP so tests don't leak state across the session. * refactor(openai): move data_residency helper under llms/openai * fix: thread data_residency through realtime stream cost calculation Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(cost): thread data_residency through batch_cost_calculator Apply the OpenAI regional-processing uplift multiplier to retrieve_batch cost paths so Batch API requests served via eu./us.api.openai.com are priced at the same uplifted token rates as completions/transcriptions. * refactor(openai): encapsulate provider check inside infer_openai_data_residency Move the custom_llm_provider == "openai" guard from get_litellm_params into the helper itself so the core utility no longer carries provider-specific dispatch logic. Callers pass through the provider unconditionally; the helper returns None for any non-OpenAI provider. * fix(responses): thread data_residency through Responses logging params The Responses API paths build their logging litellm_params dict after provider resolution but did not include data_residency, so cost calc saw None even when the effective api_base was a regional OpenAI host. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai>
2663fe9
into
litellm_fix_openai_moderation_streaming_end_of_stream
|
Too many files changed for review. ( |
|
|
This reverts commit 2663fe9.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.
Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue. You can view the agent here.
Reviewed by Cursor Bugbot for commit c23b19f. Configure here.
| uv tool run --from 'coverage[toml]==7.10.6' coverage xml | ||
| - codecov/upload: | ||
| file: ./coverage.xml | ||
| flags: circleci |
There was a problem hiding this comment.
Codecov upload flags at wrong YAML indentation level
Low Severity
The flags: circleci line is indented at 10 spaces, while file: ./coverage.xml above it is at 12 spaces. Since orb step parameters must be at the same indentation level, flags ends up outside the codecov/upload: step's parameter mapping. The flag won't be associated with the upload, so the Codecov coverage report from CircleCI won't be tagged with the circleci flag as intended.
Reviewed by Cursor Bugbot for commit c23b19f. Configure here.
There was a problem hiding this comment.
Bugbot Autofix determined this is a false positive.
Both file: and flags: are at the same 10-space indentation under - codecov/upload:, so flags: circleci is correctly registered as a step parameter; the report's claim of differing indentation (10 vs 12) does not match the actual file.
You can send follow-ups to the cloud agent here.


Relevant issues
Linear ticket
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/test_litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unit@greptileaiand received a Confidence Score of at least 4/5 before requesting a maintainer reviewDelays in PR merge?
If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).
CI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Screenshots / Proof of Fix
Type
🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test
Changes
Note
High Risk
Large architectural and deployment changes (route splitting, new images, DB URL/IAM/replica wiring, migration job ownership) affect how production connects and scales; incorrect allowlist or env assembly would break API routing or schema startup.
Overview
This merge brings in a split proxy architecture: new
gateway/andbackend/entrypoints reuseproxy_serverbut trim routes at lifespan startup via allowlists (LLM data plane vs dashboard/management API), with matchinghelm/litellmchart (gateway, backend, ui, ingress routing, pre-upgrade migrations Job, discreteDATABASE_*env + optional read replica / IAM / Redis cluster).Containers and ops drop the separate health supervisord stack; main/database Dockerfiles are slimmed (Chainguard digest bumps, smaller Prisma cache copy).
deploy/charts/litellm-helmgains read-replica env, optional HPAbehavior, and dropsSEPARATE_HEALTH_APPprobing.CI and quality: pytest
--cov=./litellmeverywhere, more CircleCI shards merged into Codecov (withcircleciflag), per-shard Codecov flags on GHA; new mutation-test and daily oss-agent-shin workflows; CodeQL uploads filtered for OCIx-content-sha256; many legacy workflows/docs removed. E2E forwardsLITELLM_LICENSEfor premium UI tests.Product fix: enterprise CheckBatchCost now creates managed file IDs for batch output/error files before DB writes. Smaller doc/README and
.gitignore(Terraform) updates; several unused Docker/K8s/publish assets deleted.Reviewed by Cursor Bugbot for commit c23b19f. Bugbot is set up for automated code reviews on this repo. Configure here.