fix(guardrails): persist disable_global_guardrails on keys#29233
Conversation
The per-key/team "Disable Global Guardrails" toggle silently stopped working after #17042, which removed `disable_global_guardrails` from the key/team request models and from the premium metadata allowlist. Without those, the UI's top-level field was dropped by pydantic and never folded into key `metadata`, so the runtime gate always read False and global default_on guardrails kept running. Restore the request-model fields (KeyRequestBase, NewTeamRequest, UpdateTeamRequest) and the `LiteLLM_ManagementEndpoint_MetadataFields_Premium` entry so the flag is promoted into metadata again. Because the key edit form always submits the flag (false by default), guard the UI so it is only sent when it actually changed (edit) or is enabled (create) — this keeps the premium gate on enabling intact while not 403-ing non-premium users who edit unrelated key fields, mirroring how guardrails/tags are already stripped.
Greptile SummaryThis PR re-adds
Confidence Score: 5/5Safe to merge — narrow, additive change restoring a previously broken field, covered by new tests for both enable and disable paths. All four changed files make backward-compatible additions. The Python types add an No files require special attention.
|
| Filename | Overview |
|---|---|
| litellm/proxy/_types.py | Adds disable_global_guardrails: Optional[bool] = None to KeyRequestBase, NewTeamRequest, UpdateTeamRequest, and LiteLLM_ManagementEndpoint_MetadataFields_Premium; consistent with the existing pattern for other premium metadata fields. |
| tests/proxy_admin_ui_tests/test_key_management.py | Adds two parametrized cases to test_prepare_metadata_fields for enabling and disabling the flag; the False case validates that the value correctly overrides existing metadata. Tests implicitly rely on premium_user=True in the test environment (no mock), matching the pre-existing dependency for tags/guardrails cases. |
| ui/litellm-dashboard/src/components/organisms/create_key_button.tsx | Strips disable_global_guardrails from the submit payload when falsy, correctly preventing non-premium users from triggering the server premium gate on key creation. |
| ui/litellm-dashboard/src/components/templates/key_info_view.tsx | Adds a 'send only when changed' guard using Boolean() coercion for disable_global_guardrails on key edit and updates the PREMIUM_METADATA_FIELDS comment to explain why boolean premium fields need separate handling. |
Reviews (2): Last reviewed commit: "test(guardrails): cover disable_global_g..." | Re-trigger Greptile
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…y premium field comment Add a prepare_metadata_fields case asserting `disable_global_guardrails: False` overwrites an existing `True`, and rewrite the PREMIUM_METADATA_FIELDS comment to explain why boolean premium fields are excluded from the empty-value strip loop.
|
@greptileai re review |
9918a9c
into
litellm_internal_staging
* feat: add support for claude code goal mode for bedrock opus output config (BerriAI#28898) * feat: support goal mode for claude on bedrock * fix failing lint test * addressing greptile comments * fixing failed test * address greptile: copy output_config and warn on dropped converse format * fix(bedrock): skip redundant output_config normalization on Converse reasoning_effort path When reasoning_effort is mapped via _handle_reasoning_effort_parameter, the resulting output_config is already normalized via normalize_bedrock_opus_output_config_effort. Mark it as normalized so _prepare_request_params can skip the redundant call (and the associated get_model_info lookup) on every request. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(reasoning-effort-grid): reflect Bedrock opus-4-6 xhigh→max clamping * fix(bedrock): stop leaking output_config marker and message-content mutation * fix(bedrock): guard effort key access in normalize_bedrock_opus_output_config_effort Defensively check that 'effort' is a valid key in _BEDROCK_OUTPUT_CONFIG_EFFORT_ORDER before indexing, to prevent a KeyError if the hardcoded guard tuple ever drifts from the order dict's keys. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(bedrock): drop dead second clause in effort normalization guard The 'effort not in _BEDROCK_OUTPUT_CONFIG_EFFORT_ORDER' check is unreachable once 'effort not in ("xhigh", "max")' has been ruled out, since both literals are present in the order dict. Keep the literal membership check and let the dict lookups below speak for themselves. * fix(bedrock): clamp output_config.effort against ceiling for any known value The early return when effort was not 'xhigh'/'max' meant a ceiling of 'low' or 'medium' would silently forward an out-of-range value. Gate on the known effort ordering instead so the ceiling comparison runs for every recognized effort. * test(grid_spec): use _CAPS_OPUS_4_7 for non-Bedrock opus-4-6 entries claude-opus-4-6 now declares supports_xhigh_reasoning_effort in the model map, so production accepts xhigh on Azure AI and Vertex AI routes. Update those grid_spec entries to match production capabilities so expected() predicts 200 for xhigh instead of 400. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(grid_spec): revert xhigh caps for non-Bedrock opus-4-6 azure_ai/claude-opus-4-6 and vertex_ai/claude-opus-4-6 do not declare supports_xhigh_reasoning_effort in model_prices_and_context_window.json. Azure AI upstream rejects xhigh with HTTP 400 ("Supported levels: high, low, max, medium"). Restore _CAPS_4_6 so the grid predicts 400 for xhigh, matching production capabilities. * fix: stop advertising xhigh effort on Opus 4.5/4.6 Only Opus 4.7 supports the xhigh reasoning effort level. Remove the supports_xhigh_reasoning_effort flag from every Opus 4.5 and Opus 4.6 entry (direct Anthropic, Bedrock, and regional variants) in both model catalog files. On the direct Anthropic path there is no effort clamp, so flagging 4.5/4.6 as xhigh-capable caused litellm to forward xhigh to a model that rejects it (and made get_model_info misreport the capability). xhigh now correctly degrades to high / raises on those models. Bedrock graceful degradation for Claude Code goal mode is unaffected: it relies solely on the bedrock_output_config_effort_ceiling clamp (4.5->high, 4.6->max, 4.7->xhigh), which runs before validation, so xhigh requests to older Bedrock Opus models are still silently lowered rather than rejected. Update effort-gating tests to reflect that 4.5/4.6 no longer accept xhigh. * fix: clamp xhigh effort on Bedrock Invoke /v1/messages instead of rejecting Claude Code "goal mode" sends output_config.effort=xhigh over the Anthropic /v1/messages API, which routes Bedrock models through AmazonAnthropicClaudeMessagesConfig. That path validated effort against the model's native capability and raised 400 for xhigh on Opus 4.6, while the chat-completions paths (Converse + Invoke) already clamp xhigh to the model's bedrock_output_config_effort_ceiling. That asymmetry broke goal mode on the exact API surface Claude Code uses. Apply the same ceiling clamp on the messages path before the shared effort gate runs, so xhigh degrades to max on Opus 4.6 (and stays xhigh on 4.7). Scoped to adaptive-thinking models and to models that declare a ceiling, so Sonnet 4.6 (no ceiling) and Opus 4.5 (budget mode) are unaffected and still reject xhigh. * fix(bedrock): preserve user output_config when applying reasoning_effort - Converse path: merge mapped effort into existing output_config via setdefault instead of overwriting it, matching the Anthropic Messages path. Prevents user-supplied output_config.format from being silently dropped when reasoning_effort is also provided. - tests: clear _get_local_model_cost_map lru_cache in the autouse fixture alongside get_bedrock_response_stream_shape to avoid stale cache leakage between tests. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(bedrock): pre-clamp reasoning_effort for chat invoke; correct test caps - Add _clamp_adaptive_reasoning_effort_for_bedrock to AmazonAnthropicClaudeConfig so raw reasoning_effort=xhigh degrades to the model's bedrock effort ceiling before AnthropicConfig.map_openai_params converts it to output_config. Mirrors converse path (_handle_reasoning_effort_parameter) and messages path (_clamp_adaptive_reasoning_effort_for_bedrock) so the three Bedrock paths are consistent. - grid_spec: restore caps=_CAPS_4_6 for Bedrock converse/invoke Opus 4.6 entries so the test reflects the model's actual JSON capabilities. Teach expected() to bypass the xhigh/max cap check when bedrock_effort_ceiling will clamp the wire effort, so the test still passes for Bedrock's graceful degradation contract without lying about native model caps. Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Dennis Henry <dennis.henry@okta.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> * feat(guardrails): wire apply_guardrail into proxy logging callbacks (BerriAI#28970) * feat(guardrails): wire apply_guardrail into proxy logging callbacks Route /apply_guardrail through pre/post proxy hooks and LiteLLM success/failure handlers so Langfuse and OTEL integrations receive input/output on guardrail-only requests. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(guardrails): fix Greptile review comments on apply_guardrail logging Co-authored-by: Cursor <cursoragent@cursor.com> * fix(apply_guardrail): preserve original exception and capture modified response - Capture return value from post_call_success_hook so callback-modified responses propagate to the caller. - Wrap success/failure logging calls in defensive try/except so logging infrastructure failures don't replace the user-visible response or mask the original guardrail exception. Co-authored-by: Yassin Kortam <yassin@berri.ai> * Fix mypy * fix(apply_guardrail): isolate failure logging and use post-hook response for logging - Split async_failure_handler and post_call_failure_hook into independent try/except blocks so a callback bug in one does not silently skip the other. - Build response_for_logging inside _emit_guardrail_success_logs after post_call_success_hook runs, so logged data matches the response the caller actually receives when the hook modifies the response. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(apply_guardrail): fix black formatting and update tests for fastapi_request param - Run black on guardrail_endpoints.py to fix CI formatting check - Add _mock_proxy_logging() helper to enterprise guardrail tests to patch proxy-server globals imported at call time - Pass fastapi_request=Mock() in all direct apply_guardrail test calls to match updated function signature Co-authored-by: Cursor <cursoragent@cursor.com> * fix(guardrails): use transformed exception from post_call_failure_hook in apply_guardrail Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(guardrails): isolate sync/async logging handlers in apply_guardrail Separate each logging handler call into its own try/except so a failure in the async handler does not silently skip the sync handler submission (and vice versa). Matches the docstring's defensive intent. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(apply_guardrail): guard transformed_exception with isinstance check Co-authored-by: Cursor <cursoragent@cursor.com> * test(guardrails): mock proxy globals in not_found test and share apply_guardrail logging fixture - Add proxy-server global mocks to test_apply_guardrail_not_found so the failure-path post_call_failure_hook call doesn't touch the real proxy logging singleton. - Extract the duplicated _mock_proxy_logging context manager out of the two enterprise apply_guardrail test files into a shared conftest fixture so the helper stays in one place. * fix(guardrails): use update_messages to keep logging obj in sync Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> * chore(ci): merge dev brach (BerriAI#29192) * build(deps): bump next from 16.2.4 to 16.2.6 in /ui/litellm-dashboard (BerriAI#27665) Bumps [next](https://github.com/vercel/next.js) from 16.2.4 to 16.2.6. - [Release notes](https://github.com/vercel/next.js/releases) - [Changelog](https://github.com/vercel/next.js/blob/canary/release.js) - [Commits](vercel/next.js@v16.2.4...v16.2.6) --- updated-dependencies: - dependency-name: next dependency-version: 16.2.6 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump protobufjs in /tests/pass_through_tests (BerriAI#28296) Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.5.6 to 7.6.0. - [Release notes](https://github.com/protobufjs/protobuf.js/releases) - [Changelog](https://github.com/protobufjs/protobuf.js/blob/protobufjs-v7.6.0/CHANGELOG.md) - [Commits](protobufjs/protobuf.js@protobufjs-v7.5.6...protobufjs-v7.6.0) --- updated-dependencies: - dependency-name: protobufjs dependency-version: 7.6.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump ws from 8.20.0 to 8.20.1 in /tests/pass_through_tests (BerriAI#28303) Bumps [ws](https://github.com/websockets/ws) from 8.20.0 to 8.20.1. - [Release notes](https://github.com/websockets/ws/releases) - [Commits](websockets/ws@8.20.0...8.20.1) --- updated-dependencies: - dependency-name: ws dependency-version: 8.20.1 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix: improve bedrock streaming hot path perf (BerriAI#28720) * fix(proxy): enforce tag budgets for key-level tags (BerriAI#29108) * fix(proxy): enforce tag budgets for key-level tags Merge API key metadata.tags into request_data before _tag_max_budget_check so per-tag budgets apply when tags are set on the key at creation time. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(auth): avoid false reject for key-inherited tags Run reject_clientside_metadata_tags before key-tag injection, then inject key metadata tags immediately before tag budget checks so key tags still enforce budgets without being treated as client-supplied tags. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * fix(vertex-ai): use DB credentials in video handlers + implement Veo video edit (BerriAI#29098) * fix(vertex-ai): pass litellm_params to validate_environment in video handlers and implement video edit for Veo - Pass litellm_params to validate_environment in 11 video handler call sites (remix, create_character, get_character, edit, extension, delete) so DB-stored Vertex AI credentials are used instead of falling back to ADC - Implement transform_video_edit_request/response for VertexAI: fetches source video via fetchPredictOperation then submits a new predictLongRunning request with the video bytes/gcsUri + edit prompt Co-authored-by: Cursor <cursoragent@cursor.com> * fix(vertex-ai): hoist fetchPredictOperation into handlers to avoid blocking event loop - Add get_video_edit_prefetch_params() to BaseVideoConfig (returns None) - VertexAI overrides it to return the fetchPredictOperation URL/body - Both sync and async video_edit handlers call this and use their shared httpx client for the fetch, passing the result as prefetched_source_data - transform_video_edit_request is now a pure transform with no HTTP calls - Fix extra_body.pop() mutation by working on a shallow copy Co-authored-by: Cursor <cursoragent@cursor.com> * fix(vertex-ai): include prefetch call inside _handle_error try/except block Co-authored-by: Cursor <cursoragent@cursor.com> * fix(videos): add prefetched_source_data param to all transform_video_edit_request overrides Co-authored-by: Cursor <cursoragent@cursor.com> * fix(video_edit): keep transform/pre_call outside try so validation errors propagate Move transform_video_edit_request and logging_obj.pre_call outside the try/except that wraps HTTP calls in (async_)video_edit_handler so that ValueError validation errors (e.g. 'source video not complete yet') are not silently wrapped as 500s by _handle_error. The prefetch HTTP call keeps its own try/except so its errors are still mapped through the provider's error handler. Matches the pattern used by video_extension_handler and video_remix_handler. Co-authored-by: Yassin Kortam <yassin@berri.ai> * refactor(vertex_ai): delegate get_video_edit_prefetch_params to status retrieve Co-authored-by: Yassin Kortam <yassin@berri.ai> * Fix varia review * fix(video_edit): route transform errors through _handle_error Wrap transform_video_edit_request and pre_call in the same try/except as the HTTP call in sync and async handlers so validation failures (e.g. source video not complete) return typed LiteLLM exceptions. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(datadog): drain cost-management queue + opt-in FinOps tag allowlist (BerriAI#28487) * fix(datadog): drain cost-management queue + opt-in FinOps tag allowlist * fix(datadog): guard non-dict callback_specific_params + log empty aggregation * fix(datadog): block user-controlled tags from overwriting reserved cost-attribution dimensions * fix(datadog): cast metadata to dict[str, Any] to satisfy mypy * feat(helm): split per-component ServiceAccounts for gateway, backend, and UI (BerriAI#28712) * feat(helm): split per-component ServiceAccounts for gateway, backend, and UI Replace the single shared serviceAccount with three separate serviceAccounts (gateway, backend, ui) so operators can attach different IRSA / Workload Identity annotations per component without granting data-plane credentials to the UI pod. Key changes: - values.yaml: rename serviceAccount → serviceAccounts with gateway/backend/ui sub-keys; UI defaults to automount: false - _helpers.tpl: replace litellm.serviceAccountName with three component-scoped helpers (litellm.gateway/backend/ui.serviceAccountName) - serviceaccount.yaml: create up to three separate ServiceAccount objects with component labels and per-SA automountServiceAccountToken - gateway/backend deployments: use their respective SA helpers - ui deployment: use litellm.ui.serviceAccountName + explicit automountServiceAccountToken: false on the pod spec so the projected token is absent even when the SA itself allows it - migrations-job: share the backend SA (both need DB write access) Resolves LIT-3171 https://claude.ai/code/session_01QPy362WnjmEpeNuJaPUqmF * fix(helm): enforce automountServiceAccountToken on all pod specs; fix leading --- in serviceaccount.yaml - gateway/backend deployments: add explicit automountServiceAccountToken on the pod spec so serviceAccounts.*.automount is honoured regardless of whether the SA is chart-created or operator-supplied (previously the flag only took effect on the SA object when create: true, creating an asymmetry with the UI which already enforced it at pod-spec level) - serviceaccount.yaml: use a $prev sentinel to emit --- only between documents, preventing a leading --- when gateway SA is skipped but backend or ui SA is created (avoids lint/GitOps warnings from strict YAML parsers and tools like ArgoCD) https://claude.ai/code/session_01QPy362WnjmEpeNuJaPUqmF --------- Co-authored-by: Claude <noreply@anthropic.com> * bump deps (BerriAI#29208) (BerriAI#29226) * fix(deps): bump vulnerable proxy dependencies (starlette/fastapi, granian, pyarrow, semantic-router) Resolve known CVEs flagged by osv-scanner/grype against uv.lock. All bumped versions verified to resolve, install, and pass the proxy auth/route/middleware unit suites (717 tests) plus an import smoke on the new stack. - starlette 0.50.0 -> 1.1.0 (CVE-2026-48710 "BadHost", GHSA-86qp-5c8j-p5mr): versions <1.0.1 reconstruct request.url from the unvalidated Host header, poisoning request.url.path. Required raising fastapi 0.124.4 -> 0.136.3, which dropped fastapi's starlette<0.51.0 cap; an explicit starlette>=1.0.1 floor blocks regression to a vulnerable transitive resolution. The proxy's own auth already reads scope["path"] via get_request_route, but the locked starlette still flagged in container scanners and left other request.url consumers exposed. - granian 2.5.7 -> 2.7.4 (CVE-2026-42544, unauthenticated DoS via WebSocket subprotocol header panic; CVE-2026-42545, WSGI response-header-panic DoS). granian is a selectable proxy server (proxy_cli). - pyarrow 22.0.0 -> 23.0.1 (CVE-2026-25087 / PYSEC-2026-113). - semantic-router 0.1.12 -> 0.1.15: 0.1.12 was yanked (CVE-2026-42208 — its unbounded litellm pin could resolve a credential-exfiltrating litellm==1.82.8 wheel). Not fixable by bump: diskcache 5.6.3 (CVE-2025-69872, unsafe pickle deserialization) has no upstream fix and is left pinned; exploiting it requires write access to the local cache directory. Relock side effect: sse-starlette 3.4.2 -> 3.4.4. * deps: relax exact pins in optional extras to compatible ranges The proxy/optional extras exact-pinned every dependency, which (1) forces downstream `pip install litellm[proxy]` consumers into version lockstep and (2) blocks them from pulling transitive security patches without forking — the structural cause behind needing a litellm release to clear the starlette CVE in the previous commit. Convert the ordinary extras deps to `>=current,<next_major` ranges, mirroring the core [project].dependencies style. Reproducibility for litellm's own Docker/CI is unaffected: images install via `uv sync --frozen`, and the lock re-resolves to the identical versions (no locked version changed). Kept exact-pinned: - litellm-proxy-extras, litellm-enterprise — litellm's own sub-packages, versioned in lockstep with the release. - opentelemetry-api/sdk/exporter-otlp — must resolve to matching versions. - grpcio — supply-chain-pinned to a vetted, aged release. Also corrects the stale comment claiming the extras are exact-pinned for Docker reproducibility (the images use the lock, not these pins). * fix(ci): resolve license-check lookup version from the floor for ranged deps check_licenses.py derived the PyPI lookup version with `next(iter(req.specifier))`, which returns an arbitrary specifier clause. For a range like `>=0.12.1,<1.0` it picked the upper bound (`1.0`) — a version that doesn't exist on PyPI — so the license lookup 404'd and the package was flagged as having an unknown license. The previous commit's switch from exact pins to ranges exposed this for soundfile, pyroscope-io, redisvl, diskcache, and mlflow (the ranged deps not already in liccheck.ini's allowlist). Prefer a lower-bound/exact version (a real released version) for the lookup. * fix(proxy): set strict_content_type=False on the FastAPI app Starlette 1.0 / FastAPI 0.13x flipped the default to strict_content_type=True, which refuses to parse a JSON request body when the client omits the Content-Type header. The proxy previously accepted those requests, so the fastapi/starlette bump in this PR would silently break clients that don't send a Content-Type. Restore the prior lenient behavior explicitly. Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> * fix(tests/vcr): mint Google OAuth tokens live to prevent stale-token replay (BerriAI#29229) The Redis-backed VCR layer was recording and replaying the Google OAuth2/STS token-mint call. The replayed ya29.* access token is long-expired, but its recorded expires_in keeps credentials.expired False, so litellm never refreshes it and sends the stale token to a live Vertex/Gemini endpoint, which returns 401 ACCESS_TOKEN_EXPIRED. This broke live partner-model tests whose completion call is not itself cassette-backed (e.g. test_vertex_ai_llama_tool_calling). Force credential-exchange hosts to pass through live (never recorded, never replayed) by returning None from before_record_request, mirroring the existing telemetry passthrough, so a fresh token is minted each run. Regression from BerriAI#28826, which added OAuth-token matcher tolerance plus TTL-refresh-on-read so a stale token episode matched and never expired. * chore(cookbook): bump Go directive to 1.26.3 in gollem example (BerriAI#29234) Updates the gollem_go_agent_framework example to the current Go release. Clears stale Go stdlib advisories reported by osv-scanner against the older 1.25.1 directive. No source changes; the single pinned dependency (gollem v0.1.0) is backward compatible. * chore(ci): bump version (BerriAI#29242) * bump: version 1.87.0 → 1.88.0 * uv lock * feat(anthropic): add Claude Opus 4.8 and prune reasoning-effort flags (BerriAI#29238) * feat(anthropic): add Claude Opus 4.8 and prune reasoning-effort flags Register claude-opus-4-8 across the anthropic/bedrock/vertex/azure cost-map entries, BEDROCK_CONVERSE_MODELS, and the setup-wizard provider list. Prune two reasoning-effort fields from the cost map: - Drop supports_minimal_reasoning_effort from the Claude fleet (58 entries). "minimal" is not a real Anthropic effort level (the API accepts only low/medium/high/xhigh/max), so LiteLLM degrades it to "low" regardless; the flag was inert and misleading on Anthropic. - Remove tool_use_system_prompt_tokens everywhere (103 entries). It is not in the ModelInfo type and is read by no production code. Update the affected config/schema tests; the reasoning-effort registry tests now assert the Claude fleet omits supports_minimal. * fix(anthropic): recognize output_config effort after minimal-flag prune Pruning supports_minimal_reasoning_effort from the Claude fleet removed the only "supports effort param" marker from 11 Opus 4.5 / mythos-preview map entries that lack supports_output_config. _model_supports_effort_param then returned False for them, so output_config was wrongly dropped under drop_params=True -- regressing test_anthropic_model_supports_effort_param_recognizes_supporting_models for claude-opus-4-5-20251101 and the mythos preview. - _model_supports_effort_param now treats supports_output_config as a sufficient signal, matching the bedrock-invoke call sites that already check supports_output_config OR a reasoning-effort flag. Shared map lookup extracted into _supports_model_capability. - Add supports_output_config: true to the 11 Opus 4.5 / mythos entries that lost their only marker, restoring prior effort-forwarding behavior without re-adding the inert minimal flag. * fix(ci): restore real Bedrock batch S3 bucket and role in oai_misc_config (BerriAI#29245) The OSS-staging sync (d52fbfb) overwrote the Bedrock batch model's s3_bucket_name and aws_batch_role_arn with public-safe placeholders (account 123456789012 / *_EXAMPLE role). The e2e_openai_endpoints CI job runs the proxy with AWS account 941277531214 credentials, so on file upload test_bedrock_batches_api failed with: NoSuchBucket: The specified bucket does not exist <BucketName>litellm-proxy-123456789012</BucketName> Restore the real resources that live in account 941277531214 (verified to exist) — the same values tests/batches_tests/test_bedrock_files_and_batches.py already references. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> * fix(guardrails): persist disable_global_guardrails on keys (BerriAI#29233) * fix(guardrails): restore disable_global_guardrails persistence for keys The per-key/team "Disable Global Guardrails" toggle silently stopped working after BerriAI#17042, which removed `disable_global_guardrails` from the key/team request models and from the premium metadata allowlist. Without those, the UI's top-level field was dropped by pydantic and never folded into key `metadata`, so the runtime gate always read False and global default_on guardrails kept running. Restore the request-model fields (KeyRequestBase, NewTeamRequest, UpdateTeamRequest) and the `LiteLLM_ManagementEndpoint_MetadataFields_Premium` entry so the flag is promoted into metadata again. Because the key edit form always submits the flag (false by default), guard the UI so it is only sent when it actually changed (edit) or is enabled (create) — this keeps the premium gate on enabling intact while not 403-ing non-premium users who edit unrelated key fields, mirroring how guardrails/tags are already stripped. * test(guardrails): cover disable_global_guardrails toggle-off + clarify premium field comment Add a prepare_metadata_fields case asserting `disable_global_guardrails: False` overwrites an existing `True`, and rewrite the PREMIUM_METADATA_FIELDS comment to explain why boolean premium fields are excluded from the empty-value strip loop. * test(e2e): cover Team Admin view + member + key flows (BerriAI#29072) * test(e2e): cover Team Admin view + member + key flows Adds a new spec exercising the previously-uncovered team-admin manual-QA items: viewing all team keys (including other members'), adding a member, removing a member, and creating a team key with All Team Models. Also seeds a dedicated invitee user so the add-member test can run in parallel with the proxy-admin invite test without colliding on the team roster. * test(e2e): harden team-admin member specs per review feedback Address Greptile feedback on the Team Admin spec: - locate the delete action via getByTestId("delete-member") instead of the fragile svg/img .last() selector - match the seeded removable member by user_id (members_with_roles stores no email, so the roster renders user_id) - assert exact success-toast strings rather than broad regexes that could match unrelated "success" text * docs: hand-written CLAUDE.md; point GEMINI.md and AGENTS.md at it (BerriAI#29252) * docs: replace generated CLAUDE.md with hand-written guidance, remove AGENTS.md Swap the auto-generated CLAUDE.md for a concise hand-written version that captures how we actually want agents to work in this repo: minimal comments, simplicity first, meaningful tests with a high mutation kill rate, PRs based off litellm_internal_staging rather than main, and curl against a live proxy as proof of fix instead of pasted pytest output. Remove AGENTS.md so there is one source of truth for agent guidance. The customer and company name confidentiality policy, along with the MCP available_on_public_internet note, are carried over from the previous CLAUDE.md. * fix: further clarify communication guidelines * docs: point GEMINI.md at CLAUDE.md instead of duplicating guidance Replace the standalone GEMINI.md copy, which had already drifted from the new CLAUDE.md, with a one-line pointer so Gemini reads the same single source of truth. * docs: simplify PR template test checklist item Replace the rigid "at least 1 test is a hard requirement" checklist line with "I have added meaningful tests", which matches the testing guidance in CLAUDE.md, and tidy a comma into a semicolon in the scope-isolation item. * docs: point AGENTS.md at CLAUDE.md instead of deleting it Keep AGENTS.md so tools that read it still resolve guidance, but collapse it to the same one-line pointer to CLAUDE.md used by GEMINI.md, keeping a single source of truth. * fix: make AI-generated rules more concise * fix: spelling Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: make the .env usage more careful * docs: restore MCP available_on_public_internet note to CLAUDE.md The PR description states this note was carried over verbatim from the previous CLAUDE.md, but it was dropped in the rewrite. Restore it so the file matches the description and the team guidance is not lost. * docs: restore browser storage and CI supply-chain safety notes to CLAUDE.md These security-relevant rules were dropped in the rewrite. Restore the sessionStorage-over-localStorage (XSS) guidance and the CI supply-chain rules (no curl|bash, pin versions, verify checksums) so agents editing UI or CI code are still steered away from those pitfalls. * docs: move area-specific guidance into nested CLAUDE.md files The MCP, browser-storage, and CI supply-chain notes are scoped to particular parts of the tree, so move each into a nested CLAUDE.md that Claude Code loads on demand when those files are touched: the MCP note under the mcp_server gateway, the browser-storage rule under the UI dashboard, and the CI supply-chain rules under .circleci. Keeps the root CLAUDE.md focused on general guidance while the area notes surface where they are relevant. * docs: keep CI supply-chain note in root CLAUDE.md CI guidance applies beyond .circleci (it also covers downloads in GitHub workflows and any CI script), and CI work does not reliably touch a single subtree, so a nested file under .circleci would not surface it dependably. Keep it in the always-loaded root instead. The MCP and browser-storage notes stay nested where they map cleanly to one area of the tree. * fix: make it clear we prefer httpOnly * chore: make ci rule more concise * chore: make concise Fix formatting and punctuation in MCP note. * fix: don't include Claude attribution --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: regenerate uv.lock to sync with pyproject.toml Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: Dennis Henry <dennis.henry@okta.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: michelligabriele <gabriele.michelli@icloud.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Summary
Restores the per-key Disable Global Guardrails toggle, which silently stopped persisting after #17042 — global
default_onguardrails kept running even when an admin disabled them on a key. The fix re-adds thedisable_global_guardrailsfield to the key/team request models and the premium metadata allowlist so the toggle is folded into keymetadataagain, and guards the key UI so it only sends the flag when it changed (edit) or is enabled (create) — keeping the premium gate on enabling intact without 403-ing non-premium users who edit unrelated key fields.Screenshots
After
Test plan
tests/proxy_admin_ui_tests/test_key_management.py::test_prepare_metadata_fields— assertsdisable_global_guardrailsis promoted into key metadatatests/test_litellm/integrations/test_custom_guardrail.py -k disable_global— runtime gate reads the flag from admin metadata onlytests/test_litellm/proxy/test_litellm_pre_call_utils.py— 144 pass (key/team metadata propagation)metadata.disable_global_guardrailsin/key/infoand that adefault_onguardrail is skippedResolves LIT-3416