feat(ui): add Interactions API endpoint to playground with SSE streaming by Sameerlite · Pull Request #28156 · BerriAI/litellm

Sameerlite · 2026-05-18T11:07:03Z

Summary

Adds /v1beta/interactions as a selectable endpoint in the UI playground (EndpointType.INTERACTIONS)
New interactions_api.tsx module: sends stream: true, reads SSE response body, and streams content.delta text tokens to the chat UI in real time
Chat-compatible models are surfaced in the model selector for this endpoint

Test plan

Select /v1beta/interactions in the playground endpoint dropdown
Pick a Gemini model (e.g. gemini-2.5-flash) and send a message — response should stream in token by token
TestTransformRequest::test_stream_param_included_in_request_body — verifies the stream flag is correctly passed through the transformation layer

Note

Medium Risk
Adds a new playground endpoint and custom SSE parsing path, plus adjusts streaming event transformation ordering; bugs here could cause missing/duplicated tokens or broken streaming UX, but changes are localized to the playground and Interactions streaming adapter.

Overview
Adds /v1beta/interactions as a first-class Playground endpoint, wiring EndpointType.INTERACTIONS through model selection, input placeholders, and request dispatch.

Introduces makeInteractionsRequest() to POST Interactions requests with stream: true and incrementally parse SSE data: events, updating the chat UI from content.start/content.delta tokens while capturing the response model.

Hardens the Responses→Interactions streaming adapter to handle ContentPartAddedEvent and out-of-order starts: ensures delta.type='text' is always present, avoids dropping the first token by queueing a content.start, and drains queued events in both sync/async iterators; expands tests to cover these ordering/token-preservation cases and verifies the stream flag is forwarded in Gemini request bodies.

^{Reviewed by Cursor Bugbot for commit 6cdf840. Bugbot is set up for automated code reviews on this repo. Configure here.}

Adds /v1beta/interactions as a selectable endpoint in the UI playground. Uses SSE streaming (stream=true) and parses content.delta events for real-time output. Co-authored-by: Cursor <cursoragent@cursor.com>

codecov · 2026-05-18T11:10:42Z

Codecov Report

❌ Patch coverage is 81.25000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...llm_responses_transformation/streaming_iterator.py	81.25%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

greptile-apps · 2026-05-18T11:11:13Z

Greptile Summary

This PR adds /v1beta/interactions as a first-class playground endpoint, wiring EndpointType.INTERACTIONS through model selection, input placeholders, and request dispatch in the UI, and introduces a custom SSE streaming client (interactions_api.tsx) that reads content.delta/content.start events to stream tokens to the chat UI in real time.

streaming_iterator.py: Adds ContentPartAddedEvent handling and a _pending_events queue so that when OutputTextDeltaEvent arrives before any start events, interaction.start is emitted immediately while content.start (carrying the first token) is queued and drained on the next iteration — preserving the first token rather than silently dropping it.
interactions_api.tsx: New SSE client POSTs with stream: true, parses data: lines, and updates the chat UI from both the native Gemini event shape and the LiteLLM bridge shape; captures the response model from interaction.start/interaction.complete for display.
Tests: Seven new unit tests cover ContentPartAddedEvent paths, first-token preservation, _pending_events drain ordering, and the stream flag forwarding in transform_request; all use mocks with no real network calls.

Confidence Score: 5/5

Changes are localized to the playground UI and the Interactions streaming adapter; the core request path is untouched and the new code is well-tested.

The streaming iterator logic is carefully implemented with a pending-events queue that correctly handles out-of-order start events in both sync and async paths. The seven new unit tests cover key edge cases including first-token preservation and fallback event ordering. No regressions to existing streaming paths are introduced.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/interactions/litellm_responses_transformation/streaming_iterator.py	Adds ContentPartAddedEvent handling and a _pending_events queue to preserve the first text token; event ordering and drain logic are correct in both sync and async paths.
tests/test_litellm/interactions/test_gemini_interactions_transformation.py	Adds TestStreamingIterator (7 unit tests covering ContentPartAddedEvent, first-token preservation, and pending-events drain) and TestTransformRequest (stream flag forwarding); all tests are pure mocks with no real network calls.
ui/litellm-dashboard/src/components/playground/llm_calls/interactions_api.tsx	New SSE streaming client for /v1beta/interactions; correctly handles both native Gemini and bridge event shapes; stale comment on line 111 says content.start needs no UI action but the branch above it does update the UI for first-token events.
ui/litellm-dashboard/src/components/playground/chat_ui/ChatUI.tsx	Wires EndpointType.INTERACTIONS into model validation, request dispatch, model-selector filter, and input placeholder; changes are self-contained and follow the same pattern as RESPONSES/ANTHROPIC_MESSAGES.
ui/litellm-dashboard/src/components/playground/chat_ui/chatConstants.ts	Adds /v1beta/interactions to the endpoint dropdown; trivial one-line change.
ui/litellm-dashboard/src/components/playground/chat_ui/mode_endpoint_mapping.tsx	Adds INTERACTIONS to the EndpointType enum, consistent with the existing pattern for A2A_AGENTS, MCP, and REALTIME which also have no ModelMode mapping.

_{Reviews (9): Last reviewed commit: "chore(ui): remove unused InteractionOutp..." | Re-trigger Greptile}

…k via interactions API Proxy endpoint was hardcoding custom_llm_provider="gemini" before routing, preventing non-Gemini models from using the litellm_responses bridge. Also reverts the UI Gemini-only model filter. Co-authored-by: Cursor <cursoragent@cursor.com>

Two bugs in LiteLLMResponsesInteractionsStreamingIterator: 1. content.delta was emitted without "type":"text" in delta dict, so the UI type-check always failed and no tokens were displayed 2. First OutputTextDeltaEvent was silently dropped (used to emit content.start with empty text); fixed by handling ResponsePartAddedEvent for content.start so text deltas go directly to content.delta Co-authored-by: Cursor <cursoragent@cursor.com>

Sameerlite · 2026-05-18T11:24:07Z

@greptile re review

…ents Co-authored-by: Yassin Kortam <yassin@berri.ai>

CLAassistant · 2026-05-18T11:34:58Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 3 committers have signed the CLA.

✅ Sameerlite
✅ mateo-berri
❌ cursoragent
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

The test_no_forced_gemini_provider_in_request_data check only asserted against dict literals it had just constructed, so it always passed and did not exercise the create_interaction endpoint. The endpoint deliberately defaults custom_llm_provider to gemini, so the assertion was also factually incorrect. Drop the misleading test. Co-authored-by: Yassin Kortam <yassin@berri.ai>

…art ordering - ResponsePartAddedEvent corresponds to reasoning summary parts, not text content parts. Use ContentPartAddedEvent which is the event emitted before text output deltas (type response.content_part.added). - Mirror the OutputTextDeltaEvent ordering guard: if interaction.start has not been sent yet, emit it first before content.start to honor the documented event ordering contract. Co-authored-by: Yassin Kortam <yassin@berri.ai>

mateo-berri · 2026-05-18T20:32:02Z

@greptileai

mateo-berri · 2026-05-18T20:42:06Z

@greptileai

…pt-realtime in OpenAI realtime guardrails test VCR redis persister was raising UnicodeDecodeError on cached payloads that fail to UTF-8 decode (e.g. legacy entries written by another version of the persister), failing tests at fixture setup instead of degrading to a cache miss. Wrap decode+deserialize in a try/except so corrupt cache entries are treated as CassetteNotFoundError, surfacing the failure via the existing _record_cache_failure / VCRCassetteCacheWarning path. OpenAI shut down gpt-4o-realtime-preview-2024-12-17 (and the entire gpt-4o-realtime-preview family) on 2026-05-07. The live realtime guardrails integration test now fails with model_not_found instead of receiving session.created. Point OPENAI_REALTIME_URL at the current GA model gpt-realtime, and relax the assertion in test_text_message_blocked_by_guardrail_no_ai_response to also accept the model's refusal-to-repeat the block message (gpt-realtime declines verbatim-repeat instructions, which is still a safe outcome since the original user message was blocked before reaching OpenAI). The BLOCKED_PHRASE leak check is preserved as a hard invariant.

mateo-berri · 2026-05-18T23:38:33Z

@greptileai

greptile-apps · 2026-05-18T23:42:55Z

@@ -92,24 +93,47 @@ def _transform_responses_chunk_to_interactions_chunk(
                    model=self.model,
                )


First text token silently dropped in OutputTextDeltaEvent double-fallback

When neither ResponseCreatedEvent nor ContentPartAddedEvent has arrived yet and OutputTextDeltaEvent is the very first event, delta_text is accumulated into collected_text and then interaction.start is returned — the text is never emitted. The next call hits the sent_content_start fallback and emits content.start with a different delta_text, so the very first streamed token is permanently absent from the SSE stream (though it does appear in the final interaction.complete.outputs). The _pending_interaction_complete pattern already in __next__/__anext__ shows how to buffer a deferred event; the same approach could save the unconsumed delta_text and re-emit it on the next iteration rather than swallowing it.

This fallback path is unreachable in practice. OutputTextDeltaEvent cannot be the first event observed by _transform_responses_chunk_to_interactions_chunk because both upstream sources guarantee ResponseCreatedEvent first:

OpenAI Responses API (native) — the documented streaming contract is response.created → response.in_progress → response.output_item.added → response.content_part.added → response.output_text.delta. The platform never emits a text delta before response.created.

Non-OpenAI providers bridged via litellm.responses() — LiteLLMCompletionStreamingIterator.__next__/__anext__ (litellm/responses/litellm_completion_transformation/streaming_iterator.py:996, 871) calls return_default_initial_events() at the top of every iteration before pulling a chat-completion chunk. That helper unconditionally emits ResponseCreatedEvent on the first call (lines 741-743) and ResponseInProgressEvent on the second, so the first chunk that downstream code ever sees is ResponseCreatedEvent — which sets sent_interaction_start=True here. OutputTextDeltaEvent only fires after _emit_output_item_added_for_message has also queued the ContentPartAddedEvent.

In both paths, by the time an OutputTextDeltaEvent reaches this transform, sent_interaction_start is already True, so the lines 84–94 branch is never taken. The interaction.start fallback exists purely as defensive code for a malformed-stream scenario the wrapping iterators won't produce.

Additionally, collected_text accumulates every delta and is emitted in full on interaction.complete.outputs[0].text (lines 174-176), so even in the hypothetical case the fallback fires, the complete output is still delivered to the client at stream end. The unit tests for the realistic event orderings (test_response_part_added_emits_content_start, test_first_text_delta_not_dropped_when_part_added_seen, test_content_delta_includes_type_field) cover the paths actually exercised in production.

…upstream models OpenAI shut down the entire gpt-4o-realtime-preview family (including the undated alias) on 2026-05-07. The live realtime tests still connected with that dead alias and failed with messages_received=1 (an error event 'The model gpt-4o-realtime-preview does not exist' instead of session.created). Point the live OpenAI realtime tests at gpt-realtime, the current GA realtime model: - test_openai_realtime_simple.py: get_model() -> gpt-realtime - test_openai_realtime.py: test_openai_realtime_direct_call_no_intent and test_openai_realtime_direct_call_with_intent -> openai/gpt-realtime Mocked unit tests (test_realtime_query_params_construction, test_realtime_query_params_use_normalized_model_name) are left as-is: they never hit the network and assert string plumbing only. NVIDIA reached end-of-life for the hosted nvidia/llama-3.2-nv-rerankqa-1b-v2 rerank API on 2026-05-18 with no published replacement, so the live BaseLLMRerankTest.test_basic_rerank for nvidia_nim now returns HTTP 410 ('Gone'). NVIDIA's hosted catalog rotates on a schedule, so swapping in another live model would only defer the failure. Override test_basic_rerank in TestNvidiaNim to mock the sync/async HTTP transport (same pattern as test_nvidia_nim_rerank_ranking_endpoint in this file) and inject a fake NVIDIA_NIM_API_KEY via monkeypatch. The request/response transformation and cost calculation stay covered offline.

mateo-berri · 2026-05-18T23:45:02Z

@greptileai

The proxy callback-leak detector (test_check_num_callbacks_on_lowest_latency) was failing on this PR with 'abs(85 - 95) <= 4' — a bounded one-time registration jump caused by switching to latency-based-routing (+LowestLatencyLoggingHandler, +SlackAlerting). The count then plateaus under load, so this is pollution from the test's own config update, not a leak. Replace the brittle two-sample diff threshold with a sampler that settles past the deliberate config switch and only flags sustained monotonic per-type growth, with a terminal-burst confirmation pass for leaks that would otherwise escape the >=2-interval guard. Normalizes instance addresses so identical callbacks at different memory locations collapse, and names the leaking type on failure.

…streaming

…m/BerriAI/litellm into litellm_interactions_ui_streaming

mateo-berri · 2026-05-18T23:58:19Z

@greptileai

…re missing When OutputTextDeltaEvent arrived before any ResponseCreatedEvent or ContentPartAddedEvent, the double-fallback path emitted interaction.start and silently dropped the first delta's text — the second delta's content.start carried only that chunk's delta, and the first token never made it to any content.delta event consumed by the UI. Queue a content.start that carries the first delta's text alongside the interaction.start emission, and drain pending events before pulling the next upstream chunk.

mateo-berri · 2026-05-19T00:01:38Z

@greptileai

cursor

Cursor Bugbot has reviewed your changes using high mode and found 1 potential issue.

Bugbot Autofix prepared a fix for the issue found in the latest run.

✅ Fixed: Exported interfaces are unused outside their defining file
- Removed the unused exported InteractionOutput and InteractionResponse interfaces from interactions_api.tsx since they were not imported anywhere and the streaming function does not reference them.

Preview (6cdf840bf8)

diff --git a/litellm/interactions/litellm_responses_transformation/streaming_iterator.py b/litellm/interactions/litellm_responses_transformation/streaming_iterator.py
--- a/litellm/interactions/litellm_responses_transformation/streaming_iterator.py
+++ b/litellm/interactions/litellm_responses_transformation/streaming_iterator.py
@@ -2,7 +2,7 @@
 Streaming iterator for transforming Responses API stream to Interactions API stream.
 """
 
-from typing import Any, AsyncIterator, Dict, Iterator, Optional, cast
+from typing import Any, AsyncIterator, Dict, Iterator, List, Optional, cast
 
 from litellm.responses.streaming_iterator import (
     BaseResponsesAPIStreamingIterator,
@@ -15,6 +15,7 @@
     InteractionsAPIStreamingResponse,
 )
 from litellm.types.llms.openai import (
+    ContentPartAddedEvent,
     OutputTextDeltaEvent,
     ResponseCompletedEvent,
     ResponseCreatedEvent,
@@ -51,6 +52,7 @@
         self.collected_text = ""
         self.sent_interaction_start = False
         self.sent_content_start = False
+        self._pending_events: List[InteractionsAPIStreamingResponse] = []
 
     def _transform_responses_chunk_to_interactions_chunk(
         self,
@@ -80,9 +82,19 @@
             )
             self.collected_text += delta_text
 
-            # Send interaction.start if not sent
+            # Fallback: emit interaction.start, and queue content.start carrying this
+            # delta so the first token is preserved in the stream.
             if not self.sent_interaction_start:
                 self.sent_interaction_start = True
+                self.sent_content_start = True
+                self._pending_events.append(
+                    InteractionsAPIStreamingResponse(
+                        event_type="content.start",
+                        id=getattr(responses_chunk, "item_id", None),
+                        object="content",
+                        delta={"type": "text", "text": delta_text},
+                    )
+                )
                 return InteractionsAPIStreamingResponse(
                     event_type="interaction.start",
                     id=getattr(responses_chunk, "item_id", None)
@@ -92,24 +104,47 @@
                     model=self.model,
                 )
 
-            # Send content.start if not sent
+            # Fallback: emit content.start if ContentPartAddedEvent never arrived
             if not self.sent_content_start:
                 self.sent_content_start = True
                 return InteractionsAPIStreamingResponse(
                     event_type="content.start",
                     id=getattr(responses_chunk, "item_id", None),
                     object="content",
-                    delta={"type": "text", "text": ""},
+                    delta={"type": "text", "text": delta_text},
                 )
 
-            # Send content.delta
+            # Normal path: emit content.delta with type field
             return InteractionsAPIStreamingResponse(
                 event_type="content.delta",
                 id=getattr(responses_chunk, "item_id", None),
                 object="content",
-                delta={"text": delta_text},
+                delta={"type": "text", "text": delta_text},
             )
 
+        # Handle ContentPartAddedEvent -> content.start (arrives before text deltas)
+        if isinstance(responses_chunk, ContentPartAddedEvent):
+            # Fallback: emit interaction.start if ResponseCreatedEvent never arrived
+            if not self.sent_interaction_start:
+                self.sent_interaction_start = True
+                return InteractionsAPIStreamingResponse(
+                    event_type="interaction.start",
+                    id=getattr(responses_chunk, "item_id", None)
+                    or f"interaction_{id(self)}",
+                    object="interaction",
+                    status="in_progress",
+                    model=self.model,
+                )
+            if not self.sent_content_start:
+                self.sent_content_start = True
+                return InteractionsAPIStreamingResponse(
+                    event_type="content.start",
+                    id=getattr(responses_chunk, "item_id", None),
+                    object="content",
+                    delta={"type": "text", "text": ""},
+                )
+            return None
+
         # Handle ResponseCreatedEvent or ResponseInProgressEvent -> interaction.start
         if isinstance(responses_chunk, (ResponseCreatedEvent, ResponseInProgressEvent)):
             if not self.sent_interaction_start:
@@ -172,6 +207,10 @@
             delattr(self, "_pending_interaction_complete")
             return pending
 
+        # Drain events queued from a prior chunk (e.g. content.start emitted alongside
+        # the interaction.start fallback for the first OutputTextDeltaEvent).
+        if self._pending_events:
+            return self._pending_events.pop(0)
         # Use a loop instead of recursion to avoid stack overflow
         sync_iterator = cast(
             SyncResponsesAPIStreamingIterator, self.responses_stream_iterator
@@ -237,6 +276,10 @@
             delattr(self, "_pending_interaction_complete")
             return pending
 
+        # Drain events queued from a prior chunk (e.g. content.start emitted alongside
+        # the interaction.start fallback for the first OutputTextDeltaEvent).
+        if self._pending_events:
+            return self._pending_events.pop(0)
         # Use a loop instead of recursion to avoid stack overflow
         async_iterator = cast(
             ResponsesAPIStreamingIterator, self.responses_stream_iterator

diff --git a/tests/test_litellm/interactions/test_gemini_interactions_transformation.py b/tests/test_litellm/interactions/test_gemini_interactions_transformation.py
--- a/tests/test_litellm/interactions/test_gemini_interactions_transformation.py
+++ b/tests/test_litellm/interactions/test_gemini_interactions_transformation.py
@@ -9,15 +9,24 @@
 
 import os
 import sys
-from unittest.mock import patch
+from unittest.mock import MagicMock, patch
 
 import pytest
 
 sys.path.insert(0, os.path.abspath("../../.."))
 
+from litellm.interactions.litellm_responses_transformation.streaming_iterator import (
+    LiteLLMResponsesInteractionsStreamingIterator,
+)
 from litellm.llms.gemini.interactions.transformation import (
     GoogleAIStudioInteractionsConfig,
 )
+from litellm.types.llms.openai import (
+    ContentPartAddedEvent,
+    OutputTextDeltaEvent,
+    ResponseCompletedEvent,
+    ResponseCreatedEvent,
+)
 from litellm.types.router import GenericLiteLLMParams
 
 _PATCH_GET_API_KEY = "litellm.llms.gemini.common_utils.GeminiModelInfo.get_api_key"
@@ -113,6 +122,186 @@
                 )
 
 
+class TestStreamingIterator:
+    def _make_iterator(self) -> LiteLLMResponsesInteractionsStreamingIterator:
+        return LiteLLMResponsesInteractionsStreamingIterator(
+            model="gpt-5.4",
+            litellm_custom_stream_wrapper=MagicMock(),
+            request_input="hi",
+            optional_params={},
+        )
+
+    def _make_text_delta(
+        self, text: str, item_id: str = "item_1"
+    ) -> OutputTextDeltaEvent:
+        event = MagicMock(spec=OutputTextDeltaEvent)
+        event.delta = text
+        event.item_id = item_id
+        return event
+
+    def _make_part_added(self, item_id: str = "item_1") -> ContentPartAddedEvent:
+        event = MagicMock(spec=ContentPartAddedEvent)
+        event.item_id = item_id
+        return event
+
+    def _make_response_created(self) -> ResponseCreatedEvent:
+        event = MagicMock(spec=ResponseCreatedEvent)
+        event.response = MagicMock(id="resp_123")
+        return event
+
+    def test_content_delta_includes_type_field(self):
+        """content.delta events must carry delta.type='text' so the UI can display them."""
+        it = self._make_iterator()
+        it.sent_interaction_start = True
+        it.sent_content_start = True
+
+        chunk = it._transform_responses_chunk_to_interactions_chunk(
+            self._make_text_delta("Hello")
+        )
+
+        assert chunk is not None
+        assert chunk.event_type == "content.delta"
+        assert chunk.delta == {"type": "text", "text": "Hello"}
+
+    def test_response_part_added_emits_content_start(self):
+        """ContentPartAddedEvent (arrives before text deltas) should emit content.start
+        so the first OutputTextDeltaEvent immediately emits content.delta without dropping text.
+        """
+        it = self._make_iterator()
+        it.sent_interaction_start = True
+
+        chunk = it._transform_responses_chunk_to_interactions_chunk(
+            self._make_part_added()
+        )
+
+        assert chunk is not None
+        assert chunk.event_type == "content.start"
+        assert it.sent_content_start is True
+
+    def test_first_text_delta_not_dropped_when_part_added_seen(self):
+        """After ContentPartAddedEvent, the first text delta must yield content.delta
+        (not content.start), preserving the token text."""
+        it = self._make_iterator()
+        it.sent_interaction_start = True
+        it._transform_responses_chunk_to_interactions_chunk(self._make_part_added())
+
+        chunk = it._transform_responses_chunk_to_interactions_chunk(
+            self._make_text_delta("Hello")
+        )
+
+        assert chunk is not None
+        assert chunk.event_type == "content.delta"
+        assert chunk.delta is not None
+        assert chunk.delta.get("text") == "Hello"
+
+    def test_part_added_emits_interaction_start_fallback_when_not_sent(self):
+        """If ContentPartAddedEvent arrives before any ResponseCreatedEvent,
+        the iterator must emit interaction.start before content.start to honor
+        the documented event ordering contract."""
+        it = self._make_iterator()
+
+        chunk = it._transform_responses_chunk_to_interactions_chunk(
+            self._make_part_added(item_id="item_42")
+        )
+
+        assert chunk is not None
+        assert chunk.event_type == "interaction.start"
+        assert chunk.id == "item_42"
+        assert chunk.status == "in_progress"
+        assert chunk.model == "gpt-5.4"
+        assert it.sent_interaction_start is True
+        assert it.sent_content_start is False
+
+    def test_part_added_returns_none_when_already_started(self):
+        """A second ContentPartAddedEvent (after content.start was already emitted)
+        should be a no-op so we don't re-emit content.start."""
+        it = self._make_iterator()
+        it.sent_interaction_start = True
+        it.sent_content_start = True
+
+        chunk = it._transform_responses_chunk_to_interactions_chunk(
+            self._make_part_added()
+        )
+
+        assert chunk is None
+
+    def test_part_added_without_item_id_falls_back_to_self_id(self):
+        """When ContentPartAddedEvent has no item_id and we emit the interaction.start
+        fallback, the id must default to an interaction_<id(self)> string."""
+        it = self._make_iterator()
+        event = MagicMock(spec=ContentPartAddedEvent)
+        event.item_id = None
+
+        chunk = it._transform_responses_chunk_to_interactions_chunk(event)
+
+        assert chunk is not None
+        assert chunk.event_type == "interaction.start"
+        assert chunk.id == f"interaction_{id(it)}"
+
+    def test_first_text_delta_not_dropped_when_no_prior_start_events(self):
+        """When OutputTextDeltaEvent arrives before any ResponseCreatedEvent or
+        ContentPartAddedEvent, the iterator must emit interaction.start *and*
+        immediately follow with a content.start that carries this delta's text,
+        so the first token is never silently dropped from the stream."""
+        events = [
+            self._make_text_delta("Hello"),
+            self._make_text_delta(" World"),
+        ]
+        wrapper = MagicMock()
+        wrapper.__iter__ = lambda self: iter(events)
+        wrapper.__next__ = lambda self, _it=iter(events): next(_it)
+        it = LiteLLMResponsesInteractionsStreamingIterator(
+            model="gpt-5.4",
+            litellm_custom_stream_wrapper=wrapper,
+            request_input="hi",
+            optional_params={},
+        )
+
+        first = it._transform_responses_chunk_to_interactions_chunk(events[0])
+        assert first is not None
+        assert first.event_type == "interaction.start"
+        assert it.sent_interaction_start is True
+        assert it.sent_content_start is True
+        assert len(it._pending_events) == 1
+        pending = it._pending_events[0]
+        assert pending.event_type == "content.start"
+        assert pending.delta == {"type": "text", "text": "Hello"}
+
+        second = it._transform_responses_chunk_to_interactions_chunk(events[1])
+        assert second is not None
+        assert second.event_type == "content.delta"
+        assert second.delta == {"type": "text", "text": " World"}
+
+
+class TestTransformRequest:
+    def test_stream_param_included_in_request_body(self, config):
+        """When stream=True is in optional_params, the request body must include it
+        so the proxy forwards the SSE streaming flag to Google's backend."""
+        body = config.transform_request(
+            model="gemini-2.5-flash",
+            agent=None,
+            input="Hello",
+            optional_params={"stream": True},
+            litellm_params=GenericLiteLLMParams(api_key="test-key"),
+            headers={},
+        )
+
+        assert body.get("stream") is True
+        assert body.get("input") == "Hello"
+
+    def test_stream_false_not_included_when_absent(self, config):
+        body = config.transform_request(
+            model="gemini-2.5-flash",
+            agent=None,
+            input="Hello",
+            optional_params={},
+            litellm_params=GenericLiteLLMParams(api_key="test-key"),
+            headers={},
+        )
+
+        assert "stream" not in body
+
+
 class TestInteractionOperationUrls:
     """Test that get/delete/cancel interaction URLs exclude API key."""
 

diff --git a/ui/litellm-dashboard/src/components/playground/chat_ui/ChatUI.tsx b/ui/litellm-dashboard/src/components/playground/chat_ui/ChatUI.tsx
--- a/ui/litellm-dashboard/src/components/playground/chat_ui/ChatUI.tsx
+++ b/ui/litellm-dashboard/src/components/playground/chat_ui/ChatUI.tsx
@@ -49,6 +49,7 @@
 import { makeOpenAIImageEditsRequest } from "../llm_calls/image_edits";
 import { makeOpenAIImageGenerationRequest } from "../llm_calls/image_generation";
 import { makeOpenAIResponsesRequest } from "../llm_calls/responses_api";
+import { makeInteractionsRequest } from "../llm_calls/interactions_api";
 import A2AMetrics from "./A2AMetrics";
 import AdditionalModelSettings from "./AdditionalModelSettings";
 import AudioRenderer from "./AudioRenderer";
@@ -649,6 +650,7 @@
       EndpointType.ANTHROPIC_MESSAGES,
       EndpointType.EMBEDDINGS,
       EndpointType.TRANSCRIPTION,
+      EndpointType.INTERACTIONS,
     ];
 
     if (modelRequiredEndpoints.includes(endpointType as EndpointType) && !selectedModel) {
@@ -914,6 +916,16 @@
               customProxyBaseUrl || undefined,
             );
           }
+        } else if (endpointType === EndpointType.INTERACTIONS) {
+          await makeInteractionsRequest(
+            inputMessage,
+            (text, model) => updateTextUI("assistant", text, model),
+            selectedModel,
+            effectiveApiKey,
+            selectedTags,
+            signal,
+            customProxyBaseUrl || undefined,
+          );
         }
       }
 
@@ -1241,10 +1253,11 @@
                                 return true;
                               }
                               const optionEndpoint = getEndpointType(option.mode);
-                              // Show chat models for responses/anthropic_messages endpoints as they are compatible
+                              // Show chat models for responses/anthropic_messages/interactions endpoints as they are compatible
                               if (
                                 endpointType === EndpointType.RESPONSES ||
-                                endpointType === EndpointType.ANTHROPIC_MESSAGES
+                                endpointType === EndpointType.ANTHROPIC_MESSAGES ||
+                                endpointType === EndpointType.INTERACTIONS
                               ) {
                                 return optionEndpoint === endpointType || optionEndpoint === EndpointType.CHAT;
                               }
@@ -2089,7 +2102,8 @@
                         endpointType === EndpointType.CHAT ||
                         endpointType === EndpointType.EMBEDDINGS ||
                         endpointType === EndpointType.RESPONSES ||
-                        endpointType === EndpointType.ANTHROPIC_MESSAGES
+                        endpointType === EndpointType.ANTHROPIC_MESSAGES ||
+                        endpointType === EndpointType.INTERACTIONS
                           ? "Type your message... (Shift+Enter for new line)"
                           : endpointType === EndpointType.A2A_AGENTS
                             ? "Send a message to the A2A agent..."

diff --git a/ui/litellm-dashboard/src/components/playground/chat_ui/chatConstants.ts b/ui/litellm-dashboard/src/components/playground/chat_ui/chatConstants.ts
--- a/ui/litellm-dashboard/src/components/playground/chat_ui/chatConstants.ts
+++ b/ui/litellm-dashboard/src/components/playground/chat_ui/chatConstants.ts
@@ -45,4 +45,5 @@
   { value: EndpointType.A2A_AGENTS, label: "/v1/a2a/message/send" },
   { value: EndpointType.MCP, label: "/mcp-rest/tools/call" },
   { value: EndpointType.REALTIME, label: "/v1/realtime" },
+  { value: EndpointType.INTERACTIONS, label: "/v1beta/interactions" },
 ];

diff --git a/ui/litellm-dashboard/src/components/playground/chat_ui/mode_endpoint_mapping.tsx b/ui/litellm-dashboard/src/components/playground/chat_ui/mode_endpoint_mapping.tsx
--- a/ui/litellm-dashboard/src/components/playground/chat_ui/mode_endpoint_mapping.tsx
+++ b/ui/litellm-dashboard/src/components/playground/chat_ui/mode_endpoint_mapping.tsx
@@ -28,6 +28,7 @@
   A2A_AGENTS = "a2a_agents",
   MCP = "mcp",
   REALTIME = "realtime",
+  INTERACTIONS = "interactions",
 }
 
 // Create a mapping between the model mode and the corresponding endpoint type

diff --git a/ui/litellm-dashboard/src/components/playground/llm_calls/interactions_api.tsx b/ui/litellm-dashboard/src/components/playground/llm_calls/interactions_api.tsx
new file mode 100644
--- /dev/null
+++ b/ui/litellm-dashboard/src/components/playground/llm_calls/interactions_api.tsx
@@ -1,0 +1,124 @@
+import NotificationManager from "@/components/molecules/notifications_manager";
+import { getGlobalLitellmHeaderName, getProxyBaseUrl } from "@/components/networking";
+
+export async function makeInteractionsRequest(
+  input: string,
+  updateUI: (text: string, model?: string) => void,
+  selectedModel: string,
+  accessToken: string,
+  tags?: string[],
+  signal?: AbortSignal,
+  customBaseUrl?: string,
+  previousInteractionId?: string,
+): Promise<void> {
+  if (!accessToken) {
+    throw new Error("Virtual Key is required");
+  }
+
+  const isLocal = process.env.NODE_ENV === "development";
+  if (isLocal !== true) {
+    console.log = function () {};
+  }
+
+  const proxyBaseUrl = customBaseUrl || getProxyBaseUrl();
+  const normalizedBaseUrl = proxyBaseUrl.endsWith("/") ? proxyBaseUrl.slice(0, -1) : proxyBaseUrl;
+  const requestUrl = `${normalizedBaseUrl}/v1beta/interactions`;
+
+  const headers: Record<string, string> = {
+    "Content-Type": "application/json",
+    [getGlobalLitellmHeaderName()]: `Bearer ${accessToken}`,
+  };
+  if (tags && tags.length > 0) {
+    headers["x-litellm-tags"] = tags.join(",");
+  }
+
+  const body: Record<string, unknown> = {
+    model: selectedModel,
+    input,
+    stream: true,
+  };
+  if (previousInteractionId) {
+    body.previous_interaction_id = previousInteractionId;
+  }
+
+  try {
+    const response = await fetch(requestUrl, {
+      method: "POST",
+      headers,
+      body: JSON.stringify(body),
+      signal,
+    });
+
+    if (!response.ok) {
+      const errorText = await response.text();
+      throw new Error(errorText || `Request failed with status ${response.status}`);
+    }
+
+    if (!response.body) {
+      throw new Error("No response body received");
+    }
+
+    const reader = response.body.getReader();
+    const decoder = new TextDecoder();
+    let responseModel: string | undefined;
+    let buffer = "";
+
+    while (true) {
+      const { done, value } = await reader.read();
+      if (done) break;
+
+      buffer += decoder.decode(value, { stream: true });
+
+      // SSE lines are separated by double newlines; split on single newlines and
+      // look for "data: " prefixed lines.
+      const lines = buffer.split("\n");
+      // Keep the last (potentially incomplete) line in the buffer
+      buffer = lines.pop() ?? "";
+
+      for (const line of lines) {
+        const trimmed = line.trim();
+        if (!trimmed.startsWith("data:")) continue;
+
+        const jsonStr = trimmed.slice("data:".length).trim();
+        if (!jsonStr || jsonStr === "[DONE]") continue;
+
+        let event: Record<string, unknown>;
+        try {
+          event = JSON.parse(jsonStr);
+        } catch {
+          continue;
+        }
+
+        const eventType = event.event_type as string | undefined;
+
+        if (eventType === "interaction.start" || eventType === "interaction.complete") {
+          // Capture model from either the native Gemini shape (nested under
+          // `interaction`) or the bridge shape (top-level `model` field).
+          const interaction = event.interaction as Record<string, unknown> | undefined;
+          if (typeof interaction?.model === "string" && interaction.model) {
+            responseModel = interaction.model;
+          } else if (typeof event.model === "string" && event.model) {
+            responseModel = event.model;
+          }
+        } else if (eventType === "content.delta" || eventType === "content.start") {
+          const delta = event.delta as Record<string, unknown> | undefined;
+          // Accept both native Gemini format {"type":"text","text":"..."} and bridge
+          // format {"text":"..."} (no type discriminator)
+          if (typeof delta?.text === "string" && delta.text) {
+            updateUI(delta.text, responseModel ?? selectedModel);
+          }
+        }
+        // content.start, content.stop, interaction.status_update — no UI action needed
+      }
+    }
+  } catch (error: unknown) {
+    if (signal?.aborted) {
+      console.log("Interactions request was cancelled");
+      throw error;
+    }
+    NotificationManager.fromBackend(
+      `Error occurred while making Interactions API request. Error: ${error}`,
+    );
+    throw error;
+  }
+}

_{You can send follow-ups to the cloud agent here.}

^{Reviewed by Cursor Bugbot for commit f54bd7d. Configure here.}

…aces Co-authored-by: Yassin Kortam <yassin@berri.ai>

mateo-berri · 2026-05-19T00:15:43Z

@greptileai

Sameerlite · 2026-05-19T02:48:59Z

Good to merge?

mateo-berri

LGTM thanks!

…ing (BerriAI#28156) * feat(ui): add Interactions API support to playground with streaming Adds /v1beta/interactions as a selectable endpoint in the UI playground. Uses SSE streaming (stream=true) and parses content.delta events for real-time output. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(interactions): remove forced gemini provider so all providers work via interactions API Proxy endpoint was hardcoding custom_llm_provider="gemini" before routing, preventing non-Gemini models from using the litellm_responses bridge. Also reverts the UI Gemini-only model filter. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(interactions): fix streaming for non-gemini providers via bridge Two bugs in LiteLLMResponsesInteractionsStreamingIterator: 1. content.delta was emitted without "type":"text" in delta dict, so the UI type-check always failed and no tokens were displayed 2. First OutputTextDeltaEvent was silently dropped (used to emit content.start with empty text); fixed by handling ResponsePartAddedEvent for content.start so text deltas go directly to content.delta Co-authored-by: Cursor <cursoragent@cursor.com> * undo unrelated changes * fix(ui): extract model from top-level field in interactions bridge events Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(interactions): remove tautological gemini-provider assertion The test_no_forced_gemini_provider_in_request_data check only asserted against dict literals it had just constructed, so it always passed and did not exercise the create_interaction endpoint. The endpoint deliberately defaults custom_llm_provider to gemini, so the assertion was also factually incorrect. Drop the misleading test. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(interactions): use ContentPartAddedEvent and guard interaction.start ordering - ResponsePartAddedEvent corresponds to reasoning summary parts, not text content parts. Use ContentPartAddedEvent which is the event emitted before text output deltas (type response.content_part.added). - Mirror the OutputTextDeltaEvent ordering guard: if interaction.start has not been sent yet, emit it first before content.start to honor the documented event ordering contract. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(interactions): cover ContentPartAddedEvent ordering and no-op paths * fix(tests): treat corrupt VCR cassette payloads as cache miss + use gpt-realtime in OpenAI realtime guardrails test VCR redis persister was raising UnicodeDecodeError on cached payloads that fail to UTF-8 decode (e.g. legacy entries written by another version of the persister), failing tests at fixture setup instead of degrading to a cache miss. Wrap decode+deserialize in a try/except so corrupt cache entries are treated as CassetteNotFoundError, surfacing the failure via the existing _record_cache_failure / VCRCassetteCacheWarning path. OpenAI shut down gpt-4o-realtime-preview-2024-12-17 (and the entire gpt-4o-realtime-preview family) on 2026-05-07. The live realtime guardrails integration test now fails with model_not_found instead of receiving session.created. Point OPENAI_REALTIME_URL at the current GA model gpt-realtime, and relax the assertion in test_text_message_blocked_by_guardrail_no_ai_response to also accept the model's refusal-to-repeat the block message (gpt-realtime declines verbatim-repeat instructions, which is still a safe outcome since the original user message was blocked before reaching OpenAI). The BLOCKED_PHRASE leak check is preserved as a hard invariant. * fix(tests): migrate realtime + nvidia_nim rerank tests off shut-down upstream models OpenAI shut down the entire gpt-4o-realtime-preview family (including the undated alias) on 2026-05-07. The live realtime tests still connected with that dead alias and failed with messages_received=1 (an error event 'The model gpt-4o-realtime-preview does not exist' instead of session.created). Point the live OpenAI realtime tests at gpt-realtime, the current GA realtime model: - test_openai_realtime_simple.py: get_model() -> gpt-realtime - test_openai_realtime.py: test_openai_realtime_direct_call_no_intent and test_openai_realtime_direct_call_with_intent -> openai/gpt-realtime Mocked unit tests (test_realtime_query_params_construction, test_realtime_query_params_use_normalized_model_name) are left as-is: they never hit the network and assert string plumbing only. NVIDIA reached end-of-life for the hosted nvidia/llama-3.2-nv-rerankqa-1b-v2 rerank API on 2026-05-18 with no published replacement, so the live BaseLLMRerankTest.test_basic_rerank for nvidia_nim now returns HTTP 410 ('Gone'). NVIDIA's hosted catalog rotates on a schedule, so swapping in another live model would only defer the failure. Override test_basic_rerank in TestNvidiaNim to mock the sync/async HTTP transport (same pattern as test_nvidia_nim_rerank_ranking_endpoint in this file) and inject a fake NVIDIA_NIM_API_KEY via monkeypatch. The request/response transformation and cost calculation stay covered offline. * test(callbacks): harden flaky proxy callback-leak detector The proxy callback-leak detector (test_check_num_callbacks_on_lowest_latency) was failing on this PR with 'abs(85 - 95) <= 4' — a bounded one-time registration jump caused by switching to latency-based-routing (+LowestLatencyLoggingHandler, +SlackAlerting). The count then plateaus under load, so this is pollution from the test's own config update, not a leak. Replace the brittle two-sample diff threshold with a sampler that settles past the deliberate config switch and only flags sustained monotonic per-type growth, with a terminal-burst confirmation pass for leaks that would otherwise escape the >=2-interval guard. Normalizes instance addresses so identical callbacks at different memory locations collapse, and names the leaking type on failure. * fix(interactions): preserve first text token when both start events are missing When OutputTextDeltaEvent arrived before any ResponseCreatedEvent or ContentPartAddedEvent, the double-fallback path emitted interaction.start and silently dropped the first delta's text — the second delta's content.start carried only that chunk's delta, and the first token never made it to any content.delta event consumed by the UI. Queue a content.start that carries the first delta's text alongside the interaction.start emission, and drain pending events before pulling the next upstream chunk. * chore(ui): remove unused InteractionOutput/InteractionResponse interfaces Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>

…to v1.89.0 (#200) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [https://github.com/BerriAI/litellm.git](https://github.com/BerriAI/litellm) | minor | `v1.85.1` → `v1.89.0` | --- > ⚠️ **Warning** > > Some dependencies could not be looked up. Check the [Dependency Dashboard](issues/155) for more information. --- ### Release Notes <details> <summary>BerriAI/litellm (https://github.com/BerriAI/litellm.git)</summary> ### [`v1.89.0`](https://github.com/BerriAI/litellm/releases/tag/v1.89.0) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.88.2...v1.89.0) #### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.89.0/cosign.pub \ ghcr.io/berriai/litellm:v1.89.0 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** #### What's Changed - test(responses): bump deprecated gemini-3-pro-preview to gemini-3.1-pro-preview by [@mateo-berri](https://github.com/mateo-berri) in [#29433](https://github.com/BerriAI/litellm/pull/29433) - fix: map mistral/ministral-8b-latest in model price map by [@mateo-berri](https://github.com/mateo-berri) in [#29453](https://github.com/BerriAI/litellm/pull/29453) - fix(datadog): split oversized batches on 413 instead of re-queueing forever by [@yassin-berriai](https://github.com/yassin-berriai) in [#29444](https://github.com/BerriAI/litellm/pull/29444) - feat(otel): allowlist team\_metadata sub-keys promoted to baggage by [@yassin-berriai](https://github.com/yassin-berriai) in [#29442](https://github.com/BerriAI/litellm/pull/29442) - fix: stop use\_chat\_completions\_api flag from leaking into provider request body by [@mateo-berri](https://github.com/mateo-berri) in [#29447](https://github.com/BerriAI/litellm/pull/29447) - fix(anthropic, fireworks): inline legacy $ref defs in tool schemas by [@milan-berri](https://github.com/milan-berri) in [#28646](https://github.com/BerriAI/litellm/pull/28646) - fix(proxy): omit OpenAI \[DONE] on google-genai streamGenerateContent by [@Sameerlite](https://github.com/Sameerlite) in [#29426](https://github.com/BerriAI/litellm/pull/29426) - ci(release): create stable/X.Y.x line branch on X.Y.0 tags by [@yuneng-berri](https://github.com/yuneng-berri) in [#29457](https://github.com/BerriAI/litellm/pull/29457) - fix(vector-stores): support engines URL for Vertex AI Search by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#27885](https://github.com/BerriAI/litellm/pull/27885) - fix(ui): render caller-supplied filter options in caller order by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29462](https://github.com/BerriAI/litellm/pull/29462) - fix(batches): skip unnecessary batch input file reads by [@Sameerlite](https://github.com/Sameerlite) in [#29114](https://github.com/BerriAI/litellm/pull/29114) - docs(agents): clarify when to create new test files by [@Sameerlite](https://github.com/Sameerlite) in [#29472](https://github.com/BerriAI/litellm/pull/29472) - Litellm OSS Staging by [@Sameerlite](https://github.com/Sameerlite) in [#29161](https://github.com/BerriAI/litellm/pull/29161) - fix(mcp): clear allowed\_tools and tool overrides on MCP server edit by [@Sameerlite](https://github.com/Sameerlite) in [#29411](https://github.com/BerriAI/litellm/pull/29411) - Litellm OSS Staging 010626 by [@Sameerlite](https://github.com/Sameerlite) in [#29422](https://github.com/BerriAI/litellm/pull/29422) - fix(ci): make CircleCI rerun-failed-tests collect tests when 2+ test files fail by [@mateo-berri](https://github.com/mateo-berri) in [#29475](https://github.com/BerriAI/litellm/pull/29475) - feat(a2a): watsonx Orchestrate agent provider by [@Sameerlite](https://github.com/Sameerlite) in [#29410](https://github.com/BerriAI/litellm/pull/29410) - fix(azure\_ai): strip tool-level extra fields on 400 and retry by [@Sameerlite](https://github.com/Sameerlite) in [#29479](https://github.com/BerriAI/litellm/pull/29479) - fix(docs): remove fixed dimensions from README hero image by [@mateo-berri](https://github.com/mateo-berri) in [#29496](https://github.com/BerriAI/litellm/pull/29496) - Litellm oss staging by [@Sameerlite](https://github.com/Sameerlite) in [#29492](https://github.com/BerriAI/litellm/pull/29492) - fix: small CLAUDE.md nits by [@mateo-berri](https://github.com/mateo-berri) in [#29504](https://github.com/BerriAI/litellm/pull/29504) - Add MCP semantic conventions to otelv2 by [@yassin-berriai](https://github.com/yassin-berriai) in [#29468](https://github.com/BerriAI/litellm/pull/29468) - fix(passthrough): emit otel guardrail span when a guardrail blocks by [@yassin-berriai](https://github.com/yassin-berriai) in [#29470](https://github.com/BerriAI/litellm/pull/29470) - fix(proxy): strip NUL bytes from spend log payloads to prevent PostgreSQL 22P05 by [@milan-berri](https://github.com/milan-berri) in [#29515](https://github.com/BerriAI/litellm/pull/29515) - \[internal copy of [#28008](https://github.com/BerriAI/litellm/issues/28008)] Support MCP OAuth passthrough and issuer-scoped JWT auth by [@mateo-berri](https://github.com/mateo-berri) in [#28356](https://github.com/BerriAI/litellm/pull/28356) - feat(vector-stores): forward per-request params to Vertex AI Search by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29459](https://github.com/BerriAI/litellm/pull/29459) - feat(proxy): add per-MCP-server RPM rate limiting for keys and teams by [@Sameerlite](https://github.com/Sameerlite) in [#29482](https://github.com/BerriAI/litellm/pull/29482) - fix(tests): drop module-level test calls that break local\_testing collection by [@mateo-berri](https://github.com/mateo-berri) in [#29520](https://github.com/BerriAI/litellm/pull/29520) - feat(agents): add LangFlow agent provider with A2A session bridging by [@Sameerlite](https://github.com/Sameerlite) in [#28963](https://github.com/BerriAI/litellm/pull/28963) - fix(ui/agents): make A2A skill tags enterable and validated by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29512](https://github.com/BerriAI/litellm/pull/29512) - \[internal copy of [#29232](https://github.com/BerriAI/litellm/issues/29232)] feat: route future Claude models to Anthropic provider via pattern matching by [@mateo-berri](https://github.com/mateo-berri) in [#29239](https://github.com/BerriAI/litellm/pull/29239) - fix(tests): drop import-time completion call in test\_register\_model by [@mateo-berri](https://github.com/mateo-berri) in [#29521](https://github.com/BerriAI/litellm/pull/29521) - test: stabilize batch VCR coverage and stop live upload/network leaks by [@mateo-berri](https://github.com/mateo-berri) in [#29477](https://github.com/BerriAI/litellm/pull/29477) - \[internal copy of [#29003](https://github.com/BerriAI/litellm/issues/29003)] fix(vertex\_ai): use user-supplied api\_base as is for Model Garden OpenAI-compat path by [@mateo-berri](https://github.com/mateo-berri) in [#29530](https://github.com/BerriAI/litellm/pull/29530) - feat(proxy): native /health/drain preStop hook for graceful shutdown by [@yassin-berriai](https://github.com/yassin-berriai) in [#29439](https://github.com/BerriAI/litellm/pull/29439) - fix(auth): preserve 401 status for expired JWTs in OTel traces by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29510](https://github.com/BerriAI/litellm/pull/29510) - fix(otel): capture 401 error details in management endpoint spans by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29535](https://github.com/BerriAI/litellm/pull/29535) - test(proxy/utils): pin bottom-of-file helper behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29509](https://github.com/BerriAI/litellm/pull/29509) - test(proxy/utils): pin PrismaClient and spend-update behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29488](https://github.com/BerriAI/litellm/pull/29488) - test(proxy/utils): pin ProxyLogging behavior by [@yuneng-berri](https://github.com/yuneng-berri) in [#29485](https://github.com/BerriAI/litellm/pull/29485) - fix: missing span for guardrail passthrough by [@yassin-berriai](https://github.com/yassin-berriai) in [#29552](https://github.com/BerriAI/litellm/pull/29552) - fix(auth): let internal users view search tools by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29542](https://github.com/BerriAI/litellm/pull/29542) - fix: missing mcp otel attributes by [@yassin-berriai](https://github.com/yassin-berriai) in [#29554](https://github.com/BerriAI/litellm/pull/29554) - fix(proxy): resolve managed video model ids for auth by [@shivamrawat1](https://github.com/shivamrawat1) in [#29545](https://github.com/BerriAI/litellm/pull/29545) - fix(key\_generate): allow team members to create keys on org-scoped teams by [@milan-berri](https://github.com/milan-berri) in [#29310](https://github.com/BerriAI/litellm/pull/29310) - test(pass-through): move Gemini pass-through tests to gemini-3.1-flash-lite by [@mateo-berri](https://github.com/mateo-berri) in [#29595](https://github.com/BerriAI/litellm/pull/29595) - Litellm oss staging 030626 by [@Sameerlite](https://github.com/Sameerlite) in [#29578](https://github.com/BerriAI/litellm/pull/29578) - Fix : a2a bugs 030626 by [@Sameerlite](https://github.com/Sameerlite) in [#29566](https://github.com/BerriAI/litellm/pull/29566) - \[internal copy of [#29533](https://github.com/BerriAI/litellm/issues/29533)] fix(anthropic/adapter): emit thinking block for reasoning\_content-only streaming chunks by [@mateo-berri](https://github.com/mateo-berri) in [#29600](https://github.com/BerriAI/litellm/pull/29600) - ci: reproduce default-Windows wheel install to guard MAX\_PATH by [@yuneng-berri](https://github.com/yuneng-berri) in [#29597](https://github.com/BerriAI/litellm/pull/29597) - fix(vertex): strip output\_config.effort for Vertex Claude models that reject it (Haiku 4.5) by [@mateo-berri](https://github.com/mateo-berri) in [#29585](https://github.com/BerriAI/litellm/pull/29585) - Litellm websocket improvements by [@Sameerlite](https://github.com/Sameerlite) in [#29563](https://github.com/BerriAI/litellm/pull/29563) - feat(arize/phoenix): OpenInference rendering parity — tool\_calls, cost, passthrough I/O, session/user, multimodal, cache tokens by [@milan-berri](https://github.com/milan-berri) in [#28800](https://github.com/BerriAI/litellm/pull/28800) - \[internal copy of [#29550](https://github.com/BerriAI/litellm/issues/29550)] fix: passthrough endpoints duplicate logs by [@mateo-berri](https://github.com/mateo-berri) in [#29598](https://github.com/BerriAI/litellm/pull/29598) - fix(ci): keep coverage rename green when a parallel node runs no tests by [@mateo-berri](https://github.com/mateo-berri) in [#29608](https://github.com/BerriAI/litellm/pull/29608) - test(vcr): close out the remaining VCR live-call leaks by [@mateo-berri](https://github.com/mateo-berri) in [#29603](https://github.com/BerriAI/litellm/pull/29603) - fix(key\_generate): exempt UI/CLI session tokens from the budget ceiling for team keys by [@yuneng-berri](https://github.com/yuneng-berri) in [#29612](https://github.com/BerriAI/litellm/pull/29612) - fix(realtime): allow null transcripts in stream logging payloads by [@milan-berri](https://github.com/milan-berri) in [#29625](https://github.com/BerriAI/litellm/pull/29625) - build(ui): migrate eslint to flat config + bump eslint-config-next to 16 by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29626](https://github.com/BerriAI/litellm/pull/29626) - fix(key\_generate): scope session-token team-key budget exemption to caller-supplied team\_id by [@yuneng-berri](https://github.com/yuneng-berri) in [#29641](https://github.com/BerriAI/litellm/pull/29641) - fix(proxy): disable proxy buffering on streaming SSE responses by [@mateo-berri](https://github.com/mateo-berri) in [#29557](https://github.com/BerriAI/litellm/pull/29557) - fix(mcp): gate /public/mcp\_hub strictly on litellm.public\_mcp\_servers by [@michelligabriele](https://github.com/michelligabriele) in [#27764](https://github.com/BerriAI/litellm/pull/27764) - ci(ui): frontend-lint job enforcing prettier + eslint on changed files by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29633](https://github.com/BerriAI/litellm/pull/29633) - fix(gemini): googleSearch + server-side tools and googleMaps JSON schema by [@Sameerlite](https://github.com/Sameerlite) in [#29582](https://github.com/BerriAI/litellm/pull/29582) - fix(proxy): passthrough 404 when SERVER\_ROOT\_PATH is set by [@Sameerlite](https://github.com/Sameerlite) in [#29658](https://github.com/BerriAI/litellm/pull/29658) - fix(gemini-realtime): use GA event names for Pipecat 1.3.x compatibility by [@Sameerlite](https://github.com/Sameerlite) in [#29662](https://github.com/BerriAI/litellm/pull/29662) - Litellm oss staging 040626 by [@Sameerlite](https://github.com/Sameerlite) in [#29671](https://github.com/BerriAI/litellm/pull/29671) - style(ui): prettier formatting pass over the dashboard by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29622](https://github.com/BerriAI/litellm/pull/29622) - chore: ignore prettier dashboard reformat in git blame by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29695](https://github.com/BerriAI/litellm/pull/29695) - fix(helm): Enable Backend Deployment to mount Gateway config.yaml by [@tin-berri](https://github.com/tin-berri) in [#29605](https://github.com/BerriAI/litellm/pull/29605) - \[internal copy of [#29277](https://github.com/BerriAI/litellm/issues/29277)] fix(proxy): add default=None to LiteLLM\_TeamMembership.litellm\_budget\_table by [@mateo-berri](https://github.com/mateo-berri) in [#29684](https://github.com/BerriAI/litellm/pull/29684) - test: make custom\_tokenizer proxy tests hermetic by [@yuneng-berri](https://github.com/yuneng-berri) in [#29643](https://github.com/BerriAI/litellm/pull/29643) - test(proxy): stop running real-DB tests in GitHub Actions unit jobs by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29700](https://github.com/BerriAI/litellm/pull/29700) - chore(ui): remove the bare-fetch lint rule by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29712](https://github.com/BerriAI/litellm/pull/29712) - Litellm jwt mapping virtualkeys by [@shivamrawat1](https://github.com/shivamrawat1) in [#28510](https://github.com/BerriAI/litellm/pull/28510) - refactor(ui): shared HTTP client + location-pinned fetch() lint rule by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29723](https://github.com/BerriAI/litellm/pull/29723) - fix(proxy): stop team BYOK model name corruption on model edit by [@yuneng-berri](https://github.com/yuneng-berri) in [#29731](https://github.com/BerriAI/litellm/pull/29731) - \[internal copy of [#29511](https://github.com/BerriAI/litellm/issues/29511)] feat(guardrails): add sensitive data routing to on-premise models by [@mateo-berri](https://github.com/mateo-berri) in [#29531](https://github.com/BerriAI/litellm/pull/29531) - fix(proxy/hooks): populate llm\_provider on internal rate-limit errors by [@mateo-berri](https://github.com/mateo-berri) in [#27707](https://github.com/BerriAI/litellm/pull/27707) - fix(vertex/anthropic): handle namespace tools and strip client\_metadata for codex compatibility by [@Sameerlite](https://github.com/Sameerlite) in [#29489](https://github.com/BerriAI/litellm/pull/29489) - Support OAuth M2M for Databricks Apps A2A agents by [@mateo-berri](https://github.com/mateo-berri) in [#29586](https://github.com/BerriAI/litellm/pull/29586) - fix: small CLAUDE.md nit by [@mateo-berri](https://github.com/mateo-berri) in [#29749](https://github.com/BerriAI/litellm/pull/29749) - fix(anthropic): route Claude Opus 4.8 through adaptive thinking by [@mateo-berri](https://github.com/mateo-berri) in [#29702](https://github.com/BerriAI/litellm/pull/29702) - fix(proxy): persist oauth2\_flow on MCP server registration by [@michelligabriele](https://github.com/michelligabriele) in [#29690](https://github.com/BerriAI/litellm/pull/29690) - \[internal copy of [#27491](https://github.com/BerriAI/litellm/issues/27491)] fix(realtime): Fix Realtime Audio Token Cost Tracking by [@mateo-berri](https://github.com/mateo-berri) in [#29722](https://github.com/BerriAI/litellm/pull/29722) - fix(galileo): use ingest traces API and standard logging payload by [@Sameerlite](https://github.com/Sameerlite) in [#29651](https://github.com/BerriAI/litellm/pull/29651) - fix(auth): expand all-team-models sentinel in can\_key\_call\_model for batch validation by [@Sameerlite](https://github.com/Sameerlite) in [#29746](https://github.com/BerriAI/litellm/pull/29746) - test(vcr): stop refreshing cassette TTL on read so cassettes lapse after 24h by [@mateo-berri](https://github.com/mateo-berri) in [#29784](https://github.com/BerriAI/litellm/pull/29784) - test(ci): record/replay OpenAI image gen so the spend E2E isn't outage-bound by [@mateo-berri](https://github.com/mateo-berri) in [#29787](https://github.com/BerriAI/litellm/pull/29787) - fix(ui): route MCP playground auth by oauth2 mode instead of token\_url by [@tin-berri](https://github.com/tin-berri) in [#29714](https://github.com/BerriAI/litellm/pull/29714) - refactor(ui): centralize proxy base URL resolution into tested resolver by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29793](https://github.com/BerriAI/litellm/pull/29793) - Litellm oss staging 050626 by [@Sameerlite](https://github.com/Sameerlite) in [#29774](https://github.com/BerriAI/litellm/pull/29774) - test(google): add google-genai SDK proxy integration tests by [@Sameerlite](https://github.com/Sameerlite) in [#29781](https://github.com/BerriAI/litellm/pull/29781) - fix(jwt): use resolved DB user\_id for spend on legacy email match by [@milan-berri](https://github.com/milan-berri) in [#29217](https://github.com/BerriAI/litellm/pull/29217) - feat(ui): generate dashboard API types from the proxy OpenAPI spec by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29816](https://github.com/BerriAI/litellm/pull/29816) - fix(proxy): drop deleted team BYOK model name from team.models by [@yuneng-berri](https://github.com/yuneng-berri) in [#29820](https://github.com/BerriAI/litellm/pull/29820) - feat(mcp): per-server env vars with global + per-user scopes by [@mateo-berri](https://github.com/mateo-berri) in [#28917](https://github.com/BerriAI/litellm/pull/28917) - refactor(ui): route behavior-preserving networking calls through apiClient by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29806](https://github.com/BerriAI/litellm/pull/29806) - fix(mcp): persist Tools-tab MCP OAuth token to DB by [@tin-berri](https://github.com/tin-berri) in [#29809](https://github.com/BerriAI/litellm/pull/29809) - fix(ui): require new expiration when regenerating an expired key by [@milan-berri](https://github.com/milan-berri) in [#29838](https://github.com/BerriAI/litellm/pull/29838) - refactor(ui): route query-building networking calls through apiClient by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29815](https://github.com/BerriAI/litellm/pull/29815) - Make the image-gen record/replay proxy report cache mode and per-request HIT/MISS by [@mateo-berri](https://github.com/mateo-berri) in [#29802](https://github.com/BerriAI/litellm/pull/29802) - feat(proxy): hot-reload .env in dev when running with --reload by [@mateo-berri](https://github.com/mateo-berri) in [#29783](https://github.com/BerriAI/litellm/pull/29783) - fix(ui): stop MCP playground tool calls from sending twice by [@tin-berri](https://github.com/tin-berri) in [#29821](https://github.com/BerriAI/litellm/pull/29821) - feat(fal\_ai): add Nano Banana / Gemini 2.5 Flash Image generation support by [@mateo-berri](https://github.com/mateo-berri) in [#29798](https://github.com/BerriAI/litellm/pull/29798) - Title: Fix managed batch cancel credential resolution by [@shivamrawat1](https://github.com/shivamrawat1) in [#29734](https://github.com/BerriAI/litellm/pull/29734) - Title: fix(proxy): resolve vector store file list credentials from team deployments by [@shivamrawat1](https://github.com/shivamrawat1) in [#29739](https://github.com/BerriAI/litellm/pull/29739) - refactor: convert AWS and GCP Terraform stacks into reusable modules … by [@yassin-berriai](https://github.com/yassin-berriai) in [#28103](https://github.com/BerriAI/litellm/pull/28103) - chore(ui): build ui for release by [@yuneng-berri](https://github.com/yuneng-berri) in [#29853](https://github.com/BerriAI/litellm/pull/29853) - fix(terraform/gcp): prompt for image\_registry in DeployStack one-click by [@yassin-berriai](https://github.com/yassin-berriai) in [#29852](https://github.com/BerriAI/litellm/pull/29852) - fix(terraform/gcp): abandon SQL user on destroy by [@yassin-berriai](https://github.com/yassin-berriai) in [#29855](https://github.com/BerriAI/litellm/pull/29855) - Extend the record/replay proxy to chat, embeddings, moderations, rerank, and Anthropic by [@mateo-berri](https://github.com/mateo-berri) in [#29847](https://github.com/BerriAI/litellm/pull/29847) - chore(deps): bump deps by [@yuneng-berri](https://github.com/yuneng-berri) in [#29860](https://github.com/BerriAI/litellm/pull/29860) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29861](https://github.com/BerriAI/litellm/pull/29861) - fix: 400 on Anthropic context overflow; seed identity on failed auth by [@yassin-berriai](https://github.com/yassin-berriai) in [#29848](https://github.com/BerriAI/litellm/pull/29848) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29862](https://github.com/BerriAI/litellm/pull/29862) - chore(release): patch v1.89.0-rc.1 with [#30064](https://github.com/BerriAI/litellm/issues/30064) (Claude Fable 5) for v1.89.0-rc.2 by [@mateo-berri](https://github.com/mateo-berri) in [#30143](https://github.com/BerriAI/litellm/pull/30143) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.0...v1.89.0> ### [`v1.88.2`](https://github.com/BerriAI/litellm/releases/tag/v1.88.2) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.88.1...v1.88.2) #### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.88.2/cosign.pub \ ghcr.io/berriai/litellm:v1.88.2 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** #### What's Changed - chore(release): backport Fable 5, batch-file auth, CrowdStrike AIDR, Mantle Responses SigV4, and NetApp streaming-cost fix to stable/1.88.x and cut 1.88.2 by [@mateo-berri](https://github.com/mateo-berri) in [#30144](https://github.com/BerriAI/litellm/pull/30144) - chore(release): backport DB-resilience, passthrough, model-info, budget, and deps fixes to stable/1.88.x by [@yuneng-berri](https://github.com/yuneng-berri) in [#30408](https://github.com/BerriAI/litellm/pull/30408) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.1...v1.88.2> ### [`v1.88.1`](https://github.com/BerriAI/litellm/releases/tag/v1.88.1) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.88.0...v1.88.1) #### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.88.1 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.88.1/cosign.pub \ ghcr.io/berriai/litellm:v1.88.1 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** #### What's Changed - build(deps): bump pyjwt to 2.13.0 and ws override to 8.20.1 (1.88.x) by [@yuneng-berri](https://github.com/yuneng-berri) in [#29987](https://github.com/BerriAI/litellm/pull/29987) - chore(release): bump version to 1.88.1 by [@yuneng-berri](https://github.com/yuneng-berri) in [#29989](https://github.com/BerriAI/litellm/pull/29989) **Full Changelog**: <https://github.com/BerriAI/litellm/compare/v1.88.0...v1.88.1> ### [`v1.88.0`](https://github.com/BerriAI/litellm/releases/tag/v1.88.0) [Compare Source](https://github.com/BerriAI/litellm/compare/v1.87.3...v1.88.0) #### Verify Docker Image Signature All LiteLLM Docker images are signed with [cosign](https://docs.sigstore.dev/cosign/overview/). Every release is signed with the same key introduced in [commit `0112e53`](https://github.com/BerriAI/litellm/commit/0112e53046018d726492c814b3644b7d376029d0). **Verify using the pinned commit hash (recommended):** A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \ ghcr.io/berriai/litellm:v1.88.0 ``` **Verify using the release tag (convenience):** Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules: ```bash cosign verify \ --key https://raw.githubusercontent.com/BerriAI/litellm/v1.88.0/cosign.pub \ ghcr.io/berriai/litellm:v1.88.0 ``` Expected output: ``` The following checks were performed on each of these signatures: - The cosign claims were validated - The signatures were verified against the specified public key ``` *** #### What's Changed - fix(proxy): gate team allowed\_passthrough\_routes to proxy admins by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28097](https://github.com/BerriAI/litellm/pull/28097) - fix(tests): stabilize image-edit VCR cassettes to stop live gpt-image-1 spend by [@mateo-berri](https://github.com/mateo-berri) in [#28110](https://github.com/BerriAI/litellm/pull/28110) - fix(bedrock/cohere): send embedding\_types as JSON array, not string by [@ishaan-berri](https://github.com/ishaan-berri) in [#28172](https://github.com/BerriAI/litellm/pull/28172) - fix(tests): migrate realtime + rerank tests off shut-down upstream models by [@yuneng-berri](https://github.com/yuneng-berri) in [#28191](https://github.com/BerriAI/litellm/pull/28191) - fix(caching): replay openai/responses bridge cache hits as chat streams by [@Sameerlite](https://github.com/Sameerlite) in [#28158](https://github.com/BerriAI/litellm/pull/28158) - Litellm oss staging by [@Sameerlite](https://github.com/Sameerlite) in [#28161](https://github.com/BerriAI/litellm/pull/28161) - feat(prometheus): add user\_email and user\_alias to user budget metrics by [@Sameerlite](https://github.com/Sameerlite) in [#28155](https://github.com/BerriAI/litellm/pull/28155) - test(callbacks): harden flaky proxy callback-leak detector by [@yuneng-berri](https://github.com/yuneng-berri) in [#28195](https://github.com/BerriAI/litellm/pull/28195) - fix(bedrock): sanitize batch metadata to prevent Pydantic ValidationError by [@mateo-berri](https://github.com/mateo-berri) in [#28202](https://github.com/BerriAI/litellm/pull/28202) - fix(deepseek): use native /anthropic/v1/messages endpoint and sanitize tools by [@mateo-berri](https://github.com/mateo-berri) in [#28200](https://github.com/BerriAI/litellm/pull/28200) - feat(ui): add Interactions API endpoint to playground with SSE streaming by [@Sameerlite](https://github.com/Sameerlite) in [#28156](https://github.com/BerriAI/litellm/pull/28156) - fix(proxy): decode bytes and pass-through SSE for Google-native streamGenerateContent ([#27444](https://github.com/BerriAI/litellm/issues/27444)) by [@Sameerlite](https://github.com/Sameerlite) in [#28213](https://github.com/BerriAI/litellm/pull/28213) - refactor(bedrock/sagemaker): switch to lazy loading for response stre… by [@harish-berri](https://github.com/harish-berri) in [#28189](https://github.com/BerriAI/litellm/pull/28189) - \[Refactor] UI - Spend Logs: consolidate filter state and extract components by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#25847](https://github.com/BerriAI/litellm/pull/25847) - fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5 by [@yuneng-berri](https://github.com/yuneng-berri) in [#28281](https://github.com/BerriAI/litellm/pull/28281) - chore(ci): bump versions by [@yuneng-berri](https://github.com/yuneng-berri) in [#28287](https://github.com/BerriAI/litellm/pull/28287) - feat: propagate team\_id and team\_alias to all child OTEL spans by [@yassin-berriai](https://github.com/yassin-berriai) in [#28273](https://github.com/BerriAI/litellm/pull/28273) - Day 0 support : Gemini 3.5 Flash by [@Sameerlite](https://github.com/Sameerlite) in [#28268](https://github.com/BerriAI/litellm/pull/28268) - Gemini managed agents support by [@Sameerlite](https://github.com/Sameerlite) in [#28270](https://github.com/BerriAI/litellm/pull/28270) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#28292](https://github.com/BerriAI/litellm/pull/28292) - feat(gemini): add gemini-3.1-flash-lite model cost map by [@Sameerlite](https://github.com/Sameerlite) in [#28320](https://github.com/BerriAI/litellm/pull/28320) - fix(spend\_counter): seed Redis counter via SET NX to prevent cross-pod double-seed by [@milan-berri](https://github.com/milan-berri) in [#27854](https://github.com/BerriAI/litellm/pull/27854) - fix(proxy): normalize batch file IDs before ManagedObjectTable write by [@Sameerlite](https://github.com/Sameerlite) in [#28339](https://github.com/BerriAI/litellm/pull/28339) - fix(router): use forwarded model\_id for native Azure container IDs by [@Sameerlite](https://github.com/Sameerlite) in [#27921](https://github.com/BerriAI/litellm/pull/27921) - fix(ui): restore log filter loading indicator by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28282](https://github.com/BerriAI/litellm/pull/28282) - test(e2e): migrate runner to uv, add All Proxy Models key test by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28313](https://github.com/BerriAI/litellm/pull/28313) - feat(ui): team passthrough routes create parity + edit load fix by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28098](https://github.com/BerriAI/litellm/pull/28098) - fix(mcp): JWT on tools/list and REST tools/call server resolution by [@Sameerlite](https://github.com/Sameerlite) in [#28227](https://github.com/BerriAI/litellm/pull/28227) - feat(interactions): migrate to Google Interactions API steps schema (May 2026) by [@Sameerlite](https://github.com/Sameerlite) in [#28153](https://github.com/BerriAI/litellm/pull/28153) - test(ui-e2e): admin key creation with a specific proxy model by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28365](https://github.com/BerriAI/litellm/pull/28365) - fix(vertex\_ai): omit function\_call id on Vertex Gemini 3.5+ tool turns by [@Sameerlite](https://github.com/Sameerlite) in [#28324](https://github.com/BerriAI/litellm/pull/28324) - feat(mcp): allow native MCP OAuth support for cursor by [@Sameerlite](https://github.com/Sameerlite) in [#28327](https://github.com/BerriAI/litellm/pull/28327) - fix(interactions): never drop streamed text deltas; always emit terminal completion by [@mateo-berri](https://github.com/mateo-berri) in [#28394](https://github.com/BerriAI/litellm/pull/28394) - fix(proxy): expose Prisma idle/connect timeout + extra DB URL params by [@yassin-berriai](https://github.com/yassin-berriai) in [#28395](https://github.com/BerriAI/litellm/pull/28395) - Litellm oss staging 1 by [@Sameerlite](https://github.com/Sameerlite) in [#28337](https://github.com/BerriAI/litellm/pull/28337) - fix: serialize guardrail\_response to JSON in OTEL traces by [@yassin-berriai](https://github.com/yassin-berriai) in [#28362](https://github.com/BerriAI/litellm/pull/28362) - chore(ci): merge dev branch by [@yuneng-berri](https://github.com/yuneng-berri) in [#28314](https://github.com/BerriAI/litellm/pull/28314) - test(realtime): expect session.created as xAI realtime initial event by [@yuneng-berri](https://github.com/yuneng-berri) in [#28424](https://github.com/BerriAI/litellm/pull/28424) - feat(tests): behavior-pinning harness + Key Tier-1 matrix by [@yuneng-berri](https://github.com/yuneng-berri) in [#28321](https://github.com/BerriAI/litellm/pull/28321) - fix(proxy): hydrate wildcard discovery credentials ([#28284](https://github.com/BerriAI/litellm/issues/28284)) - CCI Run by [@yuneng-berri](https://github.com/yuneng-berri) in [#28419](https://github.com/BerriAI/litellm/pull/28419) - Litellm oss staging 04 21 2026 2 by [@Sameerlite](https://github.com/Sameerlite) in [#26569](https://github.com/BerriAI/litellm/pull/26569) - chore(ci): merge dev branch by [@yuneng-berri](https://github.com/yuneng-berri) in [#28290](https://github.com/BerriAI/litellm/pull/28290) - fix(vertex\_gemma): strip `context_management` from request body by [@mateo-berri](https://github.com/mateo-berri) in [#28438](https://github.com/BerriAI/litellm/pull/28438) - fix(logging): recalculate cost after router retry failures by [@milan-berri](https://github.com/milan-berri) in [#28476](https://github.com/BerriAI/litellm/pull/28476) - fix(otel): emit guardrail span on violation, surface status + categories by [@yassin-berriai](https://github.com/yassin-berriai) in [#28364](https://github.com/BerriAI/litellm/pull/28364) - test(proxy): behavior-pinning matrix for team management endpoints by [@yuneng-berri](https://github.com/yuneng-berri) in [#28441](https://github.com/BerriAI/litellm/pull/28441) - test(vertex\_ai): tolerate transient 500 in google maps grounding test by [@yuneng-berri](https://github.com/yuneng-berri) in [#28503](https://github.com/BerriAI/litellm/pull/28503) - fix(docker): restore npm to non\_root builder image by [@yuneng-berri](https://github.com/yuneng-berri) in [#28519](https://github.com/BerriAI/litellm/pull/28519) - chore(ci): bump deps by [@yuneng-berri](https://github.com/yuneng-berri) in [#28524](https://github.com/BerriAI/litellm/pull/28524) - build(deps-dev): bump black to 26.3.1 and apply formatting by [@yuneng-berri](https://github.com/yuneng-berri) in [#28525](https://github.com/BerriAI/litellm/pull/28525) - chore(deps): bump deps by [@yuneng-berri](https://github.com/yuneng-berri) in [#28528](https://github.com/BerriAI/litellm/pull/28528) - test(e2e): forward LITELLM\_LICENSE to UI e2e proxy by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28398](https://github.com/BerriAI/litellm/pull/28398) - Add granian as a ASGI compliant web server. Provider better throughput stability, by [@harish-berri](https://github.com/harish-berri) in [#26027](https://github.com/BerriAI/litellm/pull/26027) - Fix conflicts and UI by [@Sameerlite](https://github.com/Sameerlite) in [#28477](https://github.com/BerriAI/litellm/pull/28477) - Add error\_description and hint for oauth flows by [@Sameerlite](https://github.com/Sameerlite) in [#28471](https://github.com/BerriAI/litellm/pull/28471) - feat(mcp): Add tool call and tool list support via UI for Oauth mcps by [@Sameerlite](https://github.com/Sameerlite) in [#28454](https://github.com/BerriAI/litellm/pull/28454) - feat(proxy): persist allowlisted OIDC claims in CLI SSO poll by [@Sameerlite](https://github.com/Sameerlite) in [#28463](https://github.com/BerriAI/litellm/pull/28463) - fix(responses): use OpenAI SSEDecoder for Responses API streaming by [@Sameerlite](https://github.com/Sameerlite) in [#28566](https://github.com/BerriAI/litellm/pull/28566) - Litellm oss staging 2 by [@Sameerlite](https://github.com/Sameerlite) in [#28582](https://github.com/BerriAI/litellm/pull/28582) - \[internal copy of [#28269](https://github.com/BerriAI/litellm/issues/28269)] Codex cli jwt team alias by [@mateo-berri](https://github.com/mateo-berri) in [#28621](https://github.com/BerriAI/litellm/pull/28621) - fix(check\_licenses): read PEP 639 license-expression metadata by [@yuneng-berri](https://github.com/yuneng-berri) in [#28529](https://github.com/BerriAI/litellm/pull/28529) - test(proxy): behavior-pinning matrix for tier-2/3 key + team management endpoints by [@yuneng-berri](https://github.com/yuneng-berri) in [#28620](https://github.com/BerriAI/litellm/pull/28620) - chore(test): remove dead old Playwright e2e suite by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28632](https://github.com/BerriAI/litellm/pull/28632) - fix(sagemaker): send native Cohere embed payload to Cohere SageMaker endpoints by [@milan-berri](https://github.com/milan-berri) in [#28613](https://github.com/BerriAI/litellm/pull/28613) - style: apply black formatting to fix lint CI (LIT-3274) ([#28639](https://github.com/BerriAI/litellm/issues/28639)) by [@krrish-berri-2](https://github.com/krrish-berri-2) in [#28641](https://github.com/BerriAI/litellm/pull/28641) - fix(bedrock): decouple STS region from Bedrock aws\_region\_name by [@milan-berri](https://github.com/milan-berri) in [#28245](https://github.com/BerriAI/litellm/pull/28245) - test(streaming): tolerate Vertex 429 wrapped in MidStreamFallbackError by [@yuneng-berri](https://github.com/yuneng-berri) in [#28669](https://github.com/BerriAI/litellm/pull/28669) - feat(guardrails): add Microsoft Purview DLP guardrail by [@Sameerlite](https://github.com/Sameerlite) in [#24966](https://github.com/BerriAI/litellm/pull/24966) - fix(mcp): forward upstream initialize instructions on cold gateway init by [@milan-berri](https://github.com/milan-berri) in [#28231](https://github.com/BerriAI/litellm/pull/28231) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#28680](https://github.com/BerriAI/litellm/pull/28680) - CI: copy of [#25177](https://github.com/BerriAI/litellm/issues/25177) (OCI GenAI: embeddings, streaming/reasoning fixes, model catalog) by [@mateo-berri](https://github.com/mateo-berri) in [#28223](https://github.com/BerriAI/litellm/pull/28223) - Encrypt callback\_vars in key/team metadata in DB by [@Michael-RZ-Berri](https://github.com/Michael-RZ-Berri) in [#27141](https://github.com/BerriAI/litellm/pull/27141) - perf: reduce per-request and per-chunk overhead across Anthropic streaming hot paths by [@yassin-berriai](https://github.com/yassin-berriai) in [#28289](https://github.com/BerriAI/litellm/pull/28289) - feat(azure): add Speech STT config support by [@ishaan-berri](https://github.com/ishaan-berri) in [#27482](https://github.com/BerriAI/litellm/pull/27482) - test(proxy): phase-4 payload behavior pinning for tier-2/3 key + team management endpoints by [@yuneng-berri](https://github.com/yuneng-berri) in [#28681](https://github.com/BerriAI/litellm/pull/28681) - feat(prometheus): emit per-token-type detail metrics (LIT-3220) ([#28372](https://github.com/BerriAI/litellm/issues/28372)) by [@ishaan-berri](https://github.com/ishaan-berri) in [#28378](https://github.com/BerriAI/litellm/pull/28378) - fix(otel): stamp http.response.status\_code on all error responses by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28405](https://github.com/BerriAI/litellm/pull/28405) - chore(ui): build ui by [@yuneng-berri](https://github.com/yuneng-berri) in [#28707](https://github.com/BerriAI/litellm/pull/28707) - fix(helm): drop main- prefix from default image tag by [@yuneng-berri](https://github.com/yuneng-berri) in [#28710](https://github.com/BerriAI/litellm/pull/28710) - test(model\_prices): allow audio\_transcription\_config in schema by [@yuneng-berri](https://github.com/yuneng-berri) in [#28708](https://github.com/BerriAI/litellm/pull/28708) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#28709](https://github.com/BerriAI/litellm/pull/28709) - fix(team): refresh team cache on team\_model\_add/delete (LIT-3244) by [@yuneng-berri](https://github.com/yuneng-berri) in [#28683](https://github.com/BerriAI/litellm/pull/28683) - fix(ui/add-model): stop vertex\_ai-anthropic\_models from leaking into Anthropic dropdown by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28723](https://github.com/BerriAI/litellm/pull/28723) - Fix spend logs v2 route permissions by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28705](https://github.com/BerriAI/litellm/pull/28705) - fix(proxy): Bedrock Knowledge Base pass-through: preserve SigV4 headers and signed request body by [@milan-berri](https://github.com/milan-berri) in [#27526](https://github.com/BerriAI/litellm/pull/27526) - chore(tests): migrate Bedrock CI to AWS account [`9412775`](https://github.com/BerriAI/litellm/commit/941277531214) by [@mateo-berri](https://github.com/mateo-berri) in [#28728](https://github.com/BerriAI/litellm/pull/28728) - fix(otel): export SERVER span on management-endpoint success without http\_request by [@yassin-berriai](https://github.com/yassin-berriai) in [#28794](https://github.com/BerriAI/litellm/pull/28794) - chore(ci): merge dev branch by [@yuneng-berri](https://github.com/yuneng-berri) in [#28801](https://github.com/BerriAI/litellm/pull/28801) - chore(ci): merge dev branch by [@yuneng-berri](https://github.com/yuneng-berri) in [#28657](https://github.com/BerriAI/litellm/pull/28657) - fix(ui): show 2-decimal precision for max\_budget on key overview by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28809](https://github.com/BerriAI/litellm/pull/28809) - feat(proxy): allow `llm_api_routes` virtual keys to list MCP servers by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28442](https://github.com/BerriAI/litellm/pull/28442) - chore(ci): merge dev branch by [@yuneng-berri](https://github.com/yuneng-berri) in [#28807](https://github.com/BerriAI/litellm/pull/28807) - fix(team): keep team\_alias cache in sync on \_cache\_team\_object writes by [@yuneng-berri](https://github.com/yuneng-berri) in [#28737](https://github.com/BerriAI/litellm/pull/28737) - chore(ci): merge dev branch by [@yuneng-berri](https://github.com/yuneng-berri) in [#28822](https://github.com/BerriAI/litellm/pull/28822) - ci: daily oss-agent-shin canonical branch by [@ishaan-berri](https://github.com/ishaan-berri) in [#28829](https://github.com/BerriAI/litellm/pull/28829) - test(proxy): add harness for proxy\_server.py behavior-pinning by [@yuneng-berri](https://github.com/yuneng-berri) in [#28827](https://github.com/BerriAI/litellm/pull/28827) - feat(openai): apply regional-processing cost uplift for EU/US data residency by [@mateo-berri](https://github.com/mateo-berri) in [#28626](https://github.com/BerriAI/litellm/pull/28626) - chore(admin-ui): regenerate static export with trailingSlash: true by [@mateo-berri](https://github.com/mateo-berri) in [#28112](https://github.com/BerriAI/litellm/pull/28112) - fix(azure): preserve AD token refresh in v1 OpenAI client path by [@mateo-berri](https://github.com/mateo-berri) in [#28627](https://github.com/BerriAI/litellm/pull/28627) - fix(ui): route API Reference back to query-param page by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28726](https://github.com/BerriAI/litellm/pull/28726) - fix(model-edit): allow clearing custom pricing on wildcard models by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28719](https://github.com/BerriAI/litellm/pull/28719) - fix(tests/vcr): make Redis cassette cache replay deterministically (zero VCR misses on consecutive runs) by [@mateo-berri](https://github.com/mateo-berri) in [#28826](https://github.com/BerriAI/litellm/pull/28826) - fix(proxy): strip LiteLLM policy tracking from OpenAI batch metadata by [@shivamrawat1](https://github.com/shivamrawat1) in [#28425](https://github.com/BerriAI/litellm/pull/28425) - Litellm OpenAI double prefix bug by [@shivamrawat1](https://github.com/shivamrawat1) in [#28661](https://github.com/BerriAI/litellm/pull/28661) - Litellm oss staging 250526 by [@Sameerlite](https://github.com/Sameerlite) in [#28770](https://github.com/BerriAI/litellm/pull/28770) - fix(bedrock): align toolUse/toolSpec names and allow hyphens by [@Sameerlite](https://github.com/Sameerlite) in [#28874](https://github.com/BerriAI/litellm/pull/28874) - fix(realtime): send TEXT frames and valid guardrail session.update by [@Sameerlite](https://github.com/Sameerlite) in [#28848](https://github.com/BerriAI/litellm/pull/28848) - fix(mcp): extend key access-group union to MCP servers by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28890](https://github.com/BerriAI/litellm/pull/28890) - fix(galileo): support hosted v2 spans API and string output extraction by [@Sameerlite](https://github.com/Sameerlite) in [#28771](https://github.com/BerriAI/litellm/pull/28771) - fix(proxy): exclude proxy\_server\_request from its own body snapshot by [@michelligabriele](https://github.com/michelligabriele) in [#28618](https://github.com/BerriAI/litellm/pull/28618) - \[Feat] Add tool calling support for gemini and vertex ai live api by [@Sameerlite](https://github.com/Sameerlite) in [#26590](https://github.com/BerriAI/litellm/pull/26590) - refactor(ui): remove dead App Router scaffolding in (dashboard)/\* by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28891](https://github.com/BerriAI/litellm/pull/28891) - fix(docker): use system Node in componentized builders + retry apk add by [@yassin-berriai](https://github.com/yassin-berriai) in [#28888](https://github.com/BerriAI/litellm/pull/28888) - docs(agents): require consent before writing new third-party names by [@yuneng-berri](https://github.com/yuneng-berri) in [#28908](https://github.com/BerriAI/litellm/pull/28908) - refactor(ui): extract auth state into AuthContext by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28910](https://github.com/BerriAI/litellm/pull/28910) - fix(mcp): resolve team.access\_group\_ids → MCP servers by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28997](https://github.com/BerriAI/litellm/pull/28997) - test(ui): e2e cover team model edit + admin identity in navbar by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#28652](https://github.com/BerriAI/litellm/pull/28652) - test(e2e): cover add-fallback flow in Router Settings by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29069](https://github.com/BerriAI/litellm/pull/29069) - test(e2e): cover Team-BYOK add-model flow as proxy admin by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29068](https://github.com/BerriAI/litellm/pull/29068) - fix(containers): record ownership for service-account keys + fix Prisma Json serialization by [@Sameerlite](https://github.com/Sameerlite) in [#28990](https://github.com/BerriAI/litellm/pull/28990) - test(e2e): cover add-MCP-server flow via discovery → custom form by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29070](https://github.com/BerriAI/litellm/pull/29070) - test(e2e): cover AI Hub make-public flow and public model\_hub\_table by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29071](https://github.com/BerriAI/litellm/pull/29071) - \[internal copy of [#28877](https://github.com/BerriAI/litellm/issues/28877)] feat: add support for claude code goal mode for bedrock opus output config by [@mateo-berri](https://github.com/mateo-berri) in [#28898](https://github.com/BerriAI/litellm/pull/28898) - feat(guardrails): wire apply\_guardrail into proxy logging callbacks by [@Sameerlite](https://github.com/Sameerlite) in [#28970](https://github.com/BerriAI/litellm/pull/28970) - chore(ci): merge dev brach by [@yuneng-berri](https://github.com/yuneng-berri) in [#29192](https://github.com/BerriAI/litellm/pull/29192) - perf(streaming): cut per-chunk overhead \~30% on Anthropic + Bedrock hot path by [@yassin-berriai](https://github.com/yassin-berriai) in [#28720](https://github.com/BerriAI/litellm/pull/28720) - fix(proxy): enforce tag budgets for key-level tags by [@Sameerlite](https://github.com/Sameerlite) in [#29108](https://github.com/BerriAI/litellm/pull/29108) - fix(vertex-ai): use DB credentials in video handlers + implement Veo video edit by [@Sameerlite](https://github.com/Sameerlite) in [#29098](https://github.com/BerriAI/litellm/pull/29098) - fix(datadog): drain cost-management queue + opt-in FinOps tag allowlist by [@michelligabriele](https://github.com/michelligabriele) in [#28487](https://github.com/BerriAI/litellm/pull/28487) - feat(helm): split per-component ServiceAccounts for gateway, backend, and UI by [@yassin-berriai](https://github.com/yassin-berriai) in [#28712](https://github.com/BerriAI/litellm/pull/28712) - chore(ci): bump deps ([#29208](https://github.com/BerriAI/litellm/issues/29208)) by [@yuneng-berri](https://github.com/yuneng-berri) in [#29226](https://github.com/BerriAI/litellm/pull/29226) - fix(tests/vcr): mint Google OAuth tokens live to prevent stale-token replay by [@yuneng-berri](https://github.com/yuneng-berri) in [#29229](https://github.com/BerriAI/litellm/pull/29229) - chore(cookbook): bump Go directive to 1.26.3 in gollem example by [@yuneng-berri](https://github.com/yuneng-berri) in [#29234](https://github.com/BerriAI/litellm/pull/29234) - chore(ci): bump version by [@yuneng-berri](https://github.com/yuneng-berri) in [#29242](https://github.com/BerriAI/litellm/pull/29242) - feat(anthropic): add Claude Opus 4.8 and prune reasoning-effort flags by [@mateo-berri](https://github.com/mateo-berri) in [#29238](https://github.com/BerriAI/litellm/pull/29238) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29243](https://github.com/BerriAI/litellm/pull/29243) - fix(ci): restore real Bedrock batch S3 bucket/role in oai\_misc\_config by [@mateo-berri](https://github.com/mateo-berri) in [#29245](https://github.com/BerriAI/litellm/pull/29245) - fix(guardrails): persist disable\_global\_guardrails on keys by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29233](https://github.com/BerriAI/litellm/pull/29233) - test(e2e): cover Team Admin view + member + key flows by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29072](https://github.com/BerriAI/litellm/pull/29072) - docs: hand-written CLAUDE.md; remove AGENTS.md, point GEMINI.md at it by [@mateo-berri](https://github.com/mateo-berri) in [#29252](https://github.com/BerriAI/litellm/pull/29252) - fix(teams): expose keys\_count on /v2/team/list and wire UI Resources badge by [@michelligabriele](https://github.com/michelligabriele) in [#28502](https://github.com/BerriAI/litellm/pull/28502) - fix(anthropic): stop injecting unsupported output\_config.effort=xhigh for Claude Code on Sonnet/Opus 4.6 by [@mateo-berri](https://github.com/mateo-berri) in [#29304](https://github.com/BerriAI/litellm/pull/29304) - test(e2e): cover Internal Viewer nav, key, and team-info gating by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29075](https://github.com/BerriAI/litellm/pull/29075) - test(e2e): cover Internal User key modal, team info, key page by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29074](https://github.com/BerriAI/litellm/pull/29074) - test(e2e): cover navbar Logout flow as proxy admin by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29076](https://github.com/BerriAI/litellm/pull/29076) - fix(mcp): resolve key.access\_group\_ids → MCP servers (ungated) by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29195](https://github.com/BerriAI/litellm/pull/29195) - fix(router): enforce deployment budgets for dynamically added models by [@Sameerlite](https://github.com/Sameerlite) in [#29273](https://github.com/BerriAI/litellm/pull/29273) - fix(proxy): map stripped batch body.model to proxy alias for auth by [@Sameerlite](https://github.com/Sameerlite) in [#29264](https://github.com/BerriAI/litellm/pull/29264) - feat(mcp): support stateless and stateful clients via session-id routing by [@Sameerlite](https://github.com/Sameerlite) in [#26857](https://github.com/BerriAI/litellm/pull/26857) - fix(bedrock): support tool search results + chat annotations by [@Sameerlite](https://github.com/Sameerlite) in [#29120](https://github.com/BerriAI/litellm/pull/29120) - fix(mcp): ignore stale ids on key save by [@Sameerlite](https://github.com/Sameerlite) in [#29128](https://github.com/BerriAI/litellm/pull/29128) - feat(a2a): well-known agent-card discovery + LangGraph Platform mode by [@Sameerlite](https://github.com/Sameerlite) in [#28860](https://github.com/BerriAI/litellm/pull/28860) - fix(proxy): link passthrough success spans to the SERVER root OTEL span by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29315](https://github.com/BerriAI/litellm/pull/29315) - \[internal copy of [#29089](https://github.com/BerriAI/litellm/issues/29089)] fix: duplicate claude code traces by [@mateo-berri](https://github.com/mateo-berri) in [#29311](https://github.com/BerriAI/litellm/pull/29311) - feat(otel): typed semconv-aligned OpenTelemetry instrumentation by [@yassin-berriai](https://github.com/yassin-berriai) in [#28909](https://github.com/BerriAI/litellm/pull/28909) - tests(proxy\_server): surface current behavior in tests by [@yuneng-berri](https://github.com/yuneng-berri) in [#29309](https://github.com/BerriAI/litellm/pull/29309) - test(e2e): cover Internal User create-key flow when in no teams by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29083](https://github.com/BerriAI/litellm/pull/29083) - test(e2e): assert internal-user navbar identity is scoped to that user by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29077](https://github.com/BerriAI/litellm/pull/29077) - feat(otel): add team\_metadata, http.route, and model names to inference spans by [@yassin-berriai](https://github.com/yassin-berriai) in [#29319](https://github.com/BerriAI/litellm/pull/29319) - feat(context\_management): compact\_20260112 polyfill for non-Anthropic providers by [@Sameerlite](https://github.com/Sameerlite) in [#28868](https://github.com/BerriAI/litellm/pull/28868) - feat(enterprise): add RESEND\_FROM\_EMAIL for self-hosted Resend sends by [@shivamrawat1](https://github.com/shivamrawat1) in [#28830](https://github.com/BerriAI/litellm/pull/28830) - Revert Bedrock CI back to the reactivated AWS account ([`8886022`](https://github.com/BerriAI/litellm/commit/888602223428)) by [@mateo-berri](https://github.com/mateo-berri) in [#29326](https://github.com/BerriAI/litellm/pull/29326) - fix(mcp): preserve source\_url in GET /v1/mcp/server list responses by [@shivamrawat1](https://github.com/shivamrawat1) in [#29249](https://github.com/BerriAI/litellm/pull/29249) - fix(mcp): preserve omitted fields on PUT /v1/mcp/server partial updates by [@shivamrawat1](https://github.com/shivamrawat1) in [#29253](https://github.com/BerriAI/litellm/pull/29253) - fix(ci): make litellm\_internal\_staging green (logging test + Bedrock Opus 4.7 self-heal) by [@mateo-berri](https://github.com/mateo-berri) in [#29344](https://github.com/BerriAI/litellm/pull/29344) - refactor(proxy/auth): normalize Bearer prefix in safe-hash helper by [@yuneng-berri](https://github.com/yuneng-berri) in [#29343](https://github.com/BerriAI/litellm/pull/29343) - test(reasoning-effort-grid): cover Claude Opus 4.8 across provider routes by [@mateo-berri](https://github.com/mateo-berri) in [#29327](https://github.com/BerriAI/litellm/pull/29327) - fix(guardrails): return HTTP 400 for litellm content filter blocks by [@shivamrawat1](https://github.com/shivamrawat1) in [#28418](https://github.com/BerriAI/litellm/pull/28418) - fix(proxy): restrict vector store index create/delete to proxy admins by [@shivamrawat1](https://github.com/shivamrawat1) in [#29202](https://github.com/BerriAI/litellm/pull/29202) - feat(pass\_through): extend passthrough\_managed\_object\_ids to Azure by [@Sameerlite](https://github.com/Sameerlite) in [#29160](https://github.com/BerriAI/litellm/pull/29160) - fix(proxy): enforce allowed\_passthrough\_routes for auth=true pass-thr… by [@shivamrawat1](https://github.com/shivamrawat1) in [#29256](https://github.com/BerriAI/litellm/pull/29256) - feat(mcp/auth): additive key access-group grants + opt-in member assignment by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29313](https://github.com/BerriAI/litellm/pull/29313) - fix(reset\_budget): write only {spend, budget\_reset\_at} and stop pre-zeroing counter by [@yuneng-berri](https://github.com/yuneng-berri) in [#29358](https://github.com/BerriAI/litellm/pull/29358) - test(e2e): cover PROXY\_LOGOUT\_URL redirect on Logout by [@ryan-crabbe-berri](https://github.com/ryan-crabbe-berri) in [#29080](https://github.com/BerriAI/litellm/pull/29080) - fix(ui): break logout redirect loop across dev and proxy origins by [@yuneng-berri](https://github.com/yuneng-berri) in [#29360](https://github.com/BerriAI/litellm/pull/29360) - fix(openai-moderation): wire streaming flags through to unified dispatcher by [@michelligabriele](https://github.com/michelligabriele) in [#27324](https://github.com/BerriAI/litellm/pull/27324) - chore(ci): build ui by [@yuneng-berri](https://github.com/yuneng-berri) in [#29366](https://github.com/BerriAI/litellm/pull/29366) - fix(v3 limiter): cap no-max\_tokens TPM floor at smallest configured limit by [@michelligabriele](https://github.com/michelligabriele) in [#28805](https://github.com/BerriAI/litellm/pull/28805) - fix(e2e): tolerate trailing slash in SERVER\_ROOT\_PATH login redirect by [@yuneng-berri](https://github.com/yuneng-berri) in [#29369](https://github.com/BerriAI/litellm/pull/29369) - chore(deps): bump deps by [@yuneng-berri](https://github.com/yuneng-berri) in [#29373](https://github.com/BerriAI/litellm/pull/29373) - chore(ci): promote internal staging to main by [@yuneng-berri](https://github.com/yuneng-berri) in [#29372](https://github.com/BerriAI/litellm/pull/29372) - chore(release): patch v1.88.0-rc.1 with four staged fixes by [@mateo-berri](https://github.com/mateo-berri) in [#29632](https://github.com/BerriAI/litellm/pull/29632) - chore(release): patch v1.88.0-rc.1 with [#29612](https://github.com/BerriAI/litellm/issues/29612) (session-token budget-ceiling exemption) by [@mateo-berri](https://github.com/mateo-berri) in [#29637](https://github.com/BerriAI/litellm/pull/29637) - fix(key\_generate): harden GHSA-q775 …

feat(ui): add Interactions API support to playground with streaming

b3ccc7d

Adds /v1beta/interactions as a selectable endpoint in the UI playground. Uses SSE streaming (stream=true) and parses content.delta events for real-time output. Co-authored-by: Cursor <cursoragent@cursor.com>

greptile-apps Bot reviewed May 18, 2026

View reviewed changes

Comment thread ui/litellm-dashboard/src/components/playground/llm_calls/interactions_api.tsx

Comment thread ui/litellm-dashboard/src/components/playground/llm_calls/interactions_api.tsx

Comment thread ui/litellm-dashboard/src/components/playground/chat_ui/mode_endpoint_mapping.tsx

Sameerlite and others added 3 commits May 18, 2026 16:43

undo unrelated changes

b8f4008

greptile-apps Bot reviewed May 18, 2026

View reviewed changes

Comment thread tests/test_litellm/proxy/google_endpoints/test_interactions_agent_param.py Outdated

cursor Bot reviewed May 18, 2026

View reviewed changes

Comment thread ui/litellm-dashboard/src/components/playground/llm_calls/interactions_api.tsx

fix(ui): extract model from top-level field in interactions bridge ev…

6381a92

…ents Co-authored-by: Yassin Kortam <yassin@berri.ai>

cursor Bot reviewed May 18, 2026

View reviewed changes

Comment thread tests/test_litellm/proxy/google_endpoints/test_interactions_agent_param.py Outdated

cursor Bot reviewed May 18, 2026

View reviewed changes

Comment thread litellm/interactions/litellm_responses_transformation/streaming_iterator.py

Comment thread litellm/interactions/litellm_responses_transformation/streaming_iterator.py Outdated

cursor Bot reviewed May 18, 2026

View reviewed changes

Comment thread ui/litellm-dashboard/src/components/playground/llm_calls/interactions_api.tsx

Comment thread litellm/interactions/litellm_responses_transformation/streaming_iterator.py

test(interactions): cover ContentPartAddedEvent ordering and no-op paths

3d82735

greptile-apps Bot reviewed May 18, 2026

View reviewed changes

mateo-berri added 3 commits May 18, 2026 23:55

Merge branch 'litellm_internal_staging' into litellm_interactions_ui_…

b717c70

…streaming

Merge branch 'litellm_interactions_ui_streaming' of https://github.co…

7e9b615

…m/BerriAI/litellm into litellm_interactions_ui_streaming

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread ui/litellm-dashboard/src/components/playground/llm_calls/interactions_api.tsx Outdated

chore(ui): remove unused InteractionOutput/InteractionResponse interf…

6cdf840

…aces Co-authored-by: Yassin Kortam <yassin@berri.ai>

mateo-berri approved these changes May 19, 2026

View reviewed changes

mateo-berri merged commit 5818828 into litellm_internal_staging May 19, 2026
116 checks passed

mateo-berri deleted the litellm_interactions_ui_streaming branch May 19, 2026 02:55

		@@ -92,24 +93,47 @@ def _transform_responses_chunk_to_interactions_chunk(
		model=self.model,
		)

Uh oh!

Conversation

Sameerlite commented May 18, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

codecov Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Sameerlite commented May 18, 2026

Uh oh!

Uh oh!

Uh oh!

CLAassistant commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mateo-berri commented May 18, 2026

Uh oh!

mateo-berri commented May 18, 2026

Uh oh!

mateo-berri commented May 18, 2026

Uh oh!

greptile-apps Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

mateo-berri May 18, 2026

Choose a reason for hiding this comment

Uh oh!

mateo-berri commented May 18, 2026

Uh oh!

mateo-berri commented May 18, 2026

Uh oh!

mateo-berri commented May 19, 2026

Uh oh!

cursor Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mateo-berri commented May 19, 2026

Uh oh!

Sameerlite commented May 19, 2026

Uh oh!

mateo-berri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Sameerlite commented May 18, 2026 •

edited by cursor Bot

Loading

codecov Bot commented May 18, 2026 •

edited

Loading

greptile-apps Bot commented May 18, 2026 •

edited

Loading

CLAassistant commented May 18, 2026 •

edited

Loading

cursor Bot left a comment •

edited

Loading