Skip to content

[LIT-2881] Admin UI — /agents three-pane dashboard (Cursor SDK)#27331

Open
ishaan-berri wants to merge 43 commits into
litellm_internal_stagingfrom
litellm_lit-2881-admin-ui-agents-dashboard
Open

[LIT-2881] Admin UI — /agents three-pane dashboard (Cursor SDK)#27331
ishaan-berri wants to merge 43 commits into
litellm_internal_stagingfrom
litellm_lit-2881-admin-ui-agents-dashboard

Conversation

@ishaan-berri

Copy link
Copy Markdown
Contributor

Relevant issues

Linear ticket

Resolves LIT-2881

Pre-Submission checklist

  • I have Added testing - 8 Playwright specs in ui/litellm-dashboard/e2e_tests/tests/agents/ (one per validation criterion in LIT-2881)
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review

CI (LiteLLM team)

  • Branch creation CI run
    Link:
  • CI run for the last commit
    Link:
  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

QA pending — UI built behind NEXT_PUBLIC_USE_MOCK_AGENTS=true. Run pnpm dev then navigate to /agents. Will attach screenshots before merge.

Type

🆕 New Feature

Changes

New /agents route group on the dashboard with three-pane Cursor-Cloud-Agents-style layout.

Routes

  • /agents — list of cloud-agent definitions
  • /agents/{agent_id} — sessions under an agent
  • /agents/{agent_id}/sessions/{session_id} — three-pane (sessions sidebar / conversation / Git+Terminal tabs)

Backend

  • API client targets /v2/ (the existing /v1/agents is the A2A registry — the new VM-agent API moves to /v2/)
  • Until Epic A (LIT-2877) lands real endpoints, mock provider drives the UI via NEXT_PUBLIC_USE_MOCK_AGENTS=true

Components (antd, not Tremor)

  • AgentList, AgentDetail, NewAgentDialog
  • SessionList, SessionRow, NewSessionDialog
  • Three-pane: Conversation + MessageBubble + ToolCallCard + FilesChangedAccordion + Composer, RightPanel with GitTab + TerminalTab
  • useSessionEventStream SSE hook with seq-cursor resume + dedup

Coordination with G1 (LIT-2891)

  • Settings hand-off button on /agents/{aid} links to /settings/cloud-agents/ (Epic G's territory). No settings inline.

Tests

  • 8 Playwright specs under e2e_tests/tests/agents/ (one per validation criterion)
  • New pnpm e2e:agents script + dedicated playwright.agents.config.ts (separate from proxy globalSetup)

ishaan-berri and others added 30 commits May 6, 2026 14:54
Mirrors the API spec from LIT-2877 (Epic A). Single source of truth for
the dashboard so components don't redefine shapes inline. Namespaced
`Cloud*` to avoid colliding with the legacy proxy-side Agent type.
Wired via NEXT_PUBLIC_USE_MOCK_AGENTS=true. Temporary shim until Epic A
(LIT-2877) lands the real /v1/agents, /v1/sessions, conversation, and
event-stream endpoints. Shapes mirror the API spec for a one-line swap.
Includes a canned MOCK_RUN_EVENTS sequence used by the SSE hook.
Proxy-routed fetches for agents, sessions, runs, conversation, and
followup. Mock-aware: short-circuits to mock-agents.ts when
NEXT_PUBLIC_USE_MOCK_AGENTS=true. Centralizes the SSE URL shape via
buildRunEventStreamUrl so the hook just opens the EventSource.
EventSource hook with auto-reconnect, seq-cursor resume, and dedup.
On error closes the stream and re-opens after 1s backoff, passing
since_seq=<lastSeq> so the server replays missed events. In mock mode
replays MOCK_RUN_EVENTS at 400ms cadence so the UI looks live, and
listens for window offline/online so Playwright can exercise the
reconnect path.
antd Table that renders cloud-agent definitions and links each row to
/agents/{agent_id}. Uses Tag for the model and dayjs.fromNow for the
last-activity column. Empty state is the antd Empty placeholder.
antd Modal + Form that posts a new cloud-agent definition. Definition-
only — no VM is provisioned at creation time.
Renders the AgentList with a 'New Agent' affordance. Uses useAuthorized
for the access token and the cloud-agents-client for the fetch. In mock
mode this works without a backend so the UI can be developed alongside
Epic A.
The existing /v1/agents endpoint is reserved for the A2A registry. The new
VM-agent API used by this dashboard moves under /v2/.
Single sidebar entry showing session title, status pill (antd Tag, gold for
provisioning), branch and last-updated timestamp. Links to the three-pane
view at /agents/{aid}/sessions/{sid}.
Vertical list of SessionRow under the active agent with a + New action
pinned to the header. Shared by /agents/{aid} and the three-pane view.
Modal collecting a repo URL, posts to createCloudSession, then surfaces
the new session to the parent for redirect into the three-pane view.
Renders user/assistant/tool/system messages with role-tagged styling.
Tool calls render via ToolCallCard, not this bubble.
Collapsible card for assistant tool invocations from tool_call events.
Cursor Cloud Agents-style: collapsed shows tool + preview; expanded
shows full input (and result, when present).
Aggregates file_diff events into a 'N Files Changed' collapsible at
the bottom of the conversation pane. Cumulative across the run per
LIT-2881 spec — latest patch wins, additions/deletions sum per path.
Textarea + Send at the bottom of the conversation pane. POSTs to
/v2/sessions/{sid}/followup; the resulting user_message lands via the
SSE stream.
Middle pane unioning the initial conversation snapshot with live SSE
events. user_message and assistant_message render as MessageBubble;
tool_call as ToolCallCard; file_diff folds into FilesChangedAccordion.
Combines the active Run's git.branches with live git_commit / pr_opened
events. Shows branches, PR link (live event wins over snapshot), and
commits sorted newest-first.
Read-only ANSI tail of terminal_chunk events. Tiny SGR parser handles
foreground colors and bold; resets on \x1b[0m. Anything else (cursor
moves, 256-color, truecolor) is dropped. data-testid='ansi-#ff0000'
exposes the red span for Validation #7.
Wraps GitTab and TerminalTab in an antd Tabs component. Defaults to
Git per the LIT-2881 layout.
Owns the SessionList sidebar, conversation snapshot fetch, run snapshot
fetch, and SSE subscription. Distributes events to Conversation and
RightPanel as props — the children are presentational.
Renders the agent identity, system prompt, and SessionList for this
agent. Settings hand-off link points at /settings/cloud-agents/ (Epic
G's territory) per coordination note in E1.md.
Per-agent landing page. Resolves auth via useAuthorized and passes the
agent_id param to AgentDetail.
Three-pane session view. Resolves auth via useAuthorized and passes
the agent_id + session_id params to ThreePane.
Targets http://localhost:3000 (Next.js dev server) directly. The agents
UI lives in App Router routes which only render under `next dev`, not
the proxy's static export — separate config skips the proxy globalSetup
that the rest of the suite needs.
Wraps the agents-suite Playwright config so it can run alongside the
existing e2e suite without colliding on globalSetup.
useAuthorized requires an unexpired JWT in the `token` cookie before it
renders. We mint an unsigned 1-hour token here — jwt-decode never
verifies the signature, so any structurally valid base64 payload works.
Helper plus an AGENTS_DEV_URL constant (overridable via env).
ishaan-jaff added 13 commits May 6, 2026 15:09
Routes load — /agents, /agents/{aid}, /agents/{aid}/sessions/{sid}
each render their primary container. Captures console errors and
asserts none on the list view.
Open New Agent dialog, fill name + model, submit; assert the new row
shows up in the table within 5s. Definition only — no VM.
From /agents/{aid}, click + New Session, fill repo URL; assert the URL
redirects into the three-pane view and the status pill shows
'provisioning'.
Inside a session, poll until ≥3 events have rendered in the conversation
pane (combination of message bubbles and tool-call cards). Mock provider
ticks every 400ms so this lands well under the 10s budget.
Mid-stream, page.context().setOffline(true)/(false); event count never
regresses and continues climbing after reconnect. Exercises the seq
dedup branch of useSessionEventStream.
Type a message, send; assert the user_message bubble count strictly
grows. Mock provider acks the user_message synchronously — that's
enough to verify the composer plumbing without Epic A.
Switch to the Terminal tab; mock streamer emits a terminal_chunk with
ANSI red. Assert the rendered span has computed color rgb(255, 0, 0).
Exercises the auth gate plumbing — present a fake token, navigate, then
swap to a fresh token and reload. Real backend partitioning is gated
on Epic A; the spec is structured so its assertions can be tightened
once the real /v2/ endpoints land.
Centralizes dayjs.extend(relativeTime) so fromNow() is typed and
loaded across the agents components. relativeOrAbsolute() falls back
to '—' for null/invalid timestamps so callers don't have to
re-implement the guard.
Optional-chaining dayjs(...).fromNow?.() was a TS error because the
relativeTime plugin wasn't loaded. Use relativeOrAbsolute() from the
shared helper instead.
@ant-design/icons isn't a direct dependency of the dashboard. Use a
text glyph (▸/▾) instead — keeps the toggle visible without adding
a runtime import.
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ ishaan-jaff
❌ ishaan-berri
You have signed the CLA already but the status is still pending? Let us recheck it.

@greptile-apps

greptile-apps Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Introduces the /agents route group — a Cursor-Cloud-Agents-style three-pane dashboard (sessions sidebar, conversation with live SSE, Git+Terminal right panel) built entirely in antd and gated behind NEXT_PUBLIC_USE_MOCK_AGENTS=true until the Epic A backend (LIT-2877) lands. Eight Playwright specs cover the main validation criteria.

  • useSessionEventStream has a reconnect-after-unmount race where the onerror backoff timer can fire on a dead component, and the SSE stream carries no bearer token — only the token cookie via withCredentials.
  • Conversation merges a REST snapshot with live SSE events without deduplication; on first render the stream starts from seq=0, replaying events already included in the snapshot and showing them twice.
  • Module-level mutable arrays in mock-agents.ts (MOCK_AGENTS, MOCK_SESSIONS) are mutated by the create helpers, causing state to bleed across browser tabs and potentially across SSR requests.

Confidence Score: 3/5

Safe to land behind the mock flag, but three behavioural defects in the live-stream path need fixing before NEXT_PUBLIC_USE_MOCK_AGENTS is turned off in any real environment.

The SSE hook has a reconnect timer that can fire on an unmounted component and no bearer-token forwarding for API-key-only users. The conversation pane duplicates messages on initial load because the snapshot and the SSE stream both emit the same events from seq=0 with no deduplication. The mock-data mutation is a lower-stakes quality issue. All defects are in new code with no production users yet, but they are present bugs rather than speculative risks.

useSessionEventStream.ts and conversation.tsx need the most attention before the mock flag is lifted.

Important Files Changed

Filename Overview
ui/litellm-dashboard/src/app/(dashboard)/agents/hooks/useSessionEventStream.ts SSE hook with two defects: reconnect setTimeout can fire after unmount (state update on dead component), and the stream carries no bearer token — only cookies.
ui/litellm-dashboard/src/app/(dashboard)/agents/components/three-pane/conversation.tsx Merges snapshot messages and live SSE events without deduplication; on initial load the SSE stream starts from seq=0 and replays events already in the snapshot, causing visible duplicates.
ui/litellm-dashboard/src/lib/cloud-agents-client.ts Clean REST client with mock short-circuit, proper error propagation, and centralised URL building; SSE URL is built here but auth is not threaded through to the hook.
ui/litellm-dashboard/src/lib/mock-agents.ts Module-level mutable arrays mutated by mockCreateAgent/mockCreateSession; state bleeds across requests in SSR and across test runs in the same process.
ui/litellm-dashboard/src/app/(dashboard)/agents/components/three-pane/terminal-tab.tsx ANSI SGR parser is well-structured; silently drops all text after a malformed escape sequence rather than emitting the raw bytes.
ui/litellm-dashboard/src/types/cloud-agents.ts Well-typed definitions mirroring the LIT-2877 API spec; clean discriminated union for run event types.
ui/litellm-dashboard/src/app/(dashboard)/agents/components/three-pane/index.tsx Orchestrates data fetching for sessions, conversation snapshot, and active run; correctly uses cancellation flags in async effects.
ui/litellm-dashboard/e2e_tests/tests/agents/tenant-isolation.spec.ts Explicitly deferred test — verifies only the auth gate, not actual data partitioning; follow-up ticket referenced.
ui/litellm-dashboard/e2e_tests/tests/agents/_helpers.ts Mints a structurally valid fake JWT for cookie injection; clean helper with no real network calls.

Reviews (1): Last reviewed commit: "fix(ui/agents): drop @ant-design/icons f..." | Re-trigger Greptile

Comment on lines +90 to +122
// Real EventSource streamer with reconnect+resume.
const startRealStream = useCallback(() => {
if (!sessionId || !runId) return;

const url = buildRunEventStreamUrl(sessionId, runId, lastSeqRef.current || undefined);
const es = new EventSource(url, { withCredentials: true });
esRef.current = es;

es.onopen = () => {
setConnected(true);
setError(null);
};
es.onmessage = (msg: MessageEvent<string>) => {
try {
const evt = JSON.parse(msg.data) as CloudAgentRunEvent;
if (typeof evt.seq === "number") {
appendEvent(evt);
}
} catch (e) {
// ignore malformed events; the server is responsible for shape
}
};
es.onerror = () => {
setConnected(false);
setError("stream interrupted; reconnecting");
es.close();
esRef.current = null;
// Reconnect after backoff, replaying from lastSeq.
setTimeout(() => {
if (esRef.current === null) startRealStream();
}, RECONNECT_BACKOFF_MS);
};
}, [sessionId, runId, appendEvent]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Reconnect loop fires after component unmount, causing React state updates on unmounted components. The setTimeout in onerror captures startRealStream and checks esRef.current === null, but stopRealStream (the cleanup) also sets esRef.current = null. If the component unmounts while a reconnect is pending, the timer fires, sees null, and calls startRealStream(), triggering setConnected, setError, and new EventSource on a dead component. The fix is to also gate on a mounted ref.

Suggested change
// Real EventSource streamer with reconnect+resume.
const startRealStream = useCallback(() => {
if (!sessionId || !runId) return;
const url = buildRunEventStreamUrl(sessionId, runId, lastSeqRef.current || undefined);
const es = new EventSource(url, { withCredentials: true });
esRef.current = es;
es.onopen = () => {
setConnected(true);
setError(null);
};
es.onmessage = (msg: MessageEvent<string>) => {
try {
const evt = JSON.parse(msg.data) as CloudAgentRunEvent;
if (typeof evt.seq === "number") {
appendEvent(evt);
}
} catch (e) {
// ignore malformed events; the server is responsible for shape
}
};
es.onerror = () => {
setConnected(false);
setError("stream interrupted; reconnecting");
es.close();
esRef.current = null;
// Reconnect after backoff, replaying from lastSeq.
setTimeout(() => {
if (esRef.current === null) startRealStream();
}, RECONNECT_BACKOFF_MS);
};
}, [sessionId, runId, appendEvent]);
const mountedRef = useRef(true);
useEffect(() => {
mountedRef.current = true;
return () => { mountedRef.current = false; };
}, []);
// Real EventSource streamer with reconnect+resume.
const startRealStream = useCallback(() => {
if (!sessionId || !runId) return;
const url = buildRunEventStreamUrl(sessionId, runId, lastSeqRef.current || undefined);
const es = new EventSource(url, { withCredentials: true });
esRef.current = es;
es.onopen = () => {
setConnected(true);
setError(null);
};
es.onmessage = (msg: MessageEvent<string>) => {
try {
const evt = JSON.parse(msg.data) as CloudAgentRunEvent;
if (typeof evt.seq === "number") {
appendEvent(evt);
}
} catch (e) {
// ignore malformed events; the server is responsible for shape
}
};
es.onerror = () => {
setConnected(false);
setError("stream interrupted; reconnecting");
es.close();
esRef.current = null;
// Reconnect after backoff, replaying from lastSeq.
setTimeout(() => {
if (mountedRef.current && esRef.current === null) startRealStream();
}, RECONNECT_BACKOFF_MS);
};
}, [sessionId, runId, appendEvent]);

Comment on lines +40 to +75
toolCall?: ToolCallPayload;
}

function buildDisplayItems(
initialMessages: CloudAgentConversationMessage[],
events: CloudAgentRunEvent[],
): { items: DisplayItem[]; diffs: FileDiffPayload[] } {
const items: DisplayItem[] = initialMessages.map((m) => ({
key: `msg-${m.id}`,
kind: "message",
message: m,
}));
const diffs: FileDiffPayload[] = [];
for (const evt of events) {
if (evt.type === "user_message" || evt.type === "assistant_message") {
const role = evt.type === "user_message" ? "user" : "assistant";
const content = String((evt.payload as { content?: unknown }).content ?? "");
items.push({
key: `evt-${evt.seq}`,
kind: "message",
message: {
id: `evt-${evt.seq}`,
role: role as CloudAgentConversationMessage["role"],
content,
created_at: evt.created_at,
},
});
} else if (evt.type === "tool_call") {
items.push({
key: `evt-${evt.seq}`,
kind: "tool_call",
toolCall: evt.payload as unknown as ToolCallPayload,
});
} else if (evt.type === "file_diff") {
diffs.push(evt.payload as unknown as FileDiffPayload);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Snapshot/SSE event duplication on initial load. buildDisplayItems appends initialMessages (from the REST snapshot) followed by all SSE events without any deduplication between the two sources. When a session has an active run, the SSE stream starts from seq=0 (no since_seq on first render because lastSeqRef.current is 0) and replays every event the server has for that run. Those same events are already included in the conversation snapshot, so user_message and assistant_message entries appear twice in the conversation pane. The fix is either to pass a sinceSeq derived from the snapshot length/cursor when opening the stream, or to deduplicate in buildDisplayItems by storing seen message IDs from the snapshot.

Comment on lines +93 to +98

const url = buildRunEventStreamUrl(sessionId, runId, lastSeqRef.current || undefined);
const es = new EventSource(url, { withCredentials: true });
esRef.current = es;

es.onopen = () => {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 SSE stream sends no auth credentials beyond cookies. EventSource does not support custom request headers, so the bearer token from useAuthorized is never forwarded. The stream relies entirely on the token cookie via withCredentials: true. Users who authenticate with an API key (no session cookie) will receive silent auth failures — the onerror handler retries indefinitely but never succeeds. accessToken is not threaded into useSessionEventStream at all, so there is currently no path to pass it as a query-param fallback without an API contract change. Does the LiteLLM proxy accept the token cookie as a valid credential for the SSE events endpoint, or does it require the bearer header? If cookie-based auth is supported for this endpoint the current approach is fine; if not, users without a session cookie will lose the stream silently.

Comment on lines +69 to +73
const end = text.indexOf("m", i + 2);
if (end === -1) {
// malformed — drop the rest
break;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 When an ANSI escape sequence is malformed (e.g., \x1b[31 with no trailing m), the parser breaks out of the main loop entirely. Any plain text that follows the malformed sequence is silently discarded. In a real build log, a truncated write to the stream could result in the second half of a large output block disappearing from the terminal view with no indication to the user.

Suggested change
const end = text.indexOf("m", i + 2);
if (end === -1) {
// malformed — drop the rest
break;
}
const end = text.indexOf("m", i + 2);
if (end === -1) {
// malformed escape: emit the raw bytes rather than silently dropping
buf += text.slice(i);
break;
}

Comment on lines +224 to +260
if (found) return found;
}
return null;
}

export async function mockGetRun(runId: string): Promise<CloudAgentRun | null> {
return MOCK_RUNS[runId] ?? null;
}

export async function mockGetConversation(sessionId: string): Promise<CloudAgentConversationMessage[]> {
return MOCK_CONVERSATION[sessionId] ?? [];
}

export async function mockCreateAgent(input: {
name: string;
model: string;
system_prompt?: string;
}): Promise<CloudAgent> {
const agent: CloudAgent = {
agent_id: `agt_${Math.random().toString(36).slice(2, 8)}`,
name: input.name,
model: input.model,
system_prompt: input.system_prompt ?? "",
session_count: 0,
last_activity_at: NOW(),
created_at: NOW(),
};
MOCK_AGENTS.push(agent);
return agent;
}

export async function mockCreateSession(input: { agent_id: string; repo_url: string }): Promise<CloudAgentSession> {
const session: CloudAgentSession = {
session_id: `ses_${Math.random().toString(36).slice(2, 8)}`,
agent_id: input.agent_id,
repo_url: input.repo_url,
branch: "main",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Mutable module-level state leaks across requests in SSR. MOCK_AGENTS, MOCK_SESSIONS, and MOCK_CONVERSATION are module-level arrays/objects. mockCreateAgent and mockCreateSession mutate them directly via push. In Next.js, server-side module caches are shared across requests in the same process, so agents created by one user's page load will appear for every subsequent user that hits the same server worker. Even if the mock is only used in dev, running next start (or any concurrent dev browser tabs) would produce cross-tab state bleed that is hard to diagnose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants