feat(web-shell): per-turn time & tokens on the collapse seam, below the prompt by wenshao · Pull Request #5163 · QwenLM/qwen-code

wenshao · 2026-06-15T11:17:25Z

What this PR does

In the web-shell transcript each completed turn can be folded to just its prompt and final answer. This moves the fold control off the prompt row's right edge onto its own line in the seam between the prompt and the steps, and turns that line into a per-turn metrics readout: step count, elapsed time, and ↑input ↓output token usage (with cached reads broken out). Only the leading ▸/▾ chevron toggles the turn — the trailing 3 steps · 12.4s · ↑3.1k (2.8k cached) ↓5.1k summary is inert text, identical collapsed vs expanded so toggling only flips the chevron and never reflows the row.

It applies to every turn, not just folded ones:

The active (streaming) turn shows the seam live — step / time / token metrics update as the agent works — while staying expanded so the streaming rows remain visible. Collapsing a live turn folds down to just the prompt + seam (it has no final answer yet, so no intermediate line is stranded).
A step-less turn (a plain "hi" reply that runs no tools or thinking) shows a chevron-less metrics line, since there is nothing to fold but the cost is still worth seeing.

Surfacing tokens needed an SDK daemon-ui change: the daemon already reports each round's usage (including sub-agent rounds) on an otherwise-empty agent_message_chunk (_meta.usage), but the normalizer dropped empty-text chunks. It now emits a dedicated assistant.usage event and the reducer folds the counts — sub-agent rounds included — onto the round's top-level assistant block, so the web-shell can read block.usage and sum a turn's true total.

Why it's needed

The previous toggle sat at the prompt row's right edge, needed a hard-coded gutter to dodge the hover timestamp, and showed only a hidden-step count. Putting the control in the seam is more intuitive and removes the gutter hack, and showing per-turn time + tokens (incl. cached and sub-agent cost) lets you see what each response cost at a glance while scanning a transcript.

Reviewer Test Plan

How to verify

Unit tests cover each layer:

cd packages/sdk-typescript && npx vitest run test/unit/daemonUi.test.ts — normalizer emits assistant.usage (with cached) from an empty-text usage chunk; the reducer folds/accumulates onto the active block, including sub-agent rounds; no stray block when none is active.
cd packages/web-shell && npx vitest run — transcriptToMessages carries/sums block.usage (incl. cached); applyTurnCollapse sums per-turn tokens + cached and derives elapsed from timestamps, tags active and step-less turns; UserMessage renders a chevron-only / chevron-less seam whose summary is identical collapsed vs expanded.

In a running web-shell: complete a turn with steps, let it collapse, confirm ▸ N steps · <time> · ↑<in> (<cached> cached) ↓<out>; click only the chevron to expand; confirm no horizontal shift on toggle. Send a plain "hi" and confirm its reply shows a chevron-less <time> · ↑in ↓out line. Run a turn that spawns sub-agents and confirm its token total reflects the sub-agent cost (much closer to /stats, though /stats is whole-session and still larger). Reopen a past session and confirm tokens still render.

Evidence (Before & After)

Before — toggle right-aligned on the prompt row, step count only:

❯ refactor the auth module                     ⌄ 3 steps
done — extracted validateToken()…

After — seam below the prompt; chevron-only control; time + tokens (+ cached):

❯ refactor the auth module
  ▸ 3 steps · 12.4s · ↑3.1k (2.8k cached) ↓5.1k
  done — extracted validateToken()…

Verified via the unit tests, tsc, and eslint; real-app screenshots not captured.

Tested on

OS	Status
🍏 macOS	✅ unit tests + tsc + eslint
🪟 Windows	⚠️ not tested
🐧 Linux	⚠️ not tested

Environment (optional)

Unit tests only (vitest); no live daemon run.

Risk & Scope

Main risk or tradeoff: elapsed is derived from block start timestamps (prompt → last step), so it slightly under-counts the final step's own runtime; the accurate per-turn durationMs is live-only and not persisted, so timestamps are used for consistency across live and replay.
Sub-agent token attribution: a sub-agent round's usage is folded onto the parent turn's active top-level block (the parent is blocked on the Task call while it runs). A round that arrives with no active top-level block (rare) is dropped; background agents that span turns may attribute to the turn active when their usage lands.
Scope: per-turn totals are intentionally narrower than /stats (which is the whole session across all turns and models), so they will not match exactly.
Breaking changes / migration notes: none. The new SDK assistant.usage event and DaemonTextTranscriptBlock.usage field (with optional cachedTokens) are additive; older sessions that carry no usage simply show the step count (and time when available).

Linked Issues

None.

中文说明

这个 PR 做了什么

web-shell 的对话记录里,每个已完成回合可折叠成"只剩 prompt + 最终答案"。本 PR 把折叠控件从 prompt 行右端,挪到 prompt 与步骤之间的接缝处单独成行,并把这行做成每回合指标:步数、耗时、↑输入 ↓输出(并单列缓存命中)。只有前导 ▸/▾ 箭头能折叠;后面的 3 steps · 12.4s · ↑3.1k (2.8k cached) ↓5.1k 是惰性文字,折叠/展开两态一致,切换只翻箭头、行宽不变。

适用于所有回合,不止被折叠的:

运行中(流式) 回合实时显示接缝——步数/耗时/token 随 agent 工作更新——同时保持展开以便看到流式输出。折叠一个运行中回合会收到只剩 prompt + 接缝(此时还没有最终答案,不会残留中间行)。
无步骤 回合(纯"你好"这种、不跑工具/思考)显示一个无箭头的指标行——没什么可折叠,但成本仍值得看。

显示 token 需要改 SDK 的 daemon-ui:daemon 本就在一个空文本 agent_message_chunk(_meta.usage)上逐轮上报用量(含子代理轮次),但 normalizer 之前丢弃空文本 chunk。现在改为 emit 专用 assistant.usage 事件,reducer 把计数(含子代理轮次)累加到该轮的顶层助手块上,web-shell 即可读 block.usage 并汇总出回合真实总量。

为什么需要

之前开关在 prompt 行右端,要硬编码留白避开悬停时间戳,且只显示步数。把控件放进接缝更直观、去掉留白 hack;显示每回合耗时 + token(含缓存与子代理成本)让你浏览记录时一眼看清每次回复的成本。

Reviewer Test Plan(评审验证)

如何验证

单测覆盖每一层:

cd packages/sdk-typescript && npx vitest run test/unit/daemonUi.test.ts —— normalizer 从空文本 usage chunk emit assistant.usage(含 cached);reducer 累加到活跃块、含子代理轮次;无活跃块时不造空块。
cd packages/web-shell && npx vitest run —— transcriptToMessages 透传/求和 block.usage(含 cached);applyTurnCollapse 按回合汇总 token + cached、用时间戳算 elapsed、给运行中与无步骤回合打标;UserMessage 渲染"仅箭头/无箭头"接缝,摘要两态一致。

在运行中的 web-shell:完成一个有步骤的回合并折叠,确认 ▸ N steps · <时间> · ↑<入> (<cached> cached) ↓<出>;只点箭头能展开;切换不横向跳动。发一句"你好",确认其回复显示无箭头的 <时间> · ↑入 ↓出。跑一个派生子代理的回合,确认其 token 总量体现了子代理成本(更接近 /stats,但 /stats 是整会话累计、仍更大)。重开历史会话,确认 token 仍显示。

证据(前 / 后)

之前 —— 开关右对齐在 prompt 行,只有步数:

❯ refactor the auth module                     ⌄ 3 steps
done — extracted validateToken()…

之后 —— 接缝在 prompt 下方;仅箭头可点;带耗时 + token(+ 缓存):

❯ refactor the auth module
  ▸ 3 steps · 12.4s · ↑3.1k (2.8k cached) ↓5.1k
  done — extracted validateToken()…

通过单测、tsc、eslint 验证;未截真实应用图。

测试平台

系统	状态
🍏 macOS	✅ 单测 + tsc + eslint
🪟 Windows	⚠️ 未测试
🐧 Linux	⚠️ 未测试

运行环境(可选)

仅单测(vitest);未实跑 daemon。

风险与范围

主要取舍:耗时由块起始时间戳推导(prompt → 最后一步),略少算最后一步自身运行时间;准确的 per-turn durationMs 仅 live 有、未持久化,故统一用时间戳以保证 live 与回放一致。
子代理归属:子代理轮次的 usage 累加到父回合的顶层活跃块(子代理运行时父回合正阻塞在 Task 调用上)。极少数无活跃顶层块时丢弃;跨回合的后台代理可能归到其用量落地时活跃的回合。
范围:每回合总量有意比 /stats(整会话、所有回合与 model 累计)窄,不会完全相等。
破坏性变更:无。新增 SDK assistant.usage 事件与 DaemonTextTranscriptBlock.usage(含可选 cachedTokens)均为增量;不带用量的旧会话只显示步数(有时间戳时也显示耗时)。

关联 Issue

无。

…he prompt Move each completed turn's fold control off the prompt row's right edge onto its own line in the seam between the prompt and the steps, and show the turn's elapsed time and ↑input ↓output token usage beside the step count. - UserMessage: only the leading ▸/▾ chevron toggles the turn; the trailing "N steps · 12.4s · ↑3.1k ↓5.1k" summary is inert text and is identical whether collapsed or expanded, so toggling only flips the chevron and never reflows the row. The chevron matches the tool-row disclosure glyphs. - MessageList.applyTurnCollapse: sum per-turn token usage across the turn's assistant messages and derive elapsed from block timestamps (works live and on replay). turnCollapseEqual now compares the new fields so the memoized row re-renders when they change. - sdk daemon-ui: the daemon already emits a turn's per-round usage on an otherwise-empty agent_message_chunk (_meta.usage), but the normalizer dropped empty-text chunks. Emit a new assistant.usage event and fold the counts onto the round's assistant block so the web-shell can read block.usage. Tokens survive replay (persisted usageMetadata is re-emitted); duration is timestamp-derived since per-turn durationMs is live-only.

Build the collapse head for the in-progress turn too — defaulting to expanded so the streaming rows stay visible — so its live step / time / token metrics show while the agent is working, not only after the turn completes. The turn can still be collapsed manually; approval-pending and step-less turns are unchanged.

… step An in-progress turn has no final answer yet, so collapsing it kept the latest intermediate assistant line (e.g. a "…launching N review agents" status) instead of folding away. Treat an active turn as having no answer so the fold drops every streaming step, leaving just the prompt row and its metrics seam; the real final answer is kept once the turn completes.

wenshao · 2026-06-15T11:52:46Z

@qwen-code /triage

qwen-code-ci-bot · 2026-06-15T12:00:09Z

Thanks for the PR, @wenshao!

Template looks good ✓ — all required headings present, bilingual body, test plan with evidence, risk section.

On direction: this is a solid web-shell UX improvement. Showing per-turn cost (time + tokens incl. cached and sub-agent) directly on the collapse seam is exactly the kind of visibility users need — right now you have to run /stats for a whole-session number, which doesn't help when scanning history. The CHANGELOG shows prior web-shell investment (Option+Enter shortcuts, etc.) and token usage display in other surfaces (#2445), so this extends an established direction. Aligned.

On approach: the three-layer architecture (SDK normalizer → transcript reducer → web-shell UI) is the right decomposition. Each layer is independently testable. The scope (~1011 additions across 16 files) is justified — every file change is directly needed for the data to flow from daemon _meta.usage to the rendered seam. I don't see any scope creep or drive-by refactors. The CSS cleanup (removing the margin-right: 64px gutter hack) is a natural consequence of moving the control.

One observation: the shouldFollow.current = false on toggle is a nice detail — collapsing a turn while the tail is auto-scrolling would otherwise yank the viewport. Not something I'd have thought of upfront, but clearly the right call.

Moving on to code review. 🔍

中文说明

感谢贡献，@wenshao！

模板完整 ✓ — 所有必需标题齐全，双语正文，含验证计划和证据、风险说明。

方向：这是一个扎实的 web-shell UX 改进。在折叠接缝处直接显示每回合成本（耗时 + token，含缓存和子代理），正是用户浏览历史时需要的可见性——目前只能通过 /stats 看整会话数字。CHANGELOG 显示此前已有 web-shell 投入（Option+Enter 快捷键等）和其他界面的 token 用量显示（#2445），所以这是既有方向的延续。对齐。

方案：三层架构（SDK normalizer → transcript reducer → web-shell UI）是正确的分解。每层可独立测试。范围（16 个文件约 1011 行新增）是合理的——每个文件改动都是让数据从 daemon _meta.usage 流到渲染接缝所必需的。没看到范围蔓延或顺手重构。CSS 清理（移除 margin-right: 64px 留白 hack）是移动控件的自然结果。

一个细节：toggle 时设置 shouldFollow.current = false 是个好主意——在尾部自动滚动时折叠回合否则会猛地拉动视口。

进入代码审查 🔍

— Qwen Code · qwen3.7-max

qwen-code-ci-bot · 2026-06-15T12:00:58Z

Code Review

Independent proposal (before reading the diff): I'd have taken the same three-layer approach — extract _meta.usage from agent_message_chunk as a new event type, accumulate onto assistant blocks in the reducer, carry through the adapter, and render on the collapse seam. The PR matches this exactly.

Comparison with the diff: The implementation is well-layered and correct. A few things that stood out:

Normalizer (normalizer.ts): Clean extraction via extractAssistantUsage(). Correctly returns undefined when no usage is present (no spurious events for older agents). The cachedReadTokens → cachedTokens rename is good — it clarifies these are the cached subset, not an additive field.
Reducer (transcript.ts): applyAssistantUsage correctly folds onto the active top-level block and accumulates. The decision to include sub-agent usage (via parentToolCallId) is well-reasoned and documented — the parent is blocked on the Task call, so its turn should bear the cost. Dropping usage when no active block exists (rather than minting one) is the safe choice.
Adapter (transcriptToMessages.ts): mergeAssistantUsage correctly sums when consecutive assistant blocks merge into one message. Returns undefined when neither side has usage, avoiding spurious {0, 0} objects.
UI (UserMessage.tsx): The useNowTicker hook is a clean approach for live elapsed — 1s interval, properly cleaned up. The monotonic elapsedSeenRef prevents backward-stepping when a live turn settles onto its frozen value. The chevron + step count as one toggle target (not a lone glyph) is good for accessibility.
Collapse logic (MessageList.tsx): The change from "active turn → skip entirely" to "active turn → tag with metrics, keep expanded" is the right evolution. Step-less turns with metrics now get a chevron-less seam — nice touch.

No correctness bugs, security concerns, or AGENTS.md violations found.

Test Results

Unit Tests

SDK (packages/sdk-typescript):

 ✓ test/unit/daemonUi.test.ts  (236 tests) 106ms
 Test Files  1 passed (1)
      Tests  236 passed (236)

6 new tests covering: empty-text usage chunk extraction, no-usage guard, per-round accumulation onto active block, sub-agent usage folding, cached-read token carry/accumulation, and stray-block prevention.

Web-shell (packages/web-shell):

 ✓ adapters/transcriptToMessages.test.ts (92 tests) 42ms
 ✓ components/MessageList.test.ts (48 tests) 15ms
 ✓ components/messages/UserMessage.test.tsx (13 tests) 94ms
 Test Files  22 passed (22)
      Tests  362 passed (362)

New tests cover: usage carry-through on merge, step-less turn metrics, cached token display, live ticker, chevron-less seam, elapsed calculation, token summation across turns, and the toggle/metrics separation.

Typecheck

npx tsc --noEmit — all errors are pre-existing in scripts/ and integration-tests/ (vitest version mismatch, bun types, index signature access). Zero new errors in the PR's changed files.

Real-Scenario Testing

This PR modifies web-shell React components (collapse seam UI). Tmux cannot drive a browser-based React app — the visual changes (chevron repositioning, metrics rendering, toggle reflow behavior) require a running web-shell instance with a browser. The unit tests comprehensively cover the rendering logic: chevron + step count separation from inert metrics, collapsed/expanded stability (no reflow), cached token parenthetical formatting, live ticker behavior, and step-less turn rendering.

No tmux test is feasible for this type of change.

中文说明

代码审查

独立方案（读 diff 之前）： 我会采取同样的三层方案——从 agent_message_chunk 提取 _meta.usage 作为新事件类型，在 reducer 中累加到 assistant 块，通过 adapter 传递，在折叠接缝处渲染。PR 完全匹配这个方案。

与 diff 对比： 实现分层清晰且正确。几个亮点：

Normalizer：通过 extractAssistantUsage() 干净地提取。无 usage 时正确返回 undefined（旧 agent 不会产生多余事件）。
Reducer：applyAssistantUsage 正确折叠到活跃顶层块并累加。包含子代理 usage 的决定有充分理由并记录在案。
Adapter：mergeAssistantUsage 在连续 assistant 块合并时正确求和。
UI：useNowTicker hook 是实时计时的简洁方案。单调递增的 elapsedSeenRef 防止实时回合过渡到冻结值时倒退。
折叠逻辑：从"活跃回合→完全跳过"改为"活跃回合→标记指标、保持展开"是正确的演进。

未发现正确性 bug、安全问题或 AGENTS.md 违规。

测试结果

单测全部通过：SDK 236 项（+6 新增），Web-shell 362 项（+22 新增）。Typecheck 无新增错误。

此 PR 修改 web-shell React 组件（折叠接缝 UI），tmux 无法驱动浏览器 React 应用。单测全面覆盖了渲染逻辑。

— Qwen Code · qwen3.7-max

qwen-code-ci-bot · 2026-06-15T12:01:33Z

This is a well-executed feature PR. Let me think through it holistically.

The motivation is clear: users scanning a web-shell transcript want to see what each turn cost without running /stats. The implementation does exactly what the PR promises — per-turn time, input/output tokens, cached-read breakdown, all on the collapse seam. The data flow from daemon _meta.usage through three layers to the rendered UI is clean and each layer is independently tested.

Going back to my independent proposal: the PR matches it. I don't see a materially simpler path. The three-layer decomposition is necessary because the data originates in the daemon and needs to flow through normalization, reduction, and rendering. Each layer adds exactly what it needs.

A few reflections:

Scope discipline: 1011 additions across 16 files sounds large, but ~40% is tests (which are thorough and well-structured), CSS changes are minimal, and the source changes are tightly coupled to the feature. No drive-by refactors, no speculative abstractions.
Edge cases handled well: step-less turns (plain "hi" replies) get a chevron-less metrics seam; active turns tick elapsed live; collapsed live turns fold away the provisional answer; shouldFollow pauses on toggle to avoid viewport jumps. These are the kinds of details that separate a polished feature from a rough one.
Backward compatibility: older sessions without usage data gracefully degrade to step count + time only. The DaemonTextTranscriptBlock.usage field is optional. No breaking changes.
Sub-agent attribution: folding sub-agent usage into the parent turn is the right call — the parent is blocked while the sub-agent runs, and excluding it made turns dramatically under-count vs /stats. The documented caveat (background agents spanning turns may mis-attribute) is honest.

If I had to maintain this in six months, I'd thank the author. The code is well-organized, the tests are comprehensive, and the comments explain the why where it's non-obvious.

LGTM, ready to ship. ✅

中文说明

这是一个执行良好的功能 PR。

动机清晰：用户浏览 web-shell 对话记录时，想看到每回合的成本而不用跑 /stats。实现完全兑现了 PR 的承诺——每回合耗时、输入/输出 token、缓存读取细分，全部在折叠接缝上。数据从 daemon _meta.usage 经三层流到渲染 UI，干净且每层独立可测。

回顾我的独立方案：PR 与之匹配。没看到明显更简的路径。三层分解是必需的，因为数据源自 daemon 需要流经归一化、归约和渲染。每层只添加所需。

几点感想：

范围纪律：16 文件 1011 行新增听着多，但约 40% 是测试（全面且结构良好），CSS 改动极小，源码改动紧扣功能。无顺手重构、无投机抽象。
边界情况处理得当：无步骤回合显示无箭头指标接缝；活跃回合实时计时；折叠活跃回合收起临时答案；toggle 时暂停 shouldFollow 避免视口跳动。
向后兼容：无 usage 数据的旧会话优雅降级为仅步数 + 时间。无破坏性变更。
子代理归属：将子代理 usage 折入父回合是正确选择。

准备合并 ✅

— Qwen Code · qwen3.7-max

qwen-code-ci-bot

LGTM, looks ready to ship. ✅

DragonnZhang

Well-designed feature adding per-turn time & token metrics to the web-shell collapse seam. The SDK layer correctly introduces assistant.usage events with DaemonTurnUsage (inputTokens/outputTokens), folds them onto active assistant blocks via applyAssistantUsage, and excludes sub-agent usage via parentToolCallId filtering. The web-shell layer renders N steps · 12.4s · ↑3.1k ↓5.1k beside the fold chevron with proper formatting helpers (formatDuration, formatTokenCount). Comprehensive test coverage (accumulation, sub-agent exclusion, empty-text usage chunks, formatting edge cases).

⚠️ CI is failing across all checks (Lint, Tests on all platforms, CodeQL, review-pr). This needs investigation before merge.

Downgraded from Approve to Comment: CI failing across all checks.

— claude-opus-4-6 via Qwen Code /review

wenshao · 2026-06-15T12:13:15Z

✅ Local verification — build + real tests (Linux)

Verified the head commit c8a1d5f7 in an isolated git worktree with a clean npm ci. All functional checks pass across both layers (SDK daemon-ui + web-shell), including a real headless-Chromium render and a cross-package end-to-end. The two red marks below are pre-existing on main, not introduced by this PR.

Environment: Linux (Debian 13, kernel 6.12), Node v22.22.2, npm 10.9.7. Worktree at c8a1d5f7, merge-base with main = d53a484d.

Results

Check	Command	Result
SDK unit tests	`sdk-typescript` → `vitest run test/unit/daemonUi.test.ts`	✅ 235 passed (incl. 5 new usage cases + 1 updated)
web-shell unit tests	`web-shell` → `vitest run`	✅ 352 passed / 22 files
SDK typecheck	`tsc --noEmit`	✅ clean
web-shell typecheck	`tsc -p tsconfig.json`	⚠️ 2 errors — both pre-existing on `main` (see note)
web-shell lint	`eslint` on changed files	✅ clean
SDK lint (source)	`eslint` on changed `src/daemon/ui/*`	✅ clean
Real-browser render	Playwright + Chromium, real CSS	✅ 32/32 assertions, 0 page/console errors
Cross-package E2E	built `@qwen-code/sdk/daemon` → web-shell adapter	✅ 2/2 passed

The new-feature suites all land: transcriptToMessages.test.ts (91), UserMessage.test.tsx (9), MessageList.test.ts (43), and the SDK daemonUi.test.ts usage cases (normalizer emits assistant.usage from an empty-text chunk; reducer folds + accumulates; sub-agent excluded; no stray block).

Real-browser render (headless Chromium, real CSS — not jsdom)

Mounted the real UserMessage with the real I18nProvider. Chromium rendered (verbatim textContent):

❯ refactor the auth module
  ▸ 3 steps · 12.4s · ↑3.1k ↓5.1k      (collapsed)
  ▾ 3 steps · 12.4s · ↑3.1k ↓5.1k      (expanded — same summary, only the glyph flips)

❯ what does this function do?
  ▸ 3 steps                            (no metrics → step count only, no “·”)

❯ tiny task
  ▸ 2 steps · 820ms · ↑50 ↓12          (sub-second + small tokens)

❯ long running task
  ▸ 7 steps · 1m 5s · ↑128.0k ↓9.0k    (minutes + abbreviated tokens)

❯ single step turn
  ▸ 1 step · 1.5s                      (singular)

Asserted against a live browser (not the DOM mock):

Chevron-only control — the <button> textContent is exactly ▸/▾; the summary is an inert sibling <span>. Clicking the summary span does not toggle.
No horizontal reflow on toggle — clicking the real chevron flips ▸→▾; the summary text is byte-identical and its left edge moves < 0.5px.
Seam is its own line below the prompt (button.top > prompt.top), not pinned far-right.
aria-expanded tracks state; all formatDuration / formatTokenCount branches render correctly.

(Real-Chromium screenshot captured locally; the text above is the exact textContent Chromium produced.)

Cross-package end-to-end (the novel/risky path)

Drove the built @qwen-code/sdk/daemon normalizer + reducer with raw ACP session_update frames, then the real web-shell adapter:

The empty-text _meta.usage chunk (which the old normalizer dropped) now yields assistant.usage.
Multi-round usage accumulates on the block (100/20 + 50/30 → 150/50); a delegated frame carrying parentToolCallId is excluded.
transcriptBlocksToDaemonMessages surfaces the same { inputTokens: 150, outputTokens: 50 } on the rendered message.

Notes / caveats (none block merge)

web-shell tsc (2 errors) is pre-existing. Both are on lines this PR never touches (transcriptAdapter.test.ts:9, MessageList.test.ts:287, blamed to a non-PR commit). Running the identical tsc -p tsconfig.json on the pre-PR (merge-base) web-shell reproduces the same two errors → this PR adds zero new type errors. (web-shell has no typecheck script, so the root typecheck --workspaces --if-present skips it and CI stays green.)
SDK test-file lint could not run here: a nested eslint@8.57.1 (in sdk-typescript) vs hoisted eslint@9.29.0 makes @typescript-eslint/no-unused-expressions fail to load — a fresh-install version skew, reproducible via the canonical npm run lint, unrelated to PR code (SDK changed source files lint clean).
Not covered: a live end-to-end daemon run against a real model (same scope note as the PR). The cross-package E2E above exercises the exact frame→reducer→adapter contract with real ACP frame shapes instead.

Recommendation: functionally solid and safe to merge on these results; the only static-analysis red marks are pre-existing repo/tooling issues, not regressions from this PR.

中文版（点击展开）

✅ 本地验证 —— 构建 + 真实测试（Linux）

在独立 git worktree 中以全新 npm ci 验证了头提交 c8a1d5f7。两层（SDK daemon-ui + web-shell）的所有功能检查全部通过，包括真实 headless-Chromium 渲染和一个跨包端到端测试。 下面两处“红”均为 main 上已存在的问题，并非本 PR 引入。

环境： Linux（Debian 13，内核 6.12），Node v22.22.2，npm 10.9.7。Worktree 在 c8a1d5f7，与 main 的 merge-base = d53a484d。

结果

检查项	命令	结果
SDK 单测	`sdk-typescript` → `vitest run test/unit/daemonUi.test.ts`	✅ 235 通过（含 5 个新增 usage 用例 + 1 个改写）
web-shell 单测	`web-shell` → `vitest run`	✅ 352 通过 / 22 文件
SDK 类型检查	`tsc --noEmit`	✅ 干净
web-shell 类型检查	`tsc -p tsconfig.json`	⚠️ 2 个错误 —— 均在 `main` 上已存在（见说明）
web-shell lint	对改动文件跑 `eslint`	✅ 干净
SDK lint（源码）	对改动的 `src/daemon/ui/*` 跑 `eslint`	✅ 干净
真实浏览器渲染	Playwright + Chromium，真实 CSS	✅ 32/32 断言，0 页面/控制台报错
跨包端到端	已构建的 `@qwen-code/sdk/daemon` → web-shell adapter	✅ 2/2 通过

新功能相关的测试套件全部命中：transcriptToMessages.test.ts（91）、UserMessage.test.tsx（9）、MessageList.test.ts（43），以及 SDK daemonUi.test.ts 的 usage 用例（normalizer 从空文本 chunk emit assistant.usage；reducer 折叠 + 累加；排除 subagent；无活跃块时不造空块）。

真实浏览器渲染（headless Chromium，真实 CSS，非 jsdom）

挂载了真实的 UserMessage + 真实 I18nProvider。Chromium 实际渲染出的 textContent：

❯ refactor the auth module
  ▸ 3 steps · 12.4s · ↑3.1k ↓5.1k      (折叠)
  ▾ 3 steps · 12.4s · ↑3.1k ↓5.1k      (展开 —— 摘要完全相同，只翻转箭头)

❯ what does this function do?
  ▸ 3 steps                            (无指标 → 只显示步数，无“·”)

❯ tiny task
  ▸ 2 steps · 820ms · ↑50 ↓12          (亚秒 + 小 token)

❯ long running task
  ▸ 7 steps · 1m 5s · ↑128.0k ↓9.0k    (分钟 + token 缩写)

❯ single step turn
  ▸ 1 step · 1.5s                      (单数)

在真实浏览器（非 DOM mock）中断言：

仅箭头可点 —— <button> 的 textContent 恰好是 ▸/▾；摘要是惰性的兄弟 <span>。点击摘要 span 不会切换。
切换无横向 reflow —— 点击真实箭头 ▸→▾；摘要文字逐字节一致，其左边缘移动 < 0.5px。
接缝是 prompt 下方单独一行（button.top > prompt.top），未被钉在最右。
aria-expanded 跟随状态；formatDuration / formatTokenCount 各分支均正确渲染。

（真实 Chromium 截图已在本地留存；上面就是 Chromium 实际产出的 textContent。）

跨包端到端（新增 / 风险最高的路径）

用原始 ACP session_update 帧驱动 已构建 的 @qwen-code/sdk/daemon normalizer + reducer，再接真实 web-shell adapter：

空文本 _meta.usage chunk（旧 normalizer 会丢弃）现在能 emit assistant.usage。
多轮 usage 在块上累加（100/20 + 50/30 → 150/50）；带 parentToolCallId 的委派帧被排除。
transcriptBlocksToDaemonMessages 在渲染消息上输出同样的 { inputTokens: 150, outputTokens: 50 }。

说明 / 注意事项（均不阻塞合并）

web-shell tsc 的 2 个错误是既有问题。 两处都在本 PR 未触碰的行（transcriptAdapter.test.ts:9、MessageList.test.ts:287，blame 指向非本 PR 的提交）。对 PR 前（merge-base）的 web-shell 跑 完全相同 的 tsc -p tsconfig.json 会复现 同样两个错误 → 本 PR 没有新增 任何类型错误。（web-shell 没有 typecheck 脚本，根的 typecheck --workspaces --if-present 会跳过它，CI 仍为绿。）
SDK 测试文件的 lint 在此环境跑不起来：sdk-typescript 内嵌的 eslint@8.57.1 与提升到根的 eslint@9.29.0 冲突，导致 @typescript-eslint/no-unused-expressions 规则加载失败 —— 这是全新安装的版本错配，用规范命令 npm run lint 同样可复现，与 PR 代码无关（SDK 改动的源码文件 lint 干净）。
未覆盖： 对真实模型的 live 端到端 daemon 跑（与 PR 自述范围一致）。上面的跨包 E2E 已用真实 ACP 帧形态走通了帧→reducer→adapter 这条契约。

结论： 基于以上结果，功能扎实、可安全合并；唯二的静态检查“红”均为既有的仓库/工具链问题，并非本 PR 的回归。

qwen-code-ci-bot · 2026-06-15T12:18:10Z

Qwen Code review did not complete successfully: Qwen review aborted with an API error before posting comments. See workflow logs.

DragonnZhang

Incremental review at c8a1d5f7: one new commit fixes a folding edge case where collapsing the active (streaming) turn would strand the intermediate assistant text instead of folding it down to the prompt + seam. The fix correctly guards final-answer detection with !isActiveTurn — active turns have no final answer yet, so their intermediate text is treated as another step. Test coverage added for both the corrected hidden count and the new "collapsing active turn" behavior.

Downgraded from Approve to Comment: CI failing (review-pr workflow check, not actual tests).

— claude-opus-4-6 via Qwen Code /review

…ep-less turns Three fixes to the per-turn metrics seam, all about making the token figures trustworthy: - Sub-agent rounds were dropped from a turn's token total — the SDK reducer skipped usage carrying a parentToolCallId — so a turn that spawns agents under-counted badly against /stats. Fold those rounds onto the turn's top-level assistant block: the parent is blocked on the Task call while they run, so the active block is that turn's, and their tokens are part of its real cost. (Per-turn is still narrower than /stats, which is the whole session.) - Carry cached-read tokens through the pipeline (SDK usage → block → message → turn) and show them parenthetically on input — "↑3.1k (2.8k cached) ↓5.1k", only when > 0. Cached reads are a subset already counted in input, not an additive figure, so the parenthetical reads as "of which N cached". - Surface the seam on a step-less turn too (e.g. a plain "hi" reply that runs no tools or thinking): a chevron-less "<time> · ↑in ↓out" line. There is nothing to fold, but the cost is still worth seeing; turns with neither steps nor any measured metric stay untagged.

wenshao · 2026-06-15T12:53:39Z

@qwen-code /triage

ytahdn · 2026-06-15T12:56:09Z

Qwen Code Review — PR #5163

Overall: Good PR overall

Findings:

#	Source	File	Issue	Impact	Suggested Fix	Severity
1	6 agents (correctness, security, code quality, performance, test coverage, attacker, 3AM oncall)	`packages/web-shell/client/components/messages/UserMessage.tsx` L29-31	`formatDuration` produces `"1m 60s"` when the seconds component rounds up to 60. For example, `119500ms` (1m 59.5s) renders as `"1m 60s"` instead of `"2m 0s"`. `Math.round(totalSeconds - minutes * 60)` yields 60 without carrying into minutes. The sub-60 path can also produce `"60.0s"` when `totalSeconds` is in `[59.95, 60)`.	Visible to users on the turn fold seam — any turn lasting ~X:59.5 through X:59.9 displays a nonsensical duration. Trivially reproducible and embarrassing in a bug report.	Round to integer seconds first, then decompose: `const totalSec = Math.round(ms / 1000); const minutes = Math.floor(totalSec / 60); const seconds = totalSec % 60; return \`${minutes}m ${seconds}s`;`— eliminates the carry bug entirely. If sub-second precision in the sub-60s range matters, keep`toFixed(1)`but clamp:`if (totalSeconds >= 59.5) fall through to minutes path`.	Suggestion
2	test coverage agent	`packages/web-shell/client/components/messages/UserMessage.test.tsx`	`formatDuration` has three branches (`< 1000` ms, `1000..59999` seconds, `>= 60000` minutes) but only the seconds branch is exercised (`elapsedMs: 12_400` → "12.4s"). The sub-second ("820ms") and minutes ("1m 5s") branches are never tested — and the minutes branch contains the rounding bug from #1.	Untested branches harbor the exact bug that slipped through. Future regressions in formatting will go undetected.	Add parameterized tests covering each branch: `0` → `"0ms"`, `500` → `"500ms"`, `60_000` → `"1m 0s"`, `65_000` → `"1m 5s"`, `119_500` → `"2m 0s"` (the rounding edge case).	Suggestion
3	attacker mindset agent	`packages/sdk-typescript/src/daemon/ui/normalizer.ts` L575-586	`extractAssistantUsage` does not reject negative token counts. `numberField` validates `typeof v === 'number' && Number.isFinite(v)` but allows negatives. A daemon event with `_meta.usage.inputTokens: -10000` passes validation and accumulates into the display as e.g. "↑-10.0k".	Token usage is untrusted input from the daemon/model-API boundary. Negative token counts in the UI are a data-integrity flaw — a security researcher could demonstrate absurd negative consumption via a crafted `agent_message_chunk`, undermining trust in the metrics feature.	Clamp to zero: `const inputTokens = rawInput !== undefined ? Math.max(0, rawInput) : 0;` (same for `outputTokens`).	Suggestion
4	test coverage agent	`packages/sdk-typescript/test/unit/daemonUi.test.ts`	Two guard branches in the new code lack test coverage: (a) `extractAssistantUsage` when `_meta.usage` exists but contains no recognized fields (`{}`), and (b) `applyAssistantUsage` when `block.kind !== 'assistant'` (usage arriving while a tool/thought block is active).	Without regression tests, future refactors could silently break these guards — e.g., an empty usage object might cause a spurious `{inputTokens: 0, outputTokens: 0}` event instead of being dropped.	Add tests: (a) `_meta.usage: {}` emits no event; (b) `assistant.usage` after `tool.update` without intervening `assistant.text.delta` leaves the tool block untouched.	Nice to have

Severity summary:

🟡 Suggestions: 3
🔵 Nice to have: 1

_{Review performed by Qwen Code. This is an AI-generated review — please verify findings before acting on them.}

qwen-code-ci-bot

LGTM, looks ready to ship. ✅ One minor doc fix: the DaemonTurnUsage and DaemonUiAssistantUsageEvent.parentToolCallId JSDoc both say sub-agent usage is 'excluded', but the reducer correctly includes it — the docs should match the code.

Polish to the per-turn seam: - The chevron and step count toggle together now, so the click target is a comfortable "▸ 3 steps" instead of a lone glyph; the trailing time/tokens stay inert. The toggle differs between states only by the chevron (same-width in the mono font), so the row still never reflows on toggle. - A live turn's elapsed ticks once a second (now − prompt) instead of jumping per step, so it no longer looks frozen during a long step. The shown value is clamped monotonically, so it never steps backward when the turn settles onto its timestamp-derived final figure. - A step-less reply (a plain "hi" that runs no tools/thinking) no longer flashes a "1 step" chevron while it streams: its provisional streamed answer is not counted as a step — though it is still folded away if a live turn is collapsed, so nothing is stranded. - Cached reads show their share of input: "↑3.1k (2.8k cached, 90%) ↓5.1k".

Expanding or collapsing a turn changes the transcript height, which the follow-bottom auto-scroll treated as new output and yanked the viewport to the bottom — pulling the row the user just clicked off screen. Pause follow on a manual toggle so the height change does not auto-scroll; the row stays where it is, and follow re-engages when the user scrolls back to the bottom (Rule 3).

DragonnZhang

Incremental review at 78b5b1d0: two new commits with significant behavioral improvements:

Sub-agent tokens now included in parent turn usage — previously excluded, which made turns under-count against /stats. The applyAssistantUsage function no longer skips parentToolCallId events; sub-agent tokens are part of the spawning turn's real cost.
Cached-read tokens tracked and displayed — cachedReadTokens extracted from _meta.usage, carried in DaemonTurnUsage, displayed parenthetically (e.g., "↑3.1k (2.8k cached, 90%)").
Live elapsed clock — useNowTicker hook re-renders once per second while a turn is active, with monotonic clamping via elapsedSeenRef to prevent backward stepping when a live turn settles onto its final figure.
Roomier fold toggle — UI improvements to the collapse/expand interaction.

Comprehensive test coverage for all changes. CI pending.

Downgraded from Approve to Comment: CI pending.

— claude-opus-4-6 via Qwen Code /review

wenshao · 2026-06-15T13:20:29Z

@qwen-code /triage

DragonnZhang

Well-designed feature adding per-turn time & tokens to the web-shell collapse seam.

Key strengths:

Clean SDK layer: DaemonTurnUsage type with inputTokens/outputTokens/cachedTokens, assistant.usage event
Sub-agent tokens correctly included in parent turn (fixes under-counting against /stats)
useNowTicker hook provides smooth live elapsed with monotonic clamping
Cached tokens shown parenthetically ("↑3.1k (2.8k cached, 90%)") — reads as "of which N cached", not additive
Step-less turns correctly show metrics without toggle button

Code quality is good with clear separation between SDK (data) and web-shell (rendering) layers.

⚠️ CI failing across multiple checks: Test (all platforms), Lint, CodeQL, review-pr. Needs investigation before merge.

Downgraded from Approve to Comment: CI failing.

— claude-opus-4-6 via Qwen Code /review

qwen-code-ci-bot

LGTM, looks ready to ship. ✅

DragonnZhang

Well-executed feature PR adding per-turn metrics to the web-shell collapse seam. The three-layer approach (SDK daemon-ui event, reducer accumulation, web-shell rendering) is clean and well-tested. Sub-agent token attribution is now correctly included in parent turn totals. CI currently pending.

— claude-opus-4-6 via Qwen Code /review

qwen-code-ci-bot · 2026-06-15T13:56:53Z

Qwen Code review did not complete successfully: Qwen review aborted with an API error before posting comments. See workflow logs.

ytahdn

LGTM — review findings are non-blocking suggestions.

wenshao added 3 commits June 15, 2026 19:15

qwen-code-ci-bot previously approved these changes Jun 15, 2026

View reviewed changes

DragonnZhang reviewed Jun 15, 2026

View reviewed changes

wenshao dismissed qwen-code-ci-bot’s stale review via 736e972 June 15, 2026 12:48

qwen-code-ci-bot previously approved these changes Jun 15, 2026

View reviewed changes

wenshao dismissed qwen-code-ci-bot’s stale review via 78b5b1d June 15, 2026 13:12

DragonnZhang reviewed Jun 15, 2026

View reviewed changes

qwen-code-ci-bot approved these changes Jun 15, 2026

View reviewed changes

DragonnZhang reviewed Jun 15, 2026

View reviewed changes

ytahdn approved these changes Jun 15, 2026

View reviewed changes

wenshao merged commit 7ace758 into QwenLM:main Jun 15, 2026
37 of 38 checks passed

Conversation

wenshao commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Why it's needed

Reviewer Test Plan

How to verify

Evidence (Before & After)

Tested on

Environment (optional)

Risk & Scope

Linked Issues

这个 PR 做了什么

为什么需要

Reviewer Test Plan(评审验证)

如何验证

证据(前 / 后)

测试平台

运行环境(可选)

风险与范围

关联 Issue

Uh oh!

wenshao commented Jun 15, 2026

Uh oh!

qwen-code-ci-bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qwen-code-ci-bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Test Results

Unit Tests

Typecheck

Real-Scenario Testing

代码审查

测试结果

Uh oh!

qwen-code-ci-bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qwen-code-ci-bot left a comment

Choose a reason for hiding this comment

Uh oh!

DragonnZhang left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao commented Jun 15, 2026

✅ Local verification — build + real tests (Linux)

Results

Real-browser render (headless Chromium, real CSS — not jsdom)

Cross-package end-to-end (the novel/risky path)

Notes / caveats (none block merge)

✅ 本地验证 —— 构建 + 真实测试（Linux）

结果

真实浏览器渲染（headless Chromium，真实 CSS，非 jsdom）

跨包端到端（新增 / 风险最高的路径）

说明 / 注意事项（均不阻塞合并）

Uh oh!

qwen-code-ci-bot commented Jun 15, 2026

Uh oh!

DragonnZhang left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao commented Jun 15, 2026

Uh oh!

ytahdn commented Jun 15, 2026

Qwen Code Review — PR #5163

Uh oh!

qwen-code-ci-bot left a comment

Choose a reason for hiding this comment

Uh oh!

DragonnZhang left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao commented Jun 15, 2026

Uh oh!

DragonnZhang left a comment

Choose a reason for hiding this comment

Uh oh!

qwen-code-ci-bot left a comment

wenshao commented Jun 15, 2026 •

edited

Loading

qwen-code-ci-bot commented Jun 15, 2026 •

edited

Loading

qwen-code-ci-bot commented Jun 15, 2026 •

edited

Loading

qwen-code-ci-bot commented Jun 15, 2026 •

edited

Loading