feat(web-shell): per-turn time & tokens on the collapse seam, below the prompt#5163
Conversation
…he prompt Move each completed turn's fold control off the prompt row's right edge onto its own line in the seam between the prompt and the steps, and show the turn's elapsed time and ↑input ↓output token usage beside the step count. - UserMessage: only the leading ▸/▾ chevron toggles the turn; the trailing "N steps · 12.4s · ↑3.1k ↓5.1k" summary is inert text and is identical whether collapsed or expanded, so toggling only flips the chevron and never reflows the row. The chevron matches the tool-row disclosure glyphs. - MessageList.applyTurnCollapse: sum per-turn token usage across the turn's assistant messages and derive elapsed from block timestamps (works live and on replay). turnCollapseEqual now compares the new fields so the memoized row re-renders when they change. - sdk daemon-ui: the daemon already emits a turn's per-round usage on an otherwise-empty agent_message_chunk (_meta.usage), but the normalizer dropped empty-text chunks. Emit a new assistant.usage event and fold the counts onto the round's assistant block so the web-shell can read block.usage. Tokens survive replay (persisted usageMetadata is re-emitted); duration is timestamp-derived since per-turn durationMs is live-only.
Build the collapse head for the in-progress turn too — defaulting to expanded so the streaming rows stay visible — so its live step / time / token metrics show while the agent is working, not only after the turn completes. The turn can still be collapsed manually; approval-pending and step-less turns are unchanged.
… step An in-progress turn has no final answer yet, so collapsing it kept the latest intermediate assistant line (e.g. a "…launching N review agents" status) instead of folding away. Treat an active turn as having no answer so the fold drops every streaming step, leaving just the prompt row and its metrics seam; the real final answer is kept once the turn completes.
|
@qwen-code /triage |
|
Thanks for the PR, @wenshao! Template looks good ✓ — all required headings present, bilingual body, test plan with evidence, risk section. On direction: this is a solid web-shell UX improvement. Showing per-turn cost (time + tokens incl. cached and sub-agent) directly on the collapse seam is exactly the kind of visibility users need — right now you have to run On approach: the three-layer architecture (SDK normalizer → transcript reducer → web-shell UI) is the right decomposition. Each layer is independently testable. The scope (~1011 additions across 16 files) is justified — every file change is directly needed for the data to flow from daemon One observation: the Moving on to code review. 🔍 中文说明感谢贡献,@wenshao! 模板完整 ✓ — 所有必需标题齐全,双语正文,含验证计划和证据、风险说明。 方向:这是一个扎实的 web-shell UX 改进。在折叠接缝处直接显示每回合成本(耗时 + token,含缓存和子代理),正是用户浏览历史时需要的可见性——目前只能通过 方案:三层架构(SDK normalizer → transcript reducer → web-shell UI)是正确的分解。每层可独立测试。范围(16 个文件约 1011 行新增)是合理的——每个文件改动都是让数据从 daemon 一个细节:toggle 时设置 进入代码审查 🔍 — Qwen Code · qwen3.7-max |
Code ReviewIndependent proposal (before reading the diff): I'd have taken the same three-layer approach — extract Comparison with the diff: The implementation is well-layered and correct. A few things that stood out:
No correctness bugs, security concerns, or AGENTS.md violations found. Test ResultsUnit TestsSDK ( 6 new tests covering: empty-text usage chunk extraction, no-usage guard, per-round accumulation onto active block, sub-agent usage folding, cached-read token carry/accumulation, and stray-block prevention. Web-shell ( New tests cover: usage carry-through on merge, step-less turn metrics, cached token display, live ticker, chevron-less seam, elapsed calculation, token summation across turns, and the toggle/metrics separation. Typecheck
Real-Scenario TestingThis PR modifies web-shell React components (collapse seam UI). Tmux cannot drive a browser-based React app — the visual changes (chevron repositioning, metrics rendering, toggle reflow behavior) require a running web-shell instance with a browser. The unit tests comprehensively cover the rendering logic: chevron + step count separation from inert metrics, collapsed/expanded stability (no reflow), cached token parenthetical formatting, live ticker behavior, and step-less turn rendering. No tmux test is feasible for this type of change. 中文说明代码审查独立方案(读 diff 之前): 我会采取同样的三层方案——从 与 diff 对比: 实现分层清晰且正确。几个亮点:
未发现正确性 bug、安全问题或 AGENTS.md 违规。 测试结果单测全部通过:SDK 236 项(+6 新增),Web-shell 362 项(+22 新增)。Typecheck 无新增错误。 此 PR 修改 web-shell React 组件(折叠接缝 UI),tmux 无法驱动浏览器 React 应用。单测全面覆盖了渲染逻辑。 — Qwen Code · qwen3.7-max |
|
This is a well-executed feature PR. Let me think through it holistically. The motivation is clear: users scanning a web-shell transcript want to see what each turn cost without running Going back to my independent proposal: the PR matches it. I don't see a materially simpler path. The three-layer decomposition is necessary because the data originates in the daemon and needs to flow through normalization, reduction, and rendering. Each layer adds exactly what it needs. A few reflections:
If I had to maintain this in six months, I'd thank the author. The code is well-organized, the tests are comprehensive, and the comments explain the why where it's non-obvious. LGTM, ready to ship. ✅ 中文说明这是一个执行良好的功能 PR。 动机清晰:用户浏览 web-shell 对话记录时,想看到每回合的成本而不用跑 回顾我的独立方案:PR 与之匹配。没看到明显更简的路径。三层分解是必需的,因为数据源自 daemon 需要流经归一化、归约和渲染。每层只添加所需。 几点感想:
准备合并 ✅ — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
DragonnZhang
left a comment
There was a problem hiding this comment.
Well-designed feature adding per-turn time & token metrics to the web-shell collapse seam. The SDK layer correctly introduces assistant.usage events with DaemonTurnUsage (inputTokens/outputTokens), folds them onto active assistant blocks via applyAssistantUsage, and excludes sub-agent usage via parentToolCallId filtering. The web-shell layer renders N steps · 12.4s · ↑3.1k ↓5.1k beside the fold chevron with proper formatting helpers (formatDuration, formatTokenCount). Comprehensive test coverage (accumulation, sub-agent exclusion, empty-text usage chunks, formatting edge cases).
Downgraded from Approve to Comment: CI failing across all checks.
— claude-opus-4-6 via Qwen Code /review
✅ Local verification — build + real tests (Linux)Verified the head commit Environment: Linux (Debian 13, kernel 6.12), Node v22.22.2, npm 10.9.7. Worktree at Results
The new-feature suites all land: Real-browser render (headless Chromium, real CSS — not jsdom)Mounted the real Asserted against a live browser (not the DOM mock):
(Real-Chromium screenshot captured locally; the text above is the exact Cross-package end-to-end (the novel/risky path)Drove the built
Notes / caveats (none block merge)
Recommendation: functionally solid and safe to merge on these results; the only static-analysis red marks are pre-existing repo/tooling issues, not regressions from this PR. 中文版(点击展开)✅ 本地验证 —— 构建 + 真实测试(Linux)在独立 git worktree 中以全新 环境: Linux(Debian 13,内核 6.12),Node v22.22.2,npm 10.9.7。Worktree 在 结果
新功能相关的测试套件全部命中: 真实浏览器渲染(headless Chromium,真实 CSS,非 jsdom)挂载了真实的 在真实浏览器(非 DOM mock)中断言:
(真实 Chromium 截图已在本地留存;上面就是 Chromium 实际产出的 跨包端到端(新增 / 风险最高的路径)用原始 ACP
说明 / 注意事项(均不阻塞合并)
结论: 基于以上结果,功能扎实、可安全合并;唯二的静态检查“红”均为既有的仓库/工具链问题,并非本 PR 的回归。 |
|
Qwen Code review did not complete successfully: Qwen review aborted with an API error before posting comments. See workflow logs. |
DragonnZhang
left a comment
There was a problem hiding this comment.
Incremental review at c8a1d5f7: one new commit fixes a folding edge case where collapsing the active (streaming) turn would strand the intermediate assistant text instead of folding it down to the prompt + seam. The fix correctly guards final-answer detection with !isActiveTurn — active turns have no final answer yet, so their intermediate text is treated as another step. Test coverage added for both the corrected hidden count and the new "collapsing active turn" behavior.
Downgraded from Approve to Comment: CI failing (review-pr workflow check, not actual tests).
— claude-opus-4-6 via Qwen Code /review
…ep-less turns Three fixes to the per-turn metrics seam, all about making the token figures trustworthy: - Sub-agent rounds were dropped from a turn's token total — the SDK reducer skipped usage carrying a parentToolCallId — so a turn that spawns agents under-counted badly against /stats. Fold those rounds onto the turn's top-level assistant block: the parent is blocked on the Task call while they run, so the active block is that turn's, and their tokens are part of its real cost. (Per-turn is still narrower than /stats, which is the whole session.) - Carry cached-read tokens through the pipeline (SDK usage → block → message → turn) and show them parenthetically on input — "↑3.1k (2.8k cached) ↓5.1k", only when > 0. Cached reads are a subset already counted in input, not an additive figure, so the parenthetical reads as "of which N cached". - Surface the seam on a step-less turn too (e.g. a plain "hi" reply that runs no tools or thinking): a chevron-less "<time> · ↑in ↓out" line. There is nothing to fold, but the cost is still worth seeing; turns with neither steps nor any measured metric stay untagged.
|
@qwen-code /triage |
Qwen Code Review — PR #5163Overall: Good PR overall Findings:
Severity summary:
Review performed by Qwen Code. This is an AI-generated review — please verify findings before acting on them. |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅ One minor doc fix: the DaemonTurnUsage and DaemonUiAssistantUsageEvent.parentToolCallId JSDoc both say sub-agent usage is 'excluded', but the reducer correctly includes it — the docs should match the code.
Polish to the per-turn seam: - The chevron and step count toggle together now, so the click target is a comfortable "▸ 3 steps" instead of a lone glyph; the trailing time/tokens stay inert. The toggle differs between states only by the chevron (same-width in the mono font), so the row still never reflows on toggle. - A live turn's elapsed ticks once a second (now − prompt) instead of jumping per step, so it no longer looks frozen during a long step. The shown value is clamped monotonically, so it never steps backward when the turn settles onto its timestamp-derived final figure. - A step-less reply (a plain "hi" that runs no tools/thinking) no longer flashes a "1 step" chevron while it streams: its provisional streamed answer is not counted as a step — though it is still folded away if a live turn is collapsed, so nothing is stranded. - Cached reads show their share of input: "↑3.1k (2.8k cached, 90%) ↓5.1k".
Expanding or collapsing a turn changes the transcript height, which the follow-bottom auto-scroll treated as new output and yanked the viewport to the bottom — pulling the row the user just clicked off screen. Pause follow on a manual toggle so the height change does not auto-scroll; the row stays where it is, and follow re-engages when the user scrolls back to the bottom (Rule 3).
DragonnZhang
left a comment
There was a problem hiding this comment.
Incremental review at 78b5b1d0: two new commits with significant behavioral improvements:
-
Sub-agent tokens now included in parent turn usage — previously excluded, which made turns under-count against /stats. The
applyAssistantUsagefunction no longer skipsparentToolCallIdevents; sub-agent tokens are part of the spawning turn's real cost. -
Cached-read tokens tracked and displayed —
cachedReadTokensextracted from_meta.usage, carried inDaemonTurnUsage, displayed parenthetically (e.g., "↑3.1k (2.8k cached, 90%)"). -
Live elapsed clock —
useNowTickerhook re-renders once per second while a turn is active, with monotonic clamping viaelapsedSeenRefto prevent backward stepping when a live turn settles onto its final figure. -
Roomier fold toggle — UI improvements to the collapse/expand interaction.
Comprehensive test coverage for all changes. CI pending.
Downgraded from Approve to Comment: CI pending.
— claude-opus-4-6 via Qwen Code /review
|
@qwen-code /triage |
DragonnZhang
left a comment
There was a problem hiding this comment.
Well-designed feature adding per-turn time & tokens to the web-shell collapse seam.
Key strengths:
- Clean SDK layer:
DaemonTurnUsagetype with inputTokens/outputTokens/cachedTokens,assistant.usageevent - Sub-agent tokens correctly included in parent turn (fixes under-counting against /stats)
useNowTickerhook provides smooth live elapsed with monotonic clamping- Cached tokens shown parenthetically ("↑3.1k (2.8k cached, 90%)") — reads as "of which N cached", not additive
- Step-less turns correctly show metrics without toggle button
Code quality is good with clear separation between SDK (data) and web-shell (rendering) layers.
Downgraded from Approve to Comment: CI failing.
— claude-opus-4-6 via Qwen Code /review
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
DragonnZhang
left a comment
There was a problem hiding this comment.
Well-executed feature PR adding per-turn metrics to the web-shell collapse seam. The three-layer approach (SDK daemon-ui event, reducer accumulation, web-shell rendering) is clean and well-tested. Sub-agent token attribution is now correctly included in parent turn totals. CI currently pending.
— claude-opus-4-6 via Qwen Code /review
|
Qwen Code review did not complete successfully: Qwen review aborted with an API error before posting comments. See workflow logs. |
ytahdn
left a comment
There was a problem hiding this comment.
LGTM — review findings are non-blocking suggestions.
What this PR does
In the web-shell transcript each completed turn can be folded to just its prompt and final answer. This moves the fold control off the prompt row's right edge onto its own line in the seam between the prompt and the steps, and turns that line into a per-turn metrics readout: step count, elapsed time, and
↑input ↓outputtoken usage (with cached reads broken out). Only the leading▸/▾chevron toggles the turn — the trailing3 steps · 12.4s · ↑3.1k (2.8k cached) ↓5.1ksummary is inert text, identical collapsed vs expanded so toggling only flips the chevron and never reflows the row.It applies to every turn, not just folded ones:
Surfacing tokens needed an SDK daemon-ui change: the daemon already reports each round's usage (including sub-agent rounds) on an otherwise-empty
agent_message_chunk(_meta.usage), but the normalizer dropped empty-text chunks. It now emits a dedicatedassistant.usageevent and the reducer folds the counts — sub-agent rounds included — onto the round's top-level assistant block, so the web-shell can readblock.usageand sum a turn's true total.Why it's needed
The previous toggle sat at the prompt row's right edge, needed a hard-coded gutter to dodge the hover timestamp, and showed only a hidden-step count. Putting the control in the seam is more intuitive and removes the gutter hack, and showing per-turn time + tokens (incl. cached and sub-agent cost) lets you see what each response cost at a glance while scanning a transcript.
Reviewer Test Plan
How to verify
Unit tests cover each layer:
cd packages/sdk-typescript && npx vitest run test/unit/daemonUi.test.ts— normalizer emitsassistant.usage(with cached) from an empty-text usage chunk; the reducer folds/accumulates onto the active block, including sub-agent rounds; no stray block when none is active.cd packages/web-shell && npx vitest run—transcriptToMessagescarries/sumsblock.usage(incl. cached);applyTurnCollapsesums per-turn tokens + cached and derives elapsed from timestamps, tags active and step-less turns;UserMessagerenders a chevron-only / chevron-less seam whose summary is identical collapsed vs expanded.In a running web-shell: complete a turn with steps, let it collapse, confirm
▸ N steps · <time> · ↑<in> (<cached> cached) ↓<out>; click only the chevron to expand; confirm no horizontal shift on toggle. Send a plain "hi" and confirm its reply shows a chevron-less<time> · ↑in ↓outline. Run a turn that spawns sub-agents and confirm its token total reflects the sub-agent cost (much closer to/stats, though/statsis whole-session and still larger). Reopen a past session and confirm tokens still render.Evidence (Before & After)
Before — toggle right-aligned on the prompt row, step count only:
After — seam below the prompt; chevron-only control; time + tokens (+ cached):
Verified via the unit tests,
tsc, andeslint; real-app screenshots not captured.Tested on
Environment (optional)
Unit tests only (vitest); no live daemon run.
Risk & Scope
durationMsis live-only and not persisted, so timestamps are used for consistency across live and replay./stats(which is the whole session across all turns and models), so they will not match exactly.assistant.usageevent andDaemonTextTranscriptBlock.usagefield (with optionalcachedTokens) are additive; older sessions that carry no usage simply show the step count (and time when available).Linked Issues
None.
中文说明
这个 PR 做了什么
web-shell 的对话记录里,每个已完成回合可折叠成"只剩 prompt + 最终答案"。本 PR 把折叠控件从 prompt 行右端,挪到 prompt 与步骤之间的接缝处单独成行,并把这行做成每回合指标:步数、耗时、
↑输入 ↓输出(并单列缓存命中)。只有前导▸/▾箭头能折叠;后面的3 steps · 12.4s · ↑3.1k (2.8k cached) ↓5.1k是惰性文字,折叠/展开两态一致,切换只翻箭头、行宽不变。适用于所有回合,不止被折叠的:
显示 token 需要改 SDK 的 daemon-ui:daemon 本就在一个空文本
agent_message_chunk(_meta.usage)上逐轮上报用量(含子代理轮次),但 normalizer 之前丢弃空文本 chunk。现在改为 emit 专用assistant.usage事件,reducer 把计数(含子代理轮次)累加到该轮的顶层助手块上,web-shell 即可读block.usage并汇总出回合真实总量。为什么需要
之前开关在 prompt 行右端,要硬编码留白避开悬停时间戳,且只显示步数。把控件放进接缝更直观、去掉留白 hack;显示每回合耗时 + token(含缓存与子代理成本)让你浏览记录时一眼看清每次回复的成本。
Reviewer Test Plan(评审验证)
如何验证
单测覆盖每一层:
cd packages/sdk-typescript && npx vitest run test/unit/daemonUi.test.ts—— normalizer 从空文本 usage chunk emitassistant.usage(含 cached);reducer 累加到活跃块、含子代理轮次;无活跃块时不造空块。cd packages/web-shell && npx vitest run——transcriptToMessages透传/求和block.usage(含 cached);applyTurnCollapse按回合汇总 token + cached、用时间戳算 elapsed、给运行中与无步骤回合打标;UserMessage渲染"仅箭头/无箭头"接缝,摘要两态一致。在运行中的 web-shell:完成一个有步骤的回合并折叠,确认
▸ N steps · <时间> · ↑<入> (<cached> cached) ↓<出>;只点箭头能展开;切换不横向跳动。发一句"你好",确认其回复显示无箭头的<时间> · ↑入 ↓出。跑一个派生子代理的回合,确认其 token 总量体现了子代理成本(更接近/stats,但/stats是整会话累计、仍更大)。重开历史会话,确认 token 仍显示。证据(前 / 后)
之前 —— 开关右对齐在 prompt 行,只有步数:
之后 —— 接缝在 prompt 下方;仅箭头可点;带耗时 + token(+ 缓存):
通过单测、
tsc、eslint验证;未截真实应用图。测试平台
运行环境(可选)
仅单测(vitest);未实跑 daemon。
风险与范围
durationMs仅 live 有、未持久化,故统一用时间戳以保证 live 与回放一致。/stats(整会话、所有回合与 model 累计)窄,不会完全相等。assistant.usage事件与DaemonTextTranscriptBlock.usage(含可选cachedTokens)均为增量;不带用量的旧会话只显示步数(有时间戳时也显示耗时)。关联 Issue
无。