FEA-1437: UI Numbers Audit Phase 3/4/6 — selector-bound DOM audits + CI gate#252
Conversation
Read-only pre-explorer pass (PLN-760) ahead of Phase 3 UI-test work. No app code changed — manifest annotations + four docs only. - Renderer map: located the React file+line for all 53 manifest tiles (45 tiles + 8 structural assertions) across in-repo overlays and the pinned upstream agent-dashboard-client. PHASE3-RENDERER-MAP.md, grouped by screen, with a render_kind per tile (text / embedded / per-group / DOM-count / chart-only / absent). - Selector annotation: added a `selector` object to every manifest tile. No data-testid exists on any tile render path today, so every selector is a PROPOSED value gated on a one-line React follow-up (present_in_code:false). - Weak triage: classified all 118 cross_ref_weak detections — 12 reclassify to out_of_scope, 4 auto-resolve via selector binding, 102 are chart aggregates needing an explicit covered-by annotation. PHASE3-WEAK-TRIAGE.md. - Sidecar reuse: confirmed helpers/launch-sidecar.mjs already serves Playwright (via playwright-global-setup.ts) and node:test alike; SANDBOX_BASE_DIRECTORY=/ is baked into the shared helper and UI specs bypass the hook enqueue/drain path entirely. PHASE3-SIDECAR-REUSE.md. Testing: `node manifest-loader.mjs` loads the patched manifest cleanly (45 tiles / 8 structural). No app code or tests modified; nothing to run. Risks: none to runtime — manifest `selector` field is additive and the loader tolerates it; docs are non-executable. Line numbers for upstream files are pinned to the current package hash and will drift on an upstream bump. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two accuracy fixes from independent codex review (gate: pass, P2+P3): - Weak triage: the 4 Analytics costData.total_cost detections were marked category (a) auto-resolve, but coverage-classifier.mjs:178 binds candidates screen-scoped (t.screen === row.screen) and there is no Analytics cost tile — only dashboard.monitor.total_cost. Moved them to category (b). Net: (a) is now 0, which sharpens the headline: NO weak detection is a simple tile-text value awaiting a selector; every one is out-of-scope (c) or a chart aggregate (b). - successRate selector follow-up: the generic per-group text said "sum across groups", but workflow_success_rate is a percentage oracle. Corrected to expose the aggregate percentage, not a summed count. Testing: manifest still loads via manifest-loader.mjs (45 tiles / 8 structural). Risks: none — docs + one manifest selector.followup string. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pre-explorer done → what Phase 3 should pick up firstLinking back: FEA-1437 · PLN-760 (Phase 3 task list). This PR lands the pre-explorer deliverables PLN-760 asked for before Phase 3 kickoff. Start with the Dashboard screen. It's the right first target because:
Defer / flag before writing specs (from PHASE3-RENDERER-MAP.md): Sidecar reuse — confirmed yes, no new boot path. Weak-count reality check: driving |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 006f865600
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "render_kind": "text", | ||
| "renders_at": "scripts/agent-monitor-client/Dashboard.tsx:1814", | ||
| "followup": "Add data-testid to the StatPill value element, keyed per tile." |
There was a problem hiding this comment.
Correct active-agent selector annotation
For the dashboard.monitor.active_agents Phase-3 binding, the current Dashboard render path puts active_agents only in the StatPill sub text (Dashboard.tsx:1814); the corresponding value text is overview.total_agents (Dashboard.tsx:1812), and there is no separate “Active Agents” stat pill. If the follow-up adds this selector to the value element as instructed here, the audit will read total agents while comparing against the active-agents oracle, producing a false failure or masking the intended binding.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — fixed in bb-range commit just pushed. active_agents renders only in the sub text of the "Total Agents" pill ({active_agents} active), and that pill's value is overview.total_agents. Corrected the manifest annotation to render_kind: text_sub with a follow-up that targets the sub element (subTestid), not the value — and updated the matching PHASE3-RENDERER-MAP.md row. No test impact today (the tile is still label-sliced and passing); this prevents the wrong follow-up binding later.
…r tiles First Phase-3 increment: replace fragile label-slicing with stable data-testid selectors on the Dashboard Monitor StatPills. - StatPill gains optional testid/subTestid props (additive); wired at the three genuinely-text Monitor pills: Total Sessions (+ active trend), Total Cost, Total Events. - New helpers/playwright-region.ts: sliceAtSelector / sliceAtLabel / sliceForTile (the PLN-760 helper layer). sliceForTile prefers the data-testid when the manifest records selector.present_in_code, else falls back to label slice. - dashboard.ui-audit.spec.ts refactored to use the shared helper. - manifest.json: selector.present_in_code flipped to true for the four testid'd entries. Build fix (latent bug): build-agent-monitor.mjs currentStamp() omitted the client Dashboard.tsx overlay, so editing it never invalidated the build stamp — the sidecar shipped a stale Dashboard bundle. Added clientOverlayDashboardSource to the stamp inputs (matches the CLAUDE.md learned pattern: overlay changes must invalidate the stamp). This is what made the testids actually reach the DOM. Testing: `pnpm --filter desktop test:audit:ui` → 4 passed / 1 skipped (total_cost, bug FEA-1418) / 3 failed. The 3 failures (active_subagents, its trend, events_today) are PRE-EXISTING and unchanged by this commit — those tiles do not render in the current overlay (confirmed by the pre-explorer). They need a triage decision (file as bug/gap features with bug_ref so they skip, or drop the manifest rows) before the Monitor audit can go fully green. Risks: testid props are additive and default undefined, so non-audit rendering is unchanged. Version already bumped to 0.15.98 earlier in this PR. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Phase 3 audit proved 3 Monitor manifest tiles render no number in the UI. Filed as gap features (RELATES_TO FEA-1415) and set bug_ref so the audit test.skips them — Dashboard Monitor audit now green (4 passed / 4 skipped). - active_subagents + active_subagents.trend_total -> FEA-1442 (no Active Subagents tile; subagents only appear as agent-tree nodes) - events_today -> FEA-1443 (no Events Today tile rendered at all) Each gap feature documents expected-vs-actual and the open decision (build the tile with a data-testid, or descope the manifest row). The skipped tests auto-flip to real assertions when the tile ships. Testing: `pnpm --filter desktop test:audit:ui` -> 4 passed / 4 skipped / 0 failed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The pre-explorer marked dashboard.monitor.active_agents as render_kind "text"
with "add data-testid to the value element". Wrong: active_agents renders only
in the SUB text of the "Total Agents" pill ("{active_agents} active"); that
pill's VALUE is overview.total_agents. A follow-up that put the testid on the
value element would read total_agents while comparing against the active_agents
oracle — a false failure / masked binding.
Corrected the manifest selector annotation (render_kind text_sub, renders_at,
followup -> target the sub element) and the matching PHASE3-RENDERER-MAP.md row.
No code/test impact now (tile is present_in_code:false, still label-sliced and
passing); this prevents the wrong follow-up later.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… helper - helpers/audit-tile.ts: shared assertTileMatchesOracle(page,row) + resolveFixtureDbPath(), lifted from dashboard.ui-audit.spec.ts so each screen spec is a thin table loop (count/trend/money branches in one place). - dashboard.ui-audit.spec.ts refactored onto the shared helper (no behavior change). - PullRequests.tsx: data-testid on the 3 summary stat-grid values. - specs/audit/pull-requests.ui-audit.spec.ts: new manifest-driven spec. - manifest.json: present_in_code true for the 3 pr.stats.* tiles. Testing: test:audit:ui -> 7 passed / 4 skipped / 0 failed (Dashboard + PR). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Sessions.tsx: data-testid on the list-subtitle <p> (the "N sessions" count). - specs/audit/sessions.ui-audit.spec.ts: new manifest-driven spec. - manifest.json: present_in_code true for sessions.list.total. Testing: test:audit:ui -> 8 passed / 4 skipped / 0 failed (Dashboard + PR + Sessions). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wires the audit coverage validator + headless tile audit into a CI workflow. The validator already enforces the Phase 6 rules (needs_review=0, every detection classified, cross_ref files exist, out_of_scope reasons, bug_ref format) but was never run in CI (Tests runs `pnpm test`, not `test:audit`). - scripts/check-audit-coverage.sh (+ `audit:coverage` npm script): regenerates scan+classify, asserts every harness parser has a contract test, runs the coverage validator, prints by_status. Fast, no build. - apps/desktop/ci/audit-gate.yml: the workflow (coverage gate + headless Playwright ui-audit jobs). Staged here, NOT .github/workflows/, because the automation PAT lacks the `workflow` scope — see apps/desktop/ci/README.md for the one-command activation a maintainer with workflow scope runs. - coverage.json: refreshed detection line numbers after the Dashboard testid edits (line shifts only; by_status unchanged). Testing: `pnpm -C apps/desktop audit:coverage` -> 203 checks pass, needs_review=0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…cope Per PHASE3-WEAK-TRIAGE.md category (c): the 12 detections that were never in scope (server-runtime health gauges, local hooks config state, timestamp formatters) now classify as out_of_scope via three new content-pattern rules in coverage-classifier.mjs (HEALTH_GAUGE / CONFIG_STATE / TIMESTAMP). cross_ref_weak 118 -> 106; out_of_scope 29 -> 41; needs_review still 0. The remaining 106 weak detections are category (b) — chart/sparkline aggregates whose underlying numbers ARE oracle-backed at the API layer but don't render as sliceable per-value DOM. Those need the covered_by annotation convention (next). Testing: coverage-validator 203/203 pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- audit-tile.ts: new dom_count branch — for list_length/section_card_count tiles (no single numeric element), assert the rendered item count == oracle via toHaveCount(selector). Generalizes the table-driven audit to count tiles. - Tools.tsx: data-testid='audit-tool-row' on each tool button. tools.list.length now asserts count == events_facets_tool_names_length (passes). - specs/audit/tools.ui-audit.spec.ts: new spec; skips tiles not yet bound (tools.event_types.length — event_type has no countable rendered list; only shows per-event in the detail panel) with a visible "(skip — selector pending)". - Plans.tsx: data-testid='audit-plan-row' added (infra for later) but the Plans UI audit is NOT enabled — the fixture seeds zero plans, so it can't be meaningfully asserted yet; manifest followup documents the fixture-seeding prerequisite. No Plans spec shipped (would be empty/false-confidence). - coverage.json: detection line numbers refreshed after the testid edits. Testing: test:audit:ui -> 9 passed / 5 skipped / 0 failed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Skills.tsx: data-testid='audit-skill-row' on each skill button (across pack groups). specs/audit/skills.ui-audit.spec.ts asserts toHaveCount == skills_total (= sum of per-group counts). Passes. - PacksInstalled.tsx: data-testid='audit-pack-row' added (infra), but the Packs UI audit is NOT enabled — in the test sidecar the installed-packs list renders 0 rows (SKIP_CATALOG_DETECTORS=1; /api/packs may need install detection to surface the seeded agent_packs). manifest followup documents the open question (possible real bug vs test-env limitation). No Packs spec shipped. - coverage.json: detection line numbers refreshed. Testing: test:audit:ui -> 10 passed / 5 skipped / 0 failed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Overnight handoff (FEA-1437 Phase 3/4/6)PR is green (10 passed / 5 skipped / 0 failed on Landed & verified: selector-bound DOM audits for Dashboard Monitor, PullRequests, Sessions, Tools, Skills; shared 3 things that need you (can't be done unattended):
Not started: Phase 5 (visual regression + real-Electron) — deferred as flaky/baseline-heavy per PLN-760. |
…w High)
Independent local review caught a High bug in the staged workflow: the coverage
summary step did `require(process.env.COV)` with a bare relative path, which
Node treats as a node_modules specifier — it throws "Cannot find module" once
coverage.json exists, failing the (non-continue-on-error) step and thus the
whole coverage gate job, blocking PRs on activation.
- Resolve to an absolute path: require(require("path").resolve(process.env.COV)).
- Mark the summary step continue-on-error: true — reporting must never fail the
gate.
Verified: the fixed snippet prints the by_status table from repo root.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
| // is patched. Skip so the audit-ui suite stays green without masking the | ||
| // bug — the node-side audit keeps the assertion alive as a `todo` that | ||
| // flips to "todo passed" the moment the fix lands. | ||
| const testFn = row.bug_ref ? test.skip : test; |
There was a problem hiding this comment.
This only skips on bug_ref, but tools.ui-audit and skills.ui-audit also skip tiles where selector.present_in_code is false. active_agents is present_in_code:false with no bug_ref, so it runs here through the label fallback. sliceAtLabel("Active Agents") lands on the full-width Active Agents section heading (t("activeAgentsSection")), not the Total Agents pill sub where the oracle value actually renders, then asserts the number shows up somewhere in an 80-char window. So that tile is green because some number in the agent grid happened to match, not because the rendered active_agents value equals the oracle. And it'll flip to a hard failure the day the activeAgentsSection translation stops containing "Active Agents". Either give it the same present_in_code skip the other two specs have, or hoist that guard into assertTileMatchesOracle so all five specs behave the same.
There was a problem hiding this comment.
Fixed in dd3cec2 — and you're right that it was the better lever than a one-off skip. Two changes: (1) bound active_agents for real — added a subTestid to the Total Agents pill's sub (data-testid='audit-dashboard-monitor-active-agents') and flipped present_in_code: true, so it now asserts the rendered active count against the oracle on the element where it actually renders, not the section heading. (2) Hoisted the guard: new tileSkip(row) in helpers/audit-tile.ts (skip on bug_ref OR !present_in_code) now used by all five specs, so nothing can silently fall through to the label slice again. Verified active_agents passes via the selector path (not coincidence).
| pull_request: | ||
| types: [opened, synchronize, reopened] | ||
| paths: | ||
| - "apps/desktop/**" |
There was a problem hiding this comment.
Heads up for when you git mv this into .github/workflows. The README calls coverage the merge gate, so it'll presumably be a required check. A required check plus a paths filter is the classic trap: a PR that doesn't touch apps/desktop/** never triggers these jobs, the required status sits pending forever, and the PR can't merge. If it gates merges you need the no-op-on-skip job pattern or a non-paths trigger.
There was a problem hiding this comment.
Good catch — fixed in dd3cec2. Removed the paths: filter entirely so the workflow always triggers (the coverage job is fast), and documented in apps/desktop/ci/README.md to mark only coverage as required and leave ui-audit optional. That avoids the required-check + paths deadlock when this gets git mv'd into .github/workflows/.
thadeusb
left a comment
There was a problem hiding this comment.
Test infra is solid and the build-stamp fix is a real one. The catch: active_agents passes for the wrong reason because this spec doesn't skip unbound selectors the way the tools/skills specs do, so a green suite is overstating its coverage by one tile. See the comment.
…dex P2) Codex review caught a race: the Sessions subtitle renders "0 sessions" as its initial placeholder before api.sessions.list() resolves. The beforeEach poll used /\d/, which matches that "0" immediately, so the assertion could read a stale zero and compare it against a non-zero oracle. Changed the poll to /[1-9]/ (the same anti-race trick the Dashboard "Total Sessions" audit uses) so it waits for real, non-placeholder data. Testing: test:audit:ui -> 10 passed / 5 skipped / 0 failed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ate trigger
Review (approved-with-comments) caught two real issues:
1. active_agents was a false green. It was present_in_code:false with no bug_ref,
and the Dashboard spec (unlike tools/skills) didn't skip unbound tiles — so it
ran through the label fallback, which sliced the "Active Agents" SECTION
heading (not the Total Agents pill sub where active_agents renders) and passed
only because some number landed in the 80-char window (and would break if the
translation changed). Fix:
- Bind it for real: subTestid on the Total Agents pill sub
(data-testid='audit-dashboard-monitor-active-agents'), manifest
present_in_code:true. Now asserts the rendered active count == oracle.
- Unify skip logic: new tileSkip() helper (bug_ref OR !present_in_code) used by
ALL five specs, so no spec can silently fall back to a coincidental label
match again.
2. audit-gate.yml required-check + paths-filter deadlock. A required check with a
paths filter leaves PRs that don't touch apps/desktop/** pending forever.
Removed the paths filter (coverage job is fast, always triggers) and documented
in README: mark only `coverage` required, leave `ui-audit` optional.
Testing: test:audit:ui -> 10 passed / 5 skipped / 0 failed (active_agents now
passes via its real selector).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PLN-750 merged to main and took 0.15.98 — the same version this PR had. On a pull_request event GitHub builds a merge commit, so version-bump-check's merge-base became main's tip (0.15.98), equal to ours -> "not bumped" failure. Bumping to 0.15.99 (above main) resolves the gate and avoids a release-version collision with PLN-750. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…plorer # Conflicts: # apps/desktop/package.json
Pre-explorer pass and Phase 3/4/6 increments for FEA-1437 (UI Numbers Audit Phase 3-6), per PLN-760. Builds on FEA-1415 / PLN-738 (PR #246,
2a3a371).Part 1 — Pre-explorer (manifest annotations + 4 docs)
inventory/gainsPHASE3-RENDERER-MAP.md(render file:line + render_kind for all 53 entries),PHASE3-WEAK-TRIAGE.md(all 118 cross_ref_weak classified),PHASE3-SIDECAR-REUSE.md(launch-sidecar.mjs already serves Playwright), and aselectorobject on every manifest tile.Part 2 — Phase 3/4: selector-bound DOM audits
Shared helpers
helpers/playwright-region.ts+helpers/audit-tile.ts(assertTileMatchesOracle, with text / trend / money / dom_count branches) make each screen spec a ~10-line table loop.data-testids added to in-repo overlays only.Bound + green (manifest-driven audit specs):
total_sessions(+active trend),total_cost,total_events(StatPill testids)pull_requests,sessions_with_pr,repos(stat-grid testids)sessions.list.total(i18n subtitle)tools.list.length(count of tool rows ==events_facets_tool_names_length)skills.list.length(count of skill rows ==skills_total)test:audit:ui→ 10 passed / 5 skipped / 0 failed. Skips: FEA-1418 (cost bug), FEA-1442/1443 (non-rendering tiles),tools.event_types.length(selector pending).Build bug fixed:
build-agent-monitor.mjscurrentStamp()omitted the Dashboard overlay, so editing it shipped a stale bundle. Added to the stamp inputs.Part 3 — Phase 6: CI gate
scripts/check-audit-coverage.sh+audit:coveragenpm script +apps/desktop/ci/audit-gate.yml(coverage gate + headless ui-audit jobs). Staged inapps/desktop/ci/(not.github/workflows/) because the automation PAT lacks theworkflowscope —apps/desktop/ci/README.mdhas the one-command activation.Part 4 — cross_ref_weak reduction
Reclassified the 12 never-in-scope detections (server-runtime health gauges, hooks config state, timestamp formatters) to
out_of_scopevia new classifier rules. cross_ref_weak 118 → 106.Triage — audit findings filed as gap features
Deferred (documented in manifest
followupnotes — next session)data-testids + the dom_count assertion are in place, but blocked on test-env data: the fixture seeds no plans, and the installed-packs list renders 0 in the test sidecar (SKIP_CATALOG_DETECTORS=1). Needs fixture seeding / an/api/packscheck (possible real bug).?type=filter).data-testids without extending the build overlay to those upstream pages; their numbers are API-covered today. Needs an overlay-infra decision.Tip raw=), not visible text; needs a product decision (surface as text vs assert tooltip).Review
Pre-explorer + first Phase 3 commit: independent
/codex review(gate PASS, findings fixed) + GitHub Codex bot (1 P2 fixed). Phase 4/6 commits: codex CLI rate-limited overnight → reviewed via an independent local reviewer instead. All CI checks green.🤖 Generated with Claude Code