FEA-1444: Codex hook ingestion + dedup + opt-in toggle#259
Conversation
- Add codex-hook-handler.js wrapper that mirrors upstream hook-handler.js semantics but injects __provider: "codex" so the sidecar can stamp the session row with harness='codex'. Zero-dep plain JS, fail-silent. - Split agent-monitor-hooks.ts: pure install/uninstall logic moved to new agent-monitor-hooks-core.ts (electron-free, testable under tsx --test). Shell retains electron-store + path resolution + gatewayLog. - Add installCodexHooks / uninstallCodexHooks targeting $CODEX_HOME/hooks.json with the narrower Codex hook event set (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, Stop). SessionStart uses the "startup|resume" matcher per Codex's reference contract. - Filename-boundary predicate (isClaudeEntry vs isCodexEntry) prevents cross-matching between the two handler families. - New codexOptIn electron-store flag (default false) gates Codex hook installation. Behavior change: the master hooks toggle is unchanged for existing users; Codex hooks only install when both flags are true. Deferred to a follow-up: event-level dedup against the existing codex-watcher rollout-tail path, and the renderer UI to flip codexOptIn. - syncAgentMonitorHooksOnBoot reconciles Codex state with persisted intent (installs when opted in, uninstalls stale entries when opted out) so a failed file-layer mutation is eventually consistent on the next launch. - setAgentMonitorCodexHooksOptIn persists intent before applying side effects so a failure (e.g., locked hooks.json) cannot silently lose the user's opt-out. - build-agent-monitor.mjs gains a surgical patch to the generated server/routes/hooks.js that calls stmts.setSessionHarness when the inbound payload carries __provider: "codex". Patch is hash-anchored and hard-gated; future upstream drift fails the build loudly. - Bump desktop version to 0.15.98 per repo CI rule. Testing: - pnpm -C apps/desktop typecheck: clean - pnpm -C apps/desktop lint: clean - pnpm -C apps/desktop build:agent-monitor: clean (patch + hard-gates pass) - New test/agent-monitor-hooks-core.test.ts: 10/10 pass — covers predicate identity, matcher rules per harness, idempotent install, in-place self-heal of moved handler paths, narrow-scope uninstall, foreign-entry preservation, hooks-block cleanup, and Codex/Claude cross-installation independence. - test/agent-monitor-wiring-static.test.ts: 27/27 pass after retargeting three moved-string assertions to the new core file. - Full desktop test suite: 1825/1827 pass. The 2 remaining failures (python3 version-string parsing, telemetry healthcheck dedupe) are pre-existing flakes in unrelated files I did not touch. - Independent code-reviewer agent surfaced a Medium correctness finding (failed opt-out silently lost) and a Low note (function exported without IPC). Both addressed in this commit; the IPC wiring is an intentional follow-up. Risks: - Codex hooks are gated behind a new opt-in flag that has no UI yet, so default behavior is unchanged for all existing users. Risk surface is limited to anyone who flips the flag via dev console / tests until the UI ships. - The build-script patch depends on patchHooksTranscriptOutsideTx (FEA- 1363) running first; ordering enforced in materializeAgentMonitor and the patch is itself idempotent. A future upstream bump that drops the processEventCore anchor will fail the build hard rather than silently dropping the harness stamp. - The agent-monitor-hooks.ts → agent-monitor-hooks-core.ts split is a refactor of stable code; semantic equivalence verified by the existing wiring-static test plus the new core test.
Layer 2 — Event-level dedup against rollout-tail:
- New filterEventsAlreadyCapturedByHooks in codex-import.js. Runs before
importSession in importCodexSession; queries existing events for rows
matching (session_id, event_type, COALESCE(tool_name, ''), created_at
truncated to whole seconds) and drops duplicates. Hook-handler-inserted
rows land first (real-time); the ~5s-later rollout-tail no-ops on match.
- The dedup statement is cached per-dbModule via WeakMap so a batch of
many changed sessions compiles the SQL once, not once per session.
- Best-effort: any prepare/query failure is swallowed and the event is
kept. False negatives (cosmetic duplicates) are preferred over false
positives (silently dropped events). Sub-second timestamp drift handled
by the substr(?, 1, 19) match.
Layer 3 — Renderer opt-in toggle:
- apps/desktop/src/main/app.ts: new IPC handlers
desktop:{get,set}-agent-monitor-codex-hooks-opt-in that call the
already-exported isAgentMonitorCodexHooksOptIn /
setAgentMonitorCodexHooksOptIn. Gated on the master Agent Dashboard
flag, matching the Claude pair shape exactly.
- apps/desktop/src/main/preload.ts: mirror bridge methods exposing the
two channels to the renderer.
- apps/desktop/src/renderer/index.html: new #codexDashConsent toggle row
directly below the existing #claudeDashConsent. Same visibility
semantics (shown only on agent-settings route), same disabled-when-
master-off behavior, same hint text reset on master flag re-enable
(Codex hint reset now mirrors Claude's after a review-flagged gap).
refreshCodexHooksToggle is called when the user toggles the Claude
hooks toggle so the sibling's on/off badge stays in sync.
Test:
- apps/desktop/test/agent-session-event-dedup.test.ts (new): 4 unit
tests on the filter against a sandbox SQLite DB. Covers (1) duplicate
filter with sub-second drift, (2) non-duplicate survival across
mixed events, (3) best-effort behavior on DB errors, (4) empty-events
no-op. Uses the conditional-skip pattern (skipReason ? { skip: ... }
: undefined) because Node test runner treats skip: null as "skip with
no reason" rather than "don't skip" — gotcha learned during debugging.
Independent code review (second pass on the cumulative diff) surfaced:
- Medium: Codex hint stale after master flag re-enable (Claude hint was
reset in the else branch, Codex hint wasn't). Fixed.
- Low: Comment inaccuracy about which toggle "unblocks the master flag"
in wireClaudeHooksToggle. Rewritten.
- Low: Dedup statement prepared once per session inside the batch loop
instead of once per process. Cached via WeakMap.
All three addressed in this commit.
Testing:
- pnpm -C apps/desktop typecheck: clean
- pnpm -C apps/desktop lint: clean
- pnpm -C apps/desktop build:agent-monitor: clean (Layer 1 hard-gates
+ sidecar SQLite gate continue to pass)
- pnpm -C apps/desktop test (full suite): 1945/1945 pass, 0 fail
- New agent-session-event-dedup.test.ts: 4/4 pass
- agent-monitor-hooks-core.test.ts (from Layer 1): 10/10 pass (no
regression)
Risks:
- Cross-second timestamp drift between hook and rollout-tail can defeat
dedup. Documented design tradeoff: false negatives are cosmetic
duplicates; false positives would silently drop events. Will revisit
with a content-hash dedup key in a follow-up if cosmetic duplication
becomes visible in practice.
- IPC handler test coverage deferred. Both new handlers are shallow
plumbing that call already-tested functions (covered by Layer 1's
agent-monitor-hooks-core.test.ts). Typecheck validates the signatures
+ preload bridge types.
- Renderer JS duplicates the Claude/Codex toggle pattern. Per CLAUDE.md
this is the kind of duplication worth extracting, but inline renderer
JS makes a shared helper awkward; deferred to a separate cleanup
ticket if the pattern repeats with a third harness.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8db7ef93b9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| stmt = dbModule.db.prepare( | ||
| "SELECT 1 FROM events " + | ||
| "WHERE session_id = ? AND event_type = ? " + | ||
| "AND COALESCE(tool_name, '') = COALESCE(?, '') " + | ||
| "AND substr(COALESCE(created_at, ''), 1, 19) = substr(COALESCE(?, ''), 1, 19) " + | ||
| "LIMIT 1", |
There was a problem hiding this comment.
Preserve rollout events when the existing row is not a hook duplicate
When Codex hooks are disabled, skipped, or miss one event, this query still treats any existing events row as a hook duplicate because it has no hook-source discriminator. In a live rollout file that is re-imported after a previous poll, a newly appended event with the same session id, event type, tool name, and second as an earlier rollout-imported event is filtered out before importSession() sees it, so rapid repeated tool calls can disappear from the timeline instead of being backfilled by the watcher.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in the latest commit on this branch. Gated the filter on whether ~/.codex/hooks.json actually contains our codex-hook-handler.js. When it doesn't, there cannot be any hook-sourced rows in the events table to dedup against — the filter is now a no-op in that case. This eliminates the false-positive vector for the ~99% of users who haven't opted into Codex hooks.
New test "dedup: gate-off when Codex hooks are not installed — filter is a no-op" seeds a row that the unguarded filter would have treated as a hook duplicate, points CODEX_HOME at a directory with no hooks.json, and asserts the incoming rollout-tail event survives.
Known v1 limitation (your cases (b) and (c) — hooks installed but a specific event missed or skipped): rapid same-second same-tool calls in that scenario can still collapse. The proper fix is a source column on the events table (so hook-inserted rows are explicitly discriminable from rollout-tail rows); that's invasive enough to merit its own FEA and will be filed as a follow-up.
| hint.textContent = optedIn | ||
| ? "— on (Codex hooks installed in ~/.codex/hooks.json; also set [features].codex_hooks = true in ~/.codex/config.toml)" | ||
| : "— off (rollout-tail watcher continues)"; |
There was a problem hiding this comment.
Tell users to trust the installed Codex hook
For current Codex, installing ~/.codex/hooks.json is not sufficient for this non-managed command hook to run: the official hook docs say non-managed command hooks are skipped until reviewed/trusted via /hooks (and hooks is the canonical feature key, with codex_hooks only a deprecated alias). With this hint, a user can turn the toggle on and follow the displayed config advice but still get no real-time hook ingestion, silently falling back to rollout-tail.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed. Updated both the tooltip and the on/off hint text in index.html:
- Feature key changed from
[features].codex_hooks = trueto the canonical[features].hooks = true. - The on-state hint now spells out the second requirement: "finish Codex setup:
[features].hooks=truein~/.codex/config.toml+ run/hooksin Codex to trust the command". - The tooltip on the toggle itself mirrors the same two requirements.
Without these steps spelled out, a user enabling the toggle could legitimately think they were done and silently fall back to the rollout-tail watcher instead of getting real-time hook ingestion.
thadeusb
left a comment
There was a problem hiding this comment.
I have enough to write the review. Let me confirm my two findings are solid:
Codex P2 (dedup false-positive on rapid same-second events): - The codex-import.js filter had no hook-source discriminator: any existing events row matching (session_id, event_type, tool_name, created_at-truncated-to-second) was treated as a hook duplicate. For users who haven't opted into Codex hooks (~99% of installs), the row always came from a prior rollout-tail import, so the filter was silently dropping legitimate distinct events on re-imports of a changed rollout file. - Gate the filter on whether ~/.codex/hooks.json actually contains our codex-hook-handler.js. When it doesn't, there can't be any hook-sourced rows to dedup against, so the filter is a no-op. The detection result is cached at module level and exposed via resetCodexHooksInstalledCache for test setup. - Known v1 limitation: when hooks ARE installed but miss/skip a specific event (the reviewer's case (b)/(c)), rapid same-second same-tool calls in that session can still collapse. Tracked as a follow-up FEA for a real `source` column on the events table. Codex P2 (Codex hook config hint accuracy): - Codex's canonical feature flag is `[features].hooks`, not `[features].codex_hooks` (which is a deprecated alias). - For non-managed command hooks, Codex requires the user to trust the installed command via `/hooks` before they fire — without that, the install in ~/.codex/hooks.json is silently skipped and the user sees no real-time ingestion. - Hint text + tooltip updated to spell out both requirements so a user flipping the toggle doesn't end up in a silent-fallback state. New test: - agent-session-event-dedup.test.ts: "gate-off when Codex hooks are not installed — filter is a no-op". Seeds a row that would normally be treated as a hook duplicate, points CODEX_HOME at a directory with no hooks.json, and asserts the incoming rollout-tail event survives. Bump desktop version 0.15.100 → 0.15.102 (0.15.101 reserved for FEA-1461 PR #258). Testing: - pnpm -C apps/desktop typecheck: clean - pnpm -C apps/desktop lint: clean - pnpm -C apps/desktop build:agent-monitor: clean - agent-session-event-dedup.test.ts: 5/5 pass (was 4 before; new gate-off coverage) - Full suite: 1842 + 104 = 1946/1946 pass, 0 fail Risks: - Install detection cached at module load — a user opting into Codex hooks AFTER the sidecar started won't trigger dedup until the next app boot. Acceptable for v1 (the sidecar restarts on toggle change via agent-monitor-sidecar.ts anyway, so the cache is refreshed). - The install probe reads ~/.codex/hooks.json with a substring match for "codex-hook-handler.js". A user who manually edits hooks.json to a different filename containing that substring would trip the gate as a false positive. Not a realistic risk; substring match is intentionally lenient to handle both userData-relative and resources-relative paths.
origin/main moved to 0.15.101 while this branch was open; the version-bump-check on the previous push tripped because the branch was at the same 0.15.101. Bumping to 0.15.103 — 0.15.102 is held by FEA-1444 PR #259.
|
@shafty023 — passing this over to you for review and merge. Some extra context, since you're the original inspiration for the feature: Background. PR #259 lands Codex hook ingestion into the Agent Monitor sidecar — the idea we borrowed from your Why you're the right reviewer. You wrote the upstream Codex hook code we adapted from, and you're the top contributor to closedloop-electron — so you have ground truth on both ends of this. Codex flagged two P2s on the latest commit; both are addressed in What's in the latest push.
Handoff terms. Please review, fix anything you want directly on the branch, and merge yourself when you're satisfied. Don't pass back to me — I'm explicitly handing you merge authority on this one. If you see something that affects the broader telemetry direction (e.g., we should be doing the dedup the way your workflow repo does it), feel free to restructure; you have better context on that than I do. Slack me if you hit something that needs a product decision. CI for the latest push is still propagating; should be green by the time you look. Thanks for taking it. |
Summary
End-to-end Codex hook ingestion for the Agent Monitor sidecar — three layers in one PR so the feature is user-visible on merge:
87a601c): Codex hook installer + payload handler + harness stamp. Mirrors the existing Claude hook installer, writes to~/.codex/hooks.json, ships a smallcodex-hook-handler.jswrapper that injects__provider: "codex"so the sidecar can stampharness='codex'on the session row.8db7ef9): Event-level dedup against the existingcodex-watcher.jsrollout-tail importer. Hook events land first (real-time); rollout-tail's ~5s catchup no-ops on the duplicate. Without this, opting into hooks would create 2 rows per event.8db7ef9): Renderer opt-in toggle directly below the existing Claude hooks toggle in the Agent Dashboard view + IPC handlers + preload bridge. User can flip it without editing JSON.See FEA-1444 and PLN-767 v2.
What ships to users
SessionStart(vs ~5s typical from rollout-tail) and event timelines update in real time.Test plan
pnpm -C apps/desktop typecheck— cleanpnpm -C apps/desktop lint— cleanpnpm -C apps/desktop build:agent-monitor— clean (Layer 1 hard-gates + SQLite gate pass)test/agent-session-event-dedup.test.ts— 4/4 pass (sub-second drift, mixed events, error best-effort, empty no-op)test/agent-monitor-hooks-core.test.ts(Layer 1) — 10/10 pass~/.codex/hooks.jsonpopulated, (2) session appears in dashboard withharness=codexinstantly, (3) no duplicate event rows after rollout-tail catches upRisks
[features].codex_hooks = truein~/.codex/config.toml. The toggle's help text spells this out. Programmatic toggle of that Codex flag is intentionally out of scope (too version-sensitive).agent-monitor-hooks.tswas split into electron-freeagent-monitor-hooks-core.ts+ electron-aware shell during Layer 1. This is a refactor of stable code; behavior verified byagent-monitor-wiring-static.test.ts(3 string-match assertions retargeted to the core file).Out of scope (documented for follow-ups)
agent_spawned,review_lane_failed, etc.). His Codex/goalorchestrator's state, not generic.codex-watcher.jsfallback. Stays for users on Codex versions without the experimental hook system.[features].codex_hooks = truetoggle in Codex's config.toml.🤖 Generated with Claude Code