Skip to content

FEA-1444: Codex hook ingestion + dedup + opt-in toggle#259

Open
aeyeCEO wants to merge 3 commits into
mainfrom
FEA-1444-codex-hooks
Open

FEA-1444: Codex hook ingestion + dedup + opt-in toggle#259
aeyeCEO wants to merge 3 commits into
mainfrom
FEA-1444-codex-hooks

Conversation

@aeyeCEO
Copy link
Copy Markdown
Contributor

@aeyeCEO aeyeCEO commented May 29, 2026

Summary

End-to-end Codex hook ingestion for the Agent Monitor sidecar — three layers in one PR so the feature is user-visible on merge:

  • Layer 1 (commit 87a601c): Codex hook installer + payload handler + harness stamp. Mirrors the existing Claude hook installer, writes to ~/.codex/hooks.json, ships a small codex-hook-handler.js wrapper that injects __provider: "codex" so the sidecar can stamp harness='codex' on the session row.
  • Layer 2 (commit 8db7ef9): Event-level dedup against the existing codex-watcher.js rollout-tail importer. Hook events land first (real-time); rollout-tail's ~5s catchup no-ops on the duplicate. Without this, opting into hooks would create 2 rows per event.
  • Layer 3 (commit 8db7ef9): Renderer opt-in toggle directly below the existing Claude hooks toggle in the Agent Dashboard view + IPC handlers + preload bridge. User can flip it without editing JSON.

See FEA-1444 and PLN-767 v2.

What ships to users

  • New "Enable Codex CLI session tracking (experimental)" toggle in the Agent Dashboard consent banner, next to the Claude toggle.
  • When enabled, Codex sessions appear in the dashboard within 1s of SessionStart (vs ~5s typical from rollout-tail) and event timelines update in real time.
  • When disabled, behavior is identical to before this PR — rollout-tail watcher continues to import Codex sessions on its 5s tick.
  • Dedup prevents duplicate event rows when both hook stream and rollout-tail are active.

Test plan

  • pnpm -C apps/desktop typecheck — clean
  • pnpm -C apps/desktop lint — clean
  • pnpm -C apps/desktop build:agent-monitor — clean (Layer 1 hard-gates + SQLite gate pass)
  • Full suite: 1945/1945 pass, 0 fail
  • New test/agent-session-event-dedup.test.ts — 4/4 pass (sub-second drift, mixed events, error best-effort, empty no-op)
  • New test/agent-monitor-hooks-core.test.ts (Layer 1) — 10/10 pass
  • Two independent code-review passes (one per layer round) — all findings addressed in-PR
  • Manual smoke (recommended before merge): flip toggle on, run a Codex session, confirm: (1) ~/.codex/hooks.json populated, (2) session appears in dashboard with harness=codex instantly, (3) no duplicate event rows after rollout-tail catches up

Risks

  • Cross-second timestamp drift between hook and rollout-tail can defeat dedup → cosmetic duplicates. False-negative-preferred-over-false-positive tradeoff. Content-hash dedup key is a follow-up if drift duplicates show up in practice.
  • Codex hook feature flag is server-side. Hooks only fire if user sets [features].codex_hooks = true in ~/.codex/config.toml. The toggle's help text spells this out. Programmatic toggle of that Codex flag is intentionally out of scope (too version-sensitive).
  • agent-monitor-hooks.ts was split into electron-free agent-monitor-hooks-core.ts + electron-aware shell during Layer 1. This is a refactor of stable code; behavior verified by agent-monitor-wiring-static.test.ts (3 string-match assertions retargeted to the core file).
  • Version bump 0.15.99 → 0.15.100 (origin/main moved to 0.15.99 from PLN-750 + FEA-1437 while this branch was in development; rebased onto current main and bumped to 0.15.100 to leave room for FEA-1461 PR FEA-1461: Dead-letter agent-session sync on persistent rate_limited #258 at 0.15.99).

Out of scope (documented for follow-ups)

  • Daniel Levesque's workflow event taxonomy (agent_spawned, review_lane_failed, etc.). His Codex /goal orchestrator's state, not generic.
  • Loop orchestrator workflow events as first-class session events.
  • Removing the codex-watcher.js fallback. Stays for users on Codex versions without the experimental hook system.
  • Programmatic [features].codex_hooks = true toggle in Codex's config.toml.
  • Live-source badge on session cards distinguishing hook-sourced vs rollout-tail-sourced sessions.

🤖 Generated with Claude Code

Andrew Eye added 2 commits May 29, 2026 15:07
- Add codex-hook-handler.js wrapper that mirrors upstream hook-handler.js
  semantics but injects __provider: "codex" so the sidecar can stamp the
  session row with harness='codex'. Zero-dep plain JS, fail-silent.
- Split agent-monitor-hooks.ts: pure install/uninstall logic moved to new
  agent-monitor-hooks-core.ts (electron-free, testable under tsx --test).
  Shell retains electron-store + path resolution + gatewayLog.
- Add installCodexHooks / uninstallCodexHooks targeting $CODEX_HOME/hooks.json
  with the narrower Codex hook event set (SessionStart, UserPromptSubmit,
  PreToolUse, PostToolUse, Stop). SessionStart uses the "startup|resume"
  matcher per Codex's reference contract.
- Filename-boundary predicate (isClaudeEntry vs isCodexEntry) prevents
  cross-matching between the two handler families.
- New codexOptIn electron-store flag (default false) gates Codex hook
  installation. Behavior change: the master hooks toggle is unchanged for
  existing users; Codex hooks only install when both flags are true.
  Deferred to a follow-up: event-level dedup against the existing
  codex-watcher rollout-tail path, and the renderer UI to flip codexOptIn.
- syncAgentMonitorHooksOnBoot reconciles Codex state with persisted intent
  (installs when opted in, uninstalls stale entries when opted out) so a
  failed file-layer mutation is eventually consistent on the next launch.
- setAgentMonitorCodexHooksOptIn persists intent before applying side
  effects so a failure (e.g., locked hooks.json) cannot silently lose the
  user's opt-out.
- build-agent-monitor.mjs gains a surgical patch to the generated
  server/routes/hooks.js that calls stmts.setSessionHarness when the
  inbound payload carries __provider: "codex". Patch is hash-anchored and
  hard-gated; future upstream drift fails the build loudly.
- Bump desktop version to 0.15.98 per repo CI rule.

Testing:
- pnpm -C apps/desktop typecheck: clean
- pnpm -C apps/desktop lint: clean
- pnpm -C apps/desktop build:agent-monitor: clean (patch + hard-gates pass)
- New test/agent-monitor-hooks-core.test.ts: 10/10 pass — covers
  predicate identity, matcher rules per harness, idempotent install,
  in-place self-heal of moved handler paths, narrow-scope uninstall,
  foreign-entry preservation, hooks-block cleanup, and Codex/Claude
  cross-installation independence.
- test/agent-monitor-wiring-static.test.ts: 27/27 pass after retargeting
  three moved-string assertions to the new core file.
- Full desktop test suite: 1825/1827 pass. The 2 remaining failures
  (python3 version-string parsing, telemetry healthcheck dedupe) are
  pre-existing flakes in unrelated files I did not touch.
- Independent code-reviewer agent surfaced a Medium correctness finding
  (failed opt-out silently lost) and a Low note (function exported without
  IPC). Both addressed in this commit; the IPC wiring is an intentional
  follow-up.

Risks:
- Codex hooks are gated behind a new opt-in flag that has no UI yet, so
  default behavior is unchanged for all existing users. Risk surface is
  limited to anyone who flips the flag via dev console / tests until the
  UI ships.
- The build-script patch depends on patchHooksTranscriptOutsideTx (FEA-
  1363) running first; ordering enforced in materializeAgentMonitor and
  the patch is itself idempotent. A future upstream bump that drops the
  processEventCore anchor will fail the build hard rather than silently
  dropping the harness stamp.
- The agent-monitor-hooks.ts → agent-monitor-hooks-core.ts split is a
  refactor of stable code; semantic equivalence verified by the existing
  wiring-static test plus the new core test.
Layer 2 — Event-level dedup against rollout-tail:
- New filterEventsAlreadyCapturedByHooks in codex-import.js. Runs before
  importSession in importCodexSession; queries existing events for rows
  matching (session_id, event_type, COALESCE(tool_name, ''), created_at
  truncated to whole seconds) and drops duplicates. Hook-handler-inserted
  rows land first (real-time); the ~5s-later rollout-tail no-ops on match.
- The dedup statement is cached per-dbModule via WeakMap so a batch of
  many changed sessions compiles the SQL once, not once per session.
- Best-effort: any prepare/query failure is swallowed and the event is
  kept. False negatives (cosmetic duplicates) are preferred over false
  positives (silently dropped events). Sub-second timestamp drift handled
  by the substr(?, 1, 19) match.

Layer 3 — Renderer opt-in toggle:
- apps/desktop/src/main/app.ts: new IPC handlers
  desktop:{get,set}-agent-monitor-codex-hooks-opt-in that call the
  already-exported isAgentMonitorCodexHooksOptIn /
  setAgentMonitorCodexHooksOptIn. Gated on the master Agent Dashboard
  flag, matching the Claude pair shape exactly.
- apps/desktop/src/main/preload.ts: mirror bridge methods exposing the
  two channels to the renderer.
- apps/desktop/src/renderer/index.html: new #codexDashConsent toggle row
  directly below the existing #claudeDashConsent. Same visibility
  semantics (shown only on agent-settings route), same disabled-when-
  master-off behavior, same hint text reset on master flag re-enable
  (Codex hint reset now mirrors Claude's after a review-flagged gap).
  refreshCodexHooksToggle is called when the user toggles the Claude
  hooks toggle so the sibling's on/off badge stays in sync.

Test:
- apps/desktop/test/agent-session-event-dedup.test.ts (new): 4 unit
  tests on the filter against a sandbox SQLite DB. Covers (1) duplicate
  filter with sub-second drift, (2) non-duplicate survival across
  mixed events, (3) best-effort behavior on DB errors, (4) empty-events
  no-op. Uses the conditional-skip pattern (skipReason ? { skip: ... }
  : undefined) because Node test runner treats skip: null as "skip with
  no reason" rather than "don't skip" — gotcha learned during debugging.

Independent code review (second pass on the cumulative diff) surfaced:
- Medium: Codex hint stale after master flag re-enable (Claude hint was
  reset in the else branch, Codex hint wasn't). Fixed.
- Low: Comment inaccuracy about which toggle "unblocks the master flag"
  in wireClaudeHooksToggle. Rewritten.
- Low: Dedup statement prepared once per session inside the batch loop
  instead of once per process. Cached via WeakMap.
All three addressed in this commit.

Testing:
- pnpm -C apps/desktop typecheck: clean
- pnpm -C apps/desktop lint: clean
- pnpm -C apps/desktop build:agent-monitor: clean (Layer 1 hard-gates
  + sidecar SQLite gate continue to pass)
- pnpm -C apps/desktop test (full suite): 1945/1945 pass, 0 fail
- New agent-session-event-dedup.test.ts: 4/4 pass
- agent-monitor-hooks-core.test.ts (from Layer 1): 10/10 pass (no
  regression)

Risks:
- Cross-second timestamp drift between hook and rollout-tail can defeat
  dedup. Documented design tradeoff: false negatives are cosmetic
  duplicates; false positives would silently drop events. Will revisit
  with a content-hash dedup key in a follow-up if cosmetic duplication
  becomes visible in practice.
- IPC handler test coverage deferred. Both new handlers are shallow
  plumbing that call already-tested functions (covered by Layer 1's
  agent-monitor-hooks-core.test.ts). Typecheck validates the signatures
  + preload bridge types.
- Renderer JS duplicates the Claude/Codex toggle pattern. Per CLAUDE.md
  this is the kind of duplication worth extracting, but inline renderer
  JS makes a shared helper awkward; deferred to a separate cleanup
  ticket if the pattern repeats with a third harness.
@aeyeCEO aeyeCEO requested a review from a team May 29, 2026 20:29
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8db7ef93b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +77 to +82
stmt = dbModule.db.prepare(
"SELECT 1 FROM events " +
"WHERE session_id = ? AND event_type = ? " +
"AND COALESCE(tool_name, '') = COALESCE(?, '') " +
"AND substr(COALESCE(created_at, ''), 1, 19) = substr(COALESCE(?, ''), 1, 19) " +
"LIMIT 1",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve rollout events when the existing row is not a hook duplicate

When Codex hooks are disabled, skipped, or miss one event, this query still treats any existing events row as a hook duplicate because it has no hook-source discriminator. In a live rollout file that is re-imported after a previous poll, a newly appended event with the same session id, event type, tool name, and second as an earlier rollout-imported event is filtered out before importSession() sees it, so rapid repeated tool calls can disappear from the timeline instead of being backfilled by the watcher.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the latest commit on this branch. Gated the filter on whether ~/.codex/hooks.json actually contains our codex-hook-handler.js. When it doesn't, there cannot be any hook-sourced rows in the events table to dedup against — the filter is now a no-op in that case. This eliminates the false-positive vector for the ~99% of users who haven't opted into Codex hooks.

New test "dedup: gate-off when Codex hooks are not installed — filter is a no-op" seeds a row that the unguarded filter would have treated as a hook duplicate, points CODEX_HOME at a directory with no hooks.json, and asserts the incoming rollout-tail event survives.

Known v1 limitation (your cases (b) and (c) — hooks installed but a specific event missed or skipped): rapid same-second same-tool calls in that scenario can still collapse. The proper fix is a source column on the events table (so hook-inserted rows are explicitly discriminable from rollout-tail rows); that's invasive enough to merit its own FEA and will be filed as a follow-up.

Comment on lines +4168 to +4170
hint.textContent = optedIn
? "— on (Codex hooks installed in ~/.codex/hooks.json; also set [features].codex_hooks = true in ~/.codex/config.toml)"
: "— off (rollout-tail watcher continues)";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Tell users to trust the installed Codex hook

For current Codex, installing ~/.codex/hooks.json is not sufficient for this non-managed command hook to run: the official hook docs say non-managed command hooks are skipped until reviewed/trusted via /hooks (and hooks is the canonical feature key, with codex_hooks only a deprecated alias). With this hint, a user can turn the toggle on and follow the displayed config advice but still get no real-time hook ingestion, silently falling back to rollout-tail.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Updated both the tooltip and the on/off hint text in index.html:

  • Feature key changed from [features].codex_hooks = true to the canonical [features].hooks = true.
  • The on-state hint now spells out the second requirement: "finish Codex setup: [features].hooks=true in ~/.codex/config.toml + run /hooks in Codex to trust the command".
  • The tooltip on the toggle itself mirrors the same two requirements.

Without these steps spelled out, a user enabling the toggle could legitimately think they were done and silently fall back to the rollout-tail watcher instead of getting real-time hook ingestion.

Copy link
Copy Markdown
Contributor

@thadeusb thadeusb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have enough to write the review. Let me confirm my two findings are solid:

Codex P2 (dedup false-positive on rapid same-second events):
- The codex-import.js filter had no hook-source discriminator: any
  existing events row matching (session_id, event_type, tool_name,
  created_at-truncated-to-second) was treated as a hook duplicate. For
  users who haven't opted into Codex hooks (~99% of installs), the row
  always came from a prior rollout-tail import, so the filter was
  silently dropping legitimate distinct events on re-imports of a
  changed rollout file.
- Gate the filter on whether ~/.codex/hooks.json actually contains our
  codex-hook-handler.js. When it doesn't, there can't be any
  hook-sourced rows to dedup against, so the filter is a no-op. The
  detection result is cached at module level and exposed via
  resetCodexHooksInstalledCache for test setup.
- Known v1 limitation: when hooks ARE installed but miss/skip a
  specific event (the reviewer's case (b)/(c)), rapid same-second
  same-tool calls in that session can still collapse. Tracked as a
  follow-up FEA for a real `source` column on the events table.

Codex P2 (Codex hook config hint accuracy):
- Codex's canonical feature flag is `[features].hooks`, not
  `[features].codex_hooks` (which is a deprecated alias).
- For non-managed command hooks, Codex requires the user to trust the
  installed command via `/hooks` before they fire — without that, the
  install in ~/.codex/hooks.json is silently skipped and the user
  sees no real-time ingestion.
- Hint text + tooltip updated to spell out both requirements so a user
  flipping the toggle doesn't end up in a silent-fallback state.

New test:
- agent-session-event-dedup.test.ts: "gate-off when Codex hooks are not
  installed — filter is a no-op". Seeds a row that would normally be
  treated as a hook duplicate, points CODEX_HOME at a directory with no
  hooks.json, and asserts the incoming rollout-tail event survives.

Bump desktop version 0.15.100 → 0.15.102 (0.15.101 reserved for
FEA-1461 PR #258).

Testing:
- pnpm -C apps/desktop typecheck: clean
- pnpm -C apps/desktop lint: clean
- pnpm -C apps/desktop build:agent-monitor: clean
- agent-session-event-dedup.test.ts: 5/5 pass (was 4 before; new
  gate-off coverage)
- Full suite: 1842 + 104 = 1946/1946 pass, 0 fail

Risks:
- Install detection cached at module load — a user opting into Codex
  hooks AFTER the sidecar started won't trigger dedup until the next
  app boot. Acceptable for v1 (the sidecar restarts on toggle change
  via agent-monitor-sidecar.ts anyway, so the cache is refreshed).
- The install probe reads ~/.codex/hooks.json with a substring match
  for "codex-hook-handler.js". A user who manually edits hooks.json
  to a different filename containing that substring would trip the
  gate as a false positive. Not a realistic risk; substring match is
  intentionally lenient to handle both userData-relative and
  resources-relative paths.
aeyeCEO pushed a commit that referenced this pull request May 30, 2026
origin/main moved to 0.15.101 while this branch was open; the
version-bump-check on the previous push tripped because the branch
was at the same 0.15.101. Bumping to 0.15.103 — 0.15.102 is held by
FEA-1444 PR #259.
@aeyeCEO aeyeCEO requested a review from shafty023 May 30, 2026 19:00
@aeyeCEO
Copy link
Copy Markdown
Contributor Author

aeyeCEO commented May 30, 2026

@shafty023 — passing this over to you for review and merge. Some extra context, since you're the original inspiration for the feature:

Background. PR #259 lands Codex hook ingestion into the Agent Monitor sidecar — the idea we borrowed from your closedloop-ai/workflow repo (telemetry/codex-hook-writer.js + telemetry/identifier-events.ts). Three layers in one PR so the feature is end-to-end user-visible on merge: hook installer + payload handler + harness stamp (Layer 1), event-level dedup against the existing rollout-tail importer (Layer 2), and a renderer opt-in toggle alongside the Claude hooks toggle (Layer 3).

Why you're the right reviewer. You wrote the upstream Codex hook code we adapted from, and you're the top contributor to closedloop-electron — so you have ground truth on both ends of this. Codex flagged two P2s on the latest commit; both are addressed in fabf164.

What's in the latest push.

  • P2 (Codex): dedup had no hook-source discriminator — gated the filter on actual install detection (~/.codex/hooks.json containing our handler). Documented v1 limitation for hooks-installed-but-missed events; tracked for a follow-up source column FEA.
  • P2 (Codex): hint text said [features].codex_hooks and missed the /hooks trust step — updated to canonical [features].hooks and spelled out the /hooks slash command requirement on both the tooltip and the on-state hint.

Handoff terms. Please review, fix anything you want directly on the branch, and merge yourself when you're satisfied. Don't pass back to me — I'm explicitly handing you merge authority on this one. If you see something that affects the broader telemetry direction (e.g., we should be doing the dedup the way your workflow repo does it), feel free to restructure; you have better context on that than I do. Slack me if you hit something that needs a product decision.

CI for the latest push is still propagating; should be green by the time you look. Thanks for taking it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants