Issue 9: Telemetry + experiment instrumentation

# Issue 9: Telemetry + experiment instrumentation

**Epic:** Task-queue orchestrator runner · **Depends on:** #624 (emits during
execution), finalize after #626 and #627

## Why

The dark launch is judged primarily on responsiveness: a short time to first
visible progress, and steady incremental progress. Per-task latency and model are
the headline metrics, and tokens are a secondary cost to watch. We need clean
per-task and per-run telemetry, segmentable from the linear baseline by `VARIANT`.

## Scope / deliverable

1. **Variant tagging.** `VARIANT: 'orchestrator'` flows into existing events via
   headers (`buildAgentEnv`, `agent-interface.ts:371`) once #621 lands. Also
   `analytics.setTag('variant', 'orchestrator')` so `sessionProperties`
   (`analytics.ts:16`) and the `setup wizard finished` shutdown event (`:166`)
   carry it. The existing per-task `agent started` and `agent completed` events
   become the A/B spine.
2. **Orchestrator events** via `analytics.wizardCapture`:
   - `orchestrator seeded { task_count, types }`
   - `orchestrator task started|completed|failed { type, model, attempts,
     duration_ms }`
   - `orchestrator task enqueued { type, enqueued_by, depth, dynamic }`
   - `orchestrator guard tripped { guard, type }`, emitted from #623's guards
   - `orchestrator run finished { tasks_total, tasks_done, tasks_failed,
     total_duration_ms }`. A `resumed` flag is added by #629.
3. **Responsiveness metrics, the headline.** Capture:
   - time to first task started, from launch to the first task reaching
     `in_progress`, the key "no long silent gap" signal,
   - per-task `duration_ms`, and the gap between consecutive task starts, so no
     single step dominates wall-clock,
   - per-task `model`, to confirm cheap models carry the cheap work.
   Keep each task's `agent completed` and `duration_ms` (`agent-interface.ts:862`).
4. **Per-task token capture, secondary.** Surface token usage per task. Summing
   across tasks versus the linear baseline's single `agent completed` is a cost to
   watch, not the pass/fail metric.
5. **Run-end remark once.** Wire the Stop-hook `requestRemark` flag from #624 so the
   remark (`WIZARD_REMARK_EVENT_NAME`, `:859`) fires once at run end, not per task.

## Key files

- `src/lib/programs/orchestrator/orchestrator-runner.ts` (emit run and task events)
- `src/lib/programs/orchestrator/queue-tools.ts` (emit enqueue and guard events)
- `src/lib/agent/agent-interface.ts` (remark gating, per-task token capture)
- `src/utils/analytics.ts` (`setTag('variant', ...)`)

## Acceptance criteria

- [ ] A run emits `orchestrator seeded`, then per-task `started` and `completed`,
      then `orchestrator run finished`, all carrying `VARIANT=orchestrator`.
- [ ] Baseline and orchestrator runs are distinguishable in PostHog by `VARIANT`.
- [ ] Time to first task started, per-task `duration_ms`, gap between starts, and
      per-task `model` are captured, the responsiveness headline.
- [ ] Per-task token usage is captured as a secondary cost to watch.
- [ ] The remark fires once per run.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 9: Telemetry + experiment instrumentation #628

Issue 9: Telemetry + experiment instrumentation

Why

Scope / deliverable

Key files

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue 9: Telemetry + experiment instrumentation #628

Description

Issue 9: Telemetry + experiment instrumentation

Why

Scope / deliverable

Key files

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions