You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Epic: Task-queue orchestrator runner · Depends on:#624 (emits during
execution), finalize after #626 and #627
Why
The dark launch is judged primarily on responsiveness: a short time to first
visible progress, and steady incremental progress. Per-task latency and model are
the headline metrics, and tokens are a secondary cost to watch. We need clean
per-task and per-run telemetry, segmentable from the linear baseline by VARIANT.
Scope / deliverable
Variant tagging.VARIANT: 'orchestrator' flows into existing events via
headers (buildAgentEnv, agent-interface.ts:371) once Issue 1: Shared bootstrap extraction + variant gating #621 lands. Also analytics.setTag('variant', 'orchestrator') so sessionProperties
(analytics.ts:16) and the setup wizard finished shutdown event (:166)
carry it. The existing per-task agent started and agent completed events
become the A/B spine.
time to first task started, from launch to the first task reaching in_progress, the key "no long silent gap" signal,
per-task duration_ms, and the gap between consecutive task starts, so no
single step dominates wall-clock,
per-task model, to confirm cheap models carry the cheap work.
Keep each task's agent completed and duration_ms (agent-interface.ts:862).
Per-task token capture, secondary. Surface token usage per task. Summing
across tasks versus the linear baseline's single agent completed is a cost to
watch, not the pass/fail metric.
Issue 9: Telemetry + experiment instrumentation
Epic: Task-queue orchestrator runner · Depends on: #624 (emits during
execution), finalize after #626 and #627
Why
The dark launch is judged primarily on responsiveness: a short time to first
visible progress, and steady incremental progress. Per-task latency and model are
the headline metrics, and tokens are a secondary cost to watch. We need clean
per-task and per-run telemetry, segmentable from the linear baseline by
VARIANT.Scope / deliverable
VARIANT: 'orchestrator'flows into existing events viaheaders (
buildAgentEnv,agent-interface.ts:371) once Issue 1: Shared bootstrap extraction + variant gating #621 lands. Alsoanalytics.setTag('variant', 'orchestrator')sosessionProperties(
analytics.ts:16) and thesetup wizard finishedshutdown event (:166)carry it. The existing per-task
agent startedandagent completedeventsbecome the A/B spine.
analytics.wizardCapture:orchestrator seeded { task_count, types }orchestrator task started|completed|failed { type, model, attempts, duration_ms }orchestrator task enqueued { type, enqueued_by, depth, dynamic }orchestrator guard tripped { guard, type }, emitted from Issue 4: Orchestrator MCP tools (inwizard-tools) #623's guardsorchestrator run finished { tasks_total, tasks_done, tasks_failed, total_duration_ms }. Aresumedflag is added by Issue 10: Resume across runs/crashes (deferred, low priority) #629.in_progress, the key "no long silent gap" signal,duration_ms, and the gap between consecutive task starts, so nosingle step dominates wall-clock,
model, to confirm cheap models carry the cheap work.Keep each task's
agent completedandduration_ms(agent-interface.ts:862).across tasks versus the linear baseline's single
agent completedis a cost towatch, not the pass/fail metric.
requestRemarkflag from Issue 5: Executor framework + fresh per-task agent #624 so theremark (
WIZARD_REMARK_EVENT_NAME,:859) fires once at run end, not per task.Key files
src/lib/programs/orchestrator/orchestrator-runner.ts(emit run and task events)src/lib/programs/orchestrator/queue-tools.ts(emit enqueue and guard events)src/lib/agent/agent-interface.ts(remark gating, per-task token capture)src/utils/analytics.ts(setTag('variant', ...))Acceptance criteria
orchestrator seeded, then per-taskstartedandcompleted,then
orchestrator run finished, all carryingVARIANT=orchestrator.VARIANT.duration_ms, gap between starts, andper-task
modelare captured, the responsiveness headline.