Skip to content

Issue 9: Telemetry + experiment instrumentation #628

@gewenyu99

Description

@gewenyu99

Issue 9: Telemetry + experiment instrumentation

Epic: Task-queue orchestrator runner · Depends on: #624 (emits during
execution), finalize after #626 and #627

Why

The dark launch is judged primarily on responsiveness: a short time to first
visible progress, and steady incremental progress. Per-task latency and model are
the headline metrics, and tokens are a secondary cost to watch. We need clean
per-task and per-run telemetry, segmentable from the linear baseline by VARIANT.

Scope / deliverable

  1. Variant tagging. VARIANT: 'orchestrator' flows into existing events via
    headers (buildAgentEnv, agent-interface.ts:371) once Issue 1: Shared bootstrap extraction + variant gating #621 lands. Also
    analytics.setTag('variant', 'orchestrator') so sessionProperties
    (analytics.ts:16) and the setup wizard finished shutdown event (:166)
    carry it. The existing per-task agent started and agent completed events
    become the A/B spine.
  2. Orchestrator events via analytics.wizardCapture:
  3. Responsiveness metrics, the headline. Capture:
    • time to first task started, from launch to the first task reaching
      in_progress, the key "no long silent gap" signal,
    • per-task duration_ms, and the gap between consecutive task starts, so no
      single step dominates wall-clock,
    • per-task model, to confirm cheap models carry the cheap work.
      Keep each task's agent completed and duration_ms (agent-interface.ts:862).
  4. Per-task token capture, secondary. Surface token usage per task. Summing
    across tasks versus the linear baseline's single agent completed is a cost to
    watch, not the pass/fail metric.
  5. Run-end remark once. Wire the Stop-hook requestRemark flag from Issue 5: Executor framework + fresh per-task agent #624 so the
    remark (WIZARD_REMARK_EVENT_NAME, :859) fires once at run end, not per task.

Key files

  • src/lib/programs/orchestrator/orchestrator-runner.ts (emit run and task events)
  • src/lib/programs/orchestrator/queue-tools.ts (emit enqueue and guard events)
  • src/lib/agent/agent-interface.ts (remark gating, per-task token capture)
  • src/utils/analytics.ts (setTag('variant', ...))

Acceptance criteria

  • A run emits orchestrator seeded, then per-task started and completed,
    then orchestrator run finished, all carrying VARIANT=orchestrator.
  • Baseline and orchestrator runs are distinguishable in PostHog by VARIANT.
  • Time to first task started, per-task duration_ms, gap between starts, and
    per-task model are captured, the responsiveness headline.
  • Per-task token usage is captured as a secondary cost to watch.
  • The remark fires once per run.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions