You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deferred. The disk reflection is already built in #622 (queue, handoffs, audit
log). This issue is the one place that reads a leftover queue back. Pick it up
after the experiment is running, and only if resume proves worth it.
Why
A wizard run can crash or be killed mid-drain. The disk reflection (#622) means the
state survives. This issue makes a subsequent run continue that state instead of
starting fresh. It is additive on top of the persisted schema.
Prior art in the PostHog monorepo, worth following rather than reinventing: the
Tasks product keeps per-run JSON state with a resume_from_run_id chain and
atomic state mutation under select_for_update (products/tasks/backend/models.py, get_resume_chain and mutate_state_atomic), and Signals avoids double-spawning
on reruns with run_count in workflow ids and ack_id dedupe. Borrow the
resume-chain shape for inheriting prior outputs, and the dedupe so a resumed queue
does not re-run or double-spawn a task.
Scope / deliverable
Resume detection. On construction, if <installDir>/.posthog-wizard/queue.json exists, matches the current version,
and its runId and installDir identify the same run, continue it instead of
starting fresh (the Issue 3: Queue + persistence layer #622 default).
In_progress recovery. A task left in_progress is suspect, a crash
mid-task. If attempts < maxAttempts, reset it to pending and re-run,
otherwise mark it failed. Record the reset in audit.jsonl.
Issue 10: Resume across runs/crashes (deferred, low priority)
Epic: Task-queue orchestrator runner · Sequenced after: #628 ·
Functionally builds on: #622, #624, #627 · Priority: low
Why
A wizard run can crash or be killed mid-drain. The disk reflection (#622) means the
state survives. This issue makes a subsequent run continue that state instead of
starting fresh. It is additive on top of the persisted schema.
Prior art in the PostHog monorepo, worth following rather than reinventing: the
Tasks product keeps per-run JSON state with a
resume_from_run_idchain andatomic state mutation under
select_for_update(products/tasks/backend/models.py,get_resume_chainandmutate_state_atomic), and Signals avoids double-spawningon reruns with
run_countin workflow ids andack_iddedupe. Borrow theresume-chain shape for inheriting prior outputs, and the dedupe so a resumed queue
does not re-run or double-spawn a task.
Scope / deliverable
<installDir>/.posthog-wizard/queue.jsonexists, matches the currentversion,and its
runIdand installDir identify the same run, continue it instead ofstarting fresh (the Issue 3: Queue + persistence layer #622 default).
in_progressis suspect, a crashmid-task. If
attempts < maxAttempts, reset it topendingand re-run,otherwise mark it
failed. Record the reset inaudit.jsonl.write, and the
{type, inputs}dedup guard ( Issue 4: Orchestrator MCP tools (inwizard-tools) #623) backs it up. Audit the threereal task bodies ( Issue 8: Real task bodies + full 1:1 integration flow #627) for safe re-execution, and fix any that are not.
versionmismatch, or arunIdfrom an unrelatedrun, is discarded rather than adopted.
resumed: booleanflag toorchestrator run finished(stubbed in Issue 9: Telemetry + experiment instrumentation #628), and emit an
orchestrator resumed { tasks_pending, tasks_done }event.
automatic, guarded by the runId and installDir match, and clearly logged.
Key files
src/lib/programs/orchestrator/queue.ts(resume path on construction)src/lib/programs/orchestrator/orchestrator-runner.ts(resumedtelemetry)install,init, andinstrument-eventsagent prompts and mini-skills(idempotency audit)
Acceptance criteria
kill -9mid-drain, then a re-run, resumes the queue. The in_progress taskresets to
pendingand completes, and no task runs twice to completion.versionmismatch, or a foreignrunId, is discarded rather than adopted.orchestrator run finishedcarriesresumed: trueon a resumed run.