fix: TASK-357 — TASK-322 Wave 4 lows (10 findings) + transcription tests#3603
Merged
Conversation
B6a: delete recordedMerges entry in Controller.removePR alongside activePRs/prFailures — recordMergeSuccess set it but nothing deleted it. B6b: delete discoveryStart[sha] on CIMonitor grace-expired terminal paths (no-CI and commit-status branches); previously only the checks-found path evicted, leaking every no-CI SHA and superseded intermediate commit. E7: add retryLastSeen TTL map + evictStaleRetryTrackers sweep (24h) to the existing checkStuckTasks ticker, mirroring GH-2204 orphan eviction for taskLastProgress; abandoned/escalated sources no longer leak forever.
G1: CleanupOrphanedWorktrees now returns (removedCount, freedBytes, err) with err reserved for real failures, instead of abusing a non-nil error to signal a successful N-orphan cleanup (a trap for idiomatic if err!=nil callers). Updated 4 call sites + 3 tests. A4b: clean up the MkdirTemp hook dir on the WriteEmbeddedScripts failure branch (runner.go) — it previously logged and fell through, leaking the temp dir for the process lifetime. G2: orchestrator/bridge.go runPython wraps with %w not %v so callers can errors.Is(context.DeadlineExceeded) / errors.As(*exec.ExitError). G3: discord heartbeat logs the WriteJSON error and returns to trigger immediate reconnect, instead of silently discarding it.
…l, coverage
A4a: research subagent now passes --model as TWO argv elements ("--model",
name) instead of one joined "--model haiku" string the claude CLI parsed as a
single unknown flag — subagents were silently running on the default model.
board-low: ExecuteGraphQL aggregates ALL gqlResp.Errors (message + type + path
via GraphQLError.String) instead of surfacing only Errors[0], so multi-error
Projects V2 responses are diagnosable. Error prefix unchanged; partial-data
tolerance stays on the TASK-319 board track.
G4: add table-driven tests for internal/transcription (0 -> 3 test files):
whisper_api error/parse/empty-transcript paths via httptest + URL-rewrite
RoundTripper, setup status, and Service primary/fallback routing.
… task shipped (PR #3603)
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This was referenced Jun 15, 2026
alekspetrov
added a commit
that referenced
this pull request
Jun 15, 2026
… TASK-365 (#3608) - README Current State: lead with v2.186.9 (TASK-357 Wave 4 closes the TASK-322 audit via #3603; flaky-test de-flake shipped by Pilot via #3606), soften the v2.186.8 restart note to past tense, collapse the archived v2.186.0/.1 bullets to keep the block lean. - Archive TASK-365 (de-flake task doc) — shipped in v2.186.9.
22 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Closes the final tranche of the TASK-322 security/code audit — the 13 deferred low-severity findings (Wave 4). All were re-verified against
main(ab15125) before fixing per the audit gate; none had been incidentally fixed by Waves 2–3, so all 10 distinct findings are addressed here.What changed
Leak evictions (
autopilot,alerts)controller.go—removePRnow evictsrecordedMergesalongsideactivePRs/prFailures(was never deleted → unbounded over daemon lifetime).ci_monitor.go— grace-expired terminal paths nowdelete(discoveryStart, sha); previously only the "checks found" path evicted, leaking every no-CI SHA + superseded commit.engine.go— addedretryLastSeen+evictStaleRetryTrackers(24h TTL) on the existingcheckStuckTasksticker, mirroring fix(alerts): task_stuck floods Slack — progress never emitted, per-rule cooldown rotates, no orphan cleanup #2204 orphan eviction; abandoned/escalated sources no longer leak.Error-handling hygiene (
executor,orchestrator,discord)worktree.go—CleanupOrphanedWorktreesreturns(removedCount, freedBytes, err)witherrreserved for real failures, instead of abusing a non-nil error to signal success (a trap for idiomatic callers). 4 call sites + 3 tests updated.runner.go— clean up theMkdirTemphook dir on theWriteEmbeddedScriptsfailure branch (was leaking for the process lifetime).bridge.go—runPythonwraps with%wnot%vso callers canerrors.Is(context.DeadlineExceeded)/errors.As(*exec.ExitError).transport.go— discord heartbeat logs theWriteJSONerror and returns to trigger immediate reconnect instead of discarding it.Bug + diagnosability (
executor,github)parallel.go— research subagent passes--modelas two argv elements ("--model", name) instead of one joined"--model haiku"string the CLI parsed as a single unknown flag. Subagents were silently running on the default model. (taggedbug · conf high)client.go—ExecuteGraphQLaggregates allgqlResp.Errors(message + type + path) instead of onlyErrors[0]. Partial-data tolerance stays on the TASK-319 board track per the audit doc.Test coverage (
transcription)internal/transcription0 → 3 test files: whisper_api error/parse/empty-transcript paths (httptest + URL-rewrite RoundTripper), setup status, Service primary/fallback routing.Acceptance
go build ./...✅make lint✅ 0 issuesautopilot,alerts,executor,orchestrator,discord,github,transcription,cmd/pilot)Re-audit + ledger tick-off and
TASK-357archival to follow on merge — that closes the entire TASK-322 audit.