feat(batch-bug-shepherd): operator visibility + fold-in invariant by danielmeppiel · Pull Request #1451 · microsoft/apm

danielmeppiel · 2026-05-22T07:47:41Z

TL;DR

Refactors the batch-bug-shepherd skill to (1) make long sagas visible to the operator at every phase boundary, (2) stop silently dropping non-blocking panel recommendations, and (3) add a post-wave mergeability gate that re-probes every "ready" PR before the report claims anything is mergeable. All three gaps were caught by genesis validation passes and confirmed by real microsoft/apm sweep-all runs.

Problem (WHY)

Three failure modes surfaced during live sweep-all runs:

Silent fan-outs. Triage / shepherd / completion waves each take many minutes; the operator could not see which phase was active or which subagents had been dispatched without scrolling the chat. "Are you still running?" became the most common operator message.
Recommendations vanished. apm-review-panel returned severity:recommended follow-ups, but the skill had no channel for them. Completion subagents only acted on severity:blocking; everything else was dropped on the floor.
Stale "ready-to-merge" claims. A PR marked ready by Phase 4 can stop being mergeable the moment the maintainer lands another PR onto main. The skill had no post-wave gate; the operator only discovered conflicts when GitHub refused the merge. In the live PR fleet that drove this work, 2 of 6 fold-in PRs (fix(hooks): keep project-scope hook commands repo-relative (closes #1394) #1396, fix(audit): pre-create target dirs in drift replay scratch (#1411) #1441) hit exactly this failure mode.

Each gap drove a genesis design pass with a persisted handoff packet; this PR ships the union of all three.

Approach (WHAT)

Operator visibility is a contract

New asset assets/progress-diagram.md defines a mermaid template + 5-state palette (pending / active / done / blocked / skipped) + per-phase render rules + dispatch-table contract. SKILL.md adds the invariant; each phase boundary re-renders the diagram with current-phase coloring and prints a subagent_id -> target dispatch table BEFORE fan-out.

Bias toward folding recommended items

assets/verdict-schema.json: shepherd_return gains required recommended_followups[] channel; completion_return gains folded_followups[] + deferred_followups[]; extracted reusable followup_item definition.
assets/shepherd-prompt.md: fixed verdict mapping bug (ship_with_followups + 0 blocking now maps to ready-to-merge).
assets/completion-prompt.md: encodes FOLD vs DEFER classifier; DEFER items filed as gh issue create tracking issues (never silently dropped).
SKILL.md Phase 4 rewritten as a 9-step contract that threads the channel end-to-end.

Mergeability is post-wave truth (NEW)

NEW Phase 5 (mergeability gate) between completion (Phase 4) and renamed final report (Phase 6). Sub-phases 5a probe (gh pr view --json mergeStateStatus,mergeable,maintainerCanModify), 5b fan-out one conflict-resolution subagent per CONFLICTING PR, 5c trust-but-verify re-probe + four-way partition.
NEW assets/conflict-resolution-prompt.md spawn body: rebase, faithful merge of both intents, mutation-break re-check, lint silent, --force-with-lease push (NEVER bare --force), re-probe, single resolution-confirmation comment.
NEW references/mergeability-gate.md load-on-demand orchestrator step-by-step. Load trigger: WHEN ENTERING PHASE 5. Keeps SKILL.md under budget.
Schema extends verdict-schema.json oneOf with conflict_resolution_return; --force-with-lease enforced via regex pattern guard on push_command; bare --force rejected. 5 rejection cases validated.
Four post-gate statuses route to four new final-report sections: resolved / requires-author-action (fork + maintainerCanModify=false) / requires-human-judgment (semantic conflict) / resolution-failed.
Two-comment-per-PR cap as new architecture invariant: at most one completion-confirmation (Phase 4) + one resolution-confirmation (Phase 5b) per PR.

Validation evidence

Gate	Result
Schema (jsonschema Draft7)	4 oneOf shapes; all 5 mergeability rejection cases enforced
SKILL.md size	388 lines / 4867 tokens (caps: 500 / 5000)
Description	965 / 1024 chars
ASCII purity	0 non-ASCII bytes across all changed files
JSON parse	4 / 4 changed JSON files valid
Eval value delta	16 rubric anchors (FOLD + mergeability) MATCH `with_skill` fixtures, MISS `without_skill`
Trigger near-misses	+5 no-fire items (single-PR fold-in phrasing; single-branch rebase phrasing; single-PR conflict phrasing)
CI lint pair	`ruff check src/ tests/` + `ruff format --check src/ tests/` both silent

Real-task evidence

This iteration was driven by a live sweep-all run over microsoft/apm:

FOLD/DEFER channel: 35 panel items folded directly in-PR + 11 deferred-to-tracking across PRs fix(integration): respect comma-separated applyTo globs in Claude/Cursor/Windsurf #1387, fix(hooks): keep project-scope hook commands repo-relative (closes #1394) #1396, fix(audit): pre-create target dirs in drift replay scratch (#1411) #1441, fix(install): persist --skill filter to apm.yml (#1395) #1442, fix(mcp): honour per-dep registry URL during install (#1393) #1443, fix(vscode): handle v0.1 runtimeArguments with variables for Docker MCP (#1391) #1444, fix(applyTo): fold #1387 panel follow-ups (docs + DRY + YAML escape) #1449.
Mergeability gate: post-wave probe found fix(hooks): keep project-scope hook commands repo-relative (closes #1394) #1396 and fix(audit): pre-create target dirs in drift replay scratch (#1411) #1441 DIRTY against current main. Both rebased by conflict-resolution subagents this session (commits 41ee035b and 36d878f4), pushed with --force-with-lease, re-probed CLEAN. Without the gate, the report would have claimed both ready-to-merge and the maintainer would have hit the merge failure manually.

How to test

Evals at packages/batch-bug-shepherd/.apm/skills/batch-bug-shepherd/evals/:

content/{three-issues-mixed,sweep-bug-queue}.json — rubric runs prompt with/without the skill, asserts value delta via anchors including the new mergeability-probe-cli, force-with-lease-on-push, post-wave-partition-columns, two-comment-cap.
triggers.json — should-fire vs should-not-fire dispatch eval, train/val split. Validation split is the ship gate.
fixtures/sweep-bug-queue.with_skill.md — ideal output (Phase 5 mermaid + 5b dispatch table + 4-column report).
fixtures/sweep-bug-queue.without_skill.md — explicit failure mode (stale ready-to-merge claim, maintainer hits the conflict at merge time).

Trade-offs

Verbosity in chat. Each phase boundary now renders a mermaid block + a dispatch table. Acceptable cost for the visibility the operator gets across multi-PR sagas with up to 4 fan-out waves.
Completion subagents may spawn nested Task subagents for per-item panelist consultation. This is by design — the FOLD path is where the panel's expertise meets the diff.
Mid-flight reclassify to DEFER is allowed ("fold-in bias is a default, not a suicide pact"). Prevents one stubborn item from blocking the whole PR.
Conflict-resolution subagents do the rebase in-session. The skill does NOT delegate the rebase to a nested agent — the subagent owns its PR end-to-end. Cost: each subagent's session is single-threaded for the duration of one rebase + push + re-probe. Benefit: trust-but-verify re-probe in 5c is independent.
Trust-but-verify re-probe. Sub-phase 5c re-probes every 5b return independently of the subagent's claim. Costs one extra gh pr view per resolved PR; pays for itself the first time GitHub finishes computing mergeStateStatus after the subagent already returned.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Update: Phase 1.5 strategic-alignment gate (commit `3786d9c6`)

A FOURTH gap surfaced during the live sweep that this PR ships a fix for: the skill happily shepherded reproducible-on-HEAD bugs whose fixes the project did not actually want. A LEGIT triage was necessary but not sufficient.

What changed

NEW PRINCIPLES.md at repo root. Numbered rejection contract (P1..P7) encoding the project's hard nos:
- P1: No invented frontmatter schemas on AGENTS / SKILL / PROMPT primitives
- P2: Multi-harness, but only harnesses with real adoption traction
- P3: Vendor-neutral (no Copilot / Cursor / Claude lock-in in the core primitives)
- P4: UX floor is not a trade (we do not break the adoption funnel to fix things)
- P5..P7: portability over lock-in, reliability over magic, community over feature count
- Bound to apm-ceo, bbs Phase 1.5, apm-triage-panel, apm-review-panel as the rejection contract
NEW bbs Phase 1.5 (WAVE 1.5) between Phase 1 (triage) and Phase 2 (PR-in-flight cross-reference):
- one apm-ceo subagent per LEGIT row, in parallel
- 4-state verdict (aligned / aligned-with-reservations / out-of-scope / wrong-direction)
- fails OPEN on infrastructure failure (malformed-x2 or non-citable principle) — never silently demotes a legit bug under gate breakage
- aborts only when apm-ceo.agent.md or PRINCIPLES.md itself is missing (operator-actionable)
- demoted rows flip to status triaged-deferred and SKIP Phase 2/3/4/5, surface in Phase 6 under "Recommend close as out-of-scope"
- deferred-PR strategic-rejection comment subagent (S7+S4+A9) posts one courtesy comment on any open PR for a demoted issue, using the would-be Phase-4 completion-comment slot (two-comment-per-PR cap preserved)
Verdict schema +5th oneOf member strategic_alignment_return (kind / issue / verdict / cited_principle / rationale / reservations).
Ground-truth table +2 columns (strategic_verdict, strategic_rationale) +1 status value (triaged-deferred).
Progress diagram inserts P15 between P1 and P2; dispatch-table contract extends.
Final-report template +2 partitions ("Recommend close as out-of-scope" + "Aligned with reservations").
Evals +2 fire / +1 no-fire (PRINCIPLES.md authoring guard) +1 rubric anchor.

Retroactive Phase 1.5 evidence (this sweep's 8 open PRs)

The gate was authored, then run against the open PRs in this sweep. Results:

PR	verdict	cited principle
#1396	aligned	P5 portability over lock-in
#1416	aligned	P6 reliability over magic
#1440	aligned	P4 UX floor is not a trade
#1441	aligned	P6 reliability over magic
#1442	aligned	P4 UX floor is not a trade (P1 explicitly cleared)
#1443	aligned	P6 reliability over magic
#1444	aligned-with-reservations	P3 vendor neutral (VS Code-only fix on a vendor-neutral MCP adapter surface)
#1449	aligned	P6 reliability over magic

Zero out-of-scope / wrong-direction verdicts. The single reservation on #1444 was tracked in #1452 (extend MCP v0.1 variables handling to all client adapters) to satisfy P3.

Updated validation

Gate	Result
Schema (jsonschema Draft7)	5 oneOf shapes; strategic_alignment_return enforces `reservations` when verdict=`aligned-with-reservations`
SKILL.md size	406 lines / 4943 tokens (caps: 500 / 5000) — trimmed pre-existing verbose passages to fit
ASCII purity	0 non-ASCII bytes across all 10 changed files (including new `PRINCIPLES.md`)
JSON parse	all changed JSON files valid
Genesis design	SHIP DESIGN, no BLOCKER (handoff packet persisted to session plan store)
Real-task rehearsal	8 PRs retroactively gated; 7 aligned, 1 aligned-with-reservations (tracked via #1452); 0 demoted

Refactors the batch-bug-shepherd skill to address two genesis-validated blockers and ship two missing capabilities discovered during a real sweep-all run over 14 microsoft/apm bugs: 1. OPERATOR VISIBILITY (was: silent 30-minute fan-outs) - New asset assets/progress-diagram.md: mermaid template + 5-state palette (pending/active/done/blocked/skipped) + per-phase render rules + dispatch-table contract. - SKILL.md adds 'Operator visibility is a contract' invariant; each phase boundary re-renders the diagram with current-phase coloring and prints a subagent_id -> target dispatch table BEFORE fan-out. - Operator can follow long sagas at a glance instead of waiting in the dark for the next checkpoint. 2. FOLD-IN INVARIANT (was: panel recommendations silently dropped) - assets/verdict-schema.json: shepherd_return gains required recommended_followups[] channel; completion_return gains folded_followups[] + deferred_followups[]; extracted reusable followup_item definition. - assets/shepherd-prompt.md: fixed verdict mapping bug (ship_with_followups + 0 blocking -> ready-to-merge, not needs-author-changes); added recommended_followups extraction step with required source_persona + optional fold_hint tagging. - assets/completion-prompt.md: full rewrite. Adds RECOMMENDED_FOLLOWUPS input; encodes FOLD vs DEFER classifier (FOLD: touches diff / single helper / regression trap / hermetic test / inline comment; DEFER: cross-cutting refactor / new feature / broad doc / architectural addition); per-FOLD item consultation with source_persona + python-architect lens; DEFER items filed as gh issue create tracking issues (never silently dropped); mid-flight reclassify rule to avoid stalls. - SKILL.md adds 'Bias toward folding recommended items' invariant and rewrites Phase 4 spawn contract (9 steps) to thread the recommended_followups channel end-to-end. Eval gate - +3 rubric anchors per content fixture (progress-diagram-header, mermaid-flowchart-rendered, dispatch-table-before-fanout) and +3 invariant anchors (recommended-followups-channel, fold-defer-classifier, tracking-issue-for-defer). - All 12 new anchors MATCH with_skill fixtures and MISS without_skill fixtures (clean value delta). - +3 no-fire trigger items for single-PR fold-in phrasing so the dispatcher will not misfire the batch outer-loop on single-PR fold work (e.g. 'fold the panel recommendations into PR #1234' remains apm-review-panel completion territory). Validation - Schema validates via jsonschema Draft7; accepts new shapes, rejects shepherd_return missing recommended_followups[]. - SKILL.md: 367 lines / 4483 tokens (caps: 500 / 5000). - Description: 965 / 1024 chars; mentions FOLD invariant. - 0 non-ASCII bytes across all modified files. - All 4 changed JSON files parse. Real-task evidence (this skill iteration was driven by a live run) - 5 of 6 in-flight community PRs had their panel recommendations folded in-PR by completion subagents following the new contract, yielding 22 folded items and 8 deferred-to-tracking items across PRs #1387, #1396, #1441, #1443, #1444. The 6th (#1442) is in flight as this lands. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Refactors the batch-bug-shepherd skill contract to improve long-running saga operator visibility (progress diagram + dispatch tables at phase boundaries) and to stop silently dropping non-blocking review-panel recommendations by flowing them through a recommended_followups[] channel and enforcing a FOLD-vs-DEFER completion workflow.

Changes:

Adds an operator-visibility contract (assets/progress-diagram.md) and threads “render progress diagram + dispatch table” requirements into SKILL.md, the root prompt, and eval fixtures.
Extends the verdict schema to carry recommended_followups[] from shepherd -> completion, plus completion reporting via folded_followups[] / deferred_followups[].
Updates prompts/evals to encode the fold-in bias (classify recommended items as FOLD vs DEFER; file DEFER items as tracking issues).

Show a summary per file

File	Description
packages/batch-bug-shepherd/.apm/skills/batch-bug-shepherd/SKILL.md	Updates architecture invariants and phase-by-phase contract to require progress renders and fold-in handling.
packages/batch-bug-shepherd/.apm/skills/batch-bug-shepherd/evals/triggers.json	Adds new “no-fire” trigger cases to prevent mis-dispatch on single-PR fold-in phrasing.
packages/batch-bug-shepherd/.apm/skills/batch-bug-shepherd/evals/fixtures/three-issues-mixed.with_skill.md	Updates idealized “with_skill” transcript to include progress/dispatch visibility and fold/defer outcomes.
packages/batch-bug-shepherd/.apm/skills/batch-bug-shepherd/evals/fixtures/sweep-bug-queue.with_skill.md	Updates sweep-all transcript to include progress/dispatch visibility and fold/defer outcomes at scale.
packages/batch-bug-shepherd/.apm/skills/batch-bug-shepherd/evals/content/three-issues-mixed.json	Adds rubric anchors for progress diagram, dispatch table, recommended followups, FOLD/DEFER behavior.
packages/batch-bug-shepherd/.apm/skills/batch-bug-shepherd/evals/content/sweep-bug-queue.json	Mirrors new rubric anchors for the sweep-bug-queue scenario.
packages/batch-bug-shepherd/.apm/skills/batch-bug-shepherd/assets/verdict-schema.json	Extends schema with `recommended_followups` and completion follow-up reporting structures.
packages/batch-bug-shepherd/.apm/skills/batch-bug-shepherd/assets/shepherd-prompt.md	Fixes verdict mapping for `ship_with_followups` with 0 blocking; extracts `recommended_followups[]`.
packages/batch-bug-shepherd/.apm/skills/batch-bug-shepherd/assets/progress-diagram.md	Introduces the canonical progress-diagram template, palette, and dispatch-table contract.
packages/batch-bug-shepherd/.apm/skills/batch-bug-shepherd/assets/completion-prompt.md	Rewrites completion workflow to classify recommended items (FOLD/DEFER) and file tracking issues for DEFER.
packages/batch-bug-shepherd/.apm/prompts/batch-bug-shepherd.prompt.md	Threads the progress/dispatch rendering requirements into the top-level prompt procedure.

Copilot's findings

Files reviewed: 11/11 changed files
Comments generated: 8

+- **Pending** -- every phase at the very first render.
+- **Active** -- the phase the orchestrator is about to spawn into.
+  EXACTLY ONE node should be `active` at any time. Use stroke-width
+  3px so the operator's attention lands there.
+- **Done** -- the phase completed and the table reflects its


+    classDef pending fill:#eee,stroke:#999,color:#333
+    classDef active  fill:#fff3b0,stroke:#b8860b,color:#333,stroke-width:2px
+    classDef done    fill:#cfe8c9,stroke:#2e7d32,color:#1b5e20
+    classDef blocked fill:#f8c7c7,stroke:#b71c1c,color:#7f0000
+    classDef skipped fill:#f0f0f0,stroke:#bbb,color:#888,stroke-dasharray:3 3


+    classDef pending fill:#eee,stroke:#999,color:#333
+    classDef active  fill:#fff3b0,stroke:#b8860b,color:#333,stroke-width:2px
+    classDef done    fill:#cfe8c9,stroke:#2e7d32,color:#1b5e20
+    classDef blocked fill:#f8c7c7,stroke:#b71c1c,color:#7f0000
+    classDef skipped fill:#f0f0f0,stroke:#bbb,color:#888,stroke-dasharray:3 3


+3. ONCE at the end of the run with every phase styled `done` (or
+   `blocked` where the human-escalation queue is non-empty).


+| state    | fill      | stroke    | stroke-width | semantics                          |
+|----------|-----------|-----------|--------------|------------------------------------|
+| pending  | `#f5f5f5` | `#9ca3af` | 1px          | not started yet                    |
+| active   | `#dbeafe` | `#2563eb` | 3px          | currently executing (one at a time)|
+| done     | `#dcfce7` | `#16a34a` | 1px          | completed cleanly                  |
+| blocked  | `#fef3c7` | `#d97706` | 2px          | partial completion, human follow-up|
+| skipped  | `#f3f4f6` | `#6b7280` | 1px,dasharray| no work in this phase (e.g. 0 fix) |


+flowchart TB
+    P0[Phase 0 scope]:::active
+    P1[Phase 1 triage]:::pending
+    P2[Phase 2 cross-ref]:::pending
+    P3[Phase 3 shepherd-or-fix]:::pending


+    P3[Phase 3 shepherd-or-fix]:::pending
+    P4[Phase 4 completion]:::pending
+    P5[Phase 5 final report]:::pending
+    P0 --> P1 --> P2 --> P3 --> P4 --> P5


+   a tracking issue), then act on each per classification.
+
+Push to the contributor's fork if possible, otherwise open a
+superseding PR that preserves author authorship, and post ONE


Adds a post-wave gate that re-probes mergeability for every PR the saga marked ready-to-merge, dispatches one conflict-resolution subagent per CONFLICTING PR, and partitions returns into four post-gate statuses before the final report claims anything is mergeable. Mergeability is post-wave truth, not pre-wave assumption: a PR that Phase 4 marked ready can stop being mergeable the moment the maintainer lands another PR onto main. Without this gate the report ships stale ready claims. Design driven through the genesis skill end-to-end (steps 1-6 handoff packet, steps 7a-7b coder pass, step 8 validation): - NEW Phase 5 (mergeability gate) between completion (Phase 4) and renamed final report (Phase 5 -> Phase 6). - Sub-phases 5a probe (read-only, single-thread, gh pr view --json mergeStateStatus), 5b fan-out (one conflict-resolution subagent per CONFLICTING PR), 5c trust-but-verify re-probe + four-way partition (resolved / requires-author-action / requires-human-judgment / resolution-failed). - NEW assets/conflict-resolution-prompt.md spawn body for 5b. Encodes rebase, faithful merge of both intents, mutation-break re-check, lint silent, --force-with-lease push, re-probe, resolution-confirmation comment. - NEW references/mergeability-gate.md load-on-demand orchestrator step-by-step (load trigger: WHEN ENTERING PHASE 5). Keeps SKILL.md under 5000-token budget. - Schema extends verdict-schema.json oneOf with conflict_resolution_return; --force-with-lease enforced via regex pattern guard on push_command field; bare --force rejected. Five rejection cases validated. - Two-comment-per-PR cap as new architecture invariant: at most one completion-confirmation (Phase 4) + one resolution-confirmation (Phase 5b) per PR. - Progress diagram extended with WAVE4 subgraph (P5a/P5b/P5c), skipped-state semantics, P5b dispatch table requirement. - Final report extended with three new partition sections plus a RESOLUTION CONFIRMATION COMMENT block and mergeability-gate disciplines line. - Evals: +3 content rubric anchors (mergeability-probe-cli, force-with-lease-on-push, post-wave-partition-columns) + 1 optional anchor (two-comment-cap); +1 fire + 2 no-fire trigger items; fixture diff shows the gate firing on a sweep with 2 conflicting PRs and the without-skill failure mode (stale ready claim). SKILL.md: 388 lines / 4867 tokens (budget 500/5000). ASCII only. CI lint pair silent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds a new wave between Phase 1 (triage) and Phase 2 (PR-in-flight cross-reference) that checks every LEGIT bug against the project's rejection contract before spending shepherd / fix / completion work on it. What changes: - NEW PRINCIPLES.md at repo root: 7 numbered principles encoding the project's hard nos (P1 no invented frontmatter; P2 multi-harness with traction gating; P3 vendor neutral; P4 UX floor is not a trade) plus 3 supporting principles (P5 portability; P6 reliability over magic; P7 community over feature count). Bound to apm-ceo + bbs Phase 1.5 + apm-triage-panel + apm-review-panel as the rejection contract. - NEW bbs Phase 1.5 strategic-alignment gate (WAVE 1.5): - one apm-ceo subagent per LEGIT row, in parallel - 4-state verdict: aligned | aligned-with-reservations | out-of-scope | wrong-direction - schema-validated returns; FAILS OPEN on infrastructure failure (malformed-x2 or non-citable principle) so legit bugs are never silently demoted under gate breakage - ABORTS only when apm-ceo.agent.md or PRINCIPLES.md itself is missing (operator-actionable error) - demoted rows flip to status triaged-deferred and SKIP Phase 2/3/4/5; surface in Phase 6 under 'Recommend close as out-of-scope' partition - aligned-with-reservations rows stay in saga; downstream phases surface the reservations in review prose - deferred-PR strategic-rejection comment subagent (S7+S4+A9) posts a courtesy comment on any open PR whose underlying issue was demoted, using the would-be Phase-4 completion-comment slot (two-comments-per-PR cap preserved) - Verdict schema extended with 5th oneOf member strategic_alignment_return (kind, issue, verdict, cited_principle, rationale, reservations). - Ground-truth table grows two columns (strategic_verdict + strategic_rationale) and one status value (triaged-deferred). - Progress diagram inserts P15 between P1 and P2; dispatch-table contract extends to Phase 1.5. - Final-report template adds 'Recommend close as out-of-scope' partition and 'Aligned with reservations' surfacing section. - 2 new fire trigger evals + 1 no-fire (PRINCIPLES.md authoring guard) + 1 new rubric anchor on the three-issues-mixed scenario. Genesis design artifact lives in the session plan store; SKILL.md body remains within 500-line / 5000-token budget (406 lines / 4943 tokens after trimming pre-existing verbose passages on operator-visibility, mergeability, fold-in, composition, and operating-contract sections to make room). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danielmeppiel requested a review from sergio-sisternes-epam as a code owner May 22, 2026 07:47

Copilot AI review requested due to automatic review settings May 22, 2026 07:47

Copilot started reviewing on behalf of danielmeppiel May 22, 2026 07:47 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

danielmeppiel and others added 2 commits May 22, 2026 10:08

danielmeppiel merged commit 9b4279e into main May 22, 2026
19 checks passed

danielmeppiel deleted the danielmeppiel/bbs-skill-fold-in-and-visibility branch May 22, 2026 14:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(batch-bug-shepherd): operator visibility + fold-in invariant#1451

feat(batch-bug-shepherd): operator visibility + fold-in invariant#1451
danielmeppiel merged 3 commits into
mainfrom
danielmeppiel/bbs-skill-fold-in-and-visibility

danielmeppiel commented May 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		3. ONCE at the end of the run with every phase styled `done` (or
		`blocked` where the human-escalation queue is non-empty).

Conversation

danielmeppiel commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Problem (WHY)

Approach (WHAT)

Operator visibility is a contract

Bias toward folding recommended items

Mergeability is post-wave truth (NEW)

Validation evidence

Real-task evidence

How to test

Trade-offs

Update: Phase 1.5 strategic-alignment gate (commit 3786d9c6)

What changed

Retroactive Phase 1.5 evidence (this sweep's 8 open PRs)

Updated validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

danielmeppiel commented May 22, 2026 •

edited

Loading

Update: Phase 1.5 strategic-alignment gate (commit `3786d9c6`)