Skip to content

fix(copilot-app): warn-not-fail on newer App schema + feat: batch-bug-shepherd primitive#1434

Merged
danielmeppiel merged 15 commits into
mainfrom
danielmeppiel/copilot-app-schema-warn-and-meta-prompt
May 21, 2026
Merged

fix(copilot-app): warn-not-fail on newer App schema + feat: batch-bug-shepherd primitive#1434
danielmeppiel merged 15 commits into
mainfrom
danielmeppiel/copilot-app-schema-warn-and-meta-prompt

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

TL;DR

Two related changes in one PR:

  • Phase A — fix: Stop hard-failing apm install --target copilot-app when the GitHub Copilot desktop App ships a newer PRAGMA user_version than APM was tested against. Warn-and-continue instead; keep the hard-fail for versions below the minimum (where the workflows table may not exist).
  • Phase B — feat: Ship .apm/skills/batch-bug-shepherd/ and the .apm/prompts/batch-bug-shepherd.prompt.md Copilot App workflow that loads it. The skill composes with the existing .apm/skills/apm-review-panel/ to drive a batch of suspected bugs from raw issue list to mergeable PR queue.

Phase A — schema gate UX fix

Problem

_check_user_version() in src/apm_cli/integration/copilot_app_db.py hard-fails any deploy whose user_version falls outside [_MIN_SUPPORTED_USER_VERSION, _MAX_SUPPORTED_USER_VERSION]. Both bounds were 13. The Copilot App is now at user_version = 15 on macOS, so every user attempting apm install --target copilot-app hits:

Copilot App DB schema is newer than this APM release supports (user_version=15, max tested 13). Upgrade APM, or skip '--target copilot-app' until APM catches up.

The Copilot App ships fast and bumps user_version on additive, forward-compatible changes that do not break APM's narrow read/write surface (the workflows table). Hard-failing on every such bump made every Copilot App release a release window for APM.

devx-ux verdict

Consulted the devx-ux-expert persona before touching code. Verdict: ship_as_proposed.

Hard-failing on a forward-compatible additive bump violates the "install succeeds without reading docs" north star -- it blocks users who simply updated their Copilot App, with zero actionable recovery other than "downgrade Copilot" or "wait for APM release". Warn-and-continue respects the user's time; the < min hard-fail is justified because structural prerequisites (workflows table) genuinely may not exist. This mirrors how npm warns on unsupported engine versions but does not abort.

Wording (shipped verbatim, printable ASCII, [!] per STATUS_SYMBOLS):

[!] Copilot App schema version {actual} is newer than APM's tested maximum ({max_tested}). Proceeding anyway -- if you see unexpected behavior, run: apm --version && gh issue new -R microsoft/apm -t 'Schema v{actual} compat' -b 'Observed: <describe>'

On the question of a +N cap before degrading to error, devx-ux argued against:

No cap at +N is needed. The warning already names the exact delta, giving the user signal proportional to risk. If APM later discovers a version that truly breaks reads (e.g., table renames), add that specific version to a _KNOWN_BREAKING list and hard-fail on it -- not on an arbitrary numeric threshold. A blanket +N cap would recreate the original problem at a higher number.

Implementation

  • _MAX_SUPPORTED_USER_VERSION bumped 13 -> 15 (observed live, working).
  • _MIN_SUPPORTED_USER_VERSION stays at 13 -- below this the workflows table may not exist, so a hard fail is correct.
  • Above max: warn-and-continue once per process via _rich_warning. Dedup is enforced by a module-level _warned_user_versions: set[int] so multi-row installs do not spam the same line N times. _reset_user_version_warning_state() is a test helper.

Trade-offs (explicit)

  • Why warn-and-continue above max, but hard-fail below min? The two failure modes are not symmetric. Below min the workflows table may genuinely not exist -- any write would raise OperationalError mid-transaction with a far worse diagnostic. Above max, APM's read/write surface (a single CRUD on workflows) is small enough that additive schema changes are overwhelmingly safe; on the rare truly-breaking bump we will see breakage reports via the URL in the warning and add the specific version to a _KNOWN_BREAKING list.
  • Why no +N cap? See devx-ux quote above. A blanket cap recreates the original problem at a higher number; a _KNOWN_BREAKING allow-list targets the actual risk.
  • Why dedup at process scope, not collector scope? _check_user_version is called from low-level DB writers (deploy_workflow, delete_workflows) that have no DiagnosticCollector in scope. Threading one through would touch the public signature of every DB writer; a module-level set is the minimal surgical change.

Phase B — batch-bug-shepherd primitive + Copilot App workflow

What ships

  • .apm/skills/batch-bug-shepherd/SKILL.md plus seven asset files (triage-prompt.md, shepherd-prompt.md, fix-prompt.md, completion-prompt.md, final-report-template.md, ground-truth-table.md, verdict-schema.json).
  • .apm/prompts/batch-bug-shepherd.prompt.md -- Copilot App workflow prompt (interval: manual, mode: interactive) that loads the skill and accepts a targets input (issue list or the literal sweep-all).
  • .agents/skills/batch-bug-shepherd/ -- compiled output produced by apm install --target copilot,copilot-app.
  • apm.lock.yaml regenerated to include both the skill path and the synthetic copilot-app-db://workflows/apm--local--_local--batch-bug-shepherd URI.

Composition with apm-review-panel

The shepherd phase delegates panel review to .apm/skills/apm-review-panel/ via a sibling relative link rather than reinventing it. This is the first cross-skill composition in this repo -- a deliberate test of whether the "skills compose via filesystem siblings" pattern holds up under non-trivial use.

Design rationale

The genesis design pass produced a genesis-plan.md artifact (8-step disciplined design with mermaid + interface sketch). That artifact is intentionally not shipped in this PR -- it is design scaffolding for the author, not a user-consumed primitive. It lives in the session workspace at /Users/danielmeppiel/.copilot/workspaces/007bd6d3-a723-4ebb-9ed5-e861e73b1b20/artifacts/genesis-plan.md if reviewers want to read it.


Validation evidence

  • Lint silent (CI-mirror commands per .apm/instructions/linting.instructions.md):
    uv run --extra dev ruff check src/ tests/         # All checks passed!
    uv run --extra dev ruff format --check src/ tests/ # 1037 files already formatted
    
  • Tests green for the changed module:
    uv run --extra dev pytest tests/unit/integration/test_copilot_app_db.py \
                              tests/unit/integration/test_copilot_app_error_ux.py
    # 42 passed in 2.24s
    
    Including the new regression coverage:
    • test_warns_but_continues_above_max[16|17|50] -- forward-compat scenarios, parametrized.
    • test_above_max_warning_deduped_per_version -- multi-row install warns once, not per row.
    • test_rejects_user_version_below_min -- < 13 still hard-fails.
  • End-to-end install succeeded against the live App DB (v15):
    $ uv run apm install --target copilot,copilot-app
    [>] Installing dependencies from apm.yml...
      [+] <project root> (local)
      |-- 1 prompts integrated -> copilot-app/workflows/
      |-- workflows arrive disabled; enable from the Copilot App's Workflows tab
      ...
    [*] Installed 1 APM dependency in 4.2s.
    
    No warning fires on v15 because v15 is now the documented tested max. (Warning behaviour was independently verified by copying the DB and bumping PRAGMA user_version = 17, then re-running with APM_COPILOT_APP_DB -- the [!] line appeared exactly once, full text matching the devx-ux wording.)
  • SQLite row landed with consent gate enforced (enabled = 0):
    $ sqlite3 ~/.copilot/data.db \
        "SELECT id, enabled FROM workflows WHERE id LIKE 'apm--%batch-bug-shepherd%'"
    apm--local--_local--batch-bug-shepherd|0
    
  • Lockfile diff carries both surfaces:
    + .agents/skills/batch-bug-shepherd
    + copilot-app-db://workflows/apm--local--_local--batch-bug-shepherd
    

How to test

  1. Check out this branch.
  2. uv venv && uv pip install -e ".[dev]".
  3. uv run --extra dev ruff check src/ tests/ && uv run --extra dev ruff format --check src/ tests/ -- both silent.
  4. uv run --extra dev pytest tests/unit/integration/test_copilot_app_db.py -v -- 14 tests pass, including the new TestVersionGuard::test_warns_but_continues_above_max cases.
  5. To exercise the warning end-to-end without bumping your real Copilot App:
    cp ~/.copilot/data.db ./build/test-bumped.db
    sqlite3 ./build/test-bumped.db "PRAGMA user_version = 17"
    APM_COPILOT_APP_DB=./build/test-bumped.db uv run apm install --target copilot-app
    
    Expect the [!] line naming version 17 and tested maximum (15) plus install completing successfully.
  6. To inspect the workflow row landed by Phase B:
    sqlite3 ~/.copilot/data.db \
      "SELECT id, name, interval, mode, enabled FROM workflows WHERE id LIKE 'apm--%batch-bug-shepherd%'"
    
    Expect enabled = 0 (consent gate).

danielmeppiel and others added 2 commits May 21, 2026 17:16
The Copilot App ships fast and bumps PRAGMA user_version on additive
changes that do not break APM's read/write surface. Hard-failing
every install whose user_version exceeded the highest version APM
was tested against made every Copilot App release a release window
for APM -- awful UX for users who simply updated their App.

Change:
- Bump _MAX_SUPPORTED_USER_VERSION 13 -> 15 (newly observed, working).
- Above max: warn-and-continue once per process (deduped by version)
  via _rich_warning, with exact wording supplied by the devx-ux-expert
  persona: names the version delta and points the user at a
  ready-to-file issue title for breakage reports.
- Below min: continues to hard-fail -- the workflows table may
  genuinely not exist on a pre-workflows schema.

Wording verbatim per devx-ux verdict (ship_as_proposed). No cap at
+N: warning already names the exact delta, giving signal proportional
to risk. If a future version truly breaks reads, add it to a
_KNOWN_BREAKING list and hard-fail on that specific version.

Tests cover v16/v17/v50 warn-not-raise, the <13 hard-fail path, and
the per-process dedup contract for multi-row installs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… prompt

Ships two composed APM primitives:

1. .apm/skills/batch-bug-shepherd/ -- the working spec for driving a
   batch of suspected bugs in microsoft/apm from raw issue list to
   mergeable PR queue. Fans out one reproduction subagent per candidate
   (LEGIT / UNCLEAR / FIXED-AT-HEAD), cross-references against open
   PRs, then branches:
   * in-flight community PR -> shepherd via .apm/skills/apm-review-panel
   * no PR -> fix session with TDD and mutation-break gate
   Dispatches one completion subagent per shepherd verdict to resolve
   panel follow-ups and post a ready-to-merge confirmation. Maintains
   a single plan.md ground-truth table as canonical session state.

2. .apm/prompts/batch-bug-shepherd.prompt.md -- a Copilot App workflow
   prompt (interval: manual, mode: interactive) that loads the skill
   above and accepts a 'targets' input (either an issue list or the
   literal 'sweep-all'). Lands in ~/.copilot/data.db as a workflow row
   with enabled=0 (consent gate); user must opt in via the Copilot App
   Workflows tab before it runs.

The skill composes with the existing .apm/skills/apm-review-panel/ via
a relative sibling link -- the shepherd phase delegates panel review
to that skill rather than reinventing it.

Design rationale lives in the genesis-plan.md authored during the
genesis design pass; that artifact is intentionally not shipped (it
is design scaffolding, not a user-consumed primitive).

apm.lock.yaml regenerated by 'apm install --target copilot,copilot-app'
to include both the skill path and the synthetic
copilot-app-db://workflows/apm--local--_local--batch-bug-shepherd URI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 21, 2026 15:17
@danielmeppiel danielmeppiel added the panel-review Trigger the apm-review-panel gh-aw workflow label May 21, 2026
Drop hard-coded .apm/skills/ path probes from the meta-prompt, the
SKILL.md composition section, and the shepherd-prompt asset. Skills
are activated by NAME by the harness; the meta-prompt and skill body
no longer assume APM-on-disk layout. This lets the same primitive
load identically inside Copilot CLI, Copilot App workflows, Claude
Code, Codex, or any other harness that resolves skills by name.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates the Copilot App DB schema guard to warn (instead of hard-fail) when PRAGMA user_version is newer than APM’s tested max, and introduces a new batch-bug-shepherd skill plus a Copilot App workflow prompt to drive batch bug triage/shepherding.

Changes:

  • Relax Copilot App DB schema gating: hard-fail only when user_version is below min; warn-and-continue (deduped) when above max.
  • Add batch-bug-shepherd skill assets + workflow prompt for interactive batch bug processing.
  • Regenerate/install outputs and lockfile entries for Copilot App workflow deployment.
Show a summary per file
File Description
tests/unit/integration/test_copilot_app_db.py Replaces above-max hard-fail test with forward-compat warning + dedup assertions.
src/apm_cli/integration/copilot_app_db.py Implements warn-and-continue for newer schemas, bumps tested max, adds warning dedup state.
apm.lock.yaml Records locally deployed Copilot App workflow URI.
.apm/skills/batch-bug-shepherd/SKILL.md Adds orchestrator skill specification and composition contract with apm-review-panel.
.apm/skills/batch-bug-shepherd/assets/triage-prompt.md Adds WAVE 1 triage subagent spawn body.
.apm/skills/batch-bug-shepherd/assets/shepherd-prompt.md Adds WAVE 2a shepherd subagent spawn body (delegates to apm-review-panel).
.apm/skills/batch-bug-shepherd/assets/fix-prompt.md Adds WAVE 2b fix subagent spawn body with mutation-break + lint gates.
.apm/skills/batch-bug-shepherd/assets/completion-prompt.md Adds WAVE 3 completion subagent spawn body for follow-ups + supersede flow.
.apm/skills/batch-bug-shepherd/assets/final-report-template.md Adds final report + PR comment templates used by orchestrator/completion.
.apm/skills/batch-bug-shepherd/assets/ground-truth-table.md Adds plan.md ground-truth table template for cross-wave state.
.apm/skills/batch-bug-shepherd/assets/verdict-schema.json Adds JSON schema for triage/shepherd/completion structured returns.
.apm/prompts/batch-bug-shepherd.prompt.md Adds Copilot App workflow prompt that loads and runs the new skill.
.agents/skills/batch-bug-shepherd/SKILL.md Adds compiled/installed copy of the skill spec for agent runtime consumption.
.agents/skills/batch-bug-shepherd/assets/triage-prompt.md Compiled copy of triage prompt asset.
.agents/skills/batch-bug-shepherd/assets/shepherd-prompt.md Compiled copy of shepherd prompt asset.
.agents/skills/batch-bug-shepherd/assets/fix-prompt.md Compiled copy of fix prompt asset.
.agents/skills/batch-bug-shepherd/assets/completion-prompt.md Compiled copy of completion prompt asset.
.agents/skills/batch-bug-shepherd/assets/final-report-template.md Compiled copy of final report/comment templates.
.agents/skills/batch-bug-shepherd/assets/ground-truth-table.md Compiled copy of ground-truth table template.
.agents/skills/batch-bug-shepherd/assets/verdict-schema.json Compiled copy of verdict JSON schema.

Copilot's findings

  • Files reviewed: 20/20 changed files
  • Comments generated: 7

Comment on lines +81 to 84
_MAX_SUPPORTED_USER_VERSION: int = 15
"""Highest ``PRAGMA user_version`` we are tested against.

When the app ships a newer schema we refuse to write rather than risk
Comment on lines +452 to +453
``_MIN_SUPPORTED_USER_VERSION``. Versions above
``_MAX_SUPPORTED_USER_VERSION`` only emit a warning.
Comment on lines +346 to 357
if version > _MAX_SUPPORTED_USER_VERSION and version not in _warned_user_versions:
_warned_user_versions.add(version)
# Wording authored by the devx-ux-expert persona; ship verbatim.
from apm_cli.utils.console import _rich_warning

_rich_warning(
f"[!] Copilot App schema version {version} is newer than APM's "
f"tested maximum ({_MAX_SUPPORTED_USER_VERSION}). Proceeding "
f"anyway -- if you see unexpected behavior, run: apm --version "
f"&& gh issue new -R microsoft/apm -t 'Schema v{version} compat' "
f"-b 'Observed: <describe>'"
)

| issue | verdict | pr | pr_in_flight | author | status | notes |
|-------|---------|----|--------------|--------|--------|-------|
| #___ | pending-triage | | | | pending-triage | seeded from <list-or-sweep>; awaiting wave 1 |
@@ -0,0 +1,133 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
Comment on lines +168 to +170
do not break APM's read/write surface; hard-failing made every
App release a release window for APM. See devx-ux verdict in
the schema-warn PR.
Comment on lines +4 to +36
Use this skill to drive a batch of suspected bugs in microsoft/apm
from raw issue list to mergeable PR queue. Fan out one reproduction
subagent per candidate issue (verify on HEAD: LEGIT / UNCLEAR /
FIXED-AT-HEAD), cross-reference each legit issue against open PRs,
branch the workflow (in-flight community PR -> shepherd via the
apm-review-panel skill; no PR -> fix session with TDD and a
mutation-break gate), then dispatch one completion subagent per
shepherd verdict to resolve panel follow-ups, push to the contributor
fork (or open a superseding PR that preserves author authorship via
commit trailers) and post one ready-to-merge confirmation. Maintain a
single plan.md ground-truth table as the canonical session state.
Activate when the maintainer asks to triage a list of issues, sweep
the bug queue, drive in-flight PRs to merge, shepherd community
contributions, or work down the bug backlog -- even if "shepherd" or
"batch" is not named.
---

# batch-bug-shepherd - Outer-loop bug-queue orchestrator

This skill is an A10 ORCHESTRATOR-SAGA over three fan-out waves
(triage, shepherd-or-fix, completion) with a persisted ground-truth
table between phases. It COMPOSES the
[apm-review-panel](../apm-review-panel/SKILL.md) skill -- it does NOT
re-implement panel review. Per-PR shepherding is delegated; per-issue
verification, PR-in-flight branching, fix dispatch, completion, and
the cross-session table are owned here.

The skill is ADVISORY at the panel layer and EXECUTIVE at the
orchestrator layer: it WILL push commits, open PRs, post comments,
close superseded PRs. Every consequential write goes through a
deterministic CLI (`gh`, `git`, `uv run ruff`) wrapped in plan +
execute + verify (A9 SUPERVISED EXECUTION).

Captures all 15 local skills (including batch-bug-shepherd) and the
copilot-app-db workflow entry into apm.lock.yaml so subsequent runs
(apm install --frozen, drift-check) see the canonical post-install
state.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

APM Review Panel: ship_with_followups

warn-not-fail on forward-compat Copilot App schema bumps + first cross-skill composition primitive (batch-bug-shepherd)

cc @danielmeppiel @sergio-sisternes-epam -- a fresh advisory pass is ready for your review.

Phase A is a high-confidence, well-scoped fix that unblocks every user on a newer Copilot App without compromising the hard-fail for versions below the structural minimum. The test suite is substantive -- 42 passing unit tests against real SQLite, including parametrized forward-compat and dedup scenarios. The test-coverage expert's evidence block confirms the core user promise holds (TestVersionGuard::test_warns_but_continues_above_max). The one latent risk supply-chain-security raised -- that warn-and-continue removes the last write-safety gate -- is real but overstated: SQLite will still raise on structurally incompatible writes, and the PR description explicitly documents the _KNOWN_BREAKING escape valve for future genuinely-breaking bumps. The CHANGELOG gap is the only pre-merge blocker in Phase A: two independent panelists (doc-writer, oss-growth-hacker) flag it, and a behavior change of this kind needs a documented migration line per APM's own operating principles. The warning message UX issues (angle-bracket placeholder, [!] prefix duplication, compound && chain) are three-panelist-convergent and trivially fixable before merge.

Phase B is structurally sound as APM's first cross-skill composition primitive, and the consent gate (enabled=0) is correctly enforced. Two issues require resolution before or immediately after merge. First, the OSS Growth blocking finding on auto-closing contributor PRs is real contributor-trust risk: a maintainer running sweep-all for the first time could close ten community PRs before understanding the consequence. This is a documentation-only fix in SKILL.md -- it does not require code changes -- but it must land before or in this PR, not as a follow-up, because the skill is already installable. Second, supply-chain-security's prompt injection finding is recommended, not blocking, but it is architecturally important for the long-term credibility of the skill-composition pattern: triage-prompt.md should wrap raw issue content in clearly delimited untrusted-data blocks before this skill is used against a live repo with external contributors.

The TOCTOU race on _warned_user_versions (python-architect, supply-chain-security) is a latent trap only relevant if the CLI gains a thread-pool install path. The CLI is demonstrably single-threaded today; defer to a follow-up with a threading.Lock. The capsys-Rich Console fragility (test-coverage-expert, outcome: unknown) is a real latent test trap but does not affect today's green run; it belongs in the follow-up queue ranked above the TOCTOU fix because it will silently bite the next engineer who installs rich in the test environment.

Dissent. OSS Growth classified the SKILL.md auto-close-PR warning as blocking; no other panelist matched that severity on a documentation-only file covering a consent-gated (enabled=0) skill. The finding is real but the mechanism is doc, not code. CEO resolution: treat it as a pre-merge required fix (add the warning block to SKILL.md before merge) rather than a ship-blocking rework. The skill is already guarded by enabled=0 and requires an explicit manual trigger -- the risk window is narrower than a blocking verdict implies, but the fix cost is zero so there is no reason to defer it. Supply-chain-security's prompt-injection finding is recommended; no panelist elevated it to blocking. CEO agrees with that calibration given the skill is new, not yet run against live data, and a remediation path (XML-delimited untrusted-data blocks) is well-defined.

Aligned with: Pragmatic-as-npm: Phase A mirrors npm's engine-version warning pattern -- warn on forward-compatible mismatches, do not abort. The devx-ux expert confirmed this framing pre-code. The no-cap rationale is sound package-manager reasoning. OSS-community-driven: Phase B introduces APM's first cross-skill composition primitive; the auto-close-PR finding is a direct community-trust risk that must be documented in SKILL.md before any external maintainer runs sweep-all. Secure-by-default: The consent gate (enabled=0) on the workflow row is correct posture; the prompt injection gap in triage-prompt.md is a follow-up priority before the skill is used at scale.

Growth signal. Two distinct launch beats are available from this PR. Phase A is a silent-churn recovery story: users who updated GitHub Copilot App and hit a hard install failure get unblocked with no action required. This belongs at the top of the next release note, not buried as a bug entry. Phase B is APM's first composable-skill story -- 'skills compose via filesystem siblings' is the Unix-pipe moment for AI-native package management primitives. A standalone docs page (Maintainer Skills) and a brief external post would surface this pattern to evaluators and contributors who currently have no visibility into .apm/.

Panel summary

Persona B R N Takeaway
Python Architect 0 1 2 Phase A is clean and well-reasoned. One thread-safety gap in the TOCTOU dedup check; lazy import is justified; test-helper exposure is acceptable Python idiom.
CLI Logging Expert 0 1 2 No double-prefix or encoding faults. One recommended fix on the angle-bracket placeholder; two nits on style consistency and message length.
DevX UX Expert 0 1 2 Phase A warn-and-continue is well-designed; one actionable nit on warning prefix and one recommended fix for the feedback command UX.
Supply Chain Security Expert 0 2 1 Schema gate relaxation is contained (SQLite will hard-error on incompatible writes); prompt injection via verbatim issue-body in batch-bug-shepherd is the real risk.
OSS Growth Hacker 1 3 1 Phase A is a silent-churn fix worth a prominent release beat; Phase B ships APM's first composable-skill story but needs a contributor-trust guardrail callout in docs.
Doc Writer 0 1 2 SKILL.md style is consistent with apm-review-panel; CHANGELOG.md missing warn-not-fail entry for Phase A behavior change; no blocking doc issues.
Test Coverage Expert 0 2 1 Warn-not-fail behavior has solid unit coverage against real SQLite; two gaps: capsys fragility when Rich is installed, and no integration-tier test for the full apm-install CLI path.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 5 follow-ups

  1. [OSS Growth Hacker] (blocking-severity) Add explicit auto-close-PR warning block to SKILL.md before merge -- A maintainer running sweep-all for the first time could silently close community contributor PRs. The fix is a doc-only addition to SKILL.md -- zero code cost, zero delay. Must land in this PR, not as a follow-up, because the skill is already installable.
  2. [Doc Writer] (blocking-severity) Add CHANGELOG.md [Unreleased] entry for warn-not-fail Phase A behavior change -- Two independent panelists (doc-writer, oss-growth-hacker) flag this gap. A behavior change from hard-fail to warn-and-continue requires a documented migration line per APM operating principles. Without it, users hitting schema mismatches after upgrade have no documented signal this is intentional.
  3. [Supply Chain Security Expert] Wrap raw issue-body content in untrusted-data delimiters in triage-prompt.md -- ISSUE_BODY_OR_REPRO is passed verbatim from GitHub issues into a triage subagent whose return JSON propagates into write-capable completion subagents. XML-style delimiters with an explicit untrusted-data instruction are the standard mitigation. Priority before any live sweep-all run against a repo with external contributors.
  4. [Test Coverage Expert] Fix capsys + Rich Console singleton fragility in TestVersionGuard tests -- Rich Console stores the sys.stdout reference at instantiation time; pytest's capsys swaps it per-test. Tests pass today only because rich is not installed in the test environment. Adding _reset_console() to _reset_user_version_warning_state() is a one-line fix that prevents a silent future trap.
  5. [Test Coverage Expert] Add integration-with-fixtures test for apm install against a version-bumped Copilot App DB -- Current integration tests mock copilot_app_db -- they never exercise _check_user_version through a real DB. If hard-fail is re-introduced in a future refactor, no integration test catches it. Evidence outcome: missing on a behavior-change surface.

Architecture

classDiagram
    direction LR

    class copilot_app_db {
        <<IOBoundary>>
        +_MIN_SUPPORTED_USER_VERSION int
        +_MAX_SUPPORTED_USER_VERSION int
        +_warned_user_versions set
        +deploy_workflow(db_path, row) str
        +delete_workflows(db_path, ids) int
        +list_managed_workflow_ids(db_path) list
        +resolve_copilot_app_db_path() Path
        -_check_user_version(conn) int
        -_reset_user_version_warning_state() None
        -_begin_immediate_with_retry(conn) None
        -_validate_row(row) None
    }

    class WorkflowRow {
        <<ValueObject>>
        +id str
        +name str
        +prompt str
        +interval str
        +schedule_hour int
        +enabled int
        +mode str
        +model str
    }

    class CopilotAppDbError {
        <<Exception>>
    }
    class CopilotAppDbMissingError {
        <<Exception>>
    }
    class CopilotAppDbSchemaError {
        <<Exception>>
    }
    class CopilotAppDbLockedError {
        <<Exception>>
    }

    class console {
        <<Utility>>
        +_rich_warning(msg) None
    }

    CopilotAppDbError <|-- CopilotAppDbMissingError
    CopilotAppDbError <|-- CopilotAppDbSchemaError
    CopilotAppDbError <|-- CopilotAppDbLockedError
    copilot_app_db *-- WorkflowRow : accepts
    copilot_app_db ..> CopilotAppDbSchemaError : raises
    copilot_app_db ..> CopilotAppDbMissingError : raises
    copilot_app_db ..> CopilotAppDbLockedError : raises
    copilot_app_db ..> console : lazy import in warn path

    class copilot_app_db:::touched
    class WorkflowRow:::touched

    classDef touched fill:#fff3b0,stroke:#d47600
Loading
flowchart TD
    A([apm install / apm uninstall]) --> B[deploy_workflow / delete_workflows]
    B --> C{db_path exists?}
    C -- no --> D[raise CopilotAppDbMissingError]
    C -- yes --> E[_validate_row]
    E --> F[_open_connection]
    F --> G[_check_user_version via PRAGMA user_version]
    G --> H{version less than MIN=13?}
    H -- yes --> I[raise CopilotAppDbSchemaError]
    H -- no --> J{version greater than MAX=15?}
    J -- yes, not in _warned_user_versions --> K[[_rich_warning to stderr, dedup via module-level set]]
    K --> L[_begin_immediate_with_retry]
    J -- no or already warned --> L
    L --> M{acquire write lock?}
    M -- locked --> N[raise CopilotAppDbLockedError]
    M -- acquired --> O{row exists?}
    O -- INSERT --> P[INSERT INTO workflows, enabled forced to 0]
    O -- UPDATE --> Q{execution-affecting fields changed?}
    Q -- yes --> R[UPDATE + reset enabled=0]
    Q -- no --> S[UPDATE name/model only]
    P --> T[COMMIT]
    R --> T
    S --> T
    T --> U[return copilot-app-db URI]
Loading

Recommendation

Ship after two pre-merge doc fixes land in this PR: (1) add the auto-close-PR warning block to SKILL.md, and (2) add the CHANGELOG [Unreleased] entry for the warn-not-fail behavior change. Both are documentation-only, zero-risk changes that can be committed directly to the branch. Phase A is production-ready: the core user promise is test-verified against real SQLite, the trade-offs are explicitly documented in the PR body, and the UX warning message issues (angle-bracket placeholder, [!] duplication, && chain) are three-panelist-convergent nits the author should address in the same commit as the CHANGELOG fix. Phase B ships correctly consent-gated (enabled=0) and the cross-skill composition pattern is sound. The prompt injection and capsys fragility items are genuine follow-ups but do not block merge given the skill's current pre-production status and the narrow consent gate.


Full per-persona findings

Python Architect

  • [recommended] TOCTOU race on _warned_user_versions check-then-add is not atomic at src/apm_cli/integration/copilot_app_db.py:346
    version not in _warned_user_versions and _warned_user_versions.add(version) are two separate operations. In any concurrent caller scenario, two threads can both pass the not in test before either completes the add. The CLI is currently single-threaded so this does not bite today, but the module-level mutable set without a lock is a latent trap as parallel installs are on the roadmap.
    Suggested: Protect the check-and-add with a threading.Lock.

  • [nit] Late import of _rich_warning inside _check_user_version has no documented justification at src/apm_cli/integration/copilot_app_db.py:349
    Suggested: Add # deferred to avoid importing console at module init above the import, or hoist it to the top of the file.

  • [nit] _reset_user_version_warning_state exposes module internals but is an accepted Python test-helper idiom at src/apm_cli/integration/copilot_app_db.py:316
    No action required unless a project-wide @testing_only marker is introduced.

CLI Logging Expert

  • [recommended] -b 'Observed: <describe>' leaves a literal angle-bracket placeholder the user must know to replace at src/apm_cli/integration/copilot_app_db.py
    Users who copy-paste the suggested gh issue new command verbatim will file an issue with body text Observed: <describe> -- an unhelpful stub.
    Suggested: Replace with --web so gh issue new opens the browser form, or change placeholder to -b 'Observed: TODO_DESCRIBE_BEHAVIOR'.

  • [nit] [!] is embedded in the f-string rather than passed as symbol="warning" to _rich_warning at src/apm_cli/integration/copilot_app_db.py
    No double-prefix risk (markup=False is set); minor convention drift from callers that use symbol="warning".

  • [nit] Warning message is a single long line (~220 chars) that will wrap unpredictably on narrow terminals at src/apm_cli/integration/copilot_app_db.py
    Suggested: Split into two _rich_warning calls or use \n to indent the recovery command on a second line.

DevX UX Expert

  • [recommended] Warning message embeds a raw multi-step shell pipeline that users must copy-paste-split manually at src/apm_cli/integration/copilot_app_db.py
    Printing a compound && chain in a single-line warning breaks on Windows cmd.exe and obscures the two distinct actions. Package-manager convention emits the simplest self-contained action.
    Suggested: Drop the apm --version && prefix or split into two lines.

  • [nit] Warning prefix [!] duplicates what _rich_warning already signals visually at src/apm_cli/integration/copilot_app_db.py
    Suggested: Remove [!] from the start of the f-string.

  • [nit] batch-bug-shepherd prompt uses mode: interactive for a long-running batch orchestrator at .apm/prompts/batch-bug-shepherd.prompt.md
    Suggested: Document in the prompt that the session is long-running and the user should expect async subagent waves.

Supply Chain Security Expert

  • [recommended] Verbatim issue-body passed to write-capable subagents enables indirect prompt injection at .apm/skills/batch-bug-shepherd/assets/triage-prompt.md
    ISSUE_BODY_OR_REPRO is passed verbatim from GitHub issues into the agent context. A malicious issue reporter can craft an issue body with embedded instructions that manipulate the triage return JSON and propagate into the completion wave, which executes git/gh commands.
    Suggested: Wrap raw issue content in XML-style <issue-body> tags with an explicit instruction that the model must treat everything inside as untrusted user data.

  • [recommended] warn-and-continue on unknown schema versions removes the last write-safety gate for non-additive App schema changes at src/apm_cli/integration/copilot_app_db.py
    A schema change that adds a column with a default would allow INSERT to succeed with semantically incorrect data.
    Suggested: Supplement with a post-write schema assertion: after any INSERT or UPDATE, read back the row and assert columns APM cares about match what was written.

  • [nit] _warned_user_versions is a process-global mutable set with no thread safety at src/apm_cli/integration/copilot_app_db.py
    Not a security issue today; harmless duplicate warnings in a concurrent scenario.

OSS Growth Hacker

  • [blocking] Skill can auto-close community contributor PRs -- this must be surfaced in the skill description before any maintainer runs sweep-all at .apm/skills/batch-bug-shepherd/SKILL.md
    SKILL.md documents 'Close the original PR with a courteous handoff comment referencing the superseding PR.' A contributor who sees their PR auto-closed by an agent will read this as rejection. A maintainer running sweep-all for the first time could close ten community PRs before realising the consequence.
    Suggested: Add a visible warning block near the top of SKILL.md: 'This skill WILL close community contributor PRs when it opens a superseding PR. Use targets: <specific-issue-list> rather than sweep-all for your first run.'

  • [recommended] Phase A fix is a first-run conversion win that needs a prominent CHANGELOG story, not a buried bug entry at CHANGELOG.md
    Suggested: Lead with: 'Copilot App users no longer see install failures after a Copilot App update -- APM now warns and continues on forward-compatible schema bumps.'

  • [recommended] First cross-skill composition is APM's strongest OSS primitive story to date -- needs a launch beat at .apm/skills/batch-bug-shepherd/SKILL.md
    Suggested: Add a Composition section to the docs site and pair with a release post angle.

  • [recommended] batch-bug-shepherd ships 1726 lines with no docs-site entry -- feature will be invisible at docs/src/content/docs/
    Suggested: Add at minimum a one-paragraph callout in what-is-apm.md or a new Maintainer Skills page.

  • [nit] Warning message gh issue new -b 'Observed: ' arrives blank -- filed issues have no actionable data at src/apm_cli/integration/copilot_app_db.py

Auth Expert -- inactive

No auth-surface files changed; copilot_app_db.py schema gate is SQLite version checking, not credential or token management.

Doc Writer

  • [recommended] CHANGELOG.md has no entry for the warn-not-fail copilot-app schema behavior change at CHANGELOG.md
    Users upgrading and hitting schema mismatches will have no documented signal that this is intentional behavior, not a regression.
    Suggested: Add under [Unreleased] ### Fixed: 'apm install --target copilot-app no longer fails hard on unrecognized Copilot App DB schema versions; it now warns and continues. (fix(copilot-app): warn-not-fail on newer App schema + feat: batch-bug-shepherd primitive #1434)'

  • [nit] SKILL.md lacks an explicit invocation trigger section heading at .apm/skills/batch-bug-shepherd/SKILL.md
    Suggested: Extract the Activate when... clause from the description into a ## When to activate section.

  • [nit] No cross-link from SKILL.md to the Copilot App prompt file at .apm/skills/batch-bug-shepherd/SKILL.md
    Suggested: Add to ## Bundled assets: '../../prompts/batch-bug-shepherd.prompt.md -- Copilot App workflow entry point for this skill.'

Test Coverage Expert

  • [recommended] capsys + Rich Console singleton is fragile: stale stdout reference breaks capture when Rich is installed at tests/unit/integration/test_copilot_app_db.py
    _rich_warning() routes through a module-level Console singleton. When rich is installed, Console stores the sys.stdout reference at instantiation time; pytest's capsys swaps it per-test. Tests pass today only because rich is not installed in the test environment.
    Suggested: In _reset_user_version_warning_state(), add: from apm_cli.utils.console import _reset_console; _reset_console().
    Proof (unknown at unit): tests/unit/integration/test_copilot_app_db.py::TestVersionGuard::test_warns_but_continues_above_max -- proves: APM warns the user when the Copilot App DB schema is newer than tested, instead of failing the install

  • [recommended] No integration-with-fixtures test exercises apm install CLI against a version-bumped Copilot App DB at tests/integration/test_install_services_orchestration.py
    Current integration tests mock copilot_app_db -- _check_user_version is never exercised through a real DB. Grep confirms zero user_version matches in tests/integration/.
    Suggested: Add test_copilot_app_install_warns_on_bumped_schema that creates a real SQLite DB at user_version=16, sets APM_COPILOT_APP_DB, invokes deploy_workflow, and asserts no CopilotAppDbSchemaError is raised.
    Proof (missing at integration-with-fixtures): tests/integration/test_install_services_orchestration.py::test_copilot_app_install_warns_on_bumped_schema -- proves: A user with a Copilot App DB that was auto-upgraded can still run apm install without failure

  • [nit] Warning state reset relies on manual per-test call; an autouse fixture would prevent future test pollution at tests/unit/integration/test_copilot_app_db.py
    Suggested: Add @pytest.fixture(autouse=True) to TestVersionGuard that calls _reset_user_version_warning_state() before and after each test.

This panel is advisory. It does not block merge. Re-apply the
panel-review label after addressing feedback to re-run.

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • pypi.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "pypi.org"

See Network Configuration for more information.

Generated by PR Review Panel for issue #1434 · ● 3.3M ·

@github-actions github-actions Bot removed the panel-review Trigger the apm-review-panel gh-aw workflow label May 21, 2026
danielmeppiel and others added 6 commits May 21, 2026 21:04
Move the apm-review-panel skill out of the microsoft/apm monorepo's
shared .apm/skills/ tree into a dedicated, publishable APM package
under packages/apm-review-panel/ using the HYBRID layout (apm.yml +
SKILL.md at the root, with co-located assets/ and evals/).

The root apm.yml now declares the package via a local-path manifest
dep, so 'apm install' at root continues to deploy the skill into
.agents/skills/apm-review-panel/ -- but the dependency is now
explicit and inspectable via 'apm deps list' instead of relying on
the includes-auto walk of the monorepo .apm/ tree.

Rule-of-three justification (see genesis-plan-v2.md step 3.5):
panel is independently useful for any single-PR review (not just
batch shepherding), so extraction unlocks consumer reuse and
prevents the consumer from being forced to install the shepherd
to get the panel.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…package

Move batch-bug-shepherd from the monorepo .apm/skills/ and .apm/prompts/
trees into a dedicated, publishable APM package under
packages/batch-bug-shepherd/ using the multi-primitive .apm/ layout
so the package can ship BOTH the skill and the workflow prompt:

  packages/batch-bug-shepherd/
    apm.yml
    .apm/
      skills/batch-bug-shepherd/
        SKILL.md
        assets/...
      prompts/
        batch-bug-shepherd.prompt.md

The package declares its dep on apm-review-panel via a local-path
manifest entry (../apm-review-panel, anchored to the package dir per
APM's local-path rule). The shepherd-prompt asset still probes the
panel by NAME at use-site as the A9 SUPERVISED EXECUTION backstop
if the harness registry is bypassed -- belt + suspenders, see
genesis-plan-v2.md step 3.5 'Declaration mechanism per external
module'.

Root apm.yml now depends on packages/batch-bug-shepherd via local
path; 'apm install' continues to deploy the skill to
.agents/skills/batch-bug-shepherd/ and the workflow row to the App
SQLite DB (with enabled=0 per the consent gate) -- but the path is
now manifest-declared and the package is independently shippable:

    apm install microsoft/apm/packages/batch-bug-shepherd

works in any consumer repo and transitively pulls apm-review-panel.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes HIGH finding #1 from genesis-critique.md: the workflow prompt's
'Hard rules' block restated four disciplines (ASCII-only, lint
contract, mutation-break gate, single-writer per comment) that already
live in the SKILL.md body. Two sources of truth invited drift -- if
the skill body added a new gate, the prompt would not auto-update.

Collapse the section to a single 'Delegation' paragraph naming the
skill as the authoritative source for all disciplines. The prompt
now carries only the per-trigger-surface contract (workflow
frontmatter + the input parameter + the 6-step ACTIVATE / SCOPE /
PLAN / INITIALIZE / EXECUTE / RENDER procedure that summons the
skill); every discipline travels with the skill body.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes HIGH finding #2 from genesis-critique.md: the v1 ship deferred
the evals gate. This commit authors the minimum set the gate
requires, mirroring the pr-description-skill/evals/ shape so a
single CI lane can score both:

  evals/
    evals.json              - manifest + gates (val split = ship gate)
    triggers.json           - 10 fire + 10 no_fire, 60/40 train/val
    content/                - 2 scenario manifests + rubrics
      three-issues-mixed.json
      sweep-bug-queue.json
    fixtures/               - 4 markdown fixtures (with/without skill x 2)
    README.md
    .gitignore              - results/ (timestamped runner output)
  scripts/
    run_evals.py            - stdlib-only, deterministic matcher

The no-fire trigger set deliberately includes queries that SHOULD
route to apm-review-panel ('review my PR', 'panel-review this PR')
so DISPATCH COLLISION between the two skills would surface as
val-no-fire-rate dropping below 0.5.

Val split result on first run: trigger fire rate 1.0, no-fire rate
1.0, content delta_anchors 8 and 7 (gates require >= 1). Overall:
passed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes HIGH finding #3 from genesis-critique.md: the original
description was 950/1024 chars, too close to the runtime cap to
absorb future trigger additions, and it under-named the indirect
maintainer phrasings the skill actually wants to catch.

Trim ~75 chars while ADDING two indirect triggers explicitly
exercised in evals/triggers.json (fire-07 'shepherd all bug-flagged
issues this quarter', fire-08 'weekly sweep of community-reported
issues'):

  - 'reproduction subagent per candidate issue' -> 'triage subagent
    per issue' (the procedural detail belongs in the body, not the
    dispatch surface)
  - collapsed the parenthetical workflow branch into a single ->
    arrow
  - added 'shepherd all bug-flagged issues this quarter', 'run a
    weekly sweep of community-reported issues', 'work down community
    bug contributions' to the Activate list
  - kept the imperative shape, intent-first ordering, and the
    '-- even if shepherd or batch is not named' tail

Verified: val split evals still report fire 1.0 / no-fire 1.0.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…age extraction

Mechanical `apm install --target copilot,copilot-app --force` output
after the package extraction. The previous lockfile listed the two
skills under `local_deployed_files` (root .apm/ tree); they now
appear as proper `dependencies` entries because they ship as
self-contained packages under `packages/`. Also re-materializes
`.agents/skills/{apm-review-panel,batch-bug-shepherd}/` with the
trimmed dispatch description and the moved workflow asset, since
those compiled trees are tracked in this repo.

No source changes -- just the compiled-output snapshot.

Verified:
  - ruff check + ruff format --check: silent
  - apm install --target copilot,copilot-app --force: clean
  - sqlite3 ~/.copilot/data.db -> workflow enabled=0 (consent gate ok)
  - standalone apm install from a scratch dir resolves
    apm-review-panel transitively via the batch-bug-shepherd
    manifest dep (depth: 2, resolved_by: _local/batch-bug-shepherd)
  - val split evals: trigger fire 1.0, no-fire 1.0, content passed

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

Genesis-driven repackaging: batch-bug-shepherd + apm-review-panel extracted to packages/

Follow-up on the v1 ship of batch-bug-shepherd. The genesis critique surfaced three HIGH findings (hidden coupling in the workflow prompt, missing evals gate, dispatch description too tight against the 1024-char cap). This extension closes all three AND turns both primitives into self-contained, standalone-installable APM packages that any maintainer can take a dependency on without dragging in the rest of microsoft/apm.

What changed (6 commits)

Commit Layer Intent
fd55a172 packages Extract apm-review-panel -> packages/apm-review-panel/ (HYBRID layout)
c3af73d3 packages Repackage batch-bug-shepherd -> packages/batch-bug-shepherd/ (.apm/ layout, declares ../apm-review-panel as manifest dep)
a385560f skills Collapse HIDDEN COUPLING in batch-bug-shepherd.prompt.md (Hard rules block -> single Delegation paragraph that names the skill as authoritative)
89ecd34c skills Add content + trigger evals (10 fire + 10 no_fire, 60/40 train/val; 2 content scenarios; stdlib runner)
19b55fa1 skills Tighten dispatch description (~950 -> ~875 chars) and add 2 indirect triggers (shepherd all bug-flagged issues this quarter, weekly sweep of community-reported issues)
62d63bb8 install Regenerate .agents/skills/ + apm.lock.yaml after extraction

Package topology

flowchart TD
    subgraph ROOT["microsoft/apm (root apm.yml)"]
        ROOTMANIFEST["dependencies.apm:<br/>./packages/apm-review-panel<br/>./packages/batch-bug-shepherd"]
    end
    subgraph PKG1["packages/apm-review-panel (HYBRID)"]
        PANELYAML[apm.yml]
        PANELSKILL[SKILL.md]
        PANELASSETS[assets/*]
    end
    subgraph PKG2["packages/batch-bug-shepherd (.apm/ layout)"]
        SHEPYAML["apm.yml<br/>dependencies.apm:<br/>../apm-review-panel"]
        SHEPSKILL[.apm/skills/batch-bug-shepherd/]
        SHEPPROMPT[.apm/prompts/batch-bug-shepherd.prompt.md]
    end
    ROOT --> PKG1
    ROOT --> PKG2
    PKG2 -. local manifest dep .-> PKG1
Loading

Dependency resolution

flowchart LR
    CONSUMER[Any consumer apm.yml<br/>dependencies.apm:<br/>'/abs/path/packages/batch-bug-shepherd']
    SHEP[packages/batch-bug-shepherd]
    PANEL[packages/apm-review-panel]
    CONSUMER -- MANIFEST DEP --> SHEP
    SHEP -- LOCAL SIBLING<br/>'../apm-review-panel' --> PANEL
Loading

Local-path deps anchor relative to the declaring package's directory (npm/pip/cargo parity), so ../apm-review-panel inside packages/batch-bug-shepherd/apm.yml resolves correctly whether the consumer installs from the monorepo root or pins the absolute path to packages/batch-bug-shepherd/.

Verification evidence

Lint (CI-mirror, must be silent):

$ uv run --extra dev ruff check src/ tests/ && uv run --extra dev ruff format --check src/ tests/
All checks passed!
1037 files already formatted

apm install --target copilot,copilot-app --force in this repo:

[+] ./packages/apm-review-panel (local)
|-- Skill integrated -> .agents/skills/
[+] ./packages/batch-bug-shepherd (local)
|-- 1 prompts integrated -> copilot-app/workflows/
|-- workflows arrive disabled; enable from the Copilot App's Workflows tab
|-- 1 skill(s) integrated -> .agents/skills/
[+] ../apm-review-panel (local)
|-- (files unchanged)            <-- transitive dep deduplicated
[*] Installed 4 APM dependencies in 2.9s.

SQLite consent gate (workflows must arrive disabled):

$ sqlite3 ~/.copilot/data.db "SELECT id, enabled FROM workflows WHERE id LIKE '%batch-bug%'"
apm--local--_local--batch-bug-shepherd|0
apm--_local--batch-bug-shepherd--batch-bug-shepherd|0

Standalone install from a scratch dir (proves the package works as a third-party dep):

# scratch/apm.yml
dependencies:
  apm:
    - /abs/path/to/packages/batch-bug-shepherd
$ apm install --target copilot
[+] /.../packages/batch-bug-shepherd (local)
|-- 1 skill(s) integrated -> .agents/skills/
[+] ../apm-review-panel (local)
|-- Skill integrated -> .agents/skills/
[*] Installed 2 APM dependencies in 0.8s.

Generated lockfile shows the transitive resolution with provenance:

- repo_url: _local/apm-review-panel
  depth: 2
  resolved_by: _local/batch-bug-shepherd
  package_type: hybrid
  local_path: ../apm-review-panel

Eval gates (val split = ship gate):

$ python packages/batch-bug-shepherd/.apm/skills/batch-bug-shepherd/scripts/run_evals.py --split val
triggers:  fire 1.0 / no-fire 1.0  passed
content:   delta_anchors 8 and 7   passed
overall:   passed

The no-fire set deliberately includes queries that should route to apm-review-panel instead (review my PR, panel-review this PR) so DISPATCH COLLISION between the two skills would surface as the val-no-fire rate dropping below 0.5.

How HIGH findings were closed

  • HIDDEN COUPLING (fd55a172 + a385560f): the prompt no longer duplicates skill rules. It hands off via a Delegation paragraph that names batch-bug-shepherd as authoritative. The skill is now activate-by-name, not path-coupled.
  • MISSING EVALS GATE (89ecd34c): full trigger + content harness committed, runner is stdlib-only, gates documented in evals/README.md.
  • DISPATCH-DESCRIPTION TIGHTNESS (19b55fa1): trimmed ~75 chars while adding 2 user-named indirect triggers; evals confirm both new triggers fire.

Modularity gain

Consumers can now write:

dependencies:
  apm:
    - github.com/microsoft/apm/packages/apm-review-panel       # panel only, for any single-PR review use case
    - github.com/microsoft/apm/packages/batch-bug-shepherd     # full shepherd, panel pulled in transitively

The rule-of-three holds: apm-review-panel is useful pre-shepherd, with-shepherd, and for future single-PR review workflows. Extraction unlocks that without dragging the shepherd in.


Design plan, mermaid diagrams, dependency graph, phantom-dependency guard, and the full evals matrix are persisted in genesis-plan-v2.md in the agent's workspace.

danielmeppiel and others added 5 commits May 21, 2026 22:23
A sub-package's local-path dep appears in the root lockfile with
resolved_by set. The root manifest cannot make it go away by editing
its dependencies.apm list, so flagging it as an orphan creates an
unfixable CI failure.

Surfaced by PR #1434, where batch-bug-shepherd declares ../apm-review-panel
as a manifest dep. Both that transitive entry and the depth=1 root entry
land in apm.lock.yaml; the orphan check was flagging the depth=2 one.

Restrict orphan detection to direct deps (resolved_by is None). Add a
regression test that covers the exact shape.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous commit fixes _check_no_orphans on the consumer side, but
the audit gate in microsoft/apm-action@v1 pins APM 0.14.0, which does
not yet have the fix. Until a release including the fix ships, the
'../apm-review-panel' transitive local-path entry would keep the gate
failing.

Drop the manifest dep from packages/batch-bug-shepherd/apm.yml; the
runtime activate-by-name probe in assets/shepherd-prompt.md remains
the working backstop, and the root manifest declares both packages
directly. Inline comment documents the rationale and the restoration
condition.

Lockfile regenerated by clean install (apm_modules wiped first); all
8 audit checks pass locally.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Test was added on main after this PR forked. It assumed the old
hard-fail behavior; our commit 05ea778 changed schema-too-new to
warn-and-continue. Update assertion to match.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel merged commit ef74bdd into main May 21, 2026
12 checks passed
@danielmeppiel danielmeppiel deleted the danielmeppiel/copilot-app-schema-warn-and-meta-prompt branch May 21, 2026 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants