fix(copilot-app): warn-not-fail on newer App schema + feat: batch-bug-shepherd primitive#1434
Conversation
The Copilot App ships fast and bumps PRAGMA user_version on additive changes that do not break APM's read/write surface. Hard-failing every install whose user_version exceeded the highest version APM was tested against made every Copilot App release a release window for APM -- awful UX for users who simply updated their App. Change: - Bump _MAX_SUPPORTED_USER_VERSION 13 -> 15 (newly observed, working). - Above max: warn-and-continue once per process (deduped by version) via _rich_warning, with exact wording supplied by the devx-ux-expert persona: names the version delta and points the user at a ready-to-file issue title for breakage reports. - Below min: continues to hard-fail -- the workflows table may genuinely not exist on a pre-workflows schema. Wording verbatim per devx-ux verdict (ship_as_proposed). No cap at +N: warning already names the exact delta, giving signal proportional to risk. If a future version truly breaks reads, add it to a _KNOWN_BREAKING list and hard-fail on that specific version. Tests cover v16/v17/v50 warn-not-raise, the <13 hard-fail path, and the per-process dedup contract for multi-row installs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… prompt Ships two composed APM primitives: 1. .apm/skills/batch-bug-shepherd/ -- the working spec for driving a batch of suspected bugs in microsoft/apm from raw issue list to mergeable PR queue. Fans out one reproduction subagent per candidate (LEGIT / UNCLEAR / FIXED-AT-HEAD), cross-references against open PRs, then branches: * in-flight community PR -> shepherd via .apm/skills/apm-review-panel * no PR -> fix session with TDD and mutation-break gate Dispatches one completion subagent per shepherd verdict to resolve panel follow-ups and post a ready-to-merge confirmation. Maintains a single plan.md ground-truth table as canonical session state. 2. .apm/prompts/batch-bug-shepherd.prompt.md -- a Copilot App workflow prompt (interval: manual, mode: interactive) that loads the skill above and accepts a 'targets' input (either an issue list or the literal 'sweep-all'). Lands in ~/.copilot/data.db as a workflow row with enabled=0 (consent gate); user must opt in via the Copilot App Workflows tab before it runs. The skill composes with the existing .apm/skills/apm-review-panel/ via a relative sibling link -- the shepherd phase delegates panel review to that skill rather than reinventing it. Design rationale lives in the genesis-plan.md authored during the genesis design pass; that artifact is intentionally not shipped (it is design scaffolding, not a user-consumed primitive). apm.lock.yaml regenerated by 'apm install --target copilot,copilot-app' to include both the skill path and the synthetic copilot-app-db://workflows/apm--local--_local--batch-bug-shepherd URI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Drop hard-coded .apm/skills/ path probes from the meta-prompt, the SKILL.md composition section, and the shepherd-prompt asset. Skills are activated by NAME by the harness; the meta-prompt and skill body no longer assume APM-on-disk layout. This lets the same primitive load identically inside Copilot CLI, Copilot App workflows, Claude Code, Codex, or any other harness that resolves skills by name. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR updates the Copilot App DB schema guard to warn (instead of hard-fail) when PRAGMA user_version is newer than APM’s tested max, and introduces a new batch-bug-shepherd skill plus a Copilot App workflow prompt to drive batch bug triage/shepherding.
Changes:
- Relax Copilot App DB schema gating: hard-fail only when
user_versionis below min; warn-and-continue (deduped) when above max. - Add
batch-bug-shepherdskill assets + workflow prompt for interactive batch bug processing. - Regenerate/install outputs and lockfile entries for Copilot App workflow deployment.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/integration/test_copilot_app_db.py | Replaces above-max hard-fail test with forward-compat warning + dedup assertions. |
| src/apm_cli/integration/copilot_app_db.py | Implements warn-and-continue for newer schemas, bumps tested max, adds warning dedup state. |
| apm.lock.yaml | Records locally deployed Copilot App workflow URI. |
| .apm/skills/batch-bug-shepherd/SKILL.md | Adds orchestrator skill specification and composition contract with apm-review-panel. |
| .apm/skills/batch-bug-shepherd/assets/triage-prompt.md | Adds WAVE 1 triage subagent spawn body. |
| .apm/skills/batch-bug-shepherd/assets/shepherd-prompt.md | Adds WAVE 2a shepherd subagent spawn body (delegates to apm-review-panel). |
| .apm/skills/batch-bug-shepherd/assets/fix-prompt.md | Adds WAVE 2b fix subagent spawn body with mutation-break + lint gates. |
| .apm/skills/batch-bug-shepherd/assets/completion-prompt.md | Adds WAVE 3 completion subagent spawn body for follow-ups + supersede flow. |
| .apm/skills/batch-bug-shepherd/assets/final-report-template.md | Adds final report + PR comment templates used by orchestrator/completion. |
| .apm/skills/batch-bug-shepherd/assets/ground-truth-table.md | Adds plan.md ground-truth table template for cross-wave state. |
| .apm/skills/batch-bug-shepherd/assets/verdict-schema.json | Adds JSON schema for triage/shepherd/completion structured returns. |
| .apm/prompts/batch-bug-shepherd.prompt.md | Adds Copilot App workflow prompt that loads and runs the new skill. |
| .agents/skills/batch-bug-shepherd/SKILL.md | Adds compiled/installed copy of the skill spec for agent runtime consumption. |
| .agents/skills/batch-bug-shepherd/assets/triage-prompt.md | Compiled copy of triage prompt asset. |
| .agents/skills/batch-bug-shepherd/assets/shepherd-prompt.md | Compiled copy of shepherd prompt asset. |
| .agents/skills/batch-bug-shepherd/assets/fix-prompt.md | Compiled copy of fix prompt asset. |
| .agents/skills/batch-bug-shepherd/assets/completion-prompt.md | Compiled copy of completion prompt asset. |
| .agents/skills/batch-bug-shepherd/assets/final-report-template.md | Compiled copy of final report/comment templates. |
| .agents/skills/batch-bug-shepherd/assets/ground-truth-table.md | Compiled copy of ground-truth table template. |
| .agents/skills/batch-bug-shepherd/assets/verdict-schema.json | Compiled copy of verdict JSON schema. |
Copilot's findings
- Files reviewed: 20/20 changed files
- Comments generated: 7
| _MAX_SUPPORTED_USER_VERSION: int = 15 | ||
| """Highest ``PRAGMA user_version`` we are tested against. | ||
|
|
||
| When the app ships a newer schema we refuse to write rather than risk |
| ``_MIN_SUPPORTED_USER_VERSION``. Versions above | ||
| ``_MAX_SUPPORTED_USER_VERSION`` only emit a warning. |
| if version > _MAX_SUPPORTED_USER_VERSION and version not in _warned_user_versions: | ||
| _warned_user_versions.add(version) | ||
| # Wording authored by the devx-ux-expert persona; ship verbatim. | ||
| from apm_cli.utils.console import _rich_warning | ||
|
|
||
| _rich_warning( | ||
| f"[!] Copilot App schema version {version} is newer than APM's " | ||
| f"tested maximum ({_MAX_SUPPORTED_USER_VERSION}). Proceeding " | ||
| f"anyway -- if you see unexpected behavior, run: apm --version " | ||
| f"&& gh issue new -R microsoft/apm -t 'Schema v{version} compat' " | ||
| f"-b 'Observed: <describe>'" | ||
| ) |
|
|
||
| | issue | verdict | pr | pr_in_flight | author | status | notes | | ||
| |-------|---------|----|--------------|--------|--------|-------| | ||
| | #___ | pending-triage | | | | pending-triage | seeded from <list-or-sweep>; awaiting wave 1 | |
| @@ -0,0 +1,133 @@ | |||
| { | |||
| "$schema": "http://json-schema.org/draft-07/schema#", | |||
| do not break APM's read/write surface; hard-failing made every | ||
| App release a release window for APM. See devx-ux verdict in | ||
| the schema-warn PR. |
| Use this skill to drive a batch of suspected bugs in microsoft/apm | ||
| from raw issue list to mergeable PR queue. Fan out one reproduction | ||
| subagent per candidate issue (verify on HEAD: LEGIT / UNCLEAR / | ||
| FIXED-AT-HEAD), cross-reference each legit issue against open PRs, | ||
| branch the workflow (in-flight community PR -> shepherd via the | ||
| apm-review-panel skill; no PR -> fix session with TDD and a | ||
| mutation-break gate), then dispatch one completion subagent per | ||
| shepherd verdict to resolve panel follow-ups, push to the contributor | ||
| fork (or open a superseding PR that preserves author authorship via | ||
| commit trailers) and post one ready-to-merge confirmation. Maintain a | ||
| single plan.md ground-truth table as the canonical session state. | ||
| Activate when the maintainer asks to triage a list of issues, sweep | ||
| the bug queue, drive in-flight PRs to merge, shepherd community | ||
| contributions, or work down the bug backlog -- even if "shepherd" or | ||
| "batch" is not named. | ||
| --- | ||
|
|
||
| # batch-bug-shepherd - Outer-loop bug-queue orchestrator | ||
|
|
||
| This skill is an A10 ORCHESTRATOR-SAGA over three fan-out waves | ||
| (triage, shepherd-or-fix, completion) with a persisted ground-truth | ||
| table between phases. It COMPOSES the | ||
| [apm-review-panel](../apm-review-panel/SKILL.md) skill -- it does NOT | ||
| re-implement panel review. Per-PR shepherding is delegated; per-issue | ||
| verification, PR-in-flight branching, fix dispatch, completion, and | ||
| the cross-session table are owned here. | ||
|
|
||
| The skill is ADVISORY at the panel layer and EXECUTIVE at the | ||
| orchestrator layer: it WILL push commits, open PRs, post comments, | ||
| close superseded PRs. Every consequential write goes through a | ||
| deterministic CLI (`gh`, `git`, `uv run ruff`) wrapped in plan + | ||
| execute + verify (A9 SUPERVISED EXECUTION). | ||
|
|
Captures all 15 local skills (including batch-bug-shepherd) and the copilot-app-db workflow entry into apm.lock.yaml so subsequent runs (apm install --frozen, drift-check) see the canonical post-install state. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
APM Review Panel:
|
| Persona | B | R | N | Takeaway |
|---|---|---|---|---|
| Python Architect | 0 | 1 | 2 | Phase A is clean and well-reasoned. One thread-safety gap in the TOCTOU dedup check; lazy import is justified; test-helper exposure is acceptable Python idiom. |
| CLI Logging Expert | 0 | 1 | 2 | No double-prefix or encoding faults. One recommended fix on the angle-bracket placeholder; two nits on style consistency and message length. |
| DevX UX Expert | 0 | 1 | 2 | Phase A warn-and-continue is well-designed; one actionable nit on warning prefix and one recommended fix for the feedback command UX. |
| Supply Chain Security Expert | 0 | 2 | 1 | Schema gate relaxation is contained (SQLite will hard-error on incompatible writes); prompt injection via verbatim issue-body in batch-bug-shepherd is the real risk. |
| OSS Growth Hacker | 1 | 3 | 1 | Phase A is a silent-churn fix worth a prominent release beat; Phase B ships APM's first composable-skill story but needs a contributor-trust guardrail callout in docs. |
| Doc Writer | 0 | 1 | 2 | SKILL.md style is consistent with apm-review-panel; CHANGELOG.md missing warn-not-fail entry for Phase A behavior change; no blocking doc issues. |
| Test Coverage Expert | 0 | 2 | 1 | Warn-not-fail behavior has solid unit coverage against real SQLite; two gaps: capsys fragility when Rich is installed, and no integration-tier test for the full apm-install CLI path. |
B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.
Top 5 follow-ups
- [OSS Growth Hacker] (blocking-severity) Add explicit auto-close-PR warning block to SKILL.md before merge -- A maintainer running
sweep-allfor the first time could silently close community contributor PRs. The fix is a doc-only addition to SKILL.md -- zero code cost, zero delay. Must land in this PR, not as a follow-up, because the skill is already installable. - [Doc Writer] (blocking-severity) Add CHANGELOG.md
[Unreleased]entry for warn-not-fail Phase A behavior change -- Two independent panelists (doc-writer, oss-growth-hacker) flag this gap. A behavior change from hard-fail to warn-and-continue requires a documented migration line per APM operating principles. Without it, users hitting schema mismatches after upgrade have no documented signal this is intentional. - [Supply Chain Security Expert] Wrap raw issue-body content in untrusted-data delimiters in
triage-prompt.md--ISSUE_BODY_OR_REPROis passed verbatim from GitHub issues into a triage subagent whose return JSON propagates into write-capable completion subagents. XML-style delimiters with an explicit untrusted-data instruction are the standard mitigation. Priority before any livesweep-allrun against a repo with external contributors. - [Test Coverage Expert] Fix capsys + Rich Console singleton fragility in TestVersionGuard tests -- Rich Console stores the
sys.stdoutreference at instantiation time; pytest's capsys swaps it per-test. Tests pass today only because rich is not installed in the test environment. Adding_reset_console()to_reset_user_version_warning_state()is a one-line fix that prevents a silent future trap. - [Test Coverage Expert] Add integration-with-fixtures test for
apm installagainst a version-bumped Copilot App DB -- Current integration tests mockcopilot_app_db-- they never exercise_check_user_versionthrough a real DB. If hard-fail is re-introduced in a future refactor, no integration test catches it. Evidence outcome: missing on a behavior-change surface.
Architecture
classDiagram
direction LR
class copilot_app_db {
<<IOBoundary>>
+_MIN_SUPPORTED_USER_VERSION int
+_MAX_SUPPORTED_USER_VERSION int
+_warned_user_versions set
+deploy_workflow(db_path, row) str
+delete_workflows(db_path, ids) int
+list_managed_workflow_ids(db_path) list
+resolve_copilot_app_db_path() Path
-_check_user_version(conn) int
-_reset_user_version_warning_state() None
-_begin_immediate_with_retry(conn) None
-_validate_row(row) None
}
class WorkflowRow {
<<ValueObject>>
+id str
+name str
+prompt str
+interval str
+schedule_hour int
+enabled int
+mode str
+model str
}
class CopilotAppDbError {
<<Exception>>
}
class CopilotAppDbMissingError {
<<Exception>>
}
class CopilotAppDbSchemaError {
<<Exception>>
}
class CopilotAppDbLockedError {
<<Exception>>
}
class console {
<<Utility>>
+_rich_warning(msg) None
}
CopilotAppDbError <|-- CopilotAppDbMissingError
CopilotAppDbError <|-- CopilotAppDbSchemaError
CopilotAppDbError <|-- CopilotAppDbLockedError
copilot_app_db *-- WorkflowRow : accepts
copilot_app_db ..> CopilotAppDbSchemaError : raises
copilot_app_db ..> CopilotAppDbMissingError : raises
copilot_app_db ..> CopilotAppDbLockedError : raises
copilot_app_db ..> console : lazy import in warn path
class copilot_app_db:::touched
class WorkflowRow:::touched
classDef touched fill:#fff3b0,stroke:#d47600
flowchart TD
A([apm install / apm uninstall]) --> B[deploy_workflow / delete_workflows]
B --> C{db_path exists?}
C -- no --> D[raise CopilotAppDbMissingError]
C -- yes --> E[_validate_row]
E --> F[_open_connection]
F --> G[_check_user_version via PRAGMA user_version]
G --> H{version less than MIN=13?}
H -- yes --> I[raise CopilotAppDbSchemaError]
H -- no --> J{version greater than MAX=15?}
J -- yes, not in _warned_user_versions --> K[[_rich_warning to stderr, dedup via module-level set]]
K --> L[_begin_immediate_with_retry]
J -- no or already warned --> L
L --> M{acquire write lock?}
M -- locked --> N[raise CopilotAppDbLockedError]
M -- acquired --> O{row exists?}
O -- INSERT --> P[INSERT INTO workflows, enabled forced to 0]
O -- UPDATE --> Q{execution-affecting fields changed?}
Q -- yes --> R[UPDATE + reset enabled=0]
Q -- no --> S[UPDATE name/model only]
P --> T[COMMIT]
R --> T
S --> T
T --> U[return copilot-app-db URI]
Recommendation
Ship after two pre-merge doc fixes land in this PR: (1) add the auto-close-PR warning block to SKILL.md, and (2) add the CHANGELOG [Unreleased] entry for the warn-not-fail behavior change. Both are documentation-only, zero-risk changes that can be committed directly to the branch. Phase A is production-ready: the core user promise is test-verified against real SQLite, the trade-offs are explicitly documented in the PR body, and the UX warning message issues (angle-bracket placeholder, [!] duplication, && chain) are three-panelist-convergent nits the author should address in the same commit as the CHANGELOG fix. Phase B ships correctly consent-gated (enabled=0) and the cross-skill composition pattern is sound. The prompt injection and capsys fragility items are genuine follow-ups but do not block merge given the skill's current pre-production status and the narrow consent gate.
Full per-persona findings
Python Architect
-
[recommended] TOCTOU race on
_warned_user_versionscheck-then-add is not atomic atsrc/apm_cli/integration/copilot_app_db.py:346
version not in _warned_user_versionsand_warned_user_versions.add(version)are two separate operations. In any concurrent caller scenario, two threads can both pass thenot intest before either completes theadd. The CLI is currently single-threaded so this does not bite today, but the module-level mutable set without a lock is a latent trap as parallel installs are on the roadmap.
Suggested: Protect the check-and-add with athreading.Lock. -
[nit] Late import of
_rich_warninginside_check_user_versionhas no documented justification atsrc/apm_cli/integration/copilot_app_db.py:349
Suggested: Add# deferred to avoid importing console at module initabove the import, or hoist it to the top of the file. -
[nit]
_reset_user_version_warning_stateexposes module internals but is an accepted Python test-helper idiom atsrc/apm_cli/integration/copilot_app_db.py:316
No action required unless a project-wide@testing_onlymarker is introduced.
CLI Logging Expert
-
[recommended]
-b 'Observed: <describe>'leaves a literal angle-bracket placeholder the user must know to replace atsrc/apm_cli/integration/copilot_app_db.py
Users who copy-paste the suggestedgh issue newcommand verbatim will file an issue with body textObserved: <describe>-- an unhelpful stub.
Suggested: Replace with--websogh issue newopens the browser form, or change placeholder to-b 'Observed: TODO_DESCRIBE_BEHAVIOR'. -
[nit]
[!]is embedded in the f-string rather than passed assymbol="warning"to_rich_warningatsrc/apm_cli/integration/copilot_app_db.py
No double-prefix risk (markup=Falseis set); minor convention drift from callers that usesymbol="warning". -
[nit] Warning message is a single long line (~220 chars) that will wrap unpredictably on narrow terminals at
src/apm_cli/integration/copilot_app_db.py
Suggested: Split into two_rich_warningcalls or use\nto indent the recovery command on a second line.
DevX UX Expert
-
[recommended] Warning message embeds a raw multi-step shell pipeline that users must copy-paste-split manually at
src/apm_cli/integration/copilot_app_db.py
Printing a compound&&chain in a single-line warning breaks on Windows cmd.exe and obscures the two distinct actions. Package-manager convention emits the simplest self-contained action.
Suggested: Drop theapm --version &&prefix or split into two lines. -
[nit] Warning prefix
[!]duplicates what_rich_warningalready signals visually atsrc/apm_cli/integration/copilot_app_db.py
Suggested: Remove[!]from the start of the f-string. -
[nit]
batch-bug-shepherdprompt usesmode: interactivefor a long-running batch orchestrator at.apm/prompts/batch-bug-shepherd.prompt.md
Suggested: Document in the prompt that the session is long-running and the user should expect async subagent waves.
Supply Chain Security Expert
-
[recommended] Verbatim issue-body passed to write-capable subagents enables indirect prompt injection at
.apm/skills/batch-bug-shepherd/assets/triage-prompt.md
ISSUE_BODY_OR_REPROis passed verbatim from GitHub issues into the agent context. A malicious issue reporter can craft an issue body with embedded instructions that manipulate the triage return JSON and propagate into the completion wave, which executes git/gh commands.
Suggested: Wrap raw issue content in XML-style<issue-body>tags with an explicit instruction that the model must treat everything inside as untrusted user data. -
[recommended] warn-and-continue on unknown schema versions removes the last write-safety gate for non-additive App schema changes at
src/apm_cli/integration/copilot_app_db.py
A schema change that adds a column with a default would allow INSERT to succeed with semantically incorrect data.
Suggested: Supplement with a post-write schema assertion: after any INSERT or UPDATE, read back the row and assert columns APM cares about match what was written. -
[nit]
_warned_user_versionsis a process-global mutable set with no thread safety atsrc/apm_cli/integration/copilot_app_db.py
Not a security issue today; harmless duplicate warnings in a concurrent scenario.
OSS Growth Hacker
-
[blocking] Skill can auto-close community contributor PRs -- this must be surfaced in the skill description before any maintainer runs
sweep-allat.apm/skills/batch-bug-shepherd/SKILL.md
SKILL.md documents 'Close the original PR with a courteous handoff comment referencing the superseding PR.' A contributor who sees their PR auto-closed by an agent will read this as rejection. A maintainer runningsweep-allfor the first time could close ten community PRs before realising the consequence.
Suggested: Add a visible warning block near the top of SKILL.md: 'This skill WILL close community contributor PRs when it opens a superseding PR. Usetargets: <specific-issue-list>rather thansweep-allfor your first run.' -
[recommended] Phase A fix is a first-run conversion win that needs a prominent CHANGELOG story, not a buried bug entry at
CHANGELOG.md
Suggested: Lead with: 'Copilot App users no longer see install failures after a Copilot App update -- APM now warns and continues on forward-compatible schema bumps.' -
[recommended] First cross-skill composition is APM's strongest OSS primitive story to date -- needs a launch beat at
.apm/skills/batch-bug-shepherd/SKILL.md
Suggested: Add a Composition section to the docs site and pair with a release post angle. -
[recommended]
batch-bug-shepherdships 1726 lines with no docs-site entry -- feature will be invisible atdocs/src/content/docs/
Suggested: Add at minimum a one-paragraph callout inwhat-is-apm.mdor a new Maintainer Skills page. -
[nit] Warning message
gh issue new -b 'Observed: 'arrives blank -- filed issues have no actionable data atsrc/apm_cli/integration/copilot_app_db.py
Auth Expert -- inactive
No auth-surface files changed; copilot_app_db.py schema gate is SQLite version checking, not credential or token management.
Doc Writer
-
[recommended] CHANGELOG.md has no entry for the warn-not-fail copilot-app schema behavior change at
CHANGELOG.md
Users upgrading and hitting schema mismatches will have no documented signal that this is intentional behavior, not a regression.
Suggested: Add under[Unreleased] ### Fixed: 'apm install --target copilot-appno longer fails hard on unrecognized Copilot App DB schema versions; it now warns and continues. (fix(copilot-app): warn-not-fail on newer App schema + feat: batch-bug-shepherd primitive #1434)' -
[nit] SKILL.md lacks an explicit invocation trigger section heading at
.apm/skills/batch-bug-shepherd/SKILL.md
Suggested: Extract theActivate when...clause from the description into a## When to activatesection. -
[nit] No cross-link from SKILL.md to the Copilot App prompt file at
.apm/skills/batch-bug-shepherd/SKILL.md
Suggested: Add to## Bundled assets: '../../prompts/batch-bug-shepherd.prompt.md-- Copilot App workflow entry point for this skill.'
Test Coverage Expert
-
[recommended] capsys + Rich Console singleton is fragile: stale stdout reference breaks capture when Rich is installed at
tests/unit/integration/test_copilot_app_db.py
_rich_warning()routes through a module-level Console singleton. When rich is installed, Console stores thesys.stdoutreference at instantiation time; pytest's capsys swaps it per-test. Tests pass today only because rich is not installed in the test environment.
Suggested: In_reset_user_version_warning_state(), add:from apm_cli.utils.console import _reset_console; _reset_console().
Proof (unknown at unit):tests/unit/integration/test_copilot_app_db.py::TestVersionGuard::test_warns_but_continues_above_max-- proves: APM warns the user when the Copilot App DB schema is newer than tested, instead of failing the install -
[recommended] No integration-with-fixtures test exercises
apm installCLI against a version-bumped Copilot App DB attests/integration/test_install_services_orchestration.py
Current integration tests mockcopilot_app_db--_check_user_versionis never exercised through a real DB. Grep confirms zerouser_versionmatches intests/integration/.
Suggested: Addtest_copilot_app_install_warns_on_bumped_schemathat creates a real SQLite DB atuser_version=16, setsAPM_COPILOT_APP_DB, invokesdeploy_workflow, and asserts noCopilotAppDbSchemaErroris raised.
Proof (missing at integration-with-fixtures):tests/integration/test_install_services_orchestration.py::test_copilot_app_install_warns_on_bumped_schema-- proves: A user with a Copilot App DB that was auto-upgraded can still run apm install without failure -
[nit] Warning state reset relies on manual per-test call; an autouse fixture would prevent future test pollution at
tests/unit/integration/test_copilot_app_db.py
Suggested: Add@pytest.fixture(autouse=True)toTestVersionGuardthat calls_reset_user_version_warning_state()before and after each test.
This panel is advisory. It does not block merge. Re-apply the
panel-review label after addressing feedback to re-run.
Warning
Firewall blocked 1 domain
The following domain was blocked by the firewall during workflow execution:
pypi.org
To allow these domains, add them to the
network.allowedlist in your workflow frontmatter:
network:
allowed:
- defaults
- "pypi.org"See Network Configuration for more information.
Generated by PR Review Panel for issue #1434 · ● 3.3M · ◷
Move the apm-review-panel skill out of the microsoft/apm monorepo's shared .apm/skills/ tree into a dedicated, publishable APM package under packages/apm-review-panel/ using the HYBRID layout (apm.yml + SKILL.md at the root, with co-located assets/ and evals/). The root apm.yml now declares the package via a local-path manifest dep, so 'apm install' at root continues to deploy the skill into .agents/skills/apm-review-panel/ -- but the dependency is now explicit and inspectable via 'apm deps list' instead of relying on the includes-auto walk of the monorepo .apm/ tree. Rule-of-three justification (see genesis-plan-v2.md step 3.5): panel is independently useful for any single-PR review (not just batch shepherding), so extraction unlocks consumer reuse and prevents the consumer from being forced to install the shepherd to get the panel. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…package
Move batch-bug-shepherd from the monorepo .apm/skills/ and .apm/prompts/
trees into a dedicated, publishable APM package under
packages/batch-bug-shepherd/ using the multi-primitive .apm/ layout
so the package can ship BOTH the skill and the workflow prompt:
packages/batch-bug-shepherd/
apm.yml
.apm/
skills/batch-bug-shepherd/
SKILL.md
assets/...
prompts/
batch-bug-shepherd.prompt.md
The package declares its dep on apm-review-panel via a local-path
manifest entry (../apm-review-panel, anchored to the package dir per
APM's local-path rule). The shepherd-prompt asset still probes the
panel by NAME at use-site as the A9 SUPERVISED EXECUTION backstop
if the harness registry is bypassed -- belt + suspenders, see
genesis-plan-v2.md step 3.5 'Declaration mechanism per external
module'.
Root apm.yml now depends on packages/batch-bug-shepherd via local
path; 'apm install' continues to deploy the skill to
.agents/skills/batch-bug-shepherd/ and the workflow row to the App
SQLite DB (with enabled=0 per the consent gate) -- but the path is
now manifest-declared and the package is independently shippable:
apm install microsoft/apm/packages/batch-bug-shepherd
works in any consumer repo and transitively pulls apm-review-panel.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes HIGH finding #1 from genesis-critique.md: the workflow prompt's 'Hard rules' block restated four disciplines (ASCII-only, lint contract, mutation-break gate, single-writer per comment) that already live in the SKILL.md body. Two sources of truth invited drift -- if the skill body added a new gate, the prompt would not auto-update. Collapse the section to a single 'Delegation' paragraph naming the skill as the authoritative source for all disciplines. The prompt now carries only the per-trigger-surface contract (workflow frontmatter + the input parameter + the 6-step ACTIVATE / SCOPE / PLAN / INITIALIZE / EXECUTE / RENDER procedure that summons the skill); every discipline travels with the skill body. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes HIGH finding #2 from genesis-critique.md: the v1 ship deferred the evals gate. This commit authors the minimum set the gate requires, mirroring the pr-description-skill/evals/ shape so a single CI lane can score both: evals/ evals.json - manifest + gates (val split = ship gate) triggers.json - 10 fire + 10 no_fire, 60/40 train/val content/ - 2 scenario manifests + rubrics three-issues-mixed.json sweep-bug-queue.json fixtures/ - 4 markdown fixtures (with/without skill x 2) README.md .gitignore - results/ (timestamped runner output) scripts/ run_evals.py - stdlib-only, deterministic matcher The no-fire trigger set deliberately includes queries that SHOULD route to apm-review-panel ('review my PR', 'panel-review this PR') so DISPATCH COLLISION between the two skills would surface as val-no-fire-rate dropping below 0.5. Val split result on first run: trigger fire rate 1.0, no-fire rate 1.0, content delta_anchors 8 and 7 (gates require >= 1). Overall: passed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closes HIGH finding #3 from genesis-critique.md: the original description was 950/1024 chars, too close to the runtime cap to absorb future trigger additions, and it under-named the indirect maintainer phrasings the skill actually wants to catch. Trim ~75 chars while ADDING two indirect triggers explicitly exercised in evals/triggers.json (fire-07 'shepherd all bug-flagged issues this quarter', fire-08 'weekly sweep of community-reported issues'): - 'reproduction subagent per candidate issue' -> 'triage subagent per issue' (the procedural detail belongs in the body, not the dispatch surface) - collapsed the parenthetical workflow branch into a single -> arrow - added 'shepherd all bug-flagged issues this quarter', 'run a weekly sweep of community-reported issues', 'work down community bug contributions' to the Activate list - kept the imperative shape, intent-first ordering, and the '-- even if shepherd or batch is not named' tail Verified: val split evals still report fire 1.0 / no-fire 1.0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…age extraction
Mechanical `apm install --target copilot,copilot-app --force` output
after the package extraction. The previous lockfile listed the two
skills under `local_deployed_files` (root .apm/ tree); they now
appear as proper `dependencies` entries because they ship as
self-contained packages under `packages/`. Also re-materializes
`.agents/skills/{apm-review-panel,batch-bug-shepherd}/` with the
trimmed dispatch description and the moved workflow asset, since
those compiled trees are tracked in this repo.
No source changes -- just the compiled-output snapshot.
Verified:
- ruff check + ruff format --check: silent
- apm install --target copilot,copilot-app --force: clean
- sqlite3 ~/.copilot/data.db -> workflow enabled=0 (consent gate ok)
- standalone apm install from a scratch dir resolves
apm-review-panel transitively via the batch-bug-shepherd
manifest dep (depth: 2, resolved_by: _local/batch-bug-shepherd)
- val split evals: trigger fire 1.0, no-fire 1.0, content passed
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Genesis-driven repackaging:
|
| Commit | Layer | Intent |
|---|---|---|
fd55a172 |
packages | Extract apm-review-panel -> packages/apm-review-panel/ (HYBRID layout) |
c3af73d3 |
packages | Repackage batch-bug-shepherd -> packages/batch-bug-shepherd/ (.apm/ layout, declares ../apm-review-panel as manifest dep) |
a385560f |
skills | Collapse HIDDEN COUPLING in batch-bug-shepherd.prompt.md (Hard rules block -> single Delegation paragraph that names the skill as authoritative) |
89ecd34c |
skills | Add content + trigger evals (10 fire + 10 no_fire, 60/40 train/val; 2 content scenarios; stdlib runner) |
19b55fa1 |
skills | Tighten dispatch description (~950 -> ~875 chars) and add 2 indirect triggers (shepherd all bug-flagged issues this quarter, weekly sweep of community-reported issues) |
62d63bb8 |
install | Regenerate .agents/skills/ + apm.lock.yaml after extraction |
Package topology
flowchart TD
subgraph ROOT["microsoft/apm (root apm.yml)"]
ROOTMANIFEST["dependencies.apm:<br/>./packages/apm-review-panel<br/>./packages/batch-bug-shepherd"]
end
subgraph PKG1["packages/apm-review-panel (HYBRID)"]
PANELYAML[apm.yml]
PANELSKILL[SKILL.md]
PANELASSETS[assets/*]
end
subgraph PKG2["packages/batch-bug-shepherd (.apm/ layout)"]
SHEPYAML["apm.yml<br/>dependencies.apm:<br/>../apm-review-panel"]
SHEPSKILL[.apm/skills/batch-bug-shepherd/]
SHEPPROMPT[.apm/prompts/batch-bug-shepherd.prompt.md]
end
ROOT --> PKG1
ROOT --> PKG2
PKG2 -. local manifest dep .-> PKG1
Dependency resolution
flowchart LR
CONSUMER[Any consumer apm.yml<br/>dependencies.apm:<br/>'/abs/path/packages/batch-bug-shepherd']
SHEP[packages/batch-bug-shepherd]
PANEL[packages/apm-review-panel]
CONSUMER -- MANIFEST DEP --> SHEP
SHEP -- LOCAL SIBLING<br/>'../apm-review-panel' --> PANEL
Local-path deps anchor relative to the declaring package's directory (npm/pip/cargo parity), so ../apm-review-panel inside packages/batch-bug-shepherd/apm.yml resolves correctly whether the consumer installs from the monorepo root or pins the absolute path to packages/batch-bug-shepherd/.
Verification evidence
Lint (CI-mirror, must be silent):
$ uv run --extra dev ruff check src/ tests/ && uv run --extra dev ruff format --check src/ tests/
All checks passed!
1037 files already formatted
apm install --target copilot,copilot-app --force in this repo:
[+] ./packages/apm-review-panel (local)
|-- Skill integrated -> .agents/skills/
[+] ./packages/batch-bug-shepherd (local)
|-- 1 prompts integrated -> copilot-app/workflows/
|-- workflows arrive disabled; enable from the Copilot App's Workflows tab
|-- 1 skill(s) integrated -> .agents/skills/
[+] ../apm-review-panel (local)
|-- (files unchanged) <-- transitive dep deduplicated
[*] Installed 4 APM dependencies in 2.9s.
SQLite consent gate (workflows must arrive disabled):
$ sqlite3 ~/.copilot/data.db "SELECT id, enabled FROM workflows WHERE id LIKE '%batch-bug%'"
apm--local--_local--batch-bug-shepherd|0
apm--_local--batch-bug-shepherd--batch-bug-shepherd|0
Standalone install from a scratch dir (proves the package works as a third-party dep):
# scratch/apm.yml
dependencies:
apm:
- /abs/path/to/packages/batch-bug-shepherd$ apm install --target copilot
[+] /.../packages/batch-bug-shepherd (local)
|-- 1 skill(s) integrated -> .agents/skills/
[+] ../apm-review-panel (local)
|-- Skill integrated -> .agents/skills/
[*] Installed 2 APM dependencies in 0.8s.
Generated lockfile shows the transitive resolution with provenance:
- repo_url: _local/apm-review-panel
depth: 2
resolved_by: _local/batch-bug-shepherd
package_type: hybrid
local_path: ../apm-review-panelEval gates (val split = ship gate):
$ python packages/batch-bug-shepherd/.apm/skills/batch-bug-shepherd/scripts/run_evals.py --split val
triggers: fire 1.0 / no-fire 1.0 passed
content: delta_anchors 8 and 7 passed
overall: passed
The no-fire set deliberately includes queries that should route to apm-review-panel instead (review my PR, panel-review this PR) so DISPATCH COLLISION between the two skills would surface as the val-no-fire rate dropping below 0.5.
How HIGH findings were closed
- HIDDEN COUPLING (
fd55a172+a385560f): the prompt no longer duplicates skill rules. It hands off via a Delegation paragraph that namesbatch-bug-shepherdas authoritative. The skill is now activate-by-name, not path-coupled. - MISSING EVALS GATE (
89ecd34c): full trigger + content harness committed, runner is stdlib-only, gates documented inevals/README.md. - DISPATCH-DESCRIPTION TIGHTNESS (
19b55fa1): trimmed ~75 chars while adding 2 user-named indirect triggers; evals confirm both new triggers fire.
Modularity gain
Consumers can now write:
dependencies:
apm:
- github.com/microsoft/apm/packages/apm-review-panel # panel only, for any single-PR review use case
- github.com/microsoft/apm/packages/batch-bug-shepherd # full shepherd, panel pulled in transitivelyThe rule-of-three holds: apm-review-panel is useful pre-shepherd, with-shepherd, and for future single-PR review workflows. Extraction unlocks that without dragging the shepherd in.
Design plan, mermaid diagrams, dependency graph, phantom-dependency guard, and the full evals matrix are persisted in genesis-plan-v2.md in the agent's workspace.
A sub-package's local-path dep appears in the root lockfile with resolved_by set. The root manifest cannot make it go away by editing its dependencies.apm list, so flagging it as an orphan creates an unfixable CI failure. Surfaced by PR #1434, where batch-bug-shepherd declares ../apm-review-panel as a manifest dep. Both that transitive entry and the depth=1 root entry land in apm.lock.yaml; the orphan check was flagging the depth=2 one. Restrict orphan detection to direct deps (resolved_by is None). Add a regression test that covers the exact shape. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous commit fixes _check_no_orphans on the consumer side, but the audit gate in microsoft/apm-action@v1 pins APM 0.14.0, which does not yet have the fix. Until a release including the fix ships, the '../apm-review-panel' transitive local-path entry would keep the gate failing. Drop the manifest dep from packages/batch-bug-shepherd/apm.yml; the runtime activate-by-name probe in assets/shepherd-prompt.md remains the working backstop, and the root manifest declares both packages directly. Inline comment documents the rationale and the restoration condition. Lockfile regenerated by clean install (apm_modules wiped first); all 8 audit checks pass locally. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…-app-schema-warn-and-meta-prompt
Test was added on main after this PR forked. It assumed the old hard-fail behavior; our commit 05ea778 changed schema-too-new to warn-and-continue. Update assertion to match. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…-app-schema-warn-and-meta-prompt
TL;DR
Two related changes in one PR:
apm install --target copilot-appwhen the GitHub Copilot desktop App ships a newerPRAGMA user_versionthan APM was tested against. Warn-and-continue instead; keep the hard-fail for versions below the minimum (where theworkflowstable may not exist)..apm/skills/batch-bug-shepherd/and the.apm/prompts/batch-bug-shepherd.prompt.mdCopilot App workflow that loads it. The skill composes with the existing.apm/skills/apm-review-panel/to drive a batch of suspected bugs from raw issue list to mergeable PR queue.Phase A — schema gate UX fix
Problem
_check_user_version()insrc/apm_cli/integration/copilot_app_db.pyhard-fails any deploy whoseuser_versionfalls outside[_MIN_SUPPORTED_USER_VERSION, _MAX_SUPPORTED_USER_VERSION]. Both bounds were 13. The Copilot App is now atuser_version = 15on macOS, so every user attemptingapm install --target copilot-apphits:The Copilot App ships fast and bumps
user_versionon additive, forward-compatible changes that do not break APM's narrow read/write surface (theworkflowstable). Hard-failing on every such bump made every Copilot App release a release window for APM.devx-ux verdict
Consulted the
devx-ux-expertpersona before touching code. Verdict:ship_as_proposed.Wording (shipped verbatim, printable ASCII,
[!]perSTATUS_SYMBOLS):On the question of a
+Ncap before degrading to error, devx-ux argued against:Implementation
_MAX_SUPPORTED_USER_VERSIONbumped13 -> 15(observed live, working)._MIN_SUPPORTED_USER_VERSIONstays at13-- below this theworkflowstable may not exist, so a hard fail is correct._rich_warning. Dedup is enforced by a module-level_warned_user_versions: set[int]so multi-row installs do not spam the same line N times._reset_user_version_warning_state()is a test helper.Trade-offs (explicit)
OperationalErrormid-transaction with a far worse diagnostic. Above max, APM's read/write surface (a single CRUD onworkflows) is small enough that additive schema changes are overwhelmingly safe; on the rare truly-breaking bump we will see breakage reports via the URL in the warning and add the specific version to a_KNOWN_BREAKINGlist.+Ncap? See devx-ux quote above. A blanket cap recreates the original problem at a higher number; a_KNOWN_BREAKINGallow-list targets the actual risk._check_user_versionis called from low-level DB writers (deploy_workflow,delete_workflows) that have noDiagnosticCollectorin scope. Threading one through would touch the public signature of every DB writer; a module-level set is the minimal surgical change.Phase B — batch-bug-shepherd primitive + Copilot App workflow
What ships
.apm/skills/batch-bug-shepherd/SKILL.mdplus seven asset files (triage-prompt.md,shepherd-prompt.md,fix-prompt.md,completion-prompt.md,final-report-template.md,ground-truth-table.md,verdict-schema.json)..apm/prompts/batch-bug-shepherd.prompt.md-- Copilot App workflow prompt (interval: manual,mode: interactive) that loads the skill and accepts atargetsinput (issue list or the literalsweep-all)..agents/skills/batch-bug-shepherd/-- compiled output produced byapm install --target copilot,copilot-app.apm.lock.yamlregenerated to include both the skill path and the syntheticcopilot-app-db://workflows/apm--local--_local--batch-bug-shepherdURI.Composition with apm-review-panel
The shepherd phase delegates panel review to
.apm/skills/apm-review-panel/via a sibling relative link rather than reinventing it. This is the first cross-skill composition in this repo -- a deliberate test of whether the "skills compose via filesystem siblings" pattern holds up under non-trivial use.Design rationale
The genesis design pass produced a
genesis-plan.mdartifact (8-step disciplined design with mermaid + interface sketch). That artifact is intentionally not shipped in this PR -- it is design scaffolding for the author, not a user-consumed primitive. It lives in the session workspace at/Users/danielmeppiel/.copilot/workspaces/007bd6d3-a723-4ebb-9ed5-e861e73b1b20/artifacts/genesis-plan.mdif reviewers want to read it.Validation evidence
.apm/instructions/linting.instructions.md):test_warns_but_continues_above_max[16|17|50]-- forward-compat scenarios, parametrized.test_above_max_warning_deduped_per_version-- multi-row install warns once, not per row.test_rejects_user_version_below_min-- < 13 still hard-fails.PRAGMA user_version = 17, then re-running withAPM_COPILOT_APP_DB-- the[!]line appeared exactly once, full text matching the devx-ux wording.)enabled = 0):How to test
uv venv && uv pip install -e ".[dev]".uv run --extra dev ruff check src/ tests/ && uv run --extra dev ruff format --check src/ tests/-- both silent.uv run --extra dev pytest tests/unit/integration/test_copilot_app_db.py -v-- 14 tests pass, including the newTestVersionGuard::test_warns_but_continues_above_maxcases.[!]line namingversion 17andtested maximum (15)plus install completing successfully.enabled = 0(consent gate).