fix(audit): pre-create target dirs in drift replay scratch (#1411) by sergio-sisternes-epam · Pull Request #1441 · microsoft/apm

sergio-sisternes-epam · 2026-05-21T20:22:34Z

Description

Fix false orphaned drift reports for non-skill primitives on targets with auto_create=False.

During drift replay, the scratch directory is empty, causing integrators with auto_create=False (windsurf, cursor, claude, codex, copilot, gemini, opencode) to skip all non-skill primitives. The diff then reports these correctly-installed files as orphaned.

The fix pre-creates all target root directories in the scratch dir before the replay loop, so the auto_create guard passes.

Fixes #1411

Type of change

Bug fix
New feature
Documentation
Maintenance / refactor

Testing

Tested locally
All existing tests pass (72 drift tests)
Added tests for new functionality

Test added

tests/unit/install/test_drift_auto_create.py — regression test confirming:

Without pre-created target root: auto_create=False skips integration (the bug)
With pre-created target root: integration proceeds normally (the fix)

Copilot

Pull request overview

Fixes false orphaned drift findings during apm audit by making the drift replay scratch tree more closely mirror a real install for targets where auto_create=False (e.g., windsurf), ensuring non-skill primitives are replayed instead of silently skipped.

Changes:

Pre-create each resolved target’s root_dir under the drift replay scratch directory before running integrators.
Add a unit regression test illustrating the auto_create=False guard behavior when the target root directory is missing vs present.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`src/apm_cli/install/drift.py`	Seeds scratch with target root directories so replay doesn’t skip non-skill primitives for `auto_create=False` targets.
`tests/unit/install/test_drift_auto_create.py`	Adds a regression test demonstrating integration skip vs success depending on whether the target root directory exists.

danielmeppiel · 2026-05-21T22:05:18Z

APM Review Panel: `ship_with_followups`

Fixes false-orphan drift reports on every auto_create=False AI-coding IDE target, restoring audit truthfulness across the majority IDE matrix.

cc @sergio-sisternes-epam @danielmeppiel -- a fresh advisory pass is ready for your review.

All six active panelists converge: the fix is architecturally correct, security-neutral, and restores a critical user-facing promise -- that apm audit tells the truth about repo cleanliness. The 8-line change seeds scratch-root target directories at exactly the right lifecycle point (after target resolution, before the replay loop), mirroring the real-world invariant that these directories exist on disk when the IDE integration ran. The alternative -- pushing scratch awareness into each integrator -- would have leaked replay concerns into 5+ classes for no user benefit.

The strongest post-merge signal comes from test-coverage-expert: the new unit test is a durable regression trap (the empty-lockfile trick ensures the directory MUST be created by the explicit loop, not as a side-effect). However, the existing integration test test_d2_multi_target_drift_reports_secondary_target_files uses a disjunction (in {'modified','orphaned'}) that accepts the exact false-orphan symptom this PR fixes. That means removing the fix would not break integration CI. This is a real gap on a governed-by-policy surface -- the user promise has no integration-tier assertion that would catch a regression at the symptom level. This follow-up is the single highest-priority item post-merge.

No panelist disagreement exists. The fix is correct, the unit trap is tight, and the follow-ups are all improvement-tier polish or future-proofing.

Aligned with: Governed by policy (audit truthfulness is a governance promise; this fix restores it for the full auto_create=False target set); Multi-harness/multi-host (covers windsurf, cursor, claude, codex, copilot, gemini, opencode -- the majority AI-coding IDE matrix); OSS community-driven (community contributor fix for a community-reported bug; fast triage-to-merge signals contributor respect).

Growth signal. This is a trust-recovery moment worth amplifying: apm audit was lying about repo cleanliness on every popular AI-coding IDE except VSCode. The fix -- from a community contributor -- restores truthfulness across Cursor, Windsurf, Claude Code, Codex, Copilot, Gemini, and OpenCode. Release notes should frame this as "audit now tells the truth on every AI-coding IDE" -- a strong hook for evaluators comparing APM against manual dotfile management.

Panel summary

Persona	R	N	Takeaway
Python Architect	0	2	Architecturally sound: pre-create at the right lifecycle point. Test sequencing could be tighter; duplicated auto_create guard across 5 integrators is a future extraction target.
CLI Logging Expert	0	1	Render functions are clean; pre-create loop is silent even under --verbose -- one verbose-only seed log would aid observability at zero default noise.
DevX UX Expert	0	1	Fix protects audit-truthfulness across the matrix; tests only exercise windsurf -- parametrization would lock the cross-IDE contract explicitly.
Supply Chain Security Expert	0	1	KNOWN_TARGETS are safe static paths and scratch_root is bounded -- no current threat. A path-traversal guard would future-proof against dynamic root_dir overrides.
OSS Growth Hacker	0	2	High-value trust-recovery fix from a community contributor on the AI-coding-IDE majority -- worth a thank-you and a release-note beat.
Test Coverage Expert	1	1	Unit regression trap is durable (verified by pytest run); integration test_d2 accepts both 'modified' and 'orphaned' so a re-regression to false-orphan would slip past CI.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 5 follow-ups

[Test Coverage Expert] Tighten test_d2 disjunction to assert == 'modified' (or add test_d4_auto_create_false_target_tamper_reports_modified_not_orphaned) -- The integration test currently accepts both 'modified' and 'orphaned', meaning a regression to the exact [BUG] apm audit reports false orphaned drift for non-skill primitives on targets with auto_create=False #1411 symptom would pass CI silently. This is a governed-by-policy surface with no integration-tier guardrail on the fixed behavior.
[DevX UX Expert] Parametrize unit test across all 7-8 auto_create=False targets -- Current test exercises only windsurf. A parametrized matrix locks in the cross-IDE contract and catches future targets with divergent root_dir layouts.
[Python Architect] Add a side_effect assertion in test_run_replay_precreates_scratch_target_dirs verifying dirs exist on first integrate_package_primitives call -- Current test would still pass if the pre-create were accidentally moved inside the per-dep loop; a sequencing assertion makes the trap tighter.
[OSS Growth Hacker] CHANGELOG entry should name affected IDE targets explicitly for ctrl-F discoverability -- Users searching for their specific IDE (cursor, windsurf, claude) will find the fix without reading the full entry.
[Python Architect] Extract duplicated auto_create guard from 5 integrators into BaseIntegrator -- Reduces future drift risk as new integrators are added; not blocking because the duplication is stable today.

Architecture

classDiagram
    direction LR
    class TargetProfile {
      <<ValueObject>>
      +name str
      +root_dir str
      +auto_create bool
      +primitives dict
    }
    class BaseIntegrator {
      <<Base>>
      +integrate(target, project_root, ...) IntegrationResult
    }
    class InstructionIntegrator {
      +integrate(target, project_root, ...) IntegrationResult
    }
    class PromptIntegrator {
      +integrate(target, project_root, ...) IntegrationResult
    }
    class AgentIntegrator {
      +integrate(target, project_root, ...) IntegrationResult
    }
    class CommandIntegrator {
      +integrate(target, project_root, ...) IntegrationResult
    }
    class SkillIntegrator {
      +integrate(target, project_root, ...) IntegrationResult
    }
    class run_replay {
      <<IOBoundary>>
      +run_replay(config) Path
    }
    class integrate_package_primitives {
      <<Facade>>
      +integrate_package_primitives(...) dict
    }
    class IntegrationResult {
      <<ValueObject>>
      +files_integrated int
      +files_skipped int
    }
    BaseIntegrator <|-- InstructionIntegrator
    BaseIntegrator <|-- PromptIntegrator
    BaseIntegrator <|-- AgentIntegrator
    BaseIntegrator <|-- CommandIntegrator
    BaseIntegrator <|-- SkillIntegrator
    run_replay ..> integrate_package_primitives : delegates per dep
    integrate_package_primitives ..> InstructionIntegrator : dispatches
    integrate_package_primitives ..> PromptIntegrator : dispatches
    integrate_package_primitives ..> AgentIntegrator : dispatches
    integrate_package_primitives ..> CommandIntegrator : dispatches
    integrate_package_primitives ..> SkillIntegrator : dispatches
    InstructionIntegrator ..> TargetProfile : reads auto_create
    PromptIntegrator ..> TargetProfile : reads auto_create
    AgentIntegrator ..> TargetProfile : reads auto_create
    CommandIntegrator ..> TargetProfile : reads auto_create
    SkillIntegrator ..> TargetProfile : reads auto_create
    InstructionIntegrator ..> IntegrationResult : returns
    class run_replay:::touched
    classDef touched fill:#fff3b0,stroke:#d47600

flowchart TD
    A["apm audit --check drift"] --> B["run_replay(config)"]
    B --> C["_make_scratch_root(project_root)"]
    C --> D["resolve_targets(project_root)"]
    D --> E["_make_integrators()"]
    E --> F["PRE-CREATE LOOP<br/>for target in targets:<br/>  (scratch_root / target.root_dir).mkdir()"]
    F --> G["_ReadOnlyProjectGuard context"]
    G --> H{"for lock_dep in lock.get_all_dependencies()"}
    H --> I["_materialize_install_path"]
    I --> J["integrate_package_primitives<br/>project_root=scratch_root"]
    J --> K{"auto_create check:<br/>(project_root / target.root_dir).is_dir()?"}
    K -- "True (dir pre-created)" --> L["deploy files into scratch"]
    K -- "False (no pre-create)" --> M["return IntegrationResult(0,0,0,[])<br/>FALSE ORPHANED"]
    L --> N["next dep"]
    N --> H
    H -- "all deps done" --> O["diff_scratch_against_project"]
    O --> P["DriftReport"]

    style F fill:#fff3b0,stroke:#d47600
    style M fill:#ffcccc,stroke:#cc0000

sequenceDiagram
    participant User
    participant CLI as apm audit --check drift
    participant Replay as run_replay
    participant Scratch as scratch tmpdir
    participant Integ as InstructionIntegrator
    participant Diff as diff_scratch_against_project

    User->>CLI: apm audit --check drift
    CLI->>Replay: run_replay(config)
    Replay->>Scratch: mkdir(scratch_root)
    Replay->>Replay: resolve_targets()
    Note over Replay,Scratch: NEW: pre-create target root dirs
    Replay->>Scratch: mkdir(scratch_root / .windsurf)
    Replay->>Scratch: mkdir(scratch_root / .cursor)
    loop each locked dependency
        Replay->>Integ: integrate(target, project_root=scratch_root)
        Integ->>Scratch: check (scratch_root / target.root_dir).is_dir()
        Note over Integ: Previously False -> skip -> false orphan
        Note over Integ: Now True -> integrate correctly
        Integ->>Scratch: write files
    end
    Replay->>Diff: diff_scratch_against_project()
    Diff-->>CLI: DriftReport (clean, no false orphans)
    CLI-->>User: No drift detected

Recommendation

Merge as-is. The fix is correct, regression-trapped at unit tier, and restores a critical audit-truthfulness promise across the majority IDE matrix. Track the test_d2 disjunction tightening (follow-up #1) as the highest-signal post-merge item -- it is the only path where a future regression could slip past CI undetected on this governed-by-policy surface.

Full per-persona findings

Python Architect

[nit] Test does not defend against pre-create loop being moved inside the per-package block at tests/unit/install/test_drift_auto_create.py
test_run_replay_precreates_scratch_target_dirs asserts the directory exists after run_replay returns, but it would still pass if the pre-create were accidentally moved inside the for-loop (per-dep). Correctness requires the dirs to exist BEFORE the first integrator call.
Suggested: Add a side_effect to a mock of integrate_package_primitives that asserts (scratch_root / target.root_dir).is_dir() on first call, proving sequencing.
[nit] Duplicated auto_create guard across 5 integrators is a future extraction candidate at src/apm_cli/integration/instruction_integrator.py:86
instruction_integrator.py:86, prompt_integrator.py:110, agent_integrator.py:111, command_integrator.py:481, skill_integrator.py:341 all duplicate if not target.auto_create and not (project_root / target.root_dir).is_dir(): return .... Not blocking for this PR; a BaseIntegrator._should_skip_target(target, project_root) -> bool would make the guard discoverable and testable in one place.

CLI Logging Expert

[nit] Pre-create loop is silent even in verbose mode at src/apm_cli/install/drift.py:427
A single logger.verbose-gated line (e.g. [i] Seeded N scratch target root(s)) would help confirm the fix fired during apm audit --verbose without adding noise for default users.

DevX UX Expert

[nit] New tests only exercise windsurf, not the full auto_create=False target matrix at tests/unit/install/test_drift_auto_create.py:21
pytest.mark.parametrize across the 7-8 auto_create=False targets would prove the contract holds across the matrix and catch a future target whose root_dir layout diverges. Low risk today since the fix loop is target-agnostic, but parametrization makes the regression net explicit.

Supply Chain Security Expert

[nit] Future-proof against dynamic _target.root_dir path-traversal at src/apm_cli/install/drift.py:431
All KNOWN_TARGETS root_dir values are safe static strings today, and scratch_root is bound outside the project tree. If a future target profile or apm.yml override introduces a dynamic root_dir, the join would silently escape scratch. A one-line assertion that the resolved path stays under scratch_root would future-proof.

OSS Growth Hacker

[nit] CHANGELOG entry should name affected IDE targets explicitly at CHANGELOG.md
Users searching their IDE name will ctrl-F the changelog. Listing windsurf, cursor, claude, codex, copilot, gemini, opencode beats the generic "non-VSCode targets" phrasing for discoverability.
[nit] Inline comment in drift.py could name affected targets at src/apm_cli/install/drift.py:427
A short comment listing the auto_create=False targets makes the blast radius obvious to future readers without cross-referencing targets.py.

Test Coverage Expert

[recommended] Integration test accepts false-orphan kind -- cannot catch [BUG] apm audit reports false orphaned drift for non-skill primitives on targets with auto_create=False #1411 re-regression at tests/integration/test_drift_check.py
test_d2_multi_target_drift_reports_secondary_target_files asserts kinds['.claude/rules/rules.md'] in {'modified','orphaned'}. Before this fix the audit falsely reported orphaned; after it reports modified. The disjunction accepts both, so removing the pre-create loop would NOT fail this test. The user-visible promise has no integration-tier assertion that would fail on the exact symptom of [BUG] apm audit reports false orphaned drift for non-skill primitives on targets with auto_create=False #1411.
Suggested: Either tighten test_d2 to assert == 'modified' or add a sibling integration test (e.g. test_d4_auto_create_false_target_tamper_reports_modified_not_orphaned) that pins the fix at integration tier.
Proof (missing at): tests/integration/test_drift_check.py::test_d4_auto_create_false_target_tamper_reports_modified_not_orphaned -- proves: apm audit reports 'modified' (not false-orphan) when a deployed file is tampered on an auto_create=False target after a clean install [devx,governed-by-policy]
assert kinds['.claude/rules/rules.md'] == 'modified'
[nit] Unit regression trap is durable (verified by running tests) at tests/unit/install/test_drift_auto_create.py
test_run_replay_precreates_scratch_target_dirs uses an empty lockfile so the pre-create loop body never runs as a side-effect of dep integration -- the asserted directory MUST be created by the explicit loop. This catches the pre-create-loop being removed. test_auto_create_false_targets_still_integrate_in_scratch directly confirms the InstructionIntegrator contract.
Proof (passed): tests/unit/install/test_drift_auto_create.py::test_run_replay_precreates_scratch_target_dirs -- proves: run_replay pre-creates target root dirs in scratch before the integration loop [devx]
assert (scratch / windsurf.root_dir).is_dir()

Auth Expert -- inactive

Diff touches only drift.py + a new test file; no auth.py, token_manager.py, azure_cli.py, github_downloader.py, marketplace/client.py, github_host.py, install/validation.py, install/pipeline.py, deps/registry_proxy.py -- and no host classification, token-source, or clone-URL change.

Doc Writer -- inactive

No README/CHANGELOG/MANIFESTO/docs/.apm/.github prose changed; the diff is a runtime bug fix + unit test, not a documentation surface. (growth-hacker still flags an OPTIONAL CHANGELOG-entry call-out for release time -- not a doc drift today.)

Performance Expert -- inactive

Diff is outside the perf hot path (cache/, deps/, install/phases/, install/pipeline.py, install/resolve.py, scripts/perf/) and makes no perf claim; pre-creating up to ~10 directories per audit run is negligible vs. the existing replay cost.

_{This panel is advisory. It does not block merge. Re-apply the panel-review label after addressing feedback to re-run.}

Fold recommended follow-ups from apm-review-panel into PR #1441: - Tighten test_d2_multi_target_drift_reports_secondary_target_files to assert kinds['.claude/rules/rules.md'] == 'modified' (was in {'modified','orphaned'}). The prior disjunction accepted the pre-fix false-orphan symptom, so a regression to #1411 would pass CI silently. Mutation-break verified: removing the scratch pre-create loop in drift.py turns this assertion RED with 'orphaned'. - Update TestSectionDMultiTarget docstring to describe the post-fix invariant (auto_create=False targets re-deploy into scratch) and enumerate the affected IDE matrix. - CHANGELOG: add Fixed entry under Unreleased naming each auto_create=False IDE target (cursor, windsurf, claude, codex, copilot, gemini, opencode) for ctrl-F discoverability. Co-authored-by: Sergio Sisternes <sergio.sisternes@epam.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danielmeppiel · 2026-05-22T07:09:04Z

Folded recommended follow-ups -- commit `87eefac`

Per panel recommendation, folded the single RECOMMENDED follow-up plus one related nit directly into this PR instead of deferring to a tracking issue.

Items folded

[Test Coverage Expert -- recommended] Tighten test_d2 disjunction at integration tier.
- tests/integration/test_drift_check.py:558-590 -- assert kinds[".claude/rules/rules.md"] == "modified" (was in {"modified","orphaned"}).
- tests/integration/test_drift_check.py:537-547 -- updated TestSectionDMultiTarget docstring to describe the post-[BUG] apm audit reports false orphaned drift for non-skill primitives on targets with auto_create=False #1411 invariant and list the affected auto_create=False IDE matrix.
- Mutation-break verified: temporarily reverting the pre-create loop in src/apm_cli/install/drift.py:431-433 turns this assertion RED with {'.claude/rules/rules.md': 'orphaned'} -- the exact [BUG] apm audit reports false orphaned drift for non-skill primitives on targets with auto_create=False #1411 symptom. Production code restored byte-for-byte before commit.
[OSS Growth Hacker -- nit] CHANGELOG entry naming each affected IDE.
- CHANGELOG.md:10-12 -- new ### Fixed block under ## [Unreleased] lists cursor, windsurf, claude, codex, copilot, gemini, opencode for ctrl-F discoverability.

Items deferred

[DevX UX -- nit] Parametrize unit test across all auto_create=False targets -- separable; would expand the unit harness and is not on this PR's critical path.
[Python Architect -- nit] Add side_effect sequencing assertion in test_run_replay_precreates_scratch_target_dirs -- separable; current empty-lockfile trap already catches the most likely regression mode (loop removal).
[Python Architect -- nit] Extract duplicated auto_create guard into BaseIntegrator._should_skip_target -- cross-cutting refactor across 5 integrators; separate PR.
[CLI Logging -- nit] Verbose-only seed-count log line -- separable observability improvement.
[Supply Chain Security -- nit] Path-traversal guard for dynamic root_dir overrides -- future-proofing; no current threat (all KNOWN_TARGETS root_dir values are safe static strings).

Local validation

uv run --extra dev ruff check src/ tests/ -- silent.
uv run --extra dev ruff format --check src/ tests/ -- 1034 files already formatted.
uv run --extra dev pytest tests/integration/test_drift_check.py tests/unit/install/test_drift_auto_create.py -- 35 passed.
Mutation-break gate on the tightened test_d2: RED on production guard removal, GREEN with guard restored.

CI status

Workflow runs for SHA 87eefac were not auto-created on push -- this repo gates outside-contributor PR runs behind maintainer approval. @danielmeppiel please click "Approve and run" on the Actions tab to spin up CI for the new SHA. All local checks passed.

Verdict

ready-to-merge once CI runs are approved and green.

Fold recommended follow-ups from apm-review-panel into PR #1441: - Tighten test_d2_multi_target_drift_reports_secondary_target_files to assert kinds['.claude/rules/rules.md'] == 'modified' (was in {'modified','orphaned'}). The prior disjunction accepted the pre-fix false-orphan symptom, so a regression to #1411 would pass CI silently. Mutation-break verified: removing the scratch pre-create loop in drift.py turns this assertion RED with 'orphaned'. - Update TestSectionDMultiTarget docstring to describe the post-fix invariant (auto_create=False targets re-deploy into scratch) and enumerate the affected IDE matrix. - CHANGELOG: add Fixed entry under Unreleased naming each auto_create=False IDE target (cursor, windsurf, claude, codex, copilot, gemini, opencode) for ctrl-F discoverability. Co-authored-by: Sergio Sisternes <sergio.sisternes@epam.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danielmeppiel · 2026-05-22T07:58:24Z

Rebased onto current main and resolved conflicts at 36d878f. Local validation: lint silent, custom CI guard (no str-relative-to pattern) silent in src/ and no new occurrences in tests/ diff, targeted tests pass (1339 passed), mutation-break gate verified (removing the pre-create-dirs guard fails 2 regression tests including test_d2).

During drift replay, integrators with auto_create=False skip non-skill primitives because the scratch directory is empty. Pre-create all target root dirs in scratch before the replay loop so the auto_create guard passes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

) * feat(batch-bug-shepherd): operator visibility + fold-in invariant Refactors the batch-bug-shepherd skill to address two genesis-validated blockers and ship two missing capabilities discovered during a real sweep-all run over 14 microsoft/apm bugs: 1. OPERATOR VISIBILITY (was: silent 30-minute fan-outs) - New asset assets/progress-diagram.md: mermaid template + 5-state palette (pending/active/done/blocked/skipped) + per-phase render rules + dispatch-table contract. - SKILL.md adds 'Operator visibility is a contract' invariant; each phase boundary re-renders the diagram with current-phase coloring and prints a subagent_id -> target dispatch table BEFORE fan-out. - Operator can follow long sagas at a glance instead of waiting in the dark for the next checkpoint. 2. FOLD-IN INVARIANT (was: panel recommendations silently dropped) - assets/verdict-schema.json: shepherd_return gains required recommended_followups[] channel; completion_return gains folded_followups[] + deferred_followups[]; extracted reusable followup_item definition. - assets/shepherd-prompt.md: fixed verdict mapping bug (ship_with_followups + 0 blocking -> ready-to-merge, not needs-author-changes); added recommended_followups extraction step with required source_persona + optional fold_hint tagging. - assets/completion-prompt.md: full rewrite. Adds RECOMMENDED_FOLLOWUPS input; encodes FOLD vs DEFER classifier (FOLD: touches diff / single helper / regression trap / hermetic test / inline comment; DEFER: cross-cutting refactor / new feature / broad doc / architectural addition); per-FOLD item consultation with source_persona + python-architect lens; DEFER items filed as gh issue create tracking issues (never silently dropped); mid-flight reclassify rule to avoid stalls. - SKILL.md adds 'Bias toward folding recommended items' invariant and rewrites Phase 4 spawn contract (9 steps) to thread the recommended_followups channel end-to-end. Eval gate - +3 rubric anchors per content fixture (progress-diagram-header, mermaid-flowchart-rendered, dispatch-table-before-fanout) and +3 invariant anchors (recommended-followups-channel, fold-defer-classifier, tracking-issue-for-defer). - All 12 new anchors MATCH with_skill fixtures and MISS without_skill fixtures (clean value delta). - +3 no-fire trigger items for single-PR fold-in phrasing so the dispatcher will not misfire the batch outer-loop on single-PR fold work (e.g. 'fold the panel recommendations into PR #1234' remains apm-review-panel completion territory). Validation - Schema validates via jsonschema Draft7; accepts new shapes, rejects shepherd_return missing recommended_followups[]. - SKILL.md: 367 lines / 4483 tokens (caps: 500 / 5000). - Description: 965 / 1024 chars; mentions FOLD invariant. - 0 non-ASCII bytes across all modified files. - All 4 changed JSON files parse. Real-task evidence (this skill iteration was driven by a live run) - 5 of 6 in-flight community PRs had their panel recommendations folded in-PR by completion subagents following the new contract, yielding 22 folded items and 8 deferred-to-tracking items across PRs #1387, #1396, #1441, #1443, #1444. The 6th (#1442) is in flight as this lands. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * batch-bug-shepherd: add Phase 5 mergeability gate Adds a post-wave gate that re-probes mergeability for every PR the saga marked ready-to-merge, dispatches one conflict-resolution subagent per CONFLICTING PR, and partitions returns into four post-gate statuses before the final report claims anything is mergeable. Mergeability is post-wave truth, not pre-wave assumption: a PR that Phase 4 marked ready can stop being mergeable the moment the maintainer lands another PR onto main. Without this gate the report ships stale ready claims. Design driven through the genesis skill end-to-end (steps 1-6 handoff packet, steps 7a-7b coder pass, step 8 validation): - NEW Phase 5 (mergeability gate) between completion (Phase 4) and renamed final report (Phase 5 -> Phase 6). - Sub-phases 5a probe (read-only, single-thread, gh pr view --json mergeStateStatus), 5b fan-out (one conflict-resolution subagent per CONFLICTING PR), 5c trust-but-verify re-probe + four-way partition (resolved / requires-author-action / requires-human-judgment / resolution-failed). - NEW assets/conflict-resolution-prompt.md spawn body for 5b. Encodes rebase, faithful merge of both intents, mutation-break re-check, lint silent, --force-with-lease push, re-probe, resolution-confirmation comment. - NEW references/mergeability-gate.md load-on-demand orchestrator step-by-step (load trigger: WHEN ENTERING PHASE 5). Keeps SKILL.md under 5000-token budget. - Schema extends verdict-schema.json oneOf with conflict_resolution_return; --force-with-lease enforced via regex pattern guard on push_command field; bare --force rejected. Five rejection cases validated. - Two-comment-per-PR cap as new architecture invariant: at most one completion-confirmation (Phase 4) + one resolution-confirmation (Phase 5b) per PR. - Progress diagram extended with WAVE4 subgraph (P5a/P5b/P5c), skipped-state semantics, P5b dispatch table requirement. - Final report extended with three new partition sections plus a RESOLUTION CONFIRMATION COMMENT block and mergeability-gate disciplines line. - Evals: +3 content rubric anchors (mergeability-probe-cli, force-with-lease-on-push, post-wave-partition-columns) + 1 optional anchor (two-comment-cap); +1 fire + 2 no-fire trigger items; fixture diff shows the gate firing on a sweep with 2 conflicting PRs and the without-skill failure mode (stale ready claim). SKILL.md: 388 lines / 4867 tokens (budget 500/5000). ASCII only. CI lint pair silent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat(bbs): add Phase 1.5 strategic-alignment gate + PRINCIPLES.md Adds a new wave between Phase 1 (triage) and Phase 2 (PR-in-flight cross-reference) that checks every LEGIT bug against the project's rejection contract before spending shepherd / fix / completion work on it. What changes: - NEW PRINCIPLES.md at repo root: 7 numbered principles encoding the project's hard nos (P1 no invented frontmatter; P2 multi-harness with traction gating; P3 vendor neutral; P4 UX floor is not a trade) plus 3 supporting principles (P5 portability; P6 reliability over magic; P7 community over feature count). Bound to apm-ceo + bbs Phase 1.5 + apm-triage-panel + apm-review-panel as the rejection contract. - NEW bbs Phase 1.5 strategic-alignment gate (WAVE 1.5): - one apm-ceo subagent per LEGIT row, in parallel - 4-state verdict: aligned | aligned-with-reservations | out-of-scope | wrong-direction - schema-validated returns; FAILS OPEN on infrastructure failure (malformed-x2 or non-citable principle) so legit bugs are never silently demoted under gate breakage - ABORTS only when apm-ceo.agent.md or PRINCIPLES.md itself is missing (operator-actionable error) - demoted rows flip to status triaged-deferred and SKIP Phase 2/3/4/5; surface in Phase 6 under 'Recommend close as out-of-scope' partition - aligned-with-reservations rows stay in saga; downstream phases surface the reservations in review prose - deferred-PR strategic-rejection comment subagent (S7+S4+A9) posts a courtesy comment on any open PR whose underlying issue was demoted, using the would-be Phase-4 completion-comment slot (two-comments-per-PR cap preserved) - Verdict schema extended with 5th oneOf member strategic_alignment_return (kind, issue, verdict, cited_principle, rationale, reservations). - Ground-truth table grows two columns (strategic_verdict + strategic_rationale) and one status value (triaged-deferred). - Progress diagram inserts P15 between P1 and P2; dispatch-table contract extends to Phase 1.5. - Final-report template adds 'Recommend close as out-of-scope' partition and 'Aligned with reservations' surfacing section. - 2 new fire trigger evals + 1 no-fire (PRINCIPLES.md authoring guard) + 1 new rubric anchor on the three-issues-mixed scenario. Genesis design artifact lives in the session plan store; SKILL.md body remains within 500-line / 5000-token budget (406 lines / 4943 tokens after trimming pre-existing verbose passages on operator-visibility, mergeability, fold-in, composition, and operating-contract sections to make room). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: danielmeppiel <danielmeppiel@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 21, 2026 20:22

sergio-sisternes-epam requested a review from danielmeppiel as a code owner May 21, 2026 20:22

Copilot started reviewing on behalf of sergio-sisternes-epam May 21, 2026 20:22 View session

Copilot AI reviewed May 21, 2026

View reviewed changes

Comment thread src/apm_cli/install/drift.py

sergio-sisternes-epam force-pushed the fix/1411-audit-false-orphaned-drift branch from 18d8ed7 to b759525 Compare May 21, 2026 20:32

sergio-sisternes-epam mentioned this pull request May 21, 2026

[BUG] apm audit reports false orphaned drift for non-skill primitives on targets with auto_create=False #1411

Closed

danielmeppiel mentioned this pull request May 22, 2026

feat(batch-bug-shepherd): operator visibility + fold-in invariant #1451

Merged

danielmeppiel force-pushed the fix/1411-audit-false-orphaned-drift branch from 87eefac to 36d878f Compare May 22, 2026 07:58

sergio-sisternes-epam enabled auto-merge May 22, 2026 08:06

sergio-sisternes-epam force-pushed the fix/1411-audit-false-orphaned-drift branch from 6214352 to 7884ec8 Compare May 22, 2026 08:09

sergio-sisternes-epam force-pushed the fix/1411-audit-false-orphaned-drift branch from 7884ec8 to 559f5ae Compare May 22, 2026 08:13

Merge branch 'main' into fix/1411-audit-false-orphaned-drift

16e34aa

danielmeppiel disabled auto-merge May 22, 2026 12:41

danielmeppiel merged commit 75d9c77 into main May 22, 2026
10 checks passed

danielmeppiel deleted the fix/1411-audit-false-orphaned-drift branch May 22, 2026 12:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(audit): pre-create target dirs in drift replay scratch (#1411)#1441

fix(audit): pre-create target dirs in drift replay scratch (#1411)#1441
danielmeppiel merged 2 commits into
mainfrom
fix/1411-audit-false-orphaned-drift

sergio-sisternes-epam commented May 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

danielmeppiel commented May 21, 2026

Python Architect

CLI Logging Expert

DevX UX Expert

Supply Chain Security Expert

OSS Growth Hacker

Test Coverage Expert

Auth Expert -- inactive

Doc Writer -- inactive

Performance Expert -- inactive

Uh oh!

danielmeppiel commented May 22, 2026

Uh oh!

danielmeppiel commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sergio-sisternes-epam commented May 21, 2026

Description

Type of change

Testing

Test added

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

danielmeppiel commented May 21, 2026

APM Review Panel: ship_with_followups

Panel summary

Top 5 follow-ups

Architecture

Recommendation

Python Architect

CLI Logging Expert

DevX UX Expert

Supply Chain Security Expert

OSS Growth Hacker

Test Coverage Expert

Auth Expert -- inactive

Doc Writer -- inactive

Performance Expert -- inactive

Uh oh!

danielmeppiel commented May 22, 2026

Folded recommended follow-ups -- commit 87eefac

Items folded

Items deferred

Local validation

CI status

Verdict

Uh oh!

danielmeppiel commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

APM Review Panel: `ship_with_followups`

Folded recommended follow-ups -- commit `87eefac`