Skip to content

fix(audit): pre-create target dirs in drift replay scratch (#1411)#1441

Merged
danielmeppiel merged 2 commits into
mainfrom
fix/1411-audit-false-orphaned-drift
May 22, 2026
Merged

fix(audit): pre-create target dirs in drift replay scratch (#1411)#1441
danielmeppiel merged 2 commits into
mainfrom
fix/1411-audit-false-orphaned-drift

Conversation

@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator

Description

Fix false orphaned drift reports for non-skill primitives on targets with auto_create=False.

During drift replay, the scratch directory is empty, causing integrators with auto_create=False (windsurf, cursor, claude, codex, copilot, gemini, opencode) to skip all non-skill primitives. The diff then reports these correctly-installed files as orphaned.

The fix pre-creates all target root directories in the scratch dir before the replay loop, so the auto_create guard passes.

Fixes #1411

Type of change

  • Bug fix
  • New feature
  • Documentation
  • Maintenance / refactor

Testing

  • Tested locally
  • All existing tests pass (72 drift tests)
  • Added tests for new functionality

Test added

tests/unit/install/test_drift_auto_create.py — regression test confirming:

  1. Without pre-created target root: auto_create=False skips integration (the bug)
  2. With pre-created target root: integration proceeds normally (the fix)

Copilot AI review requested due to automatic review settings May 21, 2026 20:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes false orphaned drift findings during apm audit by making the drift replay scratch tree more closely mirror a real install for targets where auto_create=False (e.g., windsurf), ensuring non-skill primitives are replayed instead of silently skipped.

Changes:

  • Pre-create each resolved target’s root_dir under the drift replay scratch directory before running integrators.
  • Add a unit regression test illustrating the auto_create=False guard behavior when the target root directory is missing vs present.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/apm_cli/install/drift.py Seeds scratch with target root directories so replay doesn’t skip non-skill primitives for auto_create=False targets.
tests/unit/install/test_drift_auto_create.py Adds a regression test demonstrating integration skip vs success depending on whether the target root directory exists.

Comment thread src/apm_cli/install/drift.py
@danielmeppiel
Copy link
Copy Markdown
Collaborator

APM Review Panel: ship_with_followups

Fixes false-orphan drift reports on every auto_create=False AI-coding IDE target, restoring audit truthfulness across the majority IDE matrix.

cc @sergio-sisternes-epam @danielmeppiel -- a fresh advisory pass is ready for your review.

All six active panelists converge: the fix is architecturally correct, security-neutral, and restores a critical user-facing promise -- that apm audit tells the truth about repo cleanliness. The 8-line change seeds scratch-root target directories at exactly the right lifecycle point (after target resolution, before the replay loop), mirroring the real-world invariant that these directories exist on disk when the IDE integration ran. The alternative -- pushing scratch awareness into each integrator -- would have leaked replay concerns into 5+ classes for no user benefit.

The strongest post-merge signal comes from test-coverage-expert: the new unit test is a durable regression trap (the empty-lockfile trick ensures the directory MUST be created by the explicit loop, not as a side-effect). However, the existing integration test test_d2_multi_target_drift_reports_secondary_target_files uses a disjunction (in {'modified','orphaned'}) that accepts the exact false-orphan symptom this PR fixes. That means removing the fix would not break integration CI. This is a real gap on a governed-by-policy surface -- the user promise has no integration-tier assertion that would catch a regression at the symptom level. This follow-up is the single highest-priority item post-merge.

No panelist disagreement exists. The fix is correct, the unit trap is tight, and the follow-ups are all improvement-tier polish or future-proofing.

Aligned with: Governed by policy (audit truthfulness is a governance promise; this fix restores it for the full auto_create=False target set); Multi-harness/multi-host (covers windsurf, cursor, claude, codex, copilot, gemini, opencode -- the majority AI-coding IDE matrix); OSS community-driven (community contributor fix for a community-reported bug; fast triage-to-merge signals contributor respect).

Growth signal. This is a trust-recovery moment worth amplifying: apm audit was lying about repo cleanliness on every popular AI-coding IDE except VSCode. The fix -- from a community contributor -- restores truthfulness across Cursor, Windsurf, Claude Code, Codex, Copilot, Gemini, and OpenCode. Release notes should frame this as "audit now tells the truth on every AI-coding IDE" -- a strong hook for evaluators comparing APM against manual dotfile management.

Panel summary

Persona B R N Takeaway
Python Architect 0 0 2 Architecturally sound: pre-create at the right lifecycle point. Test sequencing could be tighter; duplicated auto_create guard across 5 integrators is a future extraction target.
CLI Logging Expert 0 0 1 Render functions are clean; pre-create loop is silent even under --verbose -- one verbose-only seed log would aid observability at zero default noise.
DevX UX Expert 0 0 1 Fix protects audit-truthfulness across the matrix; tests only exercise windsurf -- parametrization would lock the cross-IDE contract explicitly.
Supply Chain Security Expert 0 0 1 KNOWN_TARGETS are safe static paths and scratch_root is bounded -- no current threat. A path-traversal guard would future-proof against dynamic root_dir overrides.
OSS Growth Hacker 0 0 2 High-value trust-recovery fix from a community contributor on the AI-coding-IDE majority -- worth a thank-you and a release-note beat.
Test Coverage Expert 0 1 1 Unit regression trap is durable (verified by pytest run); integration test_d2 accepts both 'modified' and 'orphaned' so a re-regression to false-orphan would slip past CI.

B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.

Top 5 follow-ups

  1. [Test Coverage Expert] Tighten test_d2 disjunction to assert == 'modified' (or add test_d4_auto_create_false_target_tamper_reports_modified_not_orphaned) -- The integration test currently accepts both 'modified' and 'orphaned', meaning a regression to the exact [BUG] apm audit reports false orphaned drift for non-skill primitives on targets with auto_create=False #1411 symptom would pass CI silently. This is a governed-by-policy surface with no integration-tier guardrail on the fixed behavior.
  2. [DevX UX Expert] Parametrize unit test across all 7-8 auto_create=False targets -- Current test exercises only windsurf. A parametrized matrix locks in the cross-IDE contract and catches future targets with divergent root_dir layouts.
  3. [Python Architect] Add a side_effect assertion in test_run_replay_precreates_scratch_target_dirs verifying dirs exist on first integrate_package_primitives call -- Current test would still pass if the pre-create were accidentally moved inside the per-dep loop; a sequencing assertion makes the trap tighter.
  4. [OSS Growth Hacker] CHANGELOG entry should name affected IDE targets explicitly for ctrl-F discoverability -- Users searching for their specific IDE (cursor, windsurf, claude) will find the fix without reading the full entry.
  5. [Python Architect] Extract duplicated auto_create guard from 5 integrators into BaseIntegrator -- Reduces future drift risk as new integrators are added; not blocking because the duplication is stable today.

Architecture

classDiagram
    direction LR
    class TargetProfile {
      <<ValueObject>>
      +name str
      +root_dir str
      +auto_create bool
      +primitives dict
    }
    class BaseIntegrator {
      <<Base>>
      +integrate(target, project_root, ...) IntegrationResult
    }
    class InstructionIntegrator {
      +integrate(target, project_root, ...) IntegrationResult
    }
    class PromptIntegrator {
      +integrate(target, project_root, ...) IntegrationResult
    }
    class AgentIntegrator {
      +integrate(target, project_root, ...) IntegrationResult
    }
    class CommandIntegrator {
      +integrate(target, project_root, ...) IntegrationResult
    }
    class SkillIntegrator {
      +integrate(target, project_root, ...) IntegrationResult
    }
    class run_replay {
      <<IOBoundary>>
      +run_replay(config) Path
    }
    class integrate_package_primitives {
      <<Facade>>
      +integrate_package_primitives(...) dict
    }
    class IntegrationResult {
      <<ValueObject>>
      +files_integrated int
      +files_skipped int
    }
    BaseIntegrator <|-- InstructionIntegrator
    BaseIntegrator <|-- PromptIntegrator
    BaseIntegrator <|-- AgentIntegrator
    BaseIntegrator <|-- CommandIntegrator
    BaseIntegrator <|-- SkillIntegrator
    run_replay ..> integrate_package_primitives : delegates per dep
    integrate_package_primitives ..> InstructionIntegrator : dispatches
    integrate_package_primitives ..> PromptIntegrator : dispatches
    integrate_package_primitives ..> AgentIntegrator : dispatches
    integrate_package_primitives ..> CommandIntegrator : dispatches
    integrate_package_primitives ..> SkillIntegrator : dispatches
    InstructionIntegrator ..> TargetProfile : reads auto_create
    PromptIntegrator ..> TargetProfile : reads auto_create
    AgentIntegrator ..> TargetProfile : reads auto_create
    CommandIntegrator ..> TargetProfile : reads auto_create
    SkillIntegrator ..> TargetProfile : reads auto_create
    InstructionIntegrator ..> IntegrationResult : returns
    class run_replay:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading
flowchart TD
    A["apm audit --check drift"] --> B["run_replay(config)"]
    B --> C["_make_scratch_root(project_root)"]
    C --> D["resolve_targets(project_root)"]
    D --> E["_make_integrators()"]
    E --> F["PRE-CREATE LOOP<br/>for target in targets:<br/>  (scratch_root / target.root_dir).mkdir()"]
    F --> G["_ReadOnlyProjectGuard context"]
    G --> H{"for lock_dep in lock.get_all_dependencies()"}
    H --> I["_materialize_install_path"]
    I --> J["integrate_package_primitives<br/>project_root=scratch_root"]
    J --> K{"auto_create check:<br/>(project_root / target.root_dir).is_dir()?"}
    K -- "True (dir pre-created)" --> L["deploy files into scratch"]
    K -- "False (no pre-create)" --> M["return IntegrationResult(0,0,0,[])<br/>FALSE ORPHANED"]
    L --> N["next dep"]
    N --> H
    H -- "all deps done" --> O["diff_scratch_against_project"]
    O --> P["DriftReport"]

    style F fill:#fff3b0,stroke:#d47600
    style M fill:#ffcccc,stroke:#cc0000
Loading
sequenceDiagram
    participant User
    participant CLI as apm audit --check drift
    participant Replay as run_replay
    participant Scratch as scratch tmpdir
    participant Integ as InstructionIntegrator
    participant Diff as diff_scratch_against_project

    User->>CLI: apm audit --check drift
    CLI->>Replay: run_replay(config)
    Replay->>Scratch: mkdir(scratch_root)
    Replay->>Replay: resolve_targets()
    Note over Replay,Scratch: NEW: pre-create target root dirs
    Replay->>Scratch: mkdir(scratch_root / .windsurf)
    Replay->>Scratch: mkdir(scratch_root / .cursor)
    loop each locked dependency
        Replay->>Integ: integrate(target, project_root=scratch_root)
        Integ->>Scratch: check (scratch_root / target.root_dir).is_dir()
        Note over Integ: Previously False -> skip -> false orphan
        Note over Integ: Now True -> integrate correctly
        Integ->>Scratch: write files
    end
    Replay->>Diff: diff_scratch_against_project()
    Diff-->>CLI: DriftReport (clean, no false orphans)
    CLI-->>User: No drift detected
Loading

Recommendation

Merge as-is. The fix is correct, regression-trapped at unit tier, and restores a critical audit-truthfulness promise across the majority IDE matrix. Track the test_d2 disjunction tightening (follow-up #1) as the highest-signal post-merge item -- it is the only path where a future regression could slip past CI undetected on this governed-by-policy surface.


Full per-persona findings

Python Architect

  • [nit] Test does not defend against pre-create loop being moved inside the per-package block at tests/unit/install/test_drift_auto_create.py
    test_run_replay_precreates_scratch_target_dirs asserts the directory exists after run_replay returns, but it would still pass if the pre-create were accidentally moved inside the for-loop (per-dep). Correctness requires the dirs to exist BEFORE the first integrator call.
    Suggested: Add a side_effect to a mock of integrate_package_primitives that asserts (scratch_root / target.root_dir).is_dir() on first call, proving sequencing.
  • [nit] Duplicated auto_create guard across 5 integrators is a future extraction candidate at src/apm_cli/integration/instruction_integrator.py:86
    instruction_integrator.py:86, prompt_integrator.py:110, agent_integrator.py:111, command_integrator.py:481, skill_integrator.py:341 all duplicate if not target.auto_create and not (project_root / target.root_dir).is_dir(): return .... Not blocking for this PR; a BaseIntegrator._should_skip_target(target, project_root) -> bool would make the guard discoverable and testable in one place.

CLI Logging Expert

  • [nit] Pre-create loop is silent even in verbose mode at src/apm_cli/install/drift.py:427
    A single logger.verbose-gated line (e.g. [i] Seeded N scratch target root(s)) would help confirm the fix fired during apm audit --verbose without adding noise for default users.

DevX UX Expert

  • [nit] New tests only exercise windsurf, not the full auto_create=False target matrix at tests/unit/install/test_drift_auto_create.py:21
    pytest.mark.parametrize across the 7-8 auto_create=False targets would prove the contract holds across the matrix and catch a future target whose root_dir layout diverges. Low risk today since the fix loop is target-agnostic, but parametrization makes the regression net explicit.

Supply Chain Security Expert

  • [nit] Future-proof against dynamic _target.root_dir path-traversal at src/apm_cli/install/drift.py:431
    All KNOWN_TARGETS root_dir values are safe static strings today, and scratch_root is bound outside the project tree. If a future target profile or apm.yml override introduces a dynamic root_dir, the join would silently escape scratch. A one-line assertion that the resolved path stays under scratch_root would future-proof.

OSS Growth Hacker

  • [nit] CHANGELOG entry should name affected IDE targets explicitly at CHANGELOG.md
    Users searching their IDE name will ctrl-F the changelog. Listing windsurf, cursor, claude, codex, copilot, gemini, opencode beats the generic "non-VSCode targets" phrasing for discoverability.
  • [nit] Inline comment in drift.py could name affected targets at src/apm_cli/install/drift.py:427
    A short comment listing the auto_create=False targets makes the blast radius obvious to future readers without cross-referencing targets.py.

Test Coverage Expert

  • [recommended] Integration test accepts false-orphan kind -- cannot catch [BUG] apm audit reports false orphaned drift for non-skill primitives on targets with auto_create=False #1411 re-regression at tests/integration/test_drift_check.py
    test_d2_multi_target_drift_reports_secondary_target_files asserts kinds['.claude/rules/rules.md'] in {'modified','orphaned'}. Before this fix the audit falsely reported orphaned; after it reports modified. The disjunction accepts both, so removing the pre-create loop would NOT fail this test. The user-visible promise has no integration-tier assertion that would fail on the exact symptom of [BUG] apm audit reports false orphaned drift for non-skill primitives on targets with auto_create=False #1411.
    Suggested: Either tighten test_d2 to assert == 'modified' or add a sibling integration test (e.g. test_d4_auto_create_false_target_tamper_reports_modified_not_orphaned) that pins the fix at integration tier.
    Proof (missing at): tests/integration/test_drift_check.py::test_d4_auto_create_false_target_tamper_reports_modified_not_orphaned -- proves: apm audit reports 'modified' (not false-orphan) when a deployed file is tampered on an auto_create=False target after a clean install [devx,governed-by-policy]
    assert kinds['.claude/rules/rules.md'] == 'modified'
  • [nit] Unit regression trap is durable (verified by running tests) at tests/unit/install/test_drift_auto_create.py
    test_run_replay_precreates_scratch_target_dirs uses an empty lockfile so the pre-create loop body never runs as a side-effect of dep integration -- the asserted directory MUST be created by the explicit loop. This catches the pre-create-loop being removed. test_auto_create_false_targets_still_integrate_in_scratch directly confirms the InstructionIntegrator contract.
    Proof (passed): tests/unit/install/test_drift_auto_create.py::test_run_replay_precreates_scratch_target_dirs -- proves: run_replay pre-creates target root dirs in scratch before the integration loop [devx]
    assert (scratch / windsurf.root_dir).is_dir()

Auth Expert -- inactive

Diff touches only drift.py + a new test file; no auth.py, token_manager.py, azure_cli.py, github_downloader.py, marketplace/client.py, github_host.py, install/validation.py, install/pipeline.py, deps/registry_proxy.py -- and no host classification, token-source, or clone-URL change.

Doc Writer -- inactive

No README/CHANGELOG/MANIFESTO/docs/.apm/.github prose changed; the diff is a runtime bug fix + unit test, not a documentation surface. (growth-hacker still flags an OPTIONAL CHANGELOG-entry call-out for release time -- not a doc drift today.)

Performance Expert -- inactive

Diff is outside the perf hot path (cache/, deps/, install/phases/, install/pipeline.py, install/resolve.py, scripts/perf/) and makes no perf claim; pre-creating up to ~10 directories per audit run is negligible vs. the existing replay cost.

This panel is advisory. It does not block merge. Re-apply the panel-review label after addressing feedback to re-run.

danielmeppiel pushed a commit that referenced this pull request May 22, 2026
Fold recommended follow-ups from apm-review-panel into PR #1441:

- Tighten test_d2_multi_target_drift_reports_secondary_target_files
  to assert kinds['.claude/rules/rules.md'] == 'modified' (was
  in {'modified','orphaned'}). The prior disjunction accepted the
  pre-fix false-orphan symptom, so a regression to #1411 would
  pass CI silently. Mutation-break verified: removing the scratch
  pre-create loop in drift.py turns this assertion RED with
  'orphaned'.
- Update TestSectionDMultiTarget docstring to describe the post-fix
  invariant (auto_create=False targets re-deploy into scratch) and
  enumerate the affected IDE matrix.
- CHANGELOG: add Fixed entry under Unreleased naming each
  auto_create=False IDE target (cursor, windsurf, claude, codex,
  copilot, gemini, opencode) for ctrl-F discoverability.

Co-authored-by: Sergio Sisternes <sergio.sisternes@epam.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel
Copy link
Copy Markdown
Collaborator

Folded recommended follow-ups -- commit 87eefac

Per panel recommendation, folded the single RECOMMENDED follow-up plus one related nit directly into this PR instead of deferring to a tracking issue.

Items folded

Items deferred

  • [DevX UX -- nit] Parametrize unit test across all auto_create=False targets -- separable; would expand the unit harness and is not on this PR's critical path.
  • [Python Architect -- nit] Add side_effect sequencing assertion in test_run_replay_precreates_scratch_target_dirs -- separable; current empty-lockfile trap already catches the most likely regression mode (loop removal).
  • [Python Architect -- nit] Extract duplicated auto_create guard into BaseIntegrator._should_skip_target -- cross-cutting refactor across 5 integrators; separate PR.
  • [CLI Logging -- nit] Verbose-only seed-count log line -- separable observability improvement.
  • [Supply Chain Security -- nit] Path-traversal guard for dynamic root_dir overrides -- future-proofing; no current threat (all KNOWN_TARGETS root_dir values are safe static strings).

Local validation

  • uv run --extra dev ruff check src/ tests/ -- silent.
  • uv run --extra dev ruff format --check src/ tests/ -- 1034 files already formatted.
  • uv run --extra dev pytest tests/integration/test_drift_check.py tests/unit/install/test_drift_auto_create.py -- 35 passed.
  • Mutation-break gate on the tightened test_d2: RED on production guard removal, GREEN with guard restored.

CI status

Workflow runs for SHA 87eefac were not auto-created on push -- this repo gates outside-contributor PR runs behind maintainer approval. @danielmeppiel please click "Approve and run" on the Actions tab to spin up CI for the new SHA. All local checks passed.

Verdict

ready-to-merge once CI runs are approved and green.

danielmeppiel pushed a commit that referenced this pull request May 22, 2026
Fold recommended follow-ups from apm-review-panel into PR #1441:

- Tighten test_d2_multi_target_drift_reports_secondary_target_files
  to assert kinds['.claude/rules/rules.md'] == 'modified' (was
  in {'modified','orphaned'}). The prior disjunction accepted the
  pre-fix false-orphan symptom, so a regression to #1411 would
  pass CI silently. Mutation-break verified: removing the scratch
  pre-create loop in drift.py turns this assertion RED with
  'orphaned'.
- Update TestSectionDMultiTarget docstring to describe the post-fix
  invariant (auto_create=False targets re-deploy into scratch) and
  enumerate the affected IDE matrix.
- CHANGELOG: add Fixed entry under Unreleased naming each
  auto_create=False IDE target (cursor, windsurf, claude, codex,
  copilot, gemini, opencode) for ctrl-F discoverability.

Co-authored-by: Sergio Sisternes <sergio.sisternes@epam.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel force-pushed the fix/1411-audit-false-orphaned-drift branch from 87eefac to 36d878f Compare May 22, 2026 07:58
@danielmeppiel
Copy link
Copy Markdown
Collaborator

Rebased onto current main and resolved conflicts at 36d878f. Local validation: lint silent, custom CI guard (no str-relative-to pattern) silent in src/ and no new occurrences in tests/ diff, targeted tests pass (1339 passed), mutation-break gate verified (removing the pre-create-dirs guard fails 2 regression tests including test_d2).

@sergio-sisternes-epam sergio-sisternes-epam force-pushed the fix/1411-audit-false-orphaned-drift branch from 6214352 to 7884ec8 Compare May 22, 2026 08:09
During drift replay, integrators with auto_create=False skip
non-skill primitives because the scratch directory is empty.
Pre-create all target root dirs in scratch before the replay
loop so the auto_create guard passes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam sergio-sisternes-epam force-pushed the fix/1411-audit-false-orphaned-drift branch from 7884ec8 to 559f5ae Compare May 22, 2026 08:13
@danielmeppiel danielmeppiel disabled auto-merge May 22, 2026 12:41
@danielmeppiel danielmeppiel merged commit 75d9c77 into main May 22, 2026
10 checks passed
@danielmeppiel danielmeppiel deleted the fix/1411-audit-false-orphaned-drift branch May 22, 2026 12:41
danielmeppiel added a commit that referenced this pull request May 22, 2026
)

* feat(batch-bug-shepherd): operator visibility + fold-in invariant

Refactors the batch-bug-shepherd skill to address two genesis-validated
blockers and ship two missing capabilities discovered during a real
sweep-all run over 14 microsoft/apm bugs:

1. OPERATOR VISIBILITY (was: silent 30-minute fan-outs)
   - New asset assets/progress-diagram.md: mermaid template + 5-state
     palette (pending/active/done/blocked/skipped) + per-phase render
     rules + dispatch-table contract.
   - SKILL.md adds 'Operator visibility is a contract' invariant; each
     phase boundary re-renders the diagram with current-phase coloring
     and prints a subagent_id -> target dispatch table BEFORE fan-out.
   - Operator can follow long sagas at a glance instead of waiting in
     the dark for the next checkpoint.

2. FOLD-IN INVARIANT (was: panel recommendations silently dropped)
   - assets/verdict-schema.json: shepherd_return gains required
     recommended_followups[] channel; completion_return gains
     folded_followups[] + deferred_followups[]; extracted reusable
     followup_item definition.
   - assets/shepherd-prompt.md: fixed verdict mapping bug
     (ship_with_followups + 0 blocking -> ready-to-merge, not
     needs-author-changes); added recommended_followups extraction
     step with required source_persona + optional fold_hint tagging.
   - assets/completion-prompt.md: full rewrite. Adds
     RECOMMENDED_FOLLOWUPS input; encodes FOLD vs DEFER classifier
     (FOLD: touches diff / single helper / regression trap / hermetic
     test / inline comment; DEFER: cross-cutting refactor / new
     feature / broad doc / architectural addition); per-FOLD item
     consultation with source_persona + python-architect lens;
     DEFER items filed as gh issue create tracking issues (never
     silently dropped); mid-flight reclassify rule to avoid stalls.
   - SKILL.md adds 'Bias toward folding recommended items' invariant
     and rewrites Phase 4 spawn contract (9 steps) to thread the
     recommended_followups channel end-to-end.

Eval gate
  - +3 rubric anchors per content fixture
    (progress-diagram-header, mermaid-flowchart-rendered,
    dispatch-table-before-fanout) and +3 invariant anchors
    (recommended-followups-channel, fold-defer-classifier,
    tracking-issue-for-defer).
  - All 12 new anchors MATCH with_skill fixtures and MISS
    without_skill fixtures (clean value delta).
  - +3 no-fire trigger items for single-PR fold-in phrasing so the
    dispatcher will not misfire the batch outer-loop on single-PR
    fold work (e.g. 'fold the panel recommendations into PR #1234'
    remains apm-review-panel completion territory).

Validation
  - Schema validates via jsonschema Draft7; accepts new shapes,
    rejects shepherd_return missing recommended_followups[].
  - SKILL.md: 367 lines / 4483 tokens (caps: 500 / 5000).
  - Description: 965 / 1024 chars; mentions FOLD invariant.
  - 0 non-ASCII bytes across all modified files.
  - All 4 changed JSON files parse.

Real-task evidence (this skill iteration was driven by a live run)
  - 5 of 6 in-flight community PRs had their panel recommendations
    folded in-PR by completion subagents following the new contract,
    yielding 22 folded items and 8 deferred-to-tracking items across
    PRs #1387, #1396, #1441, #1443, #1444. The 6th (#1442) is in
    flight as this lands.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* batch-bug-shepherd: add Phase 5 mergeability gate

Adds a post-wave gate that re-probes mergeability for every PR the
saga marked ready-to-merge, dispatches one conflict-resolution
subagent per CONFLICTING PR, and partitions returns into four
post-gate statuses before the final report claims anything is
mergeable.

Mergeability is post-wave truth, not pre-wave assumption: a PR
that Phase 4 marked ready can stop being mergeable the moment the
maintainer lands another PR onto main. Without this gate the
report ships stale ready claims.

Design driven through the genesis skill end-to-end (steps 1-6
handoff packet, steps 7a-7b coder pass, step 8 validation):

- NEW Phase 5 (mergeability gate) between completion (Phase 4) and
  renamed final report (Phase 5 -> Phase 6).
- Sub-phases 5a probe (read-only, single-thread, gh pr view --json
  mergeStateStatus), 5b fan-out (one conflict-resolution subagent
  per CONFLICTING PR), 5c trust-but-verify re-probe + four-way
  partition (resolved / requires-author-action /
  requires-human-judgment / resolution-failed).
- NEW assets/conflict-resolution-prompt.md spawn body for 5b.
  Encodes rebase, faithful merge of both intents, mutation-break
  re-check, lint silent, --force-with-lease push, re-probe,
  resolution-confirmation comment.
- NEW references/mergeability-gate.md load-on-demand orchestrator
  step-by-step (load trigger: WHEN ENTERING PHASE 5). Keeps
  SKILL.md under 5000-token budget.
- Schema extends verdict-schema.json oneOf with
  conflict_resolution_return; --force-with-lease enforced via
  regex pattern guard on push_command field; bare --force
  rejected. Five rejection cases validated.
- Two-comment-per-PR cap as new architecture invariant: at most
  one completion-confirmation (Phase 4) + one
  resolution-confirmation (Phase 5b) per PR.
- Progress diagram extended with WAVE4 subgraph (P5a/P5b/P5c),
  skipped-state semantics, P5b dispatch table requirement.
- Final report extended with three new partition sections plus a
  RESOLUTION CONFIRMATION COMMENT block and mergeability-gate
  disciplines line.
- Evals: +3 content rubric anchors (mergeability-probe-cli,
  force-with-lease-on-push, post-wave-partition-columns) + 1
  optional anchor (two-comment-cap); +1 fire + 2 no-fire trigger
  items; fixture diff shows the gate firing on a sweep with 2
  conflicting PRs and the without-skill failure mode (stale ready
  claim).

SKILL.md: 388 lines / 4867 tokens (budget 500/5000). ASCII only.
CI lint pair silent.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(bbs): add Phase 1.5 strategic-alignment gate + PRINCIPLES.md

Adds a new wave between Phase 1 (triage) and Phase 2 (PR-in-flight
cross-reference) that checks every LEGIT bug against the project's
rejection contract before spending shepherd / fix / completion work
on it.

What changes:

- NEW PRINCIPLES.md at repo root: 7 numbered principles encoding
  the project's hard nos (P1 no invented frontmatter; P2 multi-harness
  with traction gating; P3 vendor neutral; P4 UX floor is not a
  trade) plus 3 supporting principles (P5 portability; P6 reliability
  over magic; P7 community over feature count). Bound to apm-ceo +
  bbs Phase 1.5 + apm-triage-panel + apm-review-panel as the
  rejection contract.

- NEW bbs Phase 1.5 strategic-alignment gate (WAVE 1.5):
  - one apm-ceo subagent per LEGIT row, in parallel
  - 4-state verdict: aligned | aligned-with-reservations |
    out-of-scope | wrong-direction
  - schema-validated returns; FAILS OPEN on infrastructure failure
    (malformed-x2 or non-citable principle) so legit bugs are
    never silently demoted under gate breakage
  - ABORTS only when apm-ceo.agent.md or PRINCIPLES.md itself is
    missing (operator-actionable error)
  - demoted rows flip to status triaged-deferred and SKIP Phase
    2/3/4/5; surface in Phase 6 under 'Recommend close as
    out-of-scope' partition
  - aligned-with-reservations rows stay in saga; downstream phases
    surface the reservations in review prose
  - deferred-PR strategic-rejection comment subagent (S7+S4+A9)
    posts a courtesy comment on any open PR whose underlying issue
    was demoted, using the would-be Phase-4 completion-comment
    slot (two-comments-per-PR cap preserved)

- Verdict schema extended with 5th oneOf member
  strategic_alignment_return (kind, issue, verdict, cited_principle,
  rationale, reservations).

- Ground-truth table grows two columns (strategic_verdict +
  strategic_rationale) and one status value (triaged-deferred).

- Progress diagram inserts P15 between P1 and P2; dispatch-table
  contract extends to Phase 1.5.

- Final-report template adds 'Recommend close as out-of-scope'
  partition and 'Aligned with reservations' surfacing section.

- 2 new fire trigger evals + 1 no-fire (PRINCIPLES.md authoring
  guard) + 1 new rubric anchor on the three-issues-mixed scenario.

Genesis design artifact lives in the session plan store; SKILL.md
body remains within 500-line / 5000-token budget (406 lines /
4943 tokens after trimming pre-existing verbose passages on
operator-visibility, mergeability, fold-in, composition, and
operating-contract sections to make room).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: danielmeppiel <danielmeppiel@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] apm audit reports false orphaned drift for non-skill primitives on targets with auto_create=False

3 participants