Skip to content

fix(install): allow credential helpers in --update preflight for generic hosts#1084

Merged
danielmeppiel merged 10 commits into
microsoft:mainfrom
tillig:feature/install-update-auth-fix
May 1, 2026
Merged

fix(install): allow credential helpers in --update preflight for generic hosts#1084
danielmeppiel merged 10 commits into
microsoft:mainfrom
tillig:feature/install-update-auth-fix

Conversation

@tillig
Copy link
Copy Markdown
Contributor

@tillig tillig commented Apr 30, 2026

Description

The preflight auth probe in apm install --update used the locked-down git env (GIT_CONFIG_GLOBAL=/dev/null, GIT_CONFIG_NOSYSTEM=1, GIT_ASKPASS=echo) for all hosts, which blocked credential helpers configured in ~/.gitconfig. For generic hosts (GHES, GitLab, Bitbucket, etc.) where no token is embedded in the URL, this caused authentication to always fail even though apm install (without --update) worked fine.

The fix builds the probe env from _dl.git_env + dep_ctx.git_env (preserving TLS/CA/proxy settings from the resolver), then strips only the three credential-helper-blocking vars (GIT_CONFIG_GLOBAL, GIT_CONFIG_NOSYSTEM, GIT_ASKPASS) for generic hosts. ADO hosts continue using the full locked-down env with tokens embedded in the URL.

Fixes #1082

Type of change

  • Bug fix
  • New feature
  • Documentation
  • Maintenance / refactor

Testing

  • Tested locally
  • All existing tests pass
  • Added tests for new functionality (if applicable)

Test details

  • Added 4 new unit tests in tests/unit/install/test_pipeline_auth_preflight.py:
    • Generic hosts omit GIT_CONFIG_GLOBAL=/dev/null
    • Generic hosts don't set GIT_ASKPASS=echo
    • Generic host auth failures still raise AuthenticationError
    • ADO hosts retain credential-blocking env vars
  • Validated end-to-end against a GHES-hosted private repo that previously failed
  • Full unit suite passes (5853 tests); ruff check and ruff format clean

…ric hosts (microsoft#1082)

The preflight auth probe in `apm install --update` used the locked-down
git env (GIT_CONFIG_GLOBAL=/dev/null) for all hosts, which blocked
credential helpers configured in ~/.gitconfig. For generic hosts (GHES,
GitLab, Bitbucket, etc.) where no token is embedded in the URL, this
caused authentication to always fail. Use _build_noninteractive_git_env()
for generic hosts, matching the behavior of the clone fallback path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 30, 2026 19:41
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes apm install --update preflight auth probing so generic (non-GitHub / non-ADO) git hosts can use credential helpers configured in user git config, aligning the probe with the clone fallback behavior.

Changes:

  • Use a relaxed, noninteractive git environment for generic hosts during the git ls-remote preflight probe.
  • Keep the locked-down environment for Azure DevOps hosts (token-in-URL path).
  • Add unit tests covering generic-host env expectations and ADO behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/apm_cli/install/pipeline.py Switches preflight probe env selection based on host type (generic vs ADO).
tests/unit/install/test_pipeline_auth_preflight.py Adds tests asserting env vars are not set in a way that blocks credential helpers and that ADO doesn’t use the relaxed env.
CHANGELOG.md Documents the fix for apm install --update on generic hosts using credential helpers.

Comment thread src/apm_cli/install/pipeline.py Outdated
Comment thread src/apm_cli/install/pipeline.py Outdated
- Merge dep_ctx.git_env for generic hosts to preserve TLS/CA/proxy
  settings from the resolver, then strip only the credential-helper-
  blocking vars (GIT_CONFIG_GLOBAL, GIT_CONFIG_NOSYSTEM, GIT_ASKPASS).
- Remove call to private _build_noninteractive_git_env() method,
  reducing cross-module coupling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tillig tillig requested a review from Copilot April 30, 2026 20:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread src/apm_cli/install/pipeline.py
Comment thread tests/unit/install/test_pipeline_auth_preflight.py Outdated
Rename test_generic_host_env_allows_askpass to
test_generic_host_env_does_not_force_askpass_echo to better reflect
what is actually being asserted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread tests/unit/install/test_pipeline_auth_preflight.py Outdated
Comment thread tests/unit/install/test_pipeline_auth_preflight.py Outdated
tillig and others added 2 commits April 30, 2026 13:52
Consolidate two weaker tests into one that asserts GIT_CONFIG_GLOBAL,
GIT_CONFIG_NOSYSTEM, and GIT_ASKPASS are all absent from the probe env
for generic hosts, rather than checking individual values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Comment thread tests/unit/install/test_pipeline_auth_preflight.py Fixed
@danielmeppiel
Copy link
Copy Markdown
Collaborator

APM Review Panel — Verdict: APPROVE

Run locally (no gh-aw) at the board's request. One comment, no labels touched. Calibrated ship-bar: only correctness/architecture/security-blocking issues qualify as required[]; everything else is a nit.

Deterministic verdict: sum(required) across 6 active panelists = 0APPROVE.


CEO arbitration (synthesizer)

This PR eliminates a hard blocker for every enterprise user running GHES, GitLab, or Bitbucket with a native credential helper — the exact population APM must win to prove multi-host credibility. By surgically scoping the locked-down env to github.com and ADO (where APM controls the token path), it restores the contract that generic hosts delegate auth to the user's configured credential stack. This is the kind of fix that turns "does not work in my org" into "just works" — table-stakes for enterprise adoption velocity.

Principle alignment

  • Multi-harness / multi-host: directly restores GHES, GitLab, and Bitbucket as supported dependency sources by unblocking their credential flows.
  • Pragmatic-as-npm: npm never blocks system credential helpers for registry auth — APM now matches that expectation for git hosts outside its managed token path.

Growth amplification. @tillig is an external contributor filing a real bug from a real enterprise environment and shipping the fix — the contributor-funnel beachhead the project needs. The oss-growth-hacker correctly identifies enterprise-private-git as a concrete adoption surface. Frame the next release notes around "credential-helper support for enterprise git hosts" to convert this bug-fix into a positioning signal.

Dissent. None. All six active panelists returned APPROVE with zero required findings; nits are unanimous polish suggestions.


Per-persona summary

Persona Verdict Required Nits One-line
Python Architect ✅ APPROVE 0 2 Surgical change; suggests hoisting env-var tuple to a module-level constant.
CLI Logging Expert ✅ APPROVE 0 2 No new CLI strings, no encoding regression; auth-signal substring list could miss credential-helper-specific failures (follow-up).
DevX UX Expert ✅ APPROVE 0 0 "Pure behavioral fix that restores symmetry. CHANGELOG meets 'failure mode is the product' bar. Ship."
Supply Chain Security Expert ✅ APPROVE 0 2 insteadOf-redirect not exploitable (probe is read-only ls-remote); host classification uses exact/suffix matching (not spoofable); no token leak introduced.
OSS Growth Hacker ✅ APPROVE 0 3 First external bug-fix landing the GHES/GitLab/Bitbucket + credential-helper path; mine for release-notes beat.
Auth Expert (active) ✅ APPROVE 0 3 GHES correctly classified as generic; symmetry with _clone_with_fallback._env_for confirmed; resolve_for_dep invariant intact; bearer-header injection preserved; per-iteration env dict prevents cross-host leakage.

Architecture diagrams

Component / data flow — what the fix touches in the install pipeline:

flowchart TD
    A[apm install --update] --> B[InstallPipeline._preflight_auth_check]
    B --> C{is_generic = not GitHub and not ADO}
    C -->|github.com| D["probe_env: locked down<br/>(GIT_CONFIG_GLOBAL=/dev/null,<br/>GIT_CONFIG_NOSYSTEM=1,<br/>GIT_ASKPASS=echo)"]
    C -->|*.dev.azure.com<br/>*.visualstudio.com| D
    C -->|GHES, GitLab, Bitbucket,<br/>self-hosted| E["probe_env: pop the 3<br/>blocking vars<br/>(allow ~/.gitconfig +<br/>credential helpers)"]
    D --> F[git ls-remote probe]
    E --> F
    F --> G{probe ok?}
    G -->|yes| H[continue install]
    G -->|no, auth signal| I["AuthenticationError<br/>(host named, redacted URL)"]
    F -.preserved.-> J["bearer header injection<br/>via GIT_CONFIG_COUNT/KEY/VALUE<br/>(NOT popped)"]
Loading

Sequence — the unblocked path for a GHES dep with git-credential-manager:

sequenceDiagram
    participant U as User
    participant CLI as apm install --update
    participant PF as _preflight_auth_check
    participant Env as probe_env
    participant Git as git ls-remote
    participant Helper as git-credential-manager

    U->>CLI: apm install --update (GHES dep)
    CLI->>PF: probe(host="ghes.corp.example.com")
    PF->>PF: is_generic? yes (not github.com, not ADO)
    PF->>Env: pop GIT_CONFIG_GLOBAL,<br/>GIT_CONFIG_NOSYSTEM,<br/>GIT_ASKPASS
    PF->>Git: ls-remote with relaxed env
    Git->>Helper: read credential (via ~/.gitconfig)
    Helper-->>Git: token
    Git-->>PF: refs OK
    PF-->>CLI: probe ok
    CLI-->>U: install proceeds
Loading

Recommended follow-ups (post-merge, none blocking)

  1. Auth Expert nit, highest signal: add a regression test asserting is_github_hostname("ghes.corp.example.com") is False so the GHES classification contract this fix depends on cannot silently regress.
  2. Growth Hacker nit: frame the next release-notes line around "credential-helper support for enterprise git hosts (GHES / GitLab / Bitbucket)" and credit @tillig as the first external bug-fix on this surface.
  3. Python Architect nit: hoist the (GIT_CONFIG_GLOBAL, GIT_CONFIG_NOSYSTEM, GIT_ASKPASS) tuple to a module-level _CREDENTIAL_BLOCKING_ENV_VARS constant when a third call site appears.

Ship recommendation: merge as-is. Surgical, well-tested (4 new unit tests covering all three env vars + ADO retention + auth-failure still raising), CHANGELOG entry accurate, no architecture/security/auth blockers across the full 6-persona panel.

…ring-sanitization

Replace substring 'in' check with .startswith() against the exact
'Authentication failed for {host}' prefix raised by _preflight_auth_check.
Functionally equivalent for this assertion (the host is the first token
of the error message), but precise enough that CodeQL no longer flags it
as an incomplete URL substring sanitization.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel merged commit 7817060 into microsoft:main May 1, 2026
10 checks passed
danielmeppiel added a commit that referenced this pull request May 1, 2026
… self-healing (#1093)

* refactor(panel): VERDICT regime -> ADVISORY regime; add doc-writer; eval harness; legacy-label sweep

The apm-review-panel skill posted binary REJECT verdicts on PRs whose
real signal was a handful of recommended improvements + polish nits.
Root cause: the panelist schema forced every finding into required[]
(blocks merge) XOR nits[] (skip), with no middle slot. Careful reviewers
defaulted to required defensively -> deterministic REJECT -> noise.

This PR refactors the panel from a gate regime to an advisory regime:

- New panelist schema: unified findings[] with severity enum
  (blocking | recommended | nit). Severity is signal strength,
  never read by the orchestrator to gate anything.
- New CEO schema: returns ship_recommendation { stance, prose }
  where stance is one of ship_now | ship_with_followups |
  needs_discussion | needs_rework. Prose for the human reviewer,
  never auto-applied as a label or status check.
- New comment template (assets/recommendation-template.md): top-loads
  CEO synthesis + per-persona summary table + top-N curated follow-ups
  + (optional) architecture diagrams + recommendation prose. Full
  per-persona findings collapsed at the bottom. Mirrors the manual
  comment shape on PR #1084.
- New conditional persona: doc-writer. Activates on README, CHANGELOG,
  docs/, .apm/skills|agents/, .github/skills|agents|instructions/,
  and .github/workflows/*.md changes. Also detects DRIFT (code changed
  but docs didn't), not just doc edits.
- Workflow safe-outputs: drop add-labels (no verdict to encode);
  remove-labels now sweeps panel-review (re-run idempotency) AND
  legacy panel-approved / panel-rejected (defensive cleanup of stale
  verdict labels left over from the pre-advisory regime).
- Delete pr-panel-label-reset.yml: no verdict labels to reset.
- Eval harness: render_eval.py + two fixtures (ship-now shape from
  PR #1084, needs-rework shape with blocking-severity findings).
  Specification test that confirms the rendered comment is short,
  scannable, and doesn't bury the lede.

Closes #1091

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(panel): add test-coverage-expert conditional persona

Pairs with the DevX UX lens to assess whether behavior-change PRs are
protected by tests that catch regressions in user-facing promises.
Conditional like auth-expert and doc-writer: orchestrator always spawns
for schema-shape uniformity; persona self-reports active=false when
off-condition (pure docs / pure refactor / pure asset bumps).

Calibrated severity: blocking ONLY when a critical user-promise surface
(auth, lockfile, install, marketplace, hooks, exit codes, error wording)
changes AND no test exercises the new behavior AND the persona can name
the specific test file path. Mandatory probe: must view/grep the test
tree before emitting any 'no test exists' claim -- false positives here
destroy maintainer trust.

Boundaries documented vs python-architect (test code design),
auth-expert (auth-specific assertions), and devx-ux-expert (which user
promises exist).

- New agent: .apm/agents/test-coverage-expert.agent.md (mirrored to
  .github/agents/)
- SKILL.md: roster (5 mandatory + 3 conditional), topology, conditional
  intro count, new section, routing matrix, fan-out list, gotchas
- panelist-return-schema.json: persona enum + active description
- Eval fixtures 01 + 02: TC row added; needs-rework demonstrates the
  highest-value case (path-traversal regression where the regression-
  trap test is the test that prevents re-shipping)
- render_eval.py: PERSONA_LABELS + canonical_order extended

Verified: render_eval.py emits [OK] on both fixtures; .apm/ and
.github/ mirrors verified identical via diff -rq.

Refs #1091

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(pr-desc, panel): add Scenario Evidence rubric -- proof that it works

Single shared asset (scenario-evidence-rubric.md) consumed by two
primitives: pr-description-skill (as a required Validation subsection
on every behavior-change PR) AND test-coverage-expert.agent.md (as the
evaluation lens applied during panel review).

The rubric answers ONE question from the maintainer's chair: for each
user promise this PR touches, which test exercises the scenario, and
which APM principle does that scenario serve? Not pure coverage --
SCENARIO coverage tied to the principles named in the README:
Portability by manifest, Secure by default, Governed by policy,
Multi-harness support, Vendor-neutral, DevX (pragmatic as npm), OSS.

Composition (R3 EXTRACT + DUAL CONSUMER):
- Asset lives once at .github/skills/pr-description-skill/assets/
  scenario-evidence-rubric.md (mirrored to .apm/).
- pr-description-skill: pr-body-template.md adds Scenario Evidence
  subsection; section-rubric.md adds 9b acceptance + refusal criteria
  and a final-pass checkbox; SKILL.md activation contract adds
  scenario-test-mapping input; required-body-structure row 9 updated.
- test-coverage-expert: review procedure now READS the table FIRST,
  audits each row against the diff, then probes for unmapped behavior.
  New gap classes added: mocked-boundary-on-security-scenario
  (blocking; the rubric explicitly refuses tautological tests) and
  principle-mismapping (recommended).

Skip clause for the table: docs-only / asset-bump-only / pure-refactor
PRs (author states which case applies in trade-offs).

Bug-fix PRs MUST include the regression-trap test row tagged with the
issue it would have caught -- closes the loop with the
test-coverage-expert's missing-regression-trap finding class.

Verified: render_eval emits [OK] on both fixtures (no panel-side schema
change). .apm/ and .github/ mirrors verified identical via diff -rq.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(panel): test evidence is load-bearing -- proof beats opinion

Tests, when coded right, are irrefutable on a given commit. The CEO
synthesizer should weight a passed/failed test outcome above any
opinion-only finding. This commit wires that discipline into the
panel contract.

Schema: panelist-return-schema.json gains optional finding.evidence
{ outcome, test_file, test_name, assertion_excerpt, proves, principles }.
test-coverage-expert REQUIRES it on every finding (its contract is
stricter than other panelists -- the evidence IS its value-add).

apm-ceo: new 'Treat test evidence as load-bearing' subsection.
- outcome=passed: proves the user promise holds; cite test+assertion
- outcome=failed: strongest signal short of a CVE; surface in headline
- outcome=missing on secure-by-default / governed-by-policy / portability
  surface ranks at or near blocking-tier opinion findings (the absence
  of an automated guardrail IS a real defect on those surfaces)
- outcome=manual: arbitrated as missing
- outcome=unknown: discarded from arbitration weight
- two-tier rule for follow-ups: failed-test rows always rank above
  opinion-only rows of the same severity; missing-test on critical
  promise rank above any opinion 'recommended' finding

Render: render_eval.py + recommendation-template.md surface a 'Proof'
line per finding showing test ref, outcome, what it proves, and the
APM principles it covers, with the verbatim assertion excerpt below.

Fixtures 01 + 02 demonstrate both sides:
- 01: outcome=passed on the credential-helper env-var contract
- 02: outcome=missing on path-traversal regression-trap (secure-by-
  default, blocking) + integration boundary (portability, recommended)

Both fixtures still render [OK].

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(panel): @-ping audience + tighten test-coverage activation

Two coupled fixes folded into PR #1093 via the genesis discipline.

1) Restore the maintainer-notification signal that the verdict-label
   sweep removed. The panel now resolves an @-ping audience from
   'gh pr view --json author,reviewRequests' (PR author +
   CODEOWNERS-resolved requested reviewers + requested teams), filters
   bots, caps at 6 handles, and renders a 'cc @author @Reviewer ...'
   line right under the headline. This is the only mechanism by which
   a fresh advisory pass surfaces in inboxes; without it, a panel
   re-run on a previously-reviewed PR is silent.

2) Make test-coverage-expert mandatory on every PR that touches
   'src/**/*.py'. The persona was conditional for symmetry with
   auth/doc-writer, but that symmetry broke when test evidence became
   load-bearing in CEO arbitration (commit e502fa2). A persona whose
   findings outweigh opinion cannot be silently skipped on a
   heuristic. Skip ONLY on documentation-only PRs (zero src/*.py
   files in the diff).

Eval fixtures updated: fixture 02 now carries a non-empty
notify_audience to prove the cc line renders; fixture 01 leaves it
empty to prove the line is skipped (not rendered as placeholder).
Both evals green.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(panel): re-sync stale .apm/ fixture rendering

The .apm/ rendered fixture for 01 was stale from before commit
e502fa2 (proof-evidence rendering). 'apm install' regenerates
.github/ from .apm/, so the integration drift check failed because
.github/ had the Proof line but .apm/ source did not. Verified
clean via local 'apm install'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(panel): land python-architect class diagram in top comment

Adds extras.diagrams.class_diagram schema slot, renders it first under
### Architecture, asks the persona for it explicitly, and regenerates
both rendered.md fixtures so the panel comment shows load-bearing
visual evidence without requiring readers to expand the per-persona
findings.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <copilot-rework@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tillig tillig deleted the feature/install-update-auth-fix branch May 4, 2026 22:35
sergio-sisternes-epam pushed a commit that referenced this pull request May 19, 2026
…ric hosts (#1084)

* fix(install): allow credential helpers in --update preflight for generic hosts (#1082)

The preflight auth probe in `apm install --update` used the locked-down
git env (GIT_CONFIG_GLOBAL=/dev/null) for all hosts, which blocked
credential helpers configured in ~/.gitconfig. For generic hosts (GHES,
GitLab, Bitbucket, etc.) where no token is embedded in the URL, this
caused authentication to always fail. Use _build_noninteractive_git_env()
for generic hosts, matching the behavior of the clone fallback path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs(changelog): add unreleased entry for #1082

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(install): address review feedback on preflight auth probe

- Merge dep_ctx.git_env for generic hosts to preserve TLS/CA/proxy
  settings from the resolver, then strip only the credential-helper-
  blocking vars (GIT_CONFIG_GLOBAL, GIT_CONFIG_NOSYSTEM, GIT_ASKPASS).
- Remove call to private _build_noninteractive_git_env() method,
  reducing cross-module coupling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: rename test for clarity per review feedback

Rename test_generic_host_env_allows_askpass to
test_generic_host_env_does_not_force_askpass_echo to better reflect
what is actually being asserted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: tighten assertions to check vars are absent, not just wrong

Consolidate two weaker tests into one that asserts GIT_CONFIG_GLOBAL,
GIT_CONFIG_NOSYSTEM, and GIT_ASKPASS are all absent from the probe env
for generic hosts, rather than checking individual values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: tighten URL assertion to satisfy CodeQL py/incomplete-url-substring-sanitization

Replace substring 'in' check with .startswith() against the exact
'Authentication failed for {host}' prefix raised by _preflight_auth_check.
Functionally equivalent for this assertion (the host is the first token
of the error message), but precise enough that CodeQL no longer flags it
as an incomplete URL substring sanitization.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* style: apply ruff format to test_pipeline_auth_preflight.py

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Daniel Meppiel <51440732+danielmeppiel@users.noreply.github.com>
Co-authored-by: Copilot <copilot-rework@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
sergio-sisternes-epam pushed a commit that referenced this pull request May 19, 2026
… self-healing (#1093)

* refactor(panel): VERDICT regime -> ADVISORY regime; add doc-writer; eval harness; legacy-label sweep

The apm-review-panel skill posted binary REJECT verdicts on PRs whose
real signal was a handful of recommended improvements + polish nits.
Root cause: the panelist schema forced every finding into required[]
(blocks merge) XOR nits[] (skip), with no middle slot. Careful reviewers
defaulted to required defensively -> deterministic REJECT -> noise.

This PR refactors the panel from a gate regime to an advisory regime:

- New panelist schema: unified findings[] with severity enum
  (blocking | recommended | nit). Severity is signal strength,
  never read by the orchestrator to gate anything.
- New CEO schema: returns ship_recommendation { stance, prose }
  where stance is one of ship_now | ship_with_followups |
  needs_discussion | needs_rework. Prose for the human reviewer,
  never auto-applied as a label or status check.
- New comment template (assets/recommendation-template.md): top-loads
  CEO synthesis + per-persona summary table + top-N curated follow-ups
  + (optional) architecture diagrams + recommendation prose. Full
  per-persona findings collapsed at the bottom. Mirrors the manual
  comment shape on PR #1084.
- New conditional persona: doc-writer. Activates on README, CHANGELOG,
  docs/, .apm/skills|agents/, .github/skills|agents|instructions/,
  and .github/workflows/*.md changes. Also detects DRIFT (code changed
  but docs didn't), not just doc edits.
- Workflow safe-outputs: drop add-labels (no verdict to encode);
  remove-labels now sweeps panel-review (re-run idempotency) AND
  legacy panel-approved / panel-rejected (defensive cleanup of stale
  verdict labels left over from the pre-advisory regime).
- Delete pr-panel-label-reset.yml: no verdict labels to reset.
- Eval harness: render_eval.py + two fixtures (ship-now shape from
  PR #1084, needs-rework shape with blocking-severity findings).
  Specification test that confirms the rendered comment is short,
  scannable, and doesn't bury the lede.

Closes #1091

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(panel): add test-coverage-expert conditional persona

Pairs with the DevX UX lens to assess whether behavior-change PRs are
protected by tests that catch regressions in user-facing promises.
Conditional like auth-expert and doc-writer: orchestrator always spawns
for schema-shape uniformity; persona self-reports active=false when
off-condition (pure docs / pure refactor / pure asset bumps).

Calibrated severity: blocking ONLY when a critical user-promise surface
(auth, lockfile, install, marketplace, hooks, exit codes, error wording)
changes AND no test exercises the new behavior AND the persona can name
the specific test file path. Mandatory probe: must view/grep the test
tree before emitting any 'no test exists' claim -- false positives here
destroy maintainer trust.

Boundaries documented vs python-architect (test code design),
auth-expert (auth-specific assertions), and devx-ux-expert (which user
promises exist).

- New agent: .apm/agents/test-coverage-expert.agent.md (mirrored to
  .github/agents/)
- SKILL.md: roster (5 mandatory + 3 conditional), topology, conditional
  intro count, new section, routing matrix, fan-out list, gotchas
- panelist-return-schema.json: persona enum + active description
- Eval fixtures 01 + 02: TC row added; needs-rework demonstrates the
  highest-value case (path-traversal regression where the regression-
  trap test is the test that prevents re-shipping)
- render_eval.py: PERSONA_LABELS + canonical_order extended

Verified: render_eval.py emits [OK] on both fixtures; .apm/ and
.github/ mirrors verified identical via diff -rq.

Refs #1091

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(pr-desc, panel): add Scenario Evidence rubric -- proof that it works

Single shared asset (scenario-evidence-rubric.md) consumed by two
primitives: pr-description-skill (as a required Validation subsection
on every behavior-change PR) AND test-coverage-expert.agent.md (as the
evaluation lens applied during panel review).

The rubric answers ONE question from the maintainer's chair: for each
user promise this PR touches, which test exercises the scenario, and
which APM principle does that scenario serve? Not pure coverage --
SCENARIO coverage tied to the principles named in the README:
Portability by manifest, Secure by default, Governed by policy,
Multi-harness support, Vendor-neutral, DevX (pragmatic as npm), OSS.

Composition (R3 EXTRACT + DUAL CONSUMER):
- Asset lives once at .github/skills/pr-description-skill/assets/
  scenario-evidence-rubric.md (mirrored to .apm/).
- pr-description-skill: pr-body-template.md adds Scenario Evidence
  subsection; section-rubric.md adds 9b acceptance + refusal criteria
  and a final-pass checkbox; SKILL.md activation contract adds
  scenario-test-mapping input; required-body-structure row 9 updated.
- test-coverage-expert: review procedure now READS the table FIRST,
  audits each row against the diff, then probes for unmapped behavior.
  New gap classes added: mocked-boundary-on-security-scenario
  (blocking; the rubric explicitly refuses tautological tests) and
  principle-mismapping (recommended).

Skip clause for the table: docs-only / asset-bump-only / pure-refactor
PRs (author states which case applies in trade-offs).

Bug-fix PRs MUST include the regression-trap test row tagged with the
issue it would have caught -- closes the loop with the
test-coverage-expert's missing-regression-trap finding class.

Verified: render_eval emits [OK] on both fixtures (no panel-side schema
change). .apm/ and .github/ mirrors verified identical via diff -rq.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(panel): test evidence is load-bearing -- proof beats opinion

Tests, when coded right, are irrefutable on a given commit. The CEO
synthesizer should weight a passed/failed test outcome above any
opinion-only finding. This commit wires that discipline into the
panel contract.

Schema: panelist-return-schema.json gains optional finding.evidence
{ outcome, test_file, test_name, assertion_excerpt, proves, principles }.
test-coverage-expert REQUIRES it on every finding (its contract is
stricter than other panelists -- the evidence IS its value-add).

apm-ceo: new 'Treat test evidence as load-bearing' subsection.
- outcome=passed: proves the user promise holds; cite test+assertion
- outcome=failed: strongest signal short of a CVE; surface in headline
- outcome=missing on secure-by-default / governed-by-policy / portability
  surface ranks at or near blocking-tier opinion findings (the absence
  of an automated guardrail IS a real defect on those surfaces)
- outcome=manual: arbitrated as missing
- outcome=unknown: discarded from arbitration weight
- two-tier rule for follow-ups: failed-test rows always rank above
  opinion-only rows of the same severity; missing-test on critical
  promise rank above any opinion 'recommended' finding

Render: render_eval.py + recommendation-template.md surface a 'Proof'
line per finding showing test ref, outcome, what it proves, and the
APM principles it covers, with the verbatim assertion excerpt below.

Fixtures 01 + 02 demonstrate both sides:
- 01: outcome=passed on the credential-helper env-var contract
- 02: outcome=missing on path-traversal regression-trap (secure-by-
  default, blocking) + integration boundary (portability, recommended)

Both fixtures still render [OK].

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(panel): @-ping audience + tighten test-coverage activation

Two coupled fixes folded into PR #1093 via the genesis discipline.

1) Restore the maintainer-notification signal that the verdict-label
   sweep removed. The panel now resolves an @-ping audience from
   'gh pr view --json author,reviewRequests' (PR author +
   CODEOWNERS-resolved requested reviewers + requested teams), filters
   bots, caps at 6 handles, and renders a 'cc @author @Reviewer ...'
   line right under the headline. This is the only mechanism by which
   a fresh advisory pass surfaces in inboxes; without it, a panel
   re-run on a previously-reviewed PR is silent.

2) Make test-coverage-expert mandatory on every PR that touches
   'src/**/*.py'. The persona was conditional for symmetry with
   auth/doc-writer, but that symmetry broke when test evidence became
   load-bearing in CEO arbitration (commit e502fa2). A persona whose
   findings outweigh opinion cannot be silently skipped on a
   heuristic. Skip ONLY on documentation-only PRs (zero src/*.py
   files in the diff).

Eval fixtures updated: fixture 02 now carries a non-empty
notify_audience to prove the cc line renders; fixture 01 leaves it
empty to prove the line is skipped (not rendered as placeholder).
Both evals green.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(panel): re-sync stale .apm/ fixture rendering

The .apm/ rendered fixture for 01 was stale from before commit
e502fa2 (proof-evidence rendering). 'apm install' regenerates
.github/ from .apm/, so the integration drift check failed because
.github/ had the Proof line but .apm/ source did not. Verified
clean via local 'apm install'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(panel): land python-architect class diagram in top comment

Adds extras.diagrams.class_diagram schema slot, renders it first under
### Architecture, asks the persona for it explicitly, and regenerates
both rendered.md fixtures so the panel comment shows load-bearing
visual evidence without requiring readers to expand the per-persona
findings.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <copilot-rework@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] apm install --update fails for GHES/generic hosts due to pipeline probe blocking credential helpers

4 participants