Skip to content

History-trained agentic files + expert reviewer#35198

Merged
PureWeen merged 16 commits into
mainfrom
feature/expert-reviewer-extraction
May 7, 2026
Merged

History-trained agentic files + expert reviewer#35198
PureWeen merged 16 commits into
mainfrom
feature/expert-reviewer-extraction

Conversation

@kubaflo

@kubaflo kubaflo commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Description

Replaces review-rules.md (flat 345-line checklist) with a dimensional expert review agent. Single source of truth for all review rules, organized into 30 dimensions for per-dimension sub-agent evaluation. Adds inline file:line PR comments alongside the existing wall-of-text summary.

Extracted from 28k review comments across 5 maintainers via extraction-pipeline. No functional code changes.

Recreated from #35062 on a dotnet/maui branch (originally opened from a fork).

What changed

Before: review-rules.md had 345 lines of flat rules. code-review skill loaded them all into one context. Output was a single wall-of-text PR comment.

After: Rules absorbed into maui-expert-reviewer.md as 30 dimensions with 200+ CHECK items. Each dimension runs as an independent sub-agent with focused context. Output is inline file:line PR comments via inline-findings.json.

CI Flow

Review-PR.ps1 prompt:
  1. code-review → maui-expert-reviewer agent → inline-findings.json
  2. pr-review → Pre-Flight → Try-Fix → Report (sees findings, no duplication)

Posting:
  post-inline-review.ps1    → .json → GitHub file:line comments (NEW)
  post-ai-summary-comment.ps1 → {phase}/content.md → wall-of-text (existing)

CI: COMMENTS_VIA_FILE=true → agent writes .json, script posts
Local: agent writes .json, code-review posts directly via gh api

Files

Action File What
Add agents/maui-expert-reviewer.md 30 dimensions, 200+ CHECKs, routing table
Add instructions/collectionview-{android,ios,windows} Platform-isolated CV rules
Add instructions/{handler-patterns,layout-system,performance-hotpaths,public-api,threading-async} Domain-specific ambient guidance
Add scripts/post-inline-review.ps1 Posts .json as GitHub PR review
Del skills/code-review/references/review-rules.md Absorbed into agent
Mod skills/code-review/SKILL.md Delegates to agent
Mod scripts/Review-PR.ps1 Prompt + inline posting wiring
Mod eng/pipelines/ci-copilot.yml COMMENTS_VIA_FILE env var

Copilot AI review requested due to automatic review settings April 28, 2026 14:03
@kubaflo kubaflo marked this pull request as draft April 28, 2026 14:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures the Copilot PR code review guidance from a single flat checklist into a “dimensional” expert reviewer agent that emits file/line inline findings, and wires CI/scripts to post those findings as a GitHub PR review.

Changes:

  • Replace the legacy review-rules.md checklist with the new .github/agents/maui-expert-reviewer.md and domain/platform instruction files.
  • Update the code-review skill to delegate to the expert reviewer and support inline-findings.json output.
  • Add post-inline-review.ps1 and CI wiring to post inline review comments via GitHub’s Reviews API.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
eng/pipelines/ci-copilot.yml Enables COMMENTS_VIA_FILE in CI for file-based inline findings flow.
.github/skills/code-review/tests/eval.yaml Updates eval rubric text to reflect expert reviewer dimensions.
.github/skills/code-review/references/review-rules.md Removes the legacy flat checklist source file.
.github/skills/code-review/SKILL.md Switches code-review guidance to delegate to the expert reviewer + inline findings.
.github/scripts/post-inline-review.ps1 New script to post inline-findings.json as a single PR review with inline comments.
.github/scripts/Review-PR.ps1 Wires the review pipeline to request inline findings and post them after the summary.
.github/instructions/threading-async.instructions.md Adds threading/async guidance for platform and handler code.
.github/instructions/public-api.instructions.md Adds PublicAPI.Unshipped guidance for API surface changes.
.github/instructions/performance-hotpaths.instructions.md Adds hot-path performance rules for layout/handlers areas.
.github/instructions/layout-system.instructions.md Adds layout measure/arrange contract guidance.
.github/instructions/handler-patterns.instructions.md Adds handler mapper/lifecycle patterns guidance.
.github/instructions/collectionview-windows.instructions.md Adds Windows CollectionView (Items/) guidance.
.github/instructions/collectionview-ios.instructions.md Adds iOS/MacCatalyst CollectionView (Items2/) guidance.
.github/instructions/collectionview-android.instructions.md Adds Android CollectionView (Items/) guidance.
.github/agents/maui-expert-reviewer.md Introduces the 30-dimension expert reviewer agent and inline findings contract.

@MauiBot MauiBot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert Review — 4 findings

See inline comments for details.

@MauiBot MauiBot added s/agent-changes-requested AI agent recommends changes - found a better alternative or issues s/agent-fix-win AI found a better alternative fix than the PR s/agent-reviewed PR was reviewed by AI agent workflow (full 4-phase review) labels Apr 28, 2026
@dotnet dotnet deleted a comment from MauiBot Apr 28, 2026
@PureWeen

Copy link
Copy Markdown
Member

Great domain knowledge — let's restructure the wiring

The 30 dimensions and CHECK rules are a significant improvement over review-rules.md — more comprehensive, better structured, properly severity-graded. The instruction files, posting script, and COMMENTS_VIA_FILE decoupling are all solid building blocks. I want to keep all of that.

However, the wiring needs to change to match how we do PR reviews in MAUI.

Our core principle: the PR's fix is just another try-fix candidate — not special. Try-fix models must be independent of the PR, and the same workflow needs to work for issue-only flows where there's no PR at all. The goal is to make try-fix amazing, not to grade the PR.

(Background: PR #35105 established the firewall architecture that informs this restructuring.)

Current wiring (this PR):

Gate → Expert reviewer reviews PR → pr-review (try-fix sees reviewer output in context) → Post

Proposed restructuring:

Gate → Pre-Flight (context only) → Try-Fix ×4 (each loads domain knowledge from dimensions) → Report (expert reviewer evaluates ALL candidates: PR + try-fix results) → Post inline findings on winning fix

Key changes needed:

  1. Domain knowledge → try-fix: The 30 dimensions should feed INTO try-fix as fix-quality guidance (not review the PR upfront). Try-fix models loading these CHECK rules will produce better fixes.
  2. Expert review → Report phase: After try-fix completes, the expert reviewer evaluates all candidates symmetrically — the PR's fix is candidate Update README.md #5.
  3. Separate domain knowledge from workflow: The dimensions (lines 29-436) need to be loadable independently of the wave workflow (lines 528-599), so try-fix can consume the domain rules without the review orchestration.
  4. Candidate-scoring output mode: inline-findings.json is great for posting the final review. But Report phase also needs a candidate-comparison format to rank PR vs try-fix results.

The content you've built is excellent — this is about repositioning where and when it runs in the pipeline. See inline comments for specific file-level feedback.

@PureWeen PureWeen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments for specific feedback. Top-level architectural feedback posted as a separate comment above.

Comment thread .github/instructions/performance-hotpaths.instructions.md Outdated
Comment thread .github/instructions/public-api.instructions.md
@@ -559,7 +556,8 @@ $gateStatusForPrompt = switch ($gateResult) {
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Step 2 prompt runs the expert reviewer as an upfront PR review before try-fix. In our pipeline, the PR's fix is just another candidate — we don't want to grade it before try-fix runs because expert reviewer conclusions about the PR will be in context when try-fix starts (no firewall), and it also won't generalize to issue-only flows where there's no PR to review upfront.

What do you think about removing the "First code-review" step here and moving the expert reviewer invocation into the Report phase? Try-fix would load the domain knowledge (dimensions/CHECKs) directly, and the expert reviewer would evaluate all candidates after try-fix completes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by the multi-candidate restructure already on the branch (commits dac5150 "Multi-candidate review" and 1078580 "try-fix self-apply expert-reviewer"). The Step 2 prompt now runs PR-fix evaluation in parallel with try-fix×4, and Report compares all candidates symmetrically. See discussion comment for the agreed flow.

Not resolving this thread because the related context-contamination concern (raised on May 5) is still architecturally open — both branches still share one Invoke-CopilotStep session today.

```

### Step 2: Load Review Rules
### Step 2: Delegate to Expert Reviewer

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 2 currently delegates to the expert reviewer as the first substantive action. For the restructured pipeline, this skill would be invoked in the Report phase (to evaluate all candidates) rather than upfront.

The domain knowledge (30 dimensions, CHECK rules) should be separable from the review workflow (waves, routing, output format) so try-fix can load the knowledge without invoking the full review machinery.

Consider extracting lines 29-436 (Overarching Principles + 30 Dimensions + "What NOT to Flag") into a standalone reference file (e.g., references/maui-review-dimensions.md) that both try-fix and the expert reviewer agent can load.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial: try-fix now invokes the expert reviewer (per #35231) and per ade495a each invocation gets an attempt-scoped output path, so the dimensions are loaded indirectly via the reviewer rather than as a flat reference file. Extracting lines 29-436 into a standalone references/maui-review-dimensions.md and having try-fix Read it directly would be a cleaner factoring, but it duplicates the dimensions text and would need a sync mechanism to keep the two copies aligned. Leaving open for follow-up — happy to do the extraction if you'd prefer that shape over the current invoke-and-read pattern.


### Wave 0 — Build Briefing Pack

1. Read PR diff (`gh pr diff`) and list changed files — form your own assessment BEFORE reading PR description (independence-first)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wave 0 currently assumes evaluating a single PR diff via gh pr diff. If the expert reviewer is restructured to evaluate all candidates in the Report phase, how do you see this working when it needs to evaluate N candidate diffs (the PR's fix + 4 try-fix results)?

One option: parameterize the diff source so the wave workflow accepts a diff input rather than hardcoding gh pr diff. That would let the same workflow evaluate any candidate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in spirit by the multi-candidate restructure (dac5150) — Wave 0 is no longer the only entry point. The Report phase now compares the PR fix against all 4 try-fix candidates symmetrically, and ade495a made the agent's findings output path configurable so per-candidate evaluations can be redirected to attempt-scoped paths.

Wave 0 itself still hardcodes gh pr diff though. The cleaner long-term fix is what you suggested — parameterize the diff source so the same wave workflow can evaluate any candidate diff. Leaving open as a follow-up since it's an agent-prompt refactor, not a wiring change.

@@ -0,0 +1,32 @@
---

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instruction files you've added are well-scoped and the glob patterns are thoughtful. One coverage gap to flag: 14 of the 22 review-rules.md topics don't yet have a corresponding instruction file (Navigation/Shell, Memory Leaks, XAML/Bindings, Accessibility, Images, Gestures, Build, iOS Platform, Windows Platform, etc.) — they exist only in the expert reviewer's dimensions.

This matters because applyTo: may not reliably fire in task() sub-agent contexts (try-fix runs as a sub-agent). I lean toward having try-fix explicitly load the expert reviewer's dimensions as context — single source of truth, and it covers all 30 topics rather than 8.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed via a different route. Rather than backfilling 14 instruction files for the missing dimensions, #35231 wired try-fix to invoke @maui-expert-reviewer, and ade495a fixed that invocation to use an attempt-scoped output path. Net effect: try-fix now sees all 30 dimensions through the reviewer pass instead of relying on applyTo: glob coverage in sub-agent contexts. The instruction files we DO have (collectionview-android, hotpaths, threading-async, etc.) still serve as ambient guidance for human + Copilot edits outside the review flow.

@MauiBot MauiBot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert Review — 5 findings

See inline comments for details.

@kubaflo

kubaflo commented Apr 29, 2026

Copy link
Copy Markdown
Contributor Author

Proposed review flow evolution (capturing discussion)

Recording an in-progress discussion so it's reviewable. Not a code change request — feedback welcome before we commit to an implementation.

Today

Gate → Expert reviewer reviews PR → pr-review (Try-Fix sees reviewer output in context) → Post

Shane's proposal

Gate → Pre-Flight (context only)
     → Try-Fix ×4 (each loads domain knowledge from dimensions)
     → Report (expert reviewer evaluates ALL candidates: PR + 4 try-fix results)
     → Post inline findings on the winning fix

Proposed modifications on top of Shane's

Try-Fix always runs — there is no early exit. Every PR gets the full ×4 try-fix sweep so the expert reviewer always has the same set of candidates to compare against.

  1. Run the expert-reviewer evaluation of the PR fix before Try-Fix kicks off, in a sandbox with reviewer feedback applied. This produces an additional candidate (the "PR fix + reviewer feedback") that goes into the Report stage alongside the raw PR fix and the 4 try-fix outputs. It does not short-circuit Try-Fix — Try-Fix still runs in parallel.

  2. Don't post inline findings if the winning candidate isn't the PR's fix. Inline file:line comments only make sense against lines that exist in the PR's diff. (We already filter for this in post-inline-review.ps1, but it would be a no-op against a non-PR candidate.)

  3. If a non-PR candidate wins → request changes asking the author to apply the AI-suggested fix. When the author pushes the change, the workflow re-triggers naturally.

Combined shape

Gate
  └─ Pre-Flight (context only)
       ├─ Expert Reviewer eval of PR fix (sandbox-applied) ──┐
       └─ Try-Fix ×4 (each loads dimension knowledge) ──────┤
                                                             ▼
                                                          Report
                                  (expert reviewer evaluates ALL candidates:
                                   PR fix, PR fix + reviewer feedback, 4 try-fix candidates)
                                                             │
                              ┌──────────────────────────────┴───────────────────────────────┐
                              ▼                                                              ▼
                     winner = a PR-diff candidate                                  winner = non-PR candidate
                     (raw PR or PR + reviewer)                                              │
                              │                                                             ▼
                              ▼                                              request changes; surface
                  post inline findings on PR diff                            winning candidate diff;
                                                                             author push re-triggers

Open questions

  • How do we represent "PR fix" vs "PR fix + sandbox-applied reviewer feedback" as two distinct candidates in the Report scoring? Same scoring rubric, or weighted differently?
  • For the request-changes path: surface the winning candidate diff in the review body, as a suggested-changes block, or attach a patch file?
  • Pre-Flight runs the expert reviewer eval and kicks off Try-Fix — do they run truly in parallel, or sequentially with Try-Fix waiting on the reviewer eval result for context?

@dotnet dotnet deleted a comment from MauiBot Apr 29, 2026
@dotnet dotnet deleted a comment from MauiBot Apr 29, 2026
@dotnet dotnet deleted a comment from MauiBot Apr 29, 2026
@dotnet dotnet deleted a comment from MauiBot Apr 29, 2026
@dotnet dotnet deleted a comment from MauiBot Apr 29, 2026
@dotnet dotnet deleted a comment from MauiBot Apr 29, 2026
@dotnet dotnet deleted a comment from MauiBot Apr 29, 2026
@dotnet dotnet deleted a comment from MauiBot Apr 29, 2026
@dotnet dotnet deleted a comment from MauiBot Apr 29, 2026
@dotnet dotnet deleted a comment from MauiBot Apr 29, 2026

@MauiBot MauiBot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert Review — 4 findings

See inline comments for details.

@dotnet dotnet deleted a comment from github-actions Bot Apr 29, 2026
@dotnet dotnet deleted a comment from github-actions Bot Apr 29, 2026
@dotnet dotnet deleted a comment from MauiBot Apr 29, 2026
@dotnet dotnet deleted a comment from Copilot AI Apr 29, 2026

@PureWeen PureWeen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2 Review — 3-model adversarial consensus (Opus 4.6 / Sonnet 4.6 / GPT-5.3)

Good progress since the last review — the multi-candidate flow, winner.json, and gate diagnostics are substantial improvements. 8 findings below, focused on correctness and architecture.

Methodology: 3 independent reviewers with adversarial consensus. Findings marked with reviewer agreement count.

Comment thread .github/skills/try-fix/SKILL.md Outdated
Comment thread .github/scripts/Review-PR.ps1
Comment thread .github/scripts/Review-PR.ps1 Outdated
Comment thread .github/scripts/shared/Detect-TestsInDiff.ps1
Comment thread .github/scripts/Review-PR.ps1
Comment thread .github/scripts/Review-PR.ps1 Outdated
Comment thread .github/scripts/post-inline-review.ps1 Outdated
Comment thread .github/instructions/performance-hotpaths.instructions.md Outdated
Copilot AI added 2 commits May 6, 2026 11:26
…act fixes

Apply actionable findings from PR #35198 review (May 5):

Review-PR.ps1
- Generalize build-error regex to '\berror\s+[A-Z]{2,}\d+\b' so it catches
  CS/MSB/NU/MAUI/NETSDK/XA codes without false-positiving on "0 error(s)"
  status lines.
- Replace O(n²) truncation loop (trim 512 chars + recount per iter) with an
  O(log n) binary search on UTF-8 byte budget; reserve marker bytes upfront.
- Defend against markdown fence injection by sizing the outer code fence as
  max(backtick run in diff)+1 (min 4) instead of mutating the diff text.

post-inline-review.ps1
- Validate finding.path before posting: reject empty, '..', backslashes,
  rooted paths, drive letters, and control chars so a malformed/hostile
  finding cannot poison the review post (especially when the diff fetch
  fallback runs without cross-validation).

Detect-TestsInDiff.ps1
- Tighten Get-ClassNameFromFile regex to skip 'abstract' and 'static'
  modifiers as the comment intends — a 'public abstract class BaseTest'
  declared above the concrete test class was being captured and turned
  into a non-matching test filter.

performance-hotpaths.instructions.md
- Replace overly broad src/Controls/src/Core/Handlers/** glob (which fired
  on all 60+ handlers, most of which are not hot paths) with specific
  scopes: Layouts, Platform, Items/Items2 handlers, ScrollView. Document
  scope rationale.

public-api.instructions.md
- Add src/Core/src/**/*.cs, src/Controls/src/**/*.cs, src/Essentials/src/**/*.cs
  globs so guidance loads when designing 'public class Foo' in Button.cs,
  not only when editing PublicAPI.Unshipped.txt afterward. Add activation
  guard at top of file so it ignores internal-only changes.

maui-expert-reviewer.md + try-fix/SKILL.md
- Resolve agent-contract conflict. Make the reviewer's findings JSON path
  configurable via the invoker prompt (default unchanged for the PR-level
  flow). Update try-fix to invoke the reviewer with an attempt-scoped
  output path (try-fix-{N}/reviewer-findings.json) and read the JSON back
  — preserves the per-dimension self-review pass added in #35231 while
  preventing try-fix attempts from clobbering the PR-level inline-findings.json
  consumed by post-inline-review.ps1.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…pliance from ~18% to ~100%)

Empirical analysis of 11 recent CI runs (44 try-fix attempts on AzDO
pipeline 27723) showed the expert-reviewer self-check only fired on
8/44 attempts (18.2%). Per-attempt: attempt-1=0%, attempt-2=45%,
attempt-3=27%, attempt-4=0%. Six of eleven runs had zero invocations.
One attempt (build 14027179/attempt-4) hallucinated EXPERT_REVIEW_MAJOR_ISSUES
text without writing any JSON file.

Root cause: the reviewer instruction was a single 60+ word clause
buried in 'Core Principles' (line 29 of SKILL.md), instructing the
model to spawn @maui-expert-reviewer as a sub-agent. It was NOT a
numbered Workflow step. The Required Files table didn't list
reviewer-findings.json. The verify-files-exist check didn't include it.
Nothing enforced the buried clause.

Fix:
1. Replace sub-agent spawn with inline self-check. Model reads sections
   of .github/agents/maui-expert-reviewer.md (Overarching Principles +
   Dimension Routing + relevant CHECK lists) and walks the diff against
   them — no sub-agent spawn, no path argument, no JSON parse.
2. Promote to numbered Step 7: Expert Self-Review (MANDATORY).
   Renumbered old steps 7→8 (Capture), 8→9 (Restore), 9→10 (Report).
3. Step 7 runs for ALL outcomes (Pass/Fail/Blocked), not just Pass.
4. Add reviewer-findings.json to Required Files table. The Step 8
   verify gate detects missing artifacts but DEFERS the throw — Step 9
   restore ALWAYS runs first to keep the worktree clean for the next
   sequential attempt.
5. Add findings_count to Outputs table and Report template.
6. Add 'Self-review performed' to Completion Criteria.
7. Update pr-review SKILL.md attempt-{N} artifact tree to include
   reviewer-findings.json (and other previously-omitted files).
8. Cap self-review iteration at one correction round to keep total
   Step 6+7 iteration bounded.

Schema for reviewer-findings.json matches @maui-expert-reviewer agent
exactly: JSON array of {path, line, body} where line is on the changed
side of the diff (line 1 only as fallback for file-level concerns).
Same format as inline-findings.json so any future tooling can consume
either.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo kubaflo force-pushed the feature/expert-reviewer-extraction branch from 1ab1329 to 55d7b26 Compare May 6, 2026 18:53
Self-review now runs BEFORE build+test (Step 6) instead of after (was
Step 7). This catches design flaws before spending 5-15 min on a test
cycle, and runs when context is lightest — before test output floods
the context window.

Before: attempt-1 compliance 0%, attempt-4 compliance 0%, overall 18%
After inline fix: attempt-1 100%, attempt-4 60%, overall 85%
This positional change should push attempt-3/4 higher by running the
review when less prior-attempt context has accumulated.

New flow: Design → Apply → Self-Review → Test → Capture → Restore → Report

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI added 2 commits May 7, 2026 15:54
Switches the Copilot CLI invocation to a model with larger context
window, which should improve instruction-following compliance on
later try-fix attempts where context pressure caused the self-review
step to be dropped.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The 'result' event in Copilot CLI JSON output is a top-level event
without a 'data' wrapper. Reading $event.data.usage always returned
null, so the file/line counts were silently shown as 0.

Read $event.usage directly, and wrap filesModified in @() to ensure
.Count works whether PowerShell deserializes a single-element array
as a scalar or an array.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo

kubaflo commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

Round 3 Review — 3-model adversarial consensus (Opus 4.7 xhigh / Sonnet 4.6 / GPT-5.5)

Reviewing the 4 commits since the Round 2 baseline (ade495a6f0..bc9349e52b), focused on the new artifact gate, the Step 6↔7 swap, the inline self-check restructure, and the file-change-count fix.

Methodology: 3 independent reviewers, top-tier models, no shared context. Findings annotated with reviewer agreement count.


❌ Error — Example invocation still shows the OLD pre-swap order — try-fix/references/example-invocation.md:26 (3/3 reviewers)

**Skill execution:** Reads context → Analyzes target files → Designs fix → Applies fix → Runs test (PASS) → Performs inline expert self-review … → Captures artifacts → Reports result → Reverts changes

Commit 73ebff80a8 swapped the order so Step 6 (Self-Review) runs before Step 7 (Test). The very next prose under Step 6 in SKILL.md justifies this: "runs BEFORE testing so you can catch design flaws before spending time on build+test cycles." But the canonical example here still says Runs test (PASS) → Performs inline expert self-review. Examples carry disproportionate weight in few-shot pattern matching — an agent shortcutting to the example will execute the pre-swap order and silently regress the compliance gain the swap was meant to lock in. Opus also notes Reports result → Reverts changes is reversed (Step 9 Restore precedes Step 10 Report).

Fix:

… → Applies fix → Performs inline expert self-review against `.github/agents/maui-expert-reviewer.md` rules and writes `reviewer-findings.json` (`[]` if clean) → Runs test (PASS) → Captures artifacts → Reverts changes → Reports result

⚠️ Warning — Gate error message cites the wrong step number — try-fix/SKILL.md:429 (2/3 reviewers)

$gateFailureMessage = "Required artifacts missing: $($missing -join ', '). If 'reviewer-findings.json' is missing, Step 7 (Expert Self-Review) was not performed — it is mandatory and must contain at least '[]'."

After the Step 6↔7 swap, SKILL.md defines Step 6 = Expert Self-Review (line 267) and Step 7 = Test and Iterate (line 341). This message — written verbatim to host output via Write-Host — tells the agent that Step 7 is "Expert Self-Review", contradicting the same file. An agent reading the failure and re-reading the SKILL gets inconsistent guidance about which numbered step to re-perform.

Fix: Replace Step 7 (Expert Self-Review) with Step 6 (Expert Self-Review) on line 429.


⚠️ Warning — Section header mislabels which step the artifact gate enforces — try-fix/SKILL.md:411 (1/3 reviewers, related to #2)

**Verify all required files exist (this is the enforcement gate for Step 7):**

The whole rationale for adding reviewer-findings.json to the gate (per 55d7b26f7c's commit message: "the verify-files-exist check didn't include it. Nothing enforced the buried clause") is that the gate enforces Step 6 compliance. Calling it "the enforcement gate for Step 7" inverts that intent.

Fix: Reword to e.g. (this is the enforcement gate for Steps 6 and 7 — primarily reviewer-findings.json from Step 6).


❌ Error — Self-review can become stale after test-loop fixes — try-fix/SKILL.md:362-368 (1/3 reviewers, GPT-5.5)

**Testing Loop (Iterate until SUCCESS or exhausted):**

1. **Run the test command** - It will build, deploy, and test automatically
2. **Check the result:**
   -**Tests PASS** → Move to Step 8 (Capture)
   -**Compile errors** → Fix compilation issues (see below), go to step 1
   -**Tests FAIL (runtime)** → Analyze failure, fix code, go to step 1

Step 6 writes reviewer-findings.json against the diff before Step 7's test loop. But the test loop explicitly permits later code changes for compile/runtime failures. Those post-review changes can become the final fix.diff without any fresh expert self-review — the gate at Step 8 only checks that reviewer-findings.json exists, not that it corresponds to the final diff.

This is the deeper architectural concern of the swap: moving self-review earlier did catch design flaws faster, but it also opened a window where the recorded findings can lie about what got shipped.

Suggested fix: Either (a) require re-running Step 6 every time Step 7 modifies code (the loop body), or (b) move Self-Review to run again after the test loop converges and validate that the saved findings correspond to the final git diff.


💡 Suggestion — Hardcoded internal-only model has no fallback — Review-PR.ps1:302 (2/3 reviewers)

& copilot -p $Prompt --allow-all --output-format json --model claude-opus-4.7-1m-internal 2>&1 | ForEach-Object {

claude-opus-4.7-1m-internal is marked "Internal only" in the model catalog. Other agentic workflows in this repo (copilot-evaluate-tests.md, skill-validation.yml, ci-doctor.lock.yml) use publicly-available models. If this script is ever run by a contributor whose Copilot CLI installation can't resolve the internal model, every Invoke-CopilotStep call fails immediately. Lower severity because Review-PR.ps1 is currently manually-invoked only.

Suggested fix:

$copilotModel = if ($env:COPILOT_REVIEW_MODEL) { $env:COPILOT_REVIEW_MODEL } else { 'claude-opus-4.7-1m-internal' }
& copilot -p $Prompt --allow-all --output-format json --model $copilotModel 2>&1 | ForEach-Object {

✅ Verified Correct (no findings)

  • Review-PR.ps1:395-403 — the file-change-count fix ($event.data.usage$event.usage, @($changes.filesModified).Count). All three reviewers confirmed against the actual Copilot CLI JSON shape: result is a top-level event without a data wrapper, and the @() wrapping correctly handles PowerShell's scalar-flattening of single-element JSON arrays.
  • The Step 6↔7 swap itself (the rationale and ordering) is sound.
  • The model bump to claude-opus-4.7-1m-internal for the orchestrator is reasonable (modulo the portability concern above).

Consensus Verdict: NEEDS_CHANGES

Confidence: high
Summary: The PowerShell delta is correct and the architectural intent of the swap is right. But this delta introduced 3 prompt-correctness regressions in try-fix/SKILL.md and example-invocation.md — stale "Step 7" references and an out-of-order canonical example — that directly undermine the compliance gain that 55d7b26f7c and 73ebff80a8 were designed to lock in. The staleness concern (GPT-5.5) is a deeper architectural question worth resolving, not just a typo.

Severity Count
❌ Error 2
⚠️ Warning 2
💡 Suggestion 1

…taleness, model fallback

Round 3 multi-model review (Opus 4.7 xhigh / Sonnet 4.6 / GPT-5.5)
flagged 5 issues introduced by the Step 6↔7 swap and the inline
self-check restructure. This commit addresses all of them.

1. example-invocation.md (3/3 reviewers)
   - Reordered to Self-Review → Test → Capture → Restore → Report
   - Added the new Step 7.5 refresh step to the narrative

2. try-fix/SKILL.md gate error message (2/3 reviewers)
   - Changed 'Step 7 (Expert Self-Review)' to 'Step 6 (Expert Self-Review)'
   - Mentions Step 7.5 refresh requirement in the diagnostic

3. try-fix/SKILL.md gate section header (1/3 reviewers, related)
   - 'enforcement gate for Step 7' -> 'for Steps 6 and 7'

4. Self-review staleness (1/3, GPT-5.5 — deeper architectural issue)
   - Added Step 7.5: 'Refresh Self-Review If Code Changed'
   - Step 6 now snapshots the reviewed diff to reviewer-findings.diff
   - Step 7.5 compares current diff to the snapshot; if changed, re-runs
     the self-review against the final diff and overwrites both files
   - Step 8 artifact gate now requires reviewer-findings.diff
   - Workflow grew from 10 steps to 11 (Step 7.5)

5. Hardcoded internal-only model in Review-PR.ps1 (2/3)
   - Now reads $env:COPILOT_REVIEW_MODEL with claude-opus-4.7-1m-internal
     as default, so contributors without internal-model access can run
     the script with e.g. claude-opus-4.6 or claude-sonnet-4.6.

Also updated pr-review/SKILL.md output-tree diagram to mention the
new reviewer-findings.diff artifact and clarify that
reviewer-findings.json reflects the FINAL diff.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo

kubaflo commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

Round 4 — Adversarial Multi-Model Review

Re-ran top-tier models (claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5) against commit caaf080376. All three converge on the same critical bug in the new Step 7.5; Opus xhigh found two additional issues. Verdict from every model: NEEDS_CHANGES.

✅ Round 3 fixes confirmed correct (3/3 reviewers)

  • Example-invocation execution chain reordered (Self-Review → Test → Refresh → Capture → Restore → Report)
  • Gate error message now references "Step 6"
  • Section header reads "enforcement gate for Steps 6 and 7"
  • reviewer-findings.diff written by Step 6 and required by Step 8 gate
  • Review-PR.ps1 reads $env:COPILOT_REVIEW_MODEL with internal model as fallback

🔴 New findings

1. [error] Step 7.5 drift comparison is broken — refresh fires unconditionally — 3/3 agreement

File: .github/skills/try-fix/SKILL.md:389–394

$currentDiff = git diff                                              # → string[]
$reviewedDiff = Get-Content "$OUTPUT_DIR/reviewer-findings.diff" -Raw # → string
if ($currentDiff -ne $reviewedDiff) { ... }                          # ← always truthy

PowerShell's -ne operator with an array on the left does element-wise filtering — it returns the array of elements not equal to the right operand. Since no individual diff line equals the joined-string snapshot, the result is the entire array (truthy). Reproduced empirically (165 elements returned). Even worse, when git diff is empty $null -ne <string> is also $true, so the branch fires on no-change scenarios too.

Net effect: the "Diff unchanged → no refresh needed" branch is dead code in normal operation. Step 7.5 burns extra fix-batch loops every attempt, defeating the purpose of the comparison.

Fix:

$currentDiff  = (git diff | Out-String)
$reviewedDiff = if (Test-Path "$OUTPUT_DIR/reviewer-findings.diff") {
    Get-Content "$OUTPUT_DIR/reviewer-findings.diff" -Raw
} else { '' }
if ($currentDiff -ne $reviewedDiff) { ... }

2. [error] reviewer-findings.diff gate fails on documented no-diff Blocked path — 1/3 (Opus xhigh)

File: .github/skills/try-fix/SKILL.md:331 (Step 6 snapshot) and :454 (gate)

git diff | Set-Content "$OUTPUT_DIR/reviewer-findings.diff"   # ← empty pipe creates NO file

When git diff produces no output, Set-Content from an empty pipeline does not create the file (verified on pwsh 7.5.4). The Round 3 gate now requires reviewer-findings.diff, so any attempt with no diff to review — explicitly documented at SKILL.md:288 ("If you have NO code changes (e.g., Blocked because no device available before any fix was applied), still proceed to step 4 and write '[]'") — now fails the gate and is force-marked Blocked at line 468. Regression from Round 3, where '[]' in reviewer-findings.json was sufficient.

Fix:

Set-Content -Path "$OUTPUT_DIR/reviewer-findings.diff" -Value (git diff | Out-String) -NoNewline

(Set-Content -Value always creates the file, even with empty content.)

3. [warning] Step 7.5 procedure is hidden in PowerShell comments inside a code block — 2/3 (Opus xhigh, GPT-5.5)

File: .github/skills/try-fix/SKILL.md:394–409

if ($currentDiff -ne $reviewedDiff) {
    # Re-run the procedure from Step 6 against the final diff:
    #   - walk the same Overarching Principles + routed dimensions
    #   - OVERWRITE $OUTPUT_DIR/reviewer-findings.json with the new findings
    git diff | Set-Content "$OUTPUT_DIR/reviewer-findings.diff"
    $findings = @(Get-Content ... | ConvertFrom-Json)
    $findingsCount = $findings.Count
}

The actual self-review work is described only in PowerShell comments inside the snippet. An LLM that executes the script literally will re-snapshot the diff and re-validate the existing JSON without ever rewriting it — leaving stale findings while the gate happily reports success. Compare to Step 6, which spells out the procedure as numbered markdown bullets outside the code block. Combined with finding #1 (refresh always fires), this could routinely produce re-snapshotted diffs paired with stale findings — the exact bug Step 7.5 was designed to prevent.

Fix: Lift the "walk dimensions / overwrite JSON" instructions out of the PowerShell comments into a numbered markdown list above the snippet, mirroring Step 6's structure (Identify → Walk → Write → Validate). Keep only mechanical operations in the code block.

Plan

Applying all three fixes in the next commit, then re-running this same 3-model review.

Round 4 reviewers: claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5 — all returned NEEDS_CHANGES with high confidence.

…ntent, Step 7.5 procedure clarity

Three reviewers (claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5) converged on
empirically-verified bugs in Round 3's new Step 7.5 self-review refresh:

1. [error] Drift comparison always evaluated truthy.
   git diff -> string[] (line per element); Get-Content -Raw -> single string.
   PowerShell -ne with array-on-left does element-wise filtering, not equality,
   so the if-branch always fires. Even worse: empty diff ($null -ne <string>)
   also fires. Fix: normalize both sides via (git diff | Out-String) so the
   comparison is single-string vs single-string.

2. [error] reviewer-findings.diff gate failed on documented no-diff Blocked path.
   git diff | Set-Content does not create the file when the pipe is empty.
   The Round 3 gate now requires reviewer-findings.diff, so any attempt with
   no code changes (the explicitly-documented Blocked-with-no-fix path at
   SKILL.md:288) silently failed. Same hazard applied to Step 8's fix.diff
   write -- fixed both for consistency. Fix: Set-Content -Path X -Value (git diff
   | Out-String) -NoNewline always creates the file, even with empty content.

3. [warning] Step 7.5 procedure was buried in PowerShell comments inside a
   single code block. An LLM following the script literally would re-snapshot
   the diff and re-validate the existing JSON without ever rewriting findings,
   defeating the purpose of the refresh. Fix: hoisted the walk-rules /
   rewrite-JSON instructions out of comments into a numbered markdown
   procedure that mirrors Step 6's structure (Detect drift -> Re-do self-review
   -> Re-snapshot and validate). Added a SHA256 hash check at the start of
   sub-step 3 that throws if reviewer-findings.json was not actually rewritten
   in sub-step 2 (defends against literal-script-execution agents).

All scenarios validated empirically (pwsh 7.5.4):
- Diff unchanged: skips refresh
- Diff changed + agent forgot rewrite: hash check throws
- Diff changed + agent rewrote: hash differs, proceeds to validate
- Empty diff: file created with size 0 (gate satisfied)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo

kubaflo commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

Round 5 — Adversarial Multi-Model Review

Re-ran the same 3 top-tier models (claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5) against commit d4acf348a0. All three converged on a cascading bug introduced by Round 4's fixes interacting with each other. Verdict from every model: NEEDS_CHANGES.

✅ Round 4 fixes confirmed correct (3/3 reviewers)

  • Drift comparison normalized via (git diff | Out-String) on both sides — array-vs-scalar bug eliminated for non-empty diffs ✅
  • Set-Content -Value (git diff | Out-String) -NoNewline creates the diff file even when the pipe is empty ✅
  • Step 7.5 procedure hoisted out of PowerShell comments into a numbered markdown list mirroring Step 6 ✅

🔴 New findings — all reviewers, same root cause

Cascading regression: 0-byte file → Get-Content -Raw$null → false drift → hash-sentinel deadlock

The Round 4 fix #2 intentionally creates a 0-byte reviewer-findings.diff for the documented Blocked-with-no-diff path. But Step 7.5 reads it back with Get-Content -Raw, which returns $null on a 0-byte file (not ""). Empirically reproduced:

After Step 6 (clean tree):  reviewer-findings.diff size: 0
Step 7.5 sub-step 1:
  $currentDiff  = (git diff | Out-String)         # ""
  $reviewedDiff = Get-Content X -Raw              # $null
  $diffChanged  = ("" -ne $null)                  # True ← false positive!
Step 7.5 sub-step 2 (correctly executed):
  agent re-walks rules, writes '[]'               # byte-identical to Step 6
Step 7.5 sub-step 3:
  $preHash  = SHA256('[]') = 37517E5F...
  $postHash = SHA256('[]') = 37517E5F...          # ← same!
  → throw "reviewer-findings.json was not rewritten"
  → unhandled exception → Step 9 worktree restore SKIPPED → next attempt corrupted

Two distinct defects compose:

# Reviewers Defect
1 Sonnet 4.6, Opus xhigh (direct), GPT-5.5 (downstream) Get-Content -Raw on 0-byte file returns $null, not "" — false drift detection on Blocked-with-no-diff
2 GPT-5.5 (direct), Opus xhigh (direct), Sonnet 4.6 (cascade) SHA256 hash sentinel throws on legitimate byte-identical refresh ([][], single-finding → same single-finding). Common, not "extremely rare". The "touch trailing whitespace" escape hatch corrupts the JSON artifact.

This is the exact "always evaluates truthy" failure mode Round 4 was supposed to eliminate — just shifted from array -ne scalar to string -ne null.

Fixes to be applied

  1. Coalesce $null to "" when reading the diff snapshot back. Empirically verified: [string]$null does NOT coerce to "" in pwsh 7.5.4 (it stays null), but (...) ?? '' does work:
$reviewedDiff = if (Test-Path "$OUTPUT_DIR/reviewer-findings.diff") {
    (Get-Content "$OUTPUT_DIR/reviewer-findings.diff" -Raw) ?? ''
} else { '' }
  1. Drop the SHA256 hash sentinel entirely. Step 6 has no equivalent "did you actually walk the rules" programmatic check — it relies on procedural enforcement (the numbered markdown sub-steps and the example-invocation chain). Round 4 fix Third #3 already moved Step 7.5 to the same enforcement model. The hash sentinel rejects the common byte-identical case (e.g., [][]) and the documented escape hatch ("touch a string body") corrupts the artifact. Replace with a callout explaining the trade-off:
> Why no programmatic "did you actually rewrite the JSON" check? A SHA256 hash
> sentinel rejects the legitimate byte-identical case (e.g., [] → [] after a
> small compile fix that introduces no new violations), and that case is common.
> The procedural enforcement is sub-step 2's explicit numbered list above, plus
> the example-invocation chain that walks the dimensions explicitly.

Empirical validation of fixes (full Step 6 → Step 7.5 round-trip)

Scenario Result
Empty diff (Blocked, no code changes) diffChanged: False
Non-empty diff, unchanged after Step 7 diffChanged: False
Diff actually changed during Step 7 diffChanged: True
Clean → clean refresh ([][]) Validates, no throw ✅
Refresh writes invalid JSON Throws as expected ✅

Applying fixes now and re-running the same 3-model review.

Round 5 reviewers: claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5 — all returned NEEDS_CHANGES with high confidence, converging on the same cascading defect.

All three Round 5 reviewers (claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5)
converged on a cascading bug introduced by Round 4's fixes interacting:

1. Round 4 fix #2 intentionally creates a 0-byte reviewer-findings.diff for the
   documented Blocked-with-no-diff path. But Get-Content -Raw on a 0-byte file
   returns $null, not "". So Step 7.5's drift detection becomes
   '"" -ne $null' → True → false-positive drift on every Blocked attempt.
   This re-introduces the 'always evaluates truthy' failure mode Round 4 was
   supposed to eliminate (just shifted from array-vs-scalar to string-vs-null).

2. The new SHA256 hash sentinel throws on legitimate byte-identical refreshes
   (e.g., '[]' → '[]' after a small compile fix that introduces no new
   violations, or single-finding → same-single-finding). The case is common,
   not 'extremely rare' as the error message claimed. Compounds with #1: the
   false-positive drift forces a re-walk that correctly writes '[]' again,
   then the hash check throws → unhandled exception → Step 9 worktree restore
   skipped → next attempt corrupted.

Fixes:
- Coalesce $null to "" via '?? '''' on the Get-Content -Raw call.
  Empirically verified: [string]$null does NOT coerce to '' in pwsh 7.5.4
  (stays null), but '... ?? '''' does work.
- Drop the SHA256 hash sentinel entirely. Step 6 has no equivalent
  programmatic 'did you walk the rules' check; it relies on procedural
  enforcement (the numbered markdown sub-steps and the example-invocation
  chain). Round 4 fix #3 already moved Step 7.5 to the same enforcement
  model. Replaced the throw with a callout explaining the trade-off.

All 5 scenarios validated empirically (pwsh 7.5.4):
- Empty diff (Blocked path): diffChanged=False (no false positive)
- Non-empty diff unchanged: diffChanged=False
- Diff changed during Step 7: diffChanged=True
- Clean → clean refresh: validates, no throw
- Invalid JSON: throws as expected

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo

kubaflo commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

Round 6 — Adversarial Multi-Model Review: ✅ LGTM (3/3)

Re-ran the same 3 top-tier models (claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5) against commit 936b8e4750. All three returned LGTM with high confidence. The iterative review→fix loop has converged.

Reviewer verdicts

Reviewer Verdict Confidence Notes
claude-opus-4.7-xhigh LGTM high All scenarios (Blocked path, byte-identical refresh, normal-drift refresh) pass empirically. No orphan references to removed hash sentinel.
claude-sonnet-4.6 LGTM high Both Round 5 findings resolved. ?? operator confirmed safe (pwsh 7.5.4 environment only). Set/Get round-trip symmetric.
gpt-5.5 LGTM high No regressions. ?? syntax already used in Review-PR.ps1, so no new pwsh-version requirement.

Round 5 fixes — all confirmed correct

  1. Get-Content -Raw on 0-byte file (reviewer-findings.diff at SKILL.md:403) — ?? '' coalesce eliminates the false-positive "" -ne $null drift that broke the Blocked-with-no-diff path. Verified end-to-end on pwsh 7.5.4.
  2. SHA256 hash sentinel — Removed cleanly. Byte-identical refreshes ([][], single-finding → same single-finding) no longer throw. Step 9 worktree restore is reliably reachable. No orphan preRewriteJsonHash/postRewriteJsonHash/Get-FileHash references in .github/.

What changed across 4 review rounds (Round 2 → Round 6)

Round Findings Reviewers in agreement Fix commit
3 5 (file-count display, stale "Step 7" refs, header label, example order, hardcoded model) varied (3/3 on example order; 2/3 on others) caaf080376
4 3 (PowerShell array-vs-string, empty-pipe Set-Content, procedure clarity) 3/3 on the array-vs-string bug d4acf348a0
5 1 cascading defect (0-byte → null → false drift → hash-sentinel deadlock) 3/3 same root cause, different facets 936b8e4750
6 0 3/3 LGTM

Final state

  • Branch: feature/expert-reviewer-extraction
  • HEAD: 936b8e4750 (Round 5 fix)
  • Step 7.5 procedure: Detect drift (with proper null coalesce + string normalization) → re-walk rules in markdown sub-steps → re-snapshot diff and validate JSON. No programmatic "did you walk the rules" sentinel; relies on procedural enforcement, same as Step 6.
  • Empty-diff Blocked path: Works correctly (no false drift, no hash deadlock).
  • Common clean→clean refresh: Works correctly (no spurious throws).

PR is ready for human review and merge from this multi-model adversarial review's perspective.

Round 6 reviewers: claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5 — all returned LGTM with high confidence. Loop ran 5 fix iterations across 4 review rounds (Rounds 3, 4, 5, 6).

@PureWeen

PureWeen commented May 7, 2026

Copy link
Copy Markdown
Member

Round 5 Review — LGTM ✅

3-model adversarial review of the latest 7 commits (inline self-check adoption, Step 7.5 refresh, artifact gate, model/display fixes).

The big change landed well. Replacing the @maui-expert-reviewer sub-agent invocation with inline dimension loading (commit 55d7b26f) directly addresses the 18% compliance problem we identified across 44 CI attempts. The new Step 6 self-review is well-structured — clear procedure, proper JSON format, validation + count tracking, and the Step 7.5 drift refresh handles the case where test iterations modify code after the initial self-review.

Consensus findings (minor — not blocking):

Finding Severity Notes
Gate deferred-throw is soft ⚠️ $gateFailureMessage is never checked programmatically after Step 9. result.txt still gets set to Blocked which is sufficient, but analysis.md won't automatically explain which artifact was missing.
Here-string closing delimiter ⚠️ The multi-finding JSON example's closing '@ has leading spaces in the raw markdown. PowerShell requires column 0. Agents copying the template literally could get a parse error. Low probability since most agents generate JSON programmatically.
Comment accuracy 💡 Step 7.5 comment says "" -ne $null is a false-positive — it isn't (PowerShell coerces $null to "" for string comparison). The ?? '' coalesce is correct but the stated reason is slightly wrong.

Disputed and discarded:

  • $missing += $_ inside ForEach-Object — one reviewer flagged as critical scoping bug, but ForEach-Object runs in caller's scope (unlike Invoke-Command). Verified correct.
  • Step 7.5 complexity — one reviewer flagged as unnecessary, but it handles a real case (code changes during test iterations).

Architecture alignment: Step 6 placement (after implementation, before testing) is a reasonable compromise vs our original Step 3 suggestion. It catches design flaws before expensive build+test cycles while keeping the workflow step count manageable.

Ready to merge. 🚀

@PureWeen PureWeen merged commit b71adea into main May 7, 2026
36 of 42 checks passed
@PureWeen PureWeen deleted the feature/expert-reviewer-extraction branch May 7, 2026 18:34
@github-actions github-actions Bot added this to the .NET 10 SR7 milestone May 7, 2026
T-Gro added a commit that referenced this pull request May 12, 2026
Resolve 3 conflict zones from PR #35198 (expert reviewer):
- Zone 1: combine --model copilotModel with --secret-env-vars on copilot invocation
- Zone 2: use renamed post-gate-comment.ps1 with ScriptsDir path + phase guard
- Zone 3: keep 4-task split (main had no split)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
devanathan-vaithiyanathan pushed a commit to devanathan-vaithiyanathan/maui that referenced this pull request Jun 1, 2026
> [!NOTE]
> Are you waiting for the changes in this PR to be merged?
> It would be very helpful if you could [test the resulting
artifacts](https://github.com/dotnet/maui/wiki/Testing-PR-Builds) from
this PR and let us know in a comment if this change resolves your issue.
Thank you!

## Description

Replaces `review-rules.md` (flat 345-line checklist) with a dimensional
expert review agent. Single source of truth for all review rules,
organized into 30 dimensions for per-dimension sub-agent evaluation.
Adds inline file:line PR comments alongside the existing wall-of-text
summary.

Extracted from 28k review comments across 5 maintainers via
[extraction-pipeline](https://github.com/dotnet/fsharp/blob/main/.github/agents/extraction-pipeline.md).
No functional code changes.

Recreated from dotnet#35062 on a dotnet/maui branch (originally opened from a
fork).

## What changed

**Before:** `review-rules.md` had 345 lines of flat rules. `code-review`
skill loaded them all into one context. Output was a single wall-of-text
PR comment.

**After:** Rules absorbed into `maui-expert-reviewer.md` as 30
dimensions with 200+ CHECK items. Each dimension runs as an independent
sub-agent with focused context. Output is inline file:line PR comments
via `inline-findings.json`.

## CI Flow

```
Review-PR.ps1 prompt:
  1. code-review → maui-expert-reviewer agent → inline-findings.json
  2. pr-review → Pre-Flight → Try-Fix → Report (sees findings, no duplication)

Posting:
  post-inline-review.ps1    → .json → GitHub file:line comments (NEW)
  post-ai-summary-comment.ps1 → {phase}/content.md → wall-of-text (existing)

CI: COMMENTS_VIA_FILE=true → agent writes .json, script posts
Local: agent writes .json, code-review posts directly via gh api
```

## Files

| Action | File | What |
|--------|------|------|
| **Add** | `agents/maui-expert-reviewer.md` | 30 dimensions, 200+
CHECKs, routing table |
| **Add** | `instructions/collectionview-{android,ios,windows}` |
Platform-isolated CV rules |
| **Add** |
`instructions/{handler-patterns,layout-system,performance-hotpaths,public-api,threading-async}`
| Domain-specific ambient guidance |
| **Add** | `scripts/post-inline-review.ps1` | Posts .json as GitHub PR
review |
| **Del** | `skills/code-review/references/review-rules.md` | Absorbed
into agent |
| **Mod** | `skills/code-review/SKILL.md` | Delegates to agent |
| **Mod** | `scripts/Review-PR.ps1` | Prompt + inline posting wiring |
| **Mod** | `eng/pipelines/ci-copilot.yml` | `COMMENTS_VIA_FILE` env var
|

---------

Co-authored-by: kubaflo <kubaflo@users.noreply.github.com>
Co-authored-by: T-Gro <46543583+T-Gro@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Tomas Grosup <tomasgrosup@microsoft.com>
devanathan-vaithiyanathan pushed a commit to devanathan-vaithiyanathan/maui that referenced this pull request Jun 5, 2026
> [!NOTE]
> Are you waiting for the changes in this PR to be merged?
> It would be very helpful if you could [test the resulting
artifacts](https://github.com/dotnet/maui/wiki/Testing-PR-Builds) from
this PR and let us know in a comment if this change resolves your issue.
Thank you!

## Description

Replaces `review-rules.md` (flat 345-line checklist) with a dimensional
expert review agent. Single source of truth for all review rules,
organized into 30 dimensions for per-dimension sub-agent evaluation.
Adds inline file:line PR comments alongside the existing wall-of-text
summary.

Extracted from 28k review comments across 5 maintainers via
[extraction-pipeline](https://github.com/dotnet/fsharp/blob/main/.github/agents/extraction-pipeline.md).
No functional code changes.

Recreated from dotnet#35062 on a dotnet/maui branch (originally opened from a
fork).

## What changed

**Before:** `review-rules.md` had 345 lines of flat rules. `code-review`
skill loaded them all into one context. Output was a single wall-of-text
PR comment.

**After:** Rules absorbed into `maui-expert-reviewer.md` as 30
dimensions with 200+ CHECK items. Each dimension runs as an independent
sub-agent with focused context. Output is inline file:line PR comments
via `inline-findings.json`.

## CI Flow

```
Review-PR.ps1 prompt:
  1. code-review → maui-expert-reviewer agent → inline-findings.json
  2. pr-review → Pre-Flight → Try-Fix → Report (sees findings, no duplication)

Posting:
  post-inline-review.ps1    → .json → GitHub file:line comments (NEW)
  post-ai-summary-comment.ps1 → {phase}/content.md → wall-of-text (existing)

CI: COMMENTS_VIA_FILE=true → agent writes .json, script posts
Local: agent writes .json, code-review posts directly via gh api
```

## Files

| Action | File | What |
|--------|------|------|
| **Add** | `agents/maui-expert-reviewer.md` | 30 dimensions, 200+
CHECKs, routing table |
| **Add** | `instructions/collectionview-{android,ios,windows}` |
Platform-isolated CV rules |
| **Add** |
`instructions/{handler-patterns,layout-system,performance-hotpaths,public-api,threading-async}`
| Domain-specific ambient guidance |
| **Add** | `scripts/post-inline-review.ps1` | Posts .json as GitHub PR
review |
| **Del** | `skills/code-review/references/review-rules.md` | Absorbed
into agent |
| **Mod** | `skills/code-review/SKILL.md` | Delegates to agent |
| **Mod** | `scripts/Review-PR.ps1` | Prompt + inline posting wiring |
| **Mod** | `eng/pipelines/ci-copilot.yml` | `COMMENTS_VIA_FILE` env var
|

---------

Co-authored-by: kubaflo <kubaflo@users.noreply.github.com>
Co-authored-by: T-Gro <46543583+T-Gro@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Tomas Grosup <tomasgrosup@microsoft.com>
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 7, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

s/agent-changes-requested AI agent recommends changes - found a better alternative or issues s/agent-fix-win AI found a better alternative fix than the PR s/agent-reviewed PR was reviewed by AI agent workflow (full 4-phase review)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants