History-trained agentic files + expert reviewer#35198
Conversation
There was a problem hiding this comment.
Pull request overview
This PR restructures the Copilot PR code review guidance from a single flat checklist into a “dimensional” expert reviewer agent that emits file/line inline findings, and wires CI/scripts to post those findings as a GitHub PR review.
Changes:
- Replace the legacy
review-rules.mdchecklist with the new.github/agents/maui-expert-reviewer.mdand domain/platform instruction files. - Update the
code-reviewskill to delegate to the expert reviewer and supportinline-findings.jsonoutput. - Add
post-inline-review.ps1and CI wiring to post inline review comments via GitHub’s Reviews API.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
eng/pipelines/ci-copilot.yml |
Enables COMMENTS_VIA_FILE in CI for file-based inline findings flow. |
.github/skills/code-review/tests/eval.yaml |
Updates eval rubric text to reflect expert reviewer dimensions. |
.github/skills/code-review/references/review-rules.md |
Removes the legacy flat checklist source file. |
.github/skills/code-review/SKILL.md |
Switches code-review guidance to delegate to the expert reviewer + inline findings. |
.github/scripts/post-inline-review.ps1 |
New script to post inline-findings.json as a single PR review with inline comments. |
.github/scripts/Review-PR.ps1 |
Wires the review pipeline to request inline findings and post them after the summary. |
.github/instructions/threading-async.instructions.md |
Adds threading/async guidance for platform and handler code. |
.github/instructions/public-api.instructions.md |
Adds PublicAPI.Unshipped guidance for API surface changes. |
.github/instructions/performance-hotpaths.instructions.md |
Adds hot-path performance rules for layout/handlers areas. |
.github/instructions/layout-system.instructions.md |
Adds layout measure/arrange contract guidance. |
.github/instructions/handler-patterns.instructions.md |
Adds handler mapper/lifecycle patterns guidance. |
.github/instructions/collectionview-windows.instructions.md |
Adds Windows CollectionView (Items/) guidance. |
.github/instructions/collectionview-ios.instructions.md |
Adds iOS/MacCatalyst CollectionView (Items2/) guidance. |
.github/instructions/collectionview-android.instructions.md |
Adds Android CollectionView (Items/) guidance. |
.github/agents/maui-expert-reviewer.md |
Introduces the 30-dimension expert reviewer agent and inline findings contract. |
MauiBot
left a comment
There was a problem hiding this comment.
Expert Review — 4 findings
See inline comments for details.
Great domain knowledge — let's restructure the wiringThe 30 dimensions and CHECK rules are a significant improvement over However, the wiring needs to change to match how we do PR reviews in MAUI. Our core principle: the PR's fix is just another try-fix candidate — not special. Try-fix models must be independent of the PR, and the same workflow needs to work for issue-only flows where there's no PR at all. The goal is to make try-fix amazing, not to grade the PR. (Background: PR #35105 established the firewall architecture that informs this restructuring.) Current wiring (this PR): Proposed restructuring: Key changes needed:
The content you've built is excellent — this is about repositioning where and when it runs in the pipeline. See inline comments for specific file-level feedback. |
PureWeen
left a comment
There was a problem hiding this comment.
See inline comments for specific feedback. Top-level architectural feedback posted as a separate comment above.
| @@ -559,7 +556,8 @@ $gateStatusForPrompt = switch ($gateResult) { | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
This Step 2 prompt runs the expert reviewer as an upfront PR review before try-fix. In our pipeline, the PR's fix is just another candidate — we don't want to grade it before try-fix runs because expert reviewer conclusions about the PR will be in context when try-fix starts (no firewall), and it also won't generalize to issue-only flows where there's no PR to review upfront.
What do you think about removing the "First code-review" step here and moving the expert reviewer invocation into the Report phase? Try-fix would load the domain knowledge (dimensions/CHECKs) directly, and the expert reviewer would evaluate all candidates after try-fix completes.
There was a problem hiding this comment.
Addressed by the multi-candidate restructure already on the branch (commits dac5150 "Multi-candidate review" and 1078580 "try-fix self-apply expert-reviewer"). The Step 2 prompt now runs PR-fix evaluation in parallel with try-fix×4, and Report compares all candidates symmetrically. See discussion comment for the agreed flow.
Not resolving this thread because the related context-contamination concern (raised on May 5) is still architecturally open — both branches still share one Invoke-CopilotStep session today.
| ``` | ||
|
|
||
| ### Step 2: Load Review Rules | ||
| ### Step 2: Delegate to Expert Reviewer |
There was a problem hiding this comment.
Step 2 currently delegates to the expert reviewer as the first substantive action. For the restructured pipeline, this skill would be invoked in the Report phase (to evaluate all candidates) rather than upfront.
The domain knowledge (30 dimensions, CHECK rules) should be separable from the review workflow (waves, routing, output format) so try-fix can load the knowledge without invoking the full review machinery.
Consider extracting lines 29-436 (Overarching Principles + 30 Dimensions + "What NOT to Flag") into a standalone reference file (e.g., references/maui-review-dimensions.md) that both try-fix and the expert reviewer agent can load.
There was a problem hiding this comment.
Partial: try-fix now invokes the expert reviewer (per #35231) and per ade495a each invocation gets an attempt-scoped output path, so the dimensions are loaded indirectly via the reviewer rather than as a flat reference file. Extracting lines 29-436 into a standalone references/maui-review-dimensions.md and having try-fix Read it directly would be a cleaner factoring, but it duplicates the dimensions text and would need a sync mechanism to keep the two copies aligned. Leaving open for follow-up — happy to do the extraction if you'd prefer that shape over the current invoke-and-read pattern.
|
|
||
| ### Wave 0 — Build Briefing Pack | ||
|
|
||
| 1. Read PR diff (`gh pr diff`) and list changed files — form your own assessment BEFORE reading PR description (independence-first) |
There was a problem hiding this comment.
Wave 0 currently assumes evaluating a single PR diff via gh pr diff. If the expert reviewer is restructured to evaluate all candidates in the Report phase, how do you see this working when it needs to evaluate N candidate diffs (the PR's fix + 4 try-fix results)?
One option: parameterize the diff source so the wave workflow accepts a diff input rather than hardcoding gh pr diff. That would let the same workflow evaluate any candidate.
There was a problem hiding this comment.
Addressed in spirit by the multi-candidate restructure (dac5150) — Wave 0 is no longer the only entry point. The Report phase now compares the PR fix against all 4 try-fix candidates symmetrically, and ade495a made the agent's findings output path configurable so per-candidate evaluations can be redirected to attempt-scoped paths.
Wave 0 itself still hardcodes gh pr diff though. The cleaner long-term fix is what you suggested — parameterize the diff source so the same wave workflow can evaluate any candidate diff. Leaving open as a follow-up since it's an agent-prompt refactor, not a wiring change.
| @@ -0,0 +1,32 @@ | |||
| --- | |||
There was a problem hiding this comment.
The instruction files you've added are well-scoped and the glob patterns are thoughtful. One coverage gap to flag: 14 of the 22 review-rules.md topics don't yet have a corresponding instruction file (Navigation/Shell, Memory Leaks, XAML/Bindings, Accessibility, Images, Gestures, Build, iOS Platform, Windows Platform, etc.) — they exist only in the expert reviewer's dimensions.
This matters because applyTo: may not reliably fire in task() sub-agent contexts (try-fix runs as a sub-agent). I lean toward having try-fix explicitly load the expert reviewer's dimensions as context — single source of truth, and it covers all 30 topics rather than 8.
There was a problem hiding this comment.
Addressed via a different route. Rather than backfilling 14 instruction files for the missing dimensions, #35231 wired try-fix to invoke @maui-expert-reviewer, and ade495a fixed that invocation to use an attempt-scoped output path. Net effect: try-fix now sees all 30 dimensions through the reviewer pass instead of relying on applyTo: glob coverage in sub-agent contexts. The instruction files we DO have (collectionview-android, hotpaths, threading-async, etc.) still serve as ambient guidance for human + Copilot edits outside the review flow.
MauiBot
left a comment
There was a problem hiding this comment.
Expert Review — 5 findings
See inline comments for details.
Proposed review flow evolution (capturing discussion)Recording an in-progress discussion so it's reviewable. Not a code change request — feedback welcome before we commit to an implementation. TodayShane's proposalProposed modifications on top of Shane'sTry-Fix always runs — there is no early exit. Every PR gets the full ×4 try-fix sweep so the expert reviewer always has the same set of candidates to compare against.
Combined shapeOpen questions
|
MauiBot
left a comment
There was a problem hiding this comment.
Expert Review — 4 findings
See inline comments for details.
PureWeen
left a comment
There was a problem hiding this comment.
Round 2 Review — 3-model adversarial consensus (Opus 4.6 / Sonnet 4.6 / GPT-5.3)
Good progress since the last review — the multi-candidate flow, winner.json, and gate diagnostics are substantial improvements. 8 findings below, focused on correctness and architecture.
Methodology: 3 independent reviewers with adversarial consensus. Findings marked with reviewer agreement count.
…act fixes Apply actionable findings from PR #35198 review (May 5): Review-PR.ps1 - Generalize build-error regex to '\berror\s+[A-Z]{2,}\d+\b' so it catches CS/MSB/NU/MAUI/NETSDK/XA codes without false-positiving on "0 error(s)" status lines. - Replace O(n²) truncation loop (trim 512 chars + recount per iter) with an O(log n) binary search on UTF-8 byte budget; reserve marker bytes upfront. - Defend against markdown fence injection by sizing the outer code fence as max(backtick run in diff)+1 (min 4) instead of mutating the diff text. post-inline-review.ps1 - Validate finding.path before posting: reject empty, '..', backslashes, rooted paths, drive letters, and control chars so a malformed/hostile finding cannot poison the review post (especially when the diff fetch fallback runs without cross-validation). Detect-TestsInDiff.ps1 - Tighten Get-ClassNameFromFile regex to skip 'abstract' and 'static' modifiers as the comment intends — a 'public abstract class BaseTest' declared above the concrete test class was being captured and turned into a non-matching test filter. performance-hotpaths.instructions.md - Replace overly broad src/Controls/src/Core/Handlers/** glob (which fired on all 60+ handlers, most of which are not hot paths) with specific scopes: Layouts, Platform, Items/Items2 handlers, ScrollView. Document scope rationale. public-api.instructions.md - Add src/Core/src/**/*.cs, src/Controls/src/**/*.cs, src/Essentials/src/**/*.cs globs so guidance loads when designing 'public class Foo' in Button.cs, not only when editing PublicAPI.Unshipped.txt afterward. Add activation guard at top of file so it ignores internal-only changes. maui-expert-reviewer.md + try-fix/SKILL.md - Resolve agent-contract conflict. Make the reviewer's findings JSON path configurable via the invoker prompt (default unchanged for the PR-level flow). Update try-fix to invoke the reviewer with an attempt-scoped output path (try-fix-{N}/reviewer-findings.json) and read the JSON back — preserves the per-dimension self-review pass added in #35231 while preventing try-fix attempts from clobbering the PR-level inline-findings.json consumed by post-inline-review.ps1. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…pliance from ~18% to ~100%)
Empirical analysis of 11 recent CI runs (44 try-fix attempts on AzDO
pipeline 27723) showed the expert-reviewer self-check only fired on
8/44 attempts (18.2%). Per-attempt: attempt-1=0%, attempt-2=45%,
attempt-3=27%, attempt-4=0%. Six of eleven runs had zero invocations.
One attempt (build 14027179/attempt-4) hallucinated EXPERT_REVIEW_MAJOR_ISSUES
text without writing any JSON file.
Root cause: the reviewer instruction was a single 60+ word clause
buried in 'Core Principles' (line 29 of SKILL.md), instructing the
model to spawn @maui-expert-reviewer as a sub-agent. It was NOT a
numbered Workflow step. The Required Files table didn't list
reviewer-findings.json. The verify-files-exist check didn't include it.
Nothing enforced the buried clause.
Fix:
1. Replace sub-agent spawn with inline self-check. Model reads sections
of .github/agents/maui-expert-reviewer.md (Overarching Principles +
Dimension Routing + relevant CHECK lists) and walks the diff against
them — no sub-agent spawn, no path argument, no JSON parse.
2. Promote to numbered Step 7: Expert Self-Review (MANDATORY).
Renumbered old steps 7→8 (Capture), 8→9 (Restore), 9→10 (Report).
3. Step 7 runs for ALL outcomes (Pass/Fail/Blocked), not just Pass.
4. Add reviewer-findings.json to Required Files table. The Step 8
verify gate detects missing artifacts but DEFERS the throw — Step 9
restore ALWAYS runs first to keep the worktree clean for the next
sequential attempt.
5. Add findings_count to Outputs table and Report template.
6. Add 'Self-review performed' to Completion Criteria.
7. Update pr-review SKILL.md attempt-{N} artifact tree to include
reviewer-findings.json (and other previously-omitted files).
8. Cap self-review iteration at one correction round to keep total
Step 6+7 iteration bounded.
Schema for reviewer-findings.json matches @maui-expert-reviewer agent
exactly: JSON array of {path, line, body} where line is on the changed
side of the diff (line 1 only as fallback for file-level concerns).
Same format as inline-findings.json so any future tooling can consume
either.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1ab1329 to
55d7b26
Compare
Self-review now runs BEFORE build+test (Step 6) instead of after (was Step 7). This catches design flaws before spending 5-15 min on a test cycle, and runs when context is lightest — before test output floods the context window. Before: attempt-1 compliance 0%, attempt-4 compliance 0%, overall 18% After inline fix: attempt-1 100%, attempt-4 60%, overall 85% This positional change should push attempt-3/4 higher by running the review when less prior-attempt context has accumulated. New flow: Design → Apply → Self-Review → Test → Capture → Restore → Report Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Switches the Copilot CLI invocation to a model with larger context window, which should improve instruction-following compliance on later try-fix attempts where context pressure caused the self-review step to be dropped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The 'result' event in Copilot CLI JSON output is a top-level event without a 'data' wrapper. Reading $event.data.usage always returned null, so the file/line counts were silently shown as 0. Read $event.usage directly, and wrap filesModified in @() to ensure .Count works whether PowerShell deserializes a single-element array as a scalar or an array. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Round 3 Review — 3-model adversarial consensus (Opus 4.7 xhigh / Sonnet 4.6 / GPT-5.5)Reviewing the 4 commits since the Round 2 baseline ( Methodology: 3 independent reviewers, top-tier models, no shared context. Findings annotated with reviewer agreement count. ❌ Error — Example invocation still shows the OLD pre-swap order —
|
| Severity | Count |
|---|---|
| ❌ Error | 2 |
| 2 | |
| 💡 Suggestion | 1 |
…taleness, model fallback
Round 3 multi-model review (Opus 4.7 xhigh / Sonnet 4.6 / GPT-5.5)
flagged 5 issues introduced by the Step 6↔7 swap and the inline
self-check restructure. This commit addresses all of them.
1. example-invocation.md (3/3 reviewers)
- Reordered to Self-Review → Test → Capture → Restore → Report
- Added the new Step 7.5 refresh step to the narrative
2. try-fix/SKILL.md gate error message (2/3 reviewers)
- Changed 'Step 7 (Expert Self-Review)' to 'Step 6 (Expert Self-Review)'
- Mentions Step 7.5 refresh requirement in the diagnostic
3. try-fix/SKILL.md gate section header (1/3 reviewers, related)
- 'enforcement gate for Step 7' -> 'for Steps 6 and 7'
4. Self-review staleness (1/3, GPT-5.5 — deeper architectural issue)
- Added Step 7.5: 'Refresh Self-Review If Code Changed'
- Step 6 now snapshots the reviewed diff to reviewer-findings.diff
- Step 7.5 compares current diff to the snapshot; if changed, re-runs
the self-review against the final diff and overwrites both files
- Step 8 artifact gate now requires reviewer-findings.diff
- Workflow grew from 10 steps to 11 (Step 7.5)
5. Hardcoded internal-only model in Review-PR.ps1 (2/3)
- Now reads $env:COPILOT_REVIEW_MODEL with claude-opus-4.7-1m-internal
as default, so contributors without internal-model access can run
the script with e.g. claude-opus-4.6 or claude-sonnet-4.6.
Also updated pr-review/SKILL.md output-tree diagram to mention the
new reviewer-findings.diff artifact and clarify that
reviewer-findings.json reflects the FINAL diff.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Round 4 — Adversarial Multi-Model ReviewRe-ran top-tier models ( ✅ Round 3 fixes confirmed correct (3/3 reviewers)
🔴 New findings1.
|
…ntent, Step 7.5 procedure clarity Three reviewers (claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5) converged on empirically-verified bugs in Round 3's new Step 7.5 self-review refresh: 1. [error] Drift comparison always evaluated truthy. git diff -> string[] (line per element); Get-Content -Raw -> single string. PowerShell -ne with array-on-left does element-wise filtering, not equality, so the if-branch always fires. Even worse: empty diff ($null -ne <string>) also fires. Fix: normalize both sides via (git diff | Out-String) so the comparison is single-string vs single-string. 2. [error] reviewer-findings.diff gate failed on documented no-diff Blocked path. git diff | Set-Content does not create the file when the pipe is empty. The Round 3 gate now requires reviewer-findings.diff, so any attempt with no code changes (the explicitly-documented Blocked-with-no-fix path at SKILL.md:288) silently failed. Same hazard applied to Step 8's fix.diff write -- fixed both for consistency. Fix: Set-Content -Path X -Value (git diff | Out-String) -NoNewline always creates the file, even with empty content. 3. [warning] Step 7.5 procedure was buried in PowerShell comments inside a single code block. An LLM following the script literally would re-snapshot the diff and re-validate the existing JSON without ever rewriting findings, defeating the purpose of the refresh. Fix: hoisted the walk-rules / rewrite-JSON instructions out of comments into a numbered markdown procedure that mirrors Step 6's structure (Detect drift -> Re-do self-review -> Re-snapshot and validate). Added a SHA256 hash check at the start of sub-step 3 that throws if reviewer-findings.json was not actually rewritten in sub-step 2 (defends against literal-script-execution agents). All scenarios validated empirically (pwsh 7.5.4): - Diff unchanged: skips refresh - Diff changed + agent forgot rewrite: hash check throws - Diff changed + agent rewrote: hash differs, proceeds to validate - Empty diff: file created with size 0 (gate satisfied) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Round 5 — Adversarial Multi-Model ReviewRe-ran the same 3 top-tier models ( ✅ Round 4 fixes confirmed correct (3/3 reviewers)
🔴 New findings — all reviewers, same root causeCascading regression: 0-byte file →
|
| # | Reviewers | Defect |
|---|---|---|
| 1 | Sonnet 4.6, Opus xhigh (direct), GPT-5.5 (downstream) | Get-Content -Raw on 0-byte file returns $null, not "" — false drift detection on Blocked-with-no-diff |
| 2 | GPT-5.5 (direct), Opus xhigh (direct), Sonnet 4.6 (cascade) | SHA256 hash sentinel throws on legitimate byte-identical refresh ([] → [], single-finding → same single-finding). Common, not "extremely rare". The "touch trailing whitespace" escape hatch corrupts the JSON artifact. |
This is the exact "always evaluates truthy" failure mode Round 4 was supposed to eliminate — just shifted from array -ne scalar to string -ne null.
Fixes to be applied
- Coalesce
$nullto""when reading the diff snapshot back. Empirically verified:[string]$nulldoes NOT coerce to""in pwsh 7.5.4 (it stays null), but(...) ?? ''does work:
$reviewedDiff = if (Test-Path "$OUTPUT_DIR/reviewer-findings.diff") {
(Get-Content "$OUTPUT_DIR/reviewer-findings.diff" -Raw) ?? ''
} else { '' }- Drop the SHA256 hash sentinel entirely. Step 6 has no equivalent "did you actually walk the rules" programmatic check — it relies on procedural enforcement (the numbered markdown sub-steps and the example-invocation chain). Round 4 fix Third #3 already moved Step 7.5 to the same enforcement model. The hash sentinel rejects the common byte-identical case (e.g.,
[]→[]) and the documented escape hatch ("touch a string body") corrupts the artifact. Replace with a callout explaining the trade-off:
> Why no programmatic "did you actually rewrite the JSON" check? A SHA256 hash
> sentinel rejects the legitimate byte-identical case (e.g., [] → [] after a
> small compile fix that introduces no new violations), and that case is common.
> The procedural enforcement is sub-step 2's explicit numbered list above, plus
> the example-invocation chain that walks the dimensions explicitly.
Empirical validation of fixes (full Step 6 → Step 7.5 round-trip)
| Scenario | Result |
|---|---|
| Empty diff (Blocked, no code changes) | diffChanged: False ✅ |
| Non-empty diff, unchanged after Step 7 | diffChanged: False ✅ |
| Diff actually changed during Step 7 | diffChanged: True ✅ |
Clean → clean refresh ([] → []) |
Validates, no throw ✅ |
| Refresh writes invalid JSON | Throws as expected ✅ |
Applying fixes now and re-running the same 3-model review.
Round 5 reviewers: claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5 — all returned NEEDS_CHANGES with high confidence, converging on the same cascading defect.
All three Round 5 reviewers (claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5) converged on a cascading bug introduced by Round 4's fixes interacting: 1. Round 4 fix #2 intentionally creates a 0-byte reviewer-findings.diff for the documented Blocked-with-no-diff path. But Get-Content -Raw on a 0-byte file returns $null, not "". So Step 7.5's drift detection becomes '"" -ne $null' → True → false-positive drift on every Blocked attempt. This re-introduces the 'always evaluates truthy' failure mode Round 4 was supposed to eliminate (just shifted from array-vs-scalar to string-vs-null). 2. The new SHA256 hash sentinel throws on legitimate byte-identical refreshes (e.g., '[]' → '[]' after a small compile fix that introduces no new violations, or single-finding → same-single-finding). The case is common, not 'extremely rare' as the error message claimed. Compounds with #1: the false-positive drift forces a re-walk that correctly writes '[]' again, then the hash check throws → unhandled exception → Step 9 worktree restore skipped → next attempt corrupted. Fixes: - Coalesce $null to "" via '?? '''' on the Get-Content -Raw call. Empirically verified: [string]$null does NOT coerce to '' in pwsh 7.5.4 (stays null), but '... ?? '''' does work. - Drop the SHA256 hash sentinel entirely. Step 6 has no equivalent programmatic 'did you walk the rules' check; it relies on procedural enforcement (the numbered markdown sub-steps and the example-invocation chain). Round 4 fix #3 already moved Step 7.5 to the same enforcement model. Replaced the throw with a callout explaining the trade-off. All 5 scenarios validated empirically (pwsh 7.5.4): - Empty diff (Blocked path): diffChanged=False (no false positive) - Non-empty diff unchanged: diffChanged=False - Diff changed during Step 7: diffChanged=True - Clean → clean refresh: validates, no throw - Invalid JSON: throws as expected Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Round 6 — Adversarial Multi-Model Review: ✅ LGTM (3/3)Re-ran the same 3 top-tier models ( Reviewer verdicts
Round 5 fixes — all confirmed correct
What changed across 4 review rounds (Round 2 → Round 6)
Final state
PR is ready for human review and merge from this multi-model adversarial review's perspective. Round 6 reviewers: claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5 — all returned LGTM with high confidence. Loop ran 5 fix iterations across 4 review rounds (Rounds 3, 4, 5, 6). |
Round 5 Review — LGTM ✅3-model adversarial review of the latest 7 commits (inline self-check adoption, Step 7.5 refresh, artifact gate, model/display fixes). The big change landed well. Replacing the Consensus findings (minor — not blocking):
Disputed and discarded:
Architecture alignment: Step 6 placement (after implementation, before testing) is a reasonable compromise vs our original Step 3 suggestion. It catches design flaws before expensive build+test cycles while keeping the workflow step count manageable. Ready to merge. 🚀 |
Resolve 3 conflict zones from PR #35198 (expert reviewer): - Zone 1: combine --model copilotModel with --secret-env-vars on copilot invocation - Zone 2: use renamed post-gate-comment.ps1 with ScriptsDir path + phase guard - Zone 3: keep 4-task split (main had no split) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
> [!NOTE] > Are you waiting for the changes in this PR to be merged? > It would be very helpful if you could [test the resulting artifacts](https://github.com/dotnet/maui/wiki/Testing-PR-Builds) from this PR and let us know in a comment if this change resolves your issue. Thank you! ## Description Replaces `review-rules.md` (flat 345-line checklist) with a dimensional expert review agent. Single source of truth for all review rules, organized into 30 dimensions for per-dimension sub-agent evaluation. Adds inline file:line PR comments alongside the existing wall-of-text summary. Extracted from 28k review comments across 5 maintainers via [extraction-pipeline](https://github.com/dotnet/fsharp/blob/main/.github/agents/extraction-pipeline.md). No functional code changes. Recreated from dotnet#35062 on a dotnet/maui branch (originally opened from a fork). ## What changed **Before:** `review-rules.md` had 345 lines of flat rules. `code-review` skill loaded them all into one context. Output was a single wall-of-text PR comment. **After:** Rules absorbed into `maui-expert-reviewer.md` as 30 dimensions with 200+ CHECK items. Each dimension runs as an independent sub-agent with focused context. Output is inline file:line PR comments via `inline-findings.json`. ## CI Flow ``` Review-PR.ps1 prompt: 1. code-review → maui-expert-reviewer agent → inline-findings.json 2. pr-review → Pre-Flight → Try-Fix → Report (sees findings, no duplication) Posting: post-inline-review.ps1 → .json → GitHub file:line comments (NEW) post-ai-summary-comment.ps1 → {phase}/content.md → wall-of-text (existing) CI: COMMENTS_VIA_FILE=true → agent writes .json, script posts Local: agent writes .json, code-review posts directly via gh api ``` ## Files | Action | File | What | |--------|------|------| | **Add** | `agents/maui-expert-reviewer.md` | 30 dimensions, 200+ CHECKs, routing table | | **Add** | `instructions/collectionview-{android,ios,windows}` | Platform-isolated CV rules | | **Add** | `instructions/{handler-patterns,layout-system,performance-hotpaths,public-api,threading-async}` | Domain-specific ambient guidance | | **Add** | `scripts/post-inline-review.ps1` | Posts .json as GitHub PR review | | **Del** | `skills/code-review/references/review-rules.md` | Absorbed into agent | | **Mod** | `skills/code-review/SKILL.md` | Delegates to agent | | **Mod** | `scripts/Review-PR.ps1` | Prompt + inline posting wiring | | **Mod** | `eng/pipelines/ci-copilot.yml` | `COMMENTS_VIA_FILE` env var | --------- Co-authored-by: kubaflo <kubaflo@users.noreply.github.com> Co-authored-by: T-Gro <46543583+T-Gro@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Tomas Grosup <tomasgrosup@microsoft.com>
> [!NOTE] > Are you waiting for the changes in this PR to be merged? > It would be very helpful if you could [test the resulting artifacts](https://github.com/dotnet/maui/wiki/Testing-PR-Builds) from this PR and let us know in a comment if this change resolves your issue. Thank you! ## Description Replaces `review-rules.md` (flat 345-line checklist) with a dimensional expert review agent. Single source of truth for all review rules, organized into 30 dimensions for per-dimension sub-agent evaluation. Adds inline file:line PR comments alongside the existing wall-of-text summary. Extracted from 28k review comments across 5 maintainers via [extraction-pipeline](https://github.com/dotnet/fsharp/blob/main/.github/agents/extraction-pipeline.md). No functional code changes. Recreated from dotnet#35062 on a dotnet/maui branch (originally opened from a fork). ## What changed **Before:** `review-rules.md` had 345 lines of flat rules. `code-review` skill loaded them all into one context. Output was a single wall-of-text PR comment. **After:** Rules absorbed into `maui-expert-reviewer.md` as 30 dimensions with 200+ CHECK items. Each dimension runs as an independent sub-agent with focused context. Output is inline file:line PR comments via `inline-findings.json`. ## CI Flow ``` Review-PR.ps1 prompt: 1. code-review → maui-expert-reviewer agent → inline-findings.json 2. pr-review → Pre-Flight → Try-Fix → Report (sees findings, no duplication) Posting: post-inline-review.ps1 → .json → GitHub file:line comments (NEW) post-ai-summary-comment.ps1 → {phase}/content.md → wall-of-text (existing) CI: COMMENTS_VIA_FILE=true → agent writes .json, script posts Local: agent writes .json, code-review posts directly via gh api ``` ## Files | Action | File | What | |--------|------|------| | **Add** | `agents/maui-expert-reviewer.md` | 30 dimensions, 200+ CHECKs, routing table | | **Add** | `instructions/collectionview-{android,ios,windows}` | Platform-isolated CV rules | | **Add** | `instructions/{handler-patterns,layout-system,performance-hotpaths,public-api,threading-async}` | Domain-specific ambient guidance | | **Add** | `scripts/post-inline-review.ps1` | Posts .json as GitHub PR review | | **Del** | `skills/code-review/references/review-rules.md` | Absorbed into agent | | **Mod** | `skills/code-review/SKILL.md` | Delegates to agent | | **Mod** | `scripts/Review-PR.ps1` | Prompt + inline posting wiring | | **Mod** | `eng/pipelines/ci-copilot.yml` | `COMMENTS_VIA_FILE` env var | --------- Co-authored-by: kubaflo <kubaflo@users.noreply.github.com> Co-authored-by: T-Gro <46543583+T-Gro@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Tomas Grosup <tomasgrosup@microsoft.com>
Note
Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!
Description
Replaces
review-rules.md(flat 345-line checklist) with a dimensional expert review agent. Single source of truth for all review rules, organized into 30 dimensions for per-dimension sub-agent evaluation. Adds inline file:line PR comments alongside the existing wall-of-text summary.Extracted from 28k review comments across 5 maintainers via extraction-pipeline. No functional code changes.
Recreated from #35062 on a dotnet/maui branch (originally opened from a fork).
What changed
Before:
review-rules.mdhad 345 lines of flat rules.code-reviewskill loaded them all into one context. Output was a single wall-of-text PR comment.After: Rules absorbed into
maui-expert-reviewer.mdas 30 dimensions with 200+ CHECK items. Each dimension runs as an independent sub-agent with focused context. Output is inline file:line PR comments viainline-findings.json.CI Flow
Files
agents/maui-expert-reviewer.mdinstructions/collectionview-{android,ios,windows}instructions/{handler-patterns,layout-system,performance-hotpaths,public-api,threading-async}scripts/post-inline-review.ps1skills/code-review/references/review-rules.mdskills/code-review/SKILL.mdscripts/Review-PR.ps1eng/pipelines/ci-copilot.ymlCOMMENTS_VIA_FILEenv var