History-trained agentic files + expert reviewer by kubaflo · Pull Request #35198 · dotnet/maui

kubaflo · 2026-04-28T14:03:48Z

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Description

Replaces review-rules.md (flat 345-line checklist) with a dimensional expert review agent. Single source of truth for all review rules, organized into 30 dimensions for per-dimension sub-agent evaluation. Adds inline file:line PR comments alongside the existing wall-of-text summary.

Extracted from 28k review comments across 5 maintainers via extraction-pipeline. No functional code changes.

Recreated from #35062 on a dotnet/maui branch (originally opened from a fork).

What changed

Before: review-rules.md had 345 lines of flat rules. code-review skill loaded them all into one context. Output was a single wall-of-text PR comment.

After: Rules absorbed into maui-expert-reviewer.md as 30 dimensions with 200+ CHECK items. Each dimension runs as an independent sub-agent with focused context. Output is inline file:line PR comments via inline-findings.json.

CI Flow

Review-PR.ps1 prompt:
  1. code-review → maui-expert-reviewer agent → inline-findings.json
  2. pr-review → Pre-Flight → Try-Fix → Report (sees findings, no duplication)

Posting:
  post-inline-review.ps1    → .json → GitHub file:line comments (NEW)
  post-ai-summary-comment.ps1 → {phase}/content.md → wall-of-text (existing)

CI: COMMENTS_VIA_FILE=true → agent writes .json, script posts
Local: agent writes .json, code-review posts directly via gh api

Files

Action	File	What
Add	`agents/maui-expert-reviewer.md`	30 dimensions, 200+ CHECKs, routing table
Add	`instructions/collectionview-{android,ios,windows}`	Platform-isolated CV rules
Add	`instructions/{handler-patterns,layout-system,performance-hotpaths,public-api,threading-async}`	Domain-specific ambient guidance
Add	`scripts/post-inline-review.ps1`	Posts .json as GitHub PR review
Del	`skills/code-review/references/review-rules.md`	Absorbed into agent
Mod	`skills/code-review/SKILL.md`	Delegates to agent
Mod	`scripts/Review-PR.ps1`	Prompt + inline posting wiring
Mod	`eng/pipelines/ci-copilot.yml`	`COMMENTS_VIA_FILE` env var

Copilot

Pull request overview

This PR restructures the Copilot PR code review guidance from a single flat checklist into a “dimensional” expert reviewer agent that emits file/line inline findings, and wires CI/scripts to post those findings as a GitHub PR review.

Changes:

Replace the legacy review-rules.md checklist with the new .github/agents/maui-expert-reviewer.md and domain/platform instruction files.
Update the code-review skill to delegate to the expert reviewer and support inline-findings.json output.
Add post-inline-review.ps1 and CI wiring to post inline review comments via GitHub’s Reviews API.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`eng/pipelines/ci-copilot.yml`	Enables `COMMENTS_VIA_FILE` in CI for file-based inline findings flow.
`.github/skills/code-review/tests/eval.yaml`	Updates eval rubric text to reflect expert reviewer dimensions.
`.github/skills/code-review/references/review-rules.md`	Removes the legacy flat checklist source file.
`.github/skills/code-review/SKILL.md`	Switches `code-review` guidance to delegate to the expert reviewer + inline findings.
`.github/scripts/post-inline-review.ps1`	New script to post `inline-findings.json` as a single PR review with inline comments.
`.github/scripts/Review-PR.ps1`	Wires the review pipeline to request inline findings and post them after the summary.
`.github/instructions/threading-async.instructions.md`	Adds threading/async guidance for platform and handler code.
`.github/instructions/public-api.instructions.md`	Adds PublicAPI.Unshipped guidance for API surface changes.
`.github/instructions/performance-hotpaths.instructions.md`	Adds hot-path performance rules for layout/handlers areas.
`.github/instructions/layout-system.instructions.md`	Adds layout measure/arrange contract guidance.
`.github/instructions/handler-patterns.instructions.md`	Adds handler mapper/lifecycle patterns guidance.
`.github/instructions/collectionview-windows.instructions.md`	Adds Windows CollectionView (Items/) guidance.
`.github/instructions/collectionview-ios.instructions.md`	Adds iOS/MacCatalyst CollectionView (Items2/) guidance.
`.github/instructions/collectionview-android.instructions.md`	Adds Android CollectionView (Items/) guidance.
`.github/agents/maui-expert-reviewer.md`	Introduces the 30-dimension expert reviewer agent and inline findings contract.

MauiBot

Expert Review — 4 findings

See inline comments for details.

PureWeen · 2026-04-28T20:26:52Z

Great domain knowledge — let's restructure the wiring

The 30 dimensions and CHECK rules are a significant improvement over review-rules.md — more comprehensive, better structured, properly severity-graded. The instruction files, posting script, and COMMENTS_VIA_FILE decoupling are all solid building blocks. I want to keep all of that.

However, the wiring needs to change to match how we do PR reviews in MAUI.

Our core principle: the PR's fix is just another try-fix candidate — not special. Try-fix models must be independent of the PR, and the same workflow needs to work for issue-only flows where there's no PR at all. The goal is to make try-fix amazing, not to grade the PR.

(Background: PR #35105 established the firewall architecture that informs this restructuring.)

Current wiring (this PR):

Gate → Expert reviewer reviews PR → pr-review (try-fix sees reviewer output in context) → Post

Proposed restructuring:

Gate → Pre-Flight (context only) → Try-Fix ×4 (each loads domain knowledge from dimensions) → Report (expert reviewer evaluates ALL candidates: PR + try-fix results) → Post inline findings on winning fix

Key changes needed:

Domain knowledge → try-fix: The 30 dimensions should feed INTO try-fix as fix-quality guidance (not review the PR upfront). Try-fix models loading these CHECK rules will produce better fixes.
Expert review → Report phase: After try-fix completes, the expert reviewer evaluates all candidates symmetrically — the PR's fix is candidate Update README.md #5.
Separate domain knowledge from workflow: The dimensions (lines 29-436) need to be loadable independently of the wave workflow (lines 528-599), so try-fix can consume the domain rules without the review orchestration.
Candidate-scoring output mode: inline-findings.json is great for posting the final review. But Report phase also needs a candidate-comparison format to rank PR vs try-fix results.

The content you've built is excellent — this is about repositioning where and when it runs in the pipeline. See inline comments for specific file-level feedback.

PureWeen

See inline comments for specific feedback. Top-level architectural feedback posted as a separate comment above.

PureWeen · 2026-04-28T20:31:58Z

@@ -559,7 +556,8 @@ $gateStatusForPrompt = switch ($gateResult) {
 }



This Step 2 prompt runs the expert reviewer as an upfront PR review before try-fix. In our pipeline, the PR's fix is just another candidate — we don't want to grade it before try-fix runs because expert reviewer conclusions about the PR will be in context when try-fix starts (no firewall), and it also won't generalize to issue-only flows where there's no PR to review upfront.

What do you think about removing the "First code-review" step here and moving the expert reviewer invocation into the Report phase? Try-fix would load the domain knowledge (dimensions/CHECKs) directly, and the expert reviewer would evaluate all candidates after try-fix completes.

Addressed by the multi-candidate restructure already on the branch (commits dac5150 "Multi-candidate review" and 1078580 "try-fix self-apply expert-reviewer"). The Step 2 prompt now runs PR-fix evaluation in parallel with try-fix×4, and Report compares all candidates symmetrically. See discussion comment for the agreed flow.

Not resolving this thread because the related context-contamination concern (raised on May 5) is still architecturally open — both branches still share one Invoke-CopilotStep session today.

PureWeen · 2026-04-28T20:31:58Z

   ```

-### Step 2: Load Review Rules
+### Step 2: Delegate to Expert Reviewer


Step 2 currently delegates to the expert reviewer as the first substantive action. For the restructured pipeline, this skill would be invoked in the Report phase (to evaluate all candidates) rather than upfront.

The domain knowledge (30 dimensions, CHECK rules) should be separable from the review workflow (waves, routing, output format) so try-fix can load the knowledge without invoking the full review machinery.

Consider extracting lines 29-436 (Overarching Principles + 30 Dimensions + "What NOT to Flag") into a standalone reference file (e.g., references/maui-review-dimensions.md) that both try-fix and the expert reviewer agent can load.

Partial: try-fix now invokes the expert reviewer (per #35231) and per ade495a each invocation gets an attempt-scoped output path, so the dimensions are loaded indirectly via the reviewer rather than as a flat reference file. Extracting lines 29-436 into a standalone references/maui-review-dimensions.md and having try-fix Read it directly would be a cleaner factoring, but it duplicates the dimensions text and would need a sync mechanism to keep the two copies aligned. Leaving open for follow-up — happy to do the extraction if you'd prefer that shape over the current invoke-and-read pattern.

PureWeen · 2026-04-28T20:31:58Z

+
+### Wave 0 — Build Briefing Pack
+
+1. Read PR diff (`gh pr diff`) and list changed files — form your own assessment BEFORE reading PR description (independence-first)


Wave 0 currently assumes evaluating a single PR diff via gh pr diff. If the expert reviewer is restructured to evaluate all candidates in the Report phase, how do you see this working when it needs to evaluate N candidate diffs (the PR's fix + 4 try-fix results)?

One option: parameterize the diff source so the wave workflow accepts a diff input rather than hardcoding gh pr diff. That would let the same workflow evaluate any candidate.

Addressed in spirit by the multi-candidate restructure (dac5150) — Wave 0 is no longer the only entry point. The Report phase now compares the PR fix against all 4 try-fix candidates symmetrically, and ade495a made the agent's findings output path configurable so per-candidate evaluations can be redirected to attempt-scoped paths.

Wave 0 itself still hardcodes gh pr diff though. The cleaner long-term fix is what you suggested — parameterize the diff source so the same wave workflow can evaluate any candidate diff. Leaving open as a follow-up since it's an agent-prompt refactor, not a wiring change.

PureWeen · 2026-04-28T20:31:58Z

@@ -0,0 +1,32 @@
+---


The instruction files you've added are well-scoped and the glob patterns are thoughtful. One coverage gap to flag: 14 of the 22 review-rules.md topics don't yet have a corresponding instruction file (Navigation/Shell, Memory Leaks, XAML/Bindings, Accessibility, Images, Gestures, Build, iOS Platform, Windows Platform, etc.) — they exist only in the expert reviewer's dimensions.

This matters because applyTo: may not reliably fire in task() sub-agent contexts (try-fix runs as a sub-agent). I lean toward having try-fix explicitly load the expert reviewer's dimensions as context — single source of truth, and it covers all 30 topics rather than 8.

Addressed via a different route. Rather than backfilling 14 instruction files for the missing dimensions, #35231 wired try-fix to invoke @maui-expert-reviewer, and ade495a fixed that invocation to use an attempt-scoped output path. Net effect: try-fix now sees all 30 dimensions through the reviewer pass instead of relying on applyTo: glob coverage in sub-agent contexts. The instruction files we DO have (collectionview-android, hotpaths, threading-async, etc.) still serve as ambient guidance for human + Copilot edits outside the review flow.

MauiBot

Expert Review — 5 findings

See inline comments for details.

kubaflo · 2026-04-29T12:42:20Z

Proposed review flow evolution (capturing discussion)

Recording an in-progress discussion so it's reviewable. Not a code change request — feedback welcome before we commit to an implementation.

Today

Gate → Expert reviewer reviews PR → pr-review (Try-Fix sees reviewer output in context) → Post

Shane's proposal

Gate → Pre-Flight (context only)
     → Try-Fix ×4 (each loads domain knowledge from dimensions)
     → Report (expert reviewer evaluates ALL candidates: PR + 4 try-fix results)
     → Post inline findings on the winning fix

Proposed modifications on top of Shane's

Try-Fix always runs — there is no early exit. Every PR gets the full ×4 try-fix sweep so the expert reviewer always has the same set of candidates to compare against.

Run the expert-reviewer evaluation of the PR fix before Try-Fix kicks off, in a sandbox with reviewer feedback applied. This produces an additional candidate (the "PR fix + reviewer feedback") that goes into the Report stage alongside the raw PR fix and the 4 try-fix outputs. It does not short-circuit Try-Fix — Try-Fix still runs in parallel.
Don't post inline findings if the winning candidate isn't the PR's fix. Inline file:line comments only make sense against lines that exist in the PR's diff. (We already filter for this in post-inline-review.ps1, but it would be a no-op against a non-PR candidate.)
If a non-PR candidate wins → request changes asking the author to apply the AI-suggested fix. When the author pushes the change, the workflow re-triggers naturally.

Combined shape

Gate
  └─ Pre-Flight (context only)
       ├─ Expert Reviewer eval of PR fix (sandbox-applied) ──┐
       └─ Try-Fix ×4 (each loads dimension knowledge) ──────┤
                                                             ▼
                                                          Report
                                  (expert reviewer evaluates ALL candidates:
                                   PR fix, PR fix + reviewer feedback, 4 try-fix candidates)
                                                             │
                              ┌──────────────────────────────┴───────────────────────────────┐
                              ▼                                                              ▼
                     winner = a PR-diff candidate                                  winner = non-PR candidate
                     (raw PR or PR + reviewer)                                              │
                              │                                                             ▼
                              ▼                                              request changes; surface
                  post inline findings on PR diff                            winning candidate diff;
                                                                             author push re-triggers

Open questions

How do we represent "PR fix" vs "PR fix + sandbox-applied reviewer feedback" as two distinct candidates in the Report scoring? Same scoring rubric, or weighted differently?
For the request-changes path: surface the winning candidate diff in the review body, as a suggested-changes block, or attach a patch file?
Pre-Flight runs the expert reviewer eval and kicks off Try-Fix — do they run truly in parallel, or sequentially with Try-Fix waiting on the reviewer eval result for context?

MauiBot

Expert Review — 4 findings

See inline comments for details.

PureWeen

Round 2 Review — 3-model adversarial consensus (Opus 4.6 / Sonnet 4.6 / GPT-5.3)

Good progress since the last review — the multi-candidate flow, winner.json, and gate diagnostics are substantial improvements. 8 findings below, focused on correctness and architecture.

Methodology: 3 independent reviewers with adversarial consensus. Findings marked with reviewer agreement count.

…act fixes Apply actionable findings from PR #35198 review (May 5): Review-PR.ps1 - Generalize build-error regex to '\berror\s+[A-Z]{2,}\d+\b' so it catches CS/MSB/NU/MAUI/NETSDK/XA codes without false-positiving on "0 error(s)" status lines. - Replace O(n²) truncation loop (trim 512 chars + recount per iter) with an O(log n) binary search on UTF-8 byte budget; reserve marker bytes upfront. - Defend against markdown fence injection by sizing the outer code fence as max(backtick run in diff)+1 (min 4) instead of mutating the diff text. post-inline-review.ps1 - Validate finding.path before posting: reject empty, '..', backslashes, rooted paths, drive letters, and control chars so a malformed/hostile finding cannot poison the review post (especially when the diff fetch fallback runs without cross-validation). Detect-TestsInDiff.ps1 - Tighten Get-ClassNameFromFile regex to skip 'abstract' and 'static' modifiers as the comment intends — a 'public abstract class BaseTest' declared above the concrete test class was being captured and turned into a non-matching test filter. performance-hotpaths.instructions.md - Replace overly broad src/Controls/src/Core/Handlers/** glob (which fired on all 60+ handlers, most of which are not hot paths) with specific scopes: Layouts, Platform, Items/Items2 handlers, ScrollView. Document scope rationale. public-api.instructions.md - Add src/Core/src/**/*.cs, src/Controls/src/**/*.cs, src/Essentials/src/**/*.cs globs so guidance loads when designing 'public class Foo' in Button.cs, not only when editing PublicAPI.Unshipped.txt afterward. Add activation guard at top of file so it ignores internal-only changes. maui-expert-reviewer.md + try-fix/SKILL.md - Resolve agent-contract conflict. Make the reviewer's findings JSON path configurable via the invoker prompt (default unchanged for the PR-level flow). Update try-fix to invoke the reviewer with an attempt-scoped output path (try-fix-{N}/reviewer-findings.json) and read the JSON back — preserves the per-dimension self-review pass added in #35231 while preventing try-fix attempts from clobbering the PR-level inline-findings.json consumed by post-inline-review.ps1. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…pliance from ~18% to ~100%) Empirical analysis of 11 recent CI runs (44 try-fix attempts on AzDO pipeline 27723) showed the expert-reviewer self-check only fired on 8/44 attempts (18.2%). Per-attempt: attempt-1=0%, attempt-2=45%, attempt-3=27%, attempt-4=0%. Six of eleven runs had zero invocations. One attempt (build 14027179/attempt-4) hallucinated EXPERT_REVIEW_MAJOR_ISSUES text without writing any JSON file. Root cause: the reviewer instruction was a single 60+ word clause buried in 'Core Principles' (line 29 of SKILL.md), instructing the model to spawn @maui-expert-reviewer as a sub-agent. It was NOT a numbered Workflow step. The Required Files table didn't list reviewer-findings.json. The verify-files-exist check didn't include it. Nothing enforced the buried clause. Fix: 1. Replace sub-agent spawn with inline self-check. Model reads sections of .github/agents/maui-expert-reviewer.md (Overarching Principles + Dimension Routing + relevant CHECK lists) and walks the diff against them — no sub-agent spawn, no path argument, no JSON parse. 2. Promote to numbered Step 7: Expert Self-Review (MANDATORY). Renumbered old steps 7→8 (Capture), 8→9 (Restore), 9→10 (Report). 3. Step 7 runs for ALL outcomes (Pass/Fail/Blocked), not just Pass. 4. Add reviewer-findings.json to Required Files table. The Step 8 verify gate detects missing artifacts but DEFERS the throw — Step 9 restore ALWAYS runs first to keep the worktree clean for the next sequential attempt. 5. Add findings_count to Outputs table and Report template. 6. Add 'Self-review performed' to Completion Criteria. 7. Update pr-review SKILL.md attempt-{N} artifact tree to include reviewer-findings.json (and other previously-omitted files). 8. Cap self-review iteration at one correction round to keep total Step 6+7 iteration bounded. Schema for reviewer-findings.json matches @maui-expert-reviewer agent exactly: JSON array of {path, line, body} where line is on the changed side of the diff (line 1 only as fallback for file-level concerns). Same format as inline-findings.json so any future tooling can consume either. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Self-review now runs BEFORE build+test (Step 6) instead of after (was Step 7). This catches design flaws before spending 5-15 min on a test cycle, and runs when context is lightest — before test output floods the context window. Before: attempt-1 compliance 0%, attempt-4 compliance 0%, overall 18% After inline fix: attempt-1 100%, attempt-4 60%, overall 85% This positional change should push attempt-3/4 higher by running the review when less prior-attempt context has accumulated. New flow: Design → Apply → Self-Review → Test → Capture → Restore → Report Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Switches the Copilot CLI invocation to a model with larger context window, which should improve instruction-following compliance on later try-fix attempts where context pressure caused the self-review step to be dropped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The 'result' event in Copilot CLI JSON output is a top-level event without a 'data' wrapper. Reading $event.data.usage always returned null, so the file/line counts were silently shown as 0. Read $event.usage directly, and wrap filesModified in @() to ensure .Count works whether PowerShell deserializes a single-element array as a scalar or an array. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

kubaflo · 2026-05-07T15:19:23Z

Round 3 Review — 3-model adversarial consensus (Opus 4.7 xhigh / Sonnet 4.6 / GPT-5.5)

Reviewing the 4 commits since the Round 2 baseline (ade495a6f0..bc9349e52b), focused on the new artifact gate, the Step 6↔7 swap, the inline self-check restructure, and the file-change-count fix.

Methodology: 3 independent reviewers, top-tier models, no shared context. Findings annotated with reviewer agreement count.

❌ Error — Example invocation still shows the OLD pre-swap order — `try-fix/references/example-invocation.md:26` (3/3 reviewers)

**Skill execution:** Reads context → Analyzes target files → Designs fix → Applies fix → Runs test (PASS) → Performs inline expert self-review … → Captures artifacts → Reports result → Reverts changes

Commit 73ebff80a8 swapped the order so Step 6 (Self-Review) runs before Step 7 (Test). The very next prose under Step 6 in SKILL.md justifies this: "runs BEFORE testing so you can catch design flaws before spending time on build+test cycles." But the canonical example here still says Runs test (PASS) → Performs inline expert self-review. Examples carry disproportionate weight in few-shot pattern matching — an agent shortcutting to the example will execute the pre-swap order and silently regress the compliance gain the swap was meant to lock in. Opus also notes Reports result → Reverts changes is reversed (Step 9 Restore precedes Step 10 Report).

Fix:

… → Applies fix → Performs inline expert self-review against `.github/agents/maui-expert-reviewer.md` rules and writes `reviewer-findings.json` (`[]` if clean) → Runs test (PASS) → Captures artifacts → Reverts changes → Reports result

⚠️ Warning — Gate error message cites the wrong step number — `try-fix/SKILL.md:429` (2/3 reviewers)

$gateFailureMessage = "Required artifacts missing: $($missing -join ', '). If 'reviewer-findings.json' is missing, Step 7 (Expert Self-Review) was not performed — it is mandatory and must contain at least '[]'."

After the Step 6↔7 swap, SKILL.md defines Step 6 = Expert Self-Review (line 267) and Step 7 = Test and Iterate (line 341). This message — written verbatim to host output via Write-Host — tells the agent that Step 7 is "Expert Self-Review", contradicting the same file. An agent reading the failure and re-reading the SKILL gets inconsistent guidance about which numbered step to re-perform.

Fix: Replace Step 7 (Expert Self-Review) with Step 6 (Expert Self-Review) on line 429.

⚠️ Warning — Section header mislabels which step the artifact gate enforces — `try-fix/SKILL.md:411` (1/3 reviewers, related to #2)

**Verify all required files exist (this is the enforcement gate for Step 7):**

The whole rationale for adding reviewer-findings.json to the gate (per 55d7b26f7c's commit message: "the verify-files-exist check didn't include it. Nothing enforced the buried clause") is that the gate enforces Step 6 compliance. Calling it "the enforcement gate for Step 7" inverts that intent.

Fix: Reword to e.g. (this is the enforcement gate for Steps 6 and 7 — primarily reviewer-findings.json from Step 6).

❌ Error — Self-review can become stale after test-loop fixes — `try-fix/SKILL.md:362-368` (1/3 reviewers, GPT-5.5)

**Testing Loop (Iterate until SUCCESS or exhausted):**

1. **Run the test command** - It will build, deploy, and test automatically
2. **Check the result:**
   - ✅ **Tests PASS** → Move to Step 8 (Capture)
   - ❌ **Compile errors** → Fix compilation issues (see below), go to step 1
   - ❌ **Tests FAIL (runtime)** → Analyze failure, fix code, go to step 1

Step 6 writes reviewer-findings.json against the diff before Step 7's test loop. But the test loop explicitly permits later code changes for compile/runtime failures. Those post-review changes can become the final fix.diff without any fresh expert self-review — the gate at Step 8 only checks that reviewer-findings.json exists, not that it corresponds to the final diff.

This is the deeper architectural concern of the swap: moving self-review earlier did catch design flaws faster, but it also opened a window where the recorded findings can lie about what got shipped.

Suggested fix: Either (a) require re-running Step 6 every time Step 7 modifies code (the loop body), or (b) move Self-Review to run again after the test loop converges and validate that the saved findings correspond to the final git diff.

💡 Suggestion — Hardcoded internal-only model has no fallback — `Review-PR.ps1:302` (2/3 reviewers)

& copilot -p $Prompt --allow-all --output-format json --model claude-opus-4.7-1m-internal 2>&1 | ForEach-Object {

claude-opus-4.7-1m-internal is marked "Internal only" in the model catalog. Other agentic workflows in this repo (copilot-evaluate-tests.md, skill-validation.yml, ci-doctor.lock.yml) use publicly-available models. If this script is ever run by a contributor whose Copilot CLI installation can't resolve the internal model, every Invoke-CopilotStep call fails immediately. Lower severity because Review-PR.ps1 is currently manually-invoked only.

Suggested fix:

$copilotModel = if ($env:COPILOT_REVIEW_MODEL) { $env:COPILOT_REVIEW_MODEL } else { 'claude-opus-4.7-1m-internal' }
& copilot -p $Prompt --allow-all --output-format json --model $copilotModel 2>&1 | ForEach-Object {

✅ Verified Correct (no findings)

Review-PR.ps1:395-403 — the file-change-count fix ($event.data.usage → $event.usage, @($changes.filesModified).Count). All three reviewers confirmed against the actual Copilot CLI JSON shape: result is a top-level event without a data wrapper, and the @() wrapping correctly handles PowerShell's scalar-flattening of single-element JSON arrays.
The Step 6↔7 swap itself (the rationale and ordering) is sound.
The model bump to claude-opus-4.7-1m-internal for the orchestrator is reasonable (modulo the portability concern above).

Consensus Verdict: NEEDS_CHANGES

Confidence: high
Summary: The PowerShell delta is correct and the architectural intent of the swap is right. But this delta introduced 3 prompt-correctness regressions in try-fix/SKILL.md and example-invocation.md — stale "Step 7" references and an out-of-order canonical example — that directly undermine the compliance gain that 55d7b26f7c and 73ebff80a8 were designed to lock in. The staleness concern (GPT-5.5) is a deeper architectural question worth resolving, not just a typo.

Severity	Count
❌ Error	2
⚠️ Warning	2
💡 Suggestion	1

…taleness, model fallback Round 3 multi-model review (Opus 4.7 xhigh / Sonnet 4.6 / GPT-5.5) flagged 5 issues introduced by the Step 6↔7 swap and the inline self-check restructure. This commit addresses all of them. 1. example-invocation.md (3/3 reviewers) - Reordered to Self-Review → Test → Capture → Restore → Report - Added the new Step 7.5 refresh step to the narrative 2. try-fix/SKILL.md gate error message (2/3 reviewers) - Changed 'Step 7 (Expert Self-Review)' to 'Step 6 (Expert Self-Review)' - Mentions Step 7.5 refresh requirement in the diagnostic 3. try-fix/SKILL.md gate section header (1/3 reviewers, related) - 'enforcement gate for Step 7' -> 'for Steps 6 and 7' 4. Self-review staleness (1/3, GPT-5.5 — deeper architectural issue) - Added Step 7.5: 'Refresh Self-Review If Code Changed' - Step 6 now snapshots the reviewed diff to reviewer-findings.diff - Step 7.5 compares current diff to the snapshot; if changed, re-runs the self-review against the final diff and overwrites both files - Step 8 artifact gate now requires reviewer-findings.diff - Workflow grew from 10 steps to 11 (Step 7.5) 5. Hardcoded internal-only model in Review-PR.ps1 (2/3) - Now reads $env:COPILOT_REVIEW_MODEL with claude-opus-4.7-1m-internal as default, so contributors without internal-model access can run the script with e.g. claude-opus-4.6 or claude-sonnet-4.6. Also updated pr-review/SKILL.md output-tree diagram to mention the new reviewer-findings.diff artifact and clarify that reviewer-findings.json reflects the FINAL diff. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

kubaflo · 2026-05-07T15:38:20Z

Round 4 — Adversarial Multi-Model Review

Re-ran top-tier models (claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5) against commit caaf080376. All three converge on the same critical bug in the new Step 7.5; Opus xhigh found two additional issues. Verdict from every model: NEEDS_CHANGES.

✅ Round 3 fixes confirmed correct (3/3 reviewers)

Example-invocation execution chain reordered (Self-Review → Test → Refresh → Capture → Restore → Report)
Gate error message now references "Step 6"
Section header reads "enforcement gate for Steps 6 and 7"
reviewer-findings.diff written by Step 6 and required by Step 8 gate
Review-PR.ps1 reads $env:COPILOT_REVIEW_MODEL with internal model as fallback

🔴 New findings

1. `[error]` Step 7.5 drift comparison is broken — refresh fires unconditionally — 3/3 agreement

File: .github/skills/try-fix/SKILL.md:389–394

$currentDiff = git diff                                              # → string[]
$reviewedDiff = Get-Content "$OUTPUT_DIR/reviewer-findings.diff" -Raw # → string
if ($currentDiff -ne $reviewedDiff) { ... }                          # ← always truthy

PowerShell's -ne operator with an array on the left does element-wise filtering — it returns the array of elements not equal to the right operand. Since no individual diff line equals the joined-string snapshot, the result is the entire array (truthy). Reproduced empirically (165 elements returned). Even worse, when git diff is empty $null -ne <string> is also $true, so the branch fires on no-change scenarios too.

Net effect: the "Diff unchanged → no refresh needed" branch is dead code in normal operation. Step 7.5 burns extra fix-batch loops every attempt, defeating the purpose of the comparison.

Fix:

$currentDiff  = (git diff | Out-String)
$reviewedDiff = if (Test-Path "$OUTPUT_DIR/reviewer-findings.diff") {
    Get-Content "$OUTPUT_DIR/reviewer-findings.diff" -Raw
} else { '' }
if ($currentDiff -ne $reviewedDiff) { ... }

2. `[error]` `reviewer-findings.diff` gate fails on documented no-diff Blocked path — 1/3 (Opus xhigh)

File: .github/skills/try-fix/SKILL.md:331 (Step 6 snapshot) and :454 (gate)

git diff | Set-Content "$OUTPUT_DIR/reviewer-findings.diff"   # ← empty pipe creates NO file

When git diff produces no output, Set-Content from an empty pipeline does not create the file (verified on pwsh 7.5.4). The Round 3 gate now requires reviewer-findings.diff, so any attempt with no diff to review — explicitly documented at SKILL.md:288 ("If you have NO code changes (e.g., Blocked because no device available before any fix was applied), still proceed to step 4 and write '[]'") — now fails the gate and is force-marked Blocked at line 468. Regression from Round 3, where '[]' in reviewer-findings.json was sufficient.

Fix:

Set-Content -Path "$OUTPUT_DIR/reviewer-findings.diff" -Value (git diff | Out-String) -NoNewline

(Set-Content -Value always creates the file, even with empty content.)

3. `[warning]` Step 7.5 procedure is hidden in PowerShell comments inside a code block — 2/3 (Opus xhigh, GPT-5.5)

File: .github/skills/try-fix/SKILL.md:394–409

if ($currentDiff -ne $reviewedDiff) {
    # Re-run the procedure from Step 6 against the final diff:
    #   - walk the same Overarching Principles + routed dimensions
    #   - OVERWRITE $OUTPUT_DIR/reviewer-findings.json with the new findings
    git diff | Set-Content "$OUTPUT_DIR/reviewer-findings.diff"
    $findings = @(Get-Content ... | ConvertFrom-Json)
    $findingsCount = $findings.Count
}

The actual self-review work is described only in PowerShell comments inside the snippet. An LLM that executes the script literally will re-snapshot the diff and re-validate the existing JSON without ever rewriting it — leaving stale findings while the gate happily reports success. Compare to Step 6, which spells out the procedure as numbered markdown bullets outside the code block. Combined with finding #1 (refresh always fires), this could routinely produce re-snapshotted diffs paired with stale findings — the exact bug Step 7.5 was designed to prevent.

Fix: Lift the "walk dimensions / overwrite JSON" instructions out of the PowerShell comments into a numbered markdown list above the snippet, mirroring Step 6's structure (Identify → Walk → Write → Validate). Keep only mechanical operations in the code block.

Plan

Applying all three fixes in the next commit, then re-running this same 3-model review.

_{Round 4 reviewers: claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5 — all returned NEEDS_CHANGES with high confidence.}

…ntent, Step 7.5 procedure clarity Three reviewers (claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5) converged on empirically-verified bugs in Round 3's new Step 7.5 self-review refresh: 1. [error] Drift comparison always evaluated truthy. git diff -> string[] (line per element); Get-Content -Raw -> single string. PowerShell -ne with array-on-left does element-wise filtering, not equality, so the if-branch always fires. Even worse: empty diff ($null -ne <string>) also fires. Fix: normalize both sides via (git diff | Out-String) so the comparison is single-string vs single-string. 2. [error] reviewer-findings.diff gate failed on documented no-diff Blocked path. git diff | Set-Content does not create the file when the pipe is empty. The Round 3 gate now requires reviewer-findings.diff, so any attempt with no code changes (the explicitly-documented Blocked-with-no-fix path at SKILL.md:288) silently failed. Same hazard applied to Step 8's fix.diff write -- fixed both for consistency. Fix: Set-Content -Path X -Value (git diff | Out-String) -NoNewline always creates the file, even with empty content. 3. [warning] Step 7.5 procedure was buried in PowerShell comments inside a single code block. An LLM following the script literally would re-snapshot the diff and re-validate the existing JSON without ever rewriting findings, defeating the purpose of the refresh. Fix: hoisted the walk-rules / rewrite-JSON instructions out of comments into a numbered markdown procedure that mirrors Step 6's structure (Detect drift -> Re-do self-review -> Re-snapshot and validate). Added a SHA256 hash check at the start of sub-step 3 that throws if reviewer-findings.json was not actually rewritten in sub-step 2 (defends against literal-script-execution agents). All scenarios validated empirically (pwsh 7.5.4): - Diff unchanged: skips refresh - Diff changed + agent forgot rewrite: hash check throws - Diff changed + agent rewrote: hash differs, proceeds to validate - Empty diff: file created with size 0 (gate satisfied) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

kubaflo · 2026-05-07T15:50:13Z

Round 5 — Adversarial Multi-Model Review

Re-ran the same 3 top-tier models (claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5) against commit d4acf348a0. All three converged on a cascading bug introduced by Round 4's fixes interacting with each other. Verdict from every model: NEEDS_CHANGES.

✅ Round 4 fixes confirmed correct (3/3 reviewers)

Drift comparison normalized via (git diff | Out-String) on both sides — array-vs-scalar bug eliminated for non-empty diffs ✅
Set-Content -Value (git diff | Out-String) -NoNewline creates the diff file even when the pipe is empty ✅
Step 7.5 procedure hoisted out of PowerShell comments into a numbered markdown list mirroring Step 6 ✅

🔴 New findings — all reviewers, same root cause

Cascading regression: 0-byte file → `Get-Content -Raw` → `$null` → false drift → hash-sentinel deadlock

The Round 4 fix #2 intentionally creates a 0-byte reviewer-findings.diff for the documented Blocked-with-no-diff path. But Step 7.5 reads it back with Get-Content -Raw, which returns $null on a 0-byte file (not ""). Empirically reproduced:

After Step 6 (clean tree):  reviewer-findings.diff size: 0
Step 7.5 sub-step 1:
  $currentDiff  = (git diff | Out-String)         # ""
  $reviewedDiff = Get-Content X -Raw              # $null
  $diffChanged  = ("" -ne $null)                  # True ← false positive!
Step 7.5 sub-step 2 (correctly executed):
  agent re-walks rules, writes '[]'               # byte-identical to Step 6
Step 7.5 sub-step 3:
  $preHash  = SHA256('[]') = 37517E5F...
  $postHash = SHA256('[]') = 37517E5F...          # ← same!
  → throw "reviewer-findings.json was not rewritten"
  → unhandled exception → Step 9 worktree restore SKIPPED → next attempt corrupted

Two distinct defects compose:

#	Reviewers	Defect
1	Sonnet 4.6, Opus xhigh (direct), GPT-5.5 (downstream)	`Get-Content -Raw` on 0-byte file returns `$null`, not `""` — false drift detection on Blocked-with-no-diff
2	GPT-5.5 (direct), Opus xhigh (direct), Sonnet 4.6 (cascade)	SHA256 hash sentinel throws on legitimate byte-identical refresh (`[]` → `[]`, single-finding → same single-finding). Common, not "extremely rare". The "touch trailing whitespace" escape hatch corrupts the JSON artifact.

This is the exact "always evaluates truthy" failure mode Round 4 was supposed to eliminate — just shifted from array -ne scalar to string -ne null.

Fixes to be applied

Coalesce $null to "" when reading the diff snapshot back. Empirically verified: [string]$null does NOT coerce to "" in pwsh 7.5.4 (it stays null), but (...) ?? '' does work:

$reviewedDiff = if (Test-Path "$OUTPUT_DIR/reviewer-findings.diff") {
    (Get-Content "$OUTPUT_DIR/reviewer-findings.diff" -Raw) ?? ''
} else { '' }

Drop the SHA256 hash sentinel entirely. Step 6 has no equivalent "did you actually walk the rules" programmatic check — it relies on procedural enforcement (the numbered markdown sub-steps and the example-invocation chain). Round 4 fix Third #3 already moved Step 7.5 to the same enforcement model. The hash sentinel rejects the common byte-identical case (e.g., [] → []) and the documented escape hatch ("touch a string body") corrupts the artifact. Replace with a callout explaining the trade-off:

> Why no programmatic "did you actually rewrite the JSON" check? A SHA256 hash
> sentinel rejects the legitimate byte-identical case (e.g., [] → [] after a
> small compile fix that introduces no new violations), and that case is common.
> The procedural enforcement is sub-step 2's explicit numbered list above, plus
> the example-invocation chain that walks the dimensions explicitly.

Empirical validation of fixes (full Step 6 → Step 7.5 round-trip)

Scenario	Result
Empty diff (Blocked, no code changes)	`diffChanged: False` ✅
Non-empty diff, unchanged after Step 7	`diffChanged: False` ✅
Diff actually changed during Step 7	`diffChanged: True` ✅
Clean → clean refresh (`[]` → `[]`)	Validates, no throw ✅
Refresh writes invalid JSON	Throws as expected ✅

Applying fixes now and re-running the same 3-model review.

_{Round 5 reviewers: claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5 — all returned NEEDS_CHANGES with high confidence, converging on the same cascading defect.}

All three Round 5 reviewers (claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5) converged on a cascading bug introduced by Round 4's fixes interacting: 1. Round 4 fix #2 intentionally creates a 0-byte reviewer-findings.diff for the documented Blocked-with-no-diff path. But Get-Content -Raw on a 0-byte file returns $null, not "". So Step 7.5's drift detection becomes '"" -ne $null' → True → false-positive drift on every Blocked attempt. This re-introduces the 'always evaluates truthy' failure mode Round 4 was supposed to eliminate (just shifted from array-vs-scalar to string-vs-null). 2. The new SHA256 hash sentinel throws on legitimate byte-identical refreshes (e.g., '[]' → '[]' after a small compile fix that introduces no new violations, or single-finding → same-single-finding). The case is common, not 'extremely rare' as the error message claimed. Compounds with #1: the false-positive drift forces a re-walk that correctly writes '[]' again, then the hash check throws → unhandled exception → Step 9 worktree restore skipped → next attempt corrupted. Fixes: - Coalesce $null to "" via '?? '''' on the Get-Content -Raw call. Empirically verified: [string]$null does NOT coerce to '' in pwsh 7.5.4 (stays null), but '... ?? '''' does work. - Drop the SHA256 hash sentinel entirely. Step 6 has no equivalent programmatic 'did you walk the rules' check; it relies on procedural enforcement (the numbered markdown sub-steps and the example-invocation chain). Round 4 fix #3 already moved Step 7.5 to the same enforcement model. Replaced the throw with a callout explaining the trade-off. All 5 scenarios validated empirically (pwsh 7.5.4): - Empty diff (Blocked path): diffChanged=False (no false positive) - Non-empty diff unchanged: diffChanged=False - Diff changed during Step 7: diffChanged=True - Clean → clean refresh: validates, no throw - Invalid JSON: throws as expected Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

kubaflo · 2026-05-07T15:56:57Z

Round 6 — Adversarial Multi-Model Review: ✅ LGTM (3/3)

Re-ran the same 3 top-tier models (claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5) against commit 936b8e4750. All three returned LGTM with high confidence. The iterative review→fix loop has converged.

Reviewer verdicts

Reviewer	Verdict	Confidence	Notes
`claude-opus-4.7-xhigh`	LGTM	high	All scenarios (Blocked path, byte-identical refresh, normal-drift refresh) pass empirically. No orphan references to removed hash sentinel.
`claude-sonnet-4.6`	LGTM	high	Both Round 5 findings resolved. `??` operator confirmed safe (pwsh 7.5.4 environment only). Set/Get round-trip symmetric.
`gpt-5.5`	LGTM	high	No regressions. `??` syntax already used in `Review-PR.ps1`, so no new pwsh-version requirement.

Round 5 fixes — all confirmed correct

Get-Content -Raw on 0-byte file (reviewer-findings.diff at SKILL.md:403) — ?? '' coalesce eliminates the false-positive "" -ne $null drift that broke the Blocked-with-no-diff path. Verified end-to-end on pwsh 7.5.4.
SHA256 hash sentinel — Removed cleanly. Byte-identical refreshes ([] → [], single-finding → same single-finding) no longer throw. Step 9 worktree restore is reliably reachable. No orphan preRewriteJsonHash/postRewriteJsonHash/Get-FileHash references in .github/.

What changed across 4 review rounds (Round 2 → Round 6)

Round	Findings	Reviewers in agreement	Fix commit
3	5 (file-count display, stale "Step 7" refs, header label, example order, hardcoded model)	varied (3/3 on example order; 2/3 on others)	`caaf080376`
4	3 (PowerShell array-vs-string, empty-pipe Set-Content, procedure clarity)	3/3 on the array-vs-string bug	`d4acf348a0`
5	1 cascading defect (0-byte → null → false drift → hash-sentinel deadlock)	3/3 same root cause, different facets	`936b8e4750`
6	0	3/3 LGTM	—

Final state

Branch: feature/expert-reviewer-extraction
HEAD: 936b8e4750 (Round 5 fix)
Step 7.5 procedure: Detect drift (with proper null coalesce + string normalization) → re-walk rules in markdown sub-steps → re-snapshot diff and validate JSON. No programmatic "did you walk the rules" sentinel; relies on procedural enforcement, same as Step 6.
Empty-diff Blocked path: Works correctly (no false drift, no hash deadlock).
Common clean→clean refresh: Works correctly (no spurious throws).

PR is ready for human review and merge from this multi-model adversarial review's perspective.

_{Round 6 reviewers: claude-opus-4.7-xhigh, claude-sonnet-4.6, gpt-5.5 — all returned LGTM with high confidence. Loop ran 5 fix iterations across 4 review rounds (Rounds 3, 4, 5, 6).}

PureWeen · 2026-05-07T18:33:11Z

Round 5 Review — LGTM ✅

3-model adversarial review of the latest 7 commits (inline self-check adoption, Step 7.5 refresh, artifact gate, model/display fixes).

The big change landed well. Replacing the @maui-expert-reviewer sub-agent invocation with inline dimension loading (commit 55d7b26f) directly addresses the 18% compliance problem we identified across 44 CI attempts. The new Step 6 self-review is well-structured — clear procedure, proper JSON format, validation + count tracking, and the Step 7.5 drift refresh handles the case where test iterations modify code after the initial self-review.

Consensus findings (minor — not blocking):

Finding	Severity	Notes
Gate deferred-throw is soft	⚠️	`$gateFailureMessage` is never checked programmatically after Step 9. `result.txt` still gets set to `Blocked` which is sufficient, but `analysis.md` won't automatically explain which artifact was missing.
Here-string closing delimiter	⚠️	The multi-finding JSON example's closing `'@` has leading spaces in the raw markdown. PowerShell requires column 0. Agents copying the template literally could get a parse error. Low probability since most agents generate JSON programmatically.
Comment accuracy	💡	Step 7.5 comment says `"" -ne $null` is a false-positive — it isn't (PowerShell coerces `$null` to `""` for string comparison). The `?? ''` coalesce is correct but the stated reason is slightly wrong.

Disputed and discarded:

$missing += $_ inside ForEach-Object — one reviewer flagged as critical scoping bug, but ForEach-Object runs in caller's scope (unlike Invoke-Command). Verified correct.
Step 7.5 complexity — one reviewer flagged as unnecessary, but it handles a real case (code changes during test iterations).

Architecture alignment: Step 6 placement (after implementation, before testing) is a reasonable compromise vs our original Step 3 suggestion. It catches design flaws before expensive build+test cycles while keeping the workflow step count manageable.

Ready to merge. 🚀

Resolve 3 conflict zones from PR #35198 (expert reviewer): - Zone 1: combine --model copilotModel with --secret-env-vars on copilot invocation - Zone 2: use renamed post-gate-comment.ps1 with ScriptsDir path + phase guard - Zone 3: keep 4-task split (main had no split) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

> [!NOTE] > Are you waiting for the changes in this PR to be merged? > It would be very helpful if you could [test the resulting artifacts](https://github.com/dotnet/maui/wiki/Testing-PR-Builds) from this PR and let us know in a comment if this change resolves your issue. Thank you! ## Description Replaces `review-rules.md` (flat 345-line checklist) with a dimensional expert review agent. Single source of truth for all review rules, organized into 30 dimensions for per-dimension sub-agent evaluation. Adds inline file:line PR comments alongside the existing wall-of-text summary. Extracted from 28k review comments across 5 maintainers via [extraction-pipeline](https://github.com/dotnet/fsharp/blob/main/.github/agents/extraction-pipeline.md). No functional code changes. Recreated from dotnet#35062 on a dotnet/maui branch (originally opened from a fork). ## What changed **Before:** `review-rules.md` had 345 lines of flat rules. `code-review` skill loaded them all into one context. Output was a single wall-of-text PR comment. **After:** Rules absorbed into `maui-expert-reviewer.md` as 30 dimensions with 200+ CHECK items. Each dimension runs as an independent sub-agent with focused context. Output is inline file:line PR comments via `inline-findings.json`. ## CI Flow ``` Review-PR.ps1 prompt: 1. code-review → maui-expert-reviewer agent → inline-findings.json 2. pr-review → Pre-Flight → Try-Fix → Report (sees findings, no duplication) Posting: post-inline-review.ps1 → .json → GitHub file:line comments (NEW) post-ai-summary-comment.ps1 → {phase}/content.md → wall-of-text (existing) CI: COMMENTS_VIA_FILE=true → agent writes .json, script posts Local: agent writes .json, code-review posts directly via gh api ``` ## Files | Action | File | What | |--------|------|------| | **Add** | `agents/maui-expert-reviewer.md` | 30 dimensions, 200+ CHECKs, routing table | | **Add** | `instructions/collectionview-{android,ios,windows}` | Platform-isolated CV rules | | **Add** | `instructions/{handler-patterns,layout-system,performance-hotpaths,public-api,threading-async}` | Domain-specific ambient guidance | | **Add** | `scripts/post-inline-review.ps1` | Posts .json as GitHub PR review | | **Del** | `skills/code-review/references/review-rules.md` | Absorbed into agent | | **Mod** | `skills/code-review/SKILL.md` | Delegates to agent | | **Mod** | `scripts/Review-PR.ps1` | Prompt + inline posting wiring | | **Mod** | `eng/pipelines/ci-copilot.yml` | `COMMENTS_VIA_FILE` env var | --------- Co-authored-by: kubaflo <kubaflo@users.noreply.github.com> Co-authored-by: T-Gro <46543583+T-Gro@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Tomas Grosup <tomasgrosup@microsoft.com>

Copilot AI review requested due to automatic review settings April 28, 2026 14:03

kubaflo marked this pull request as draft April 28, 2026 14:04

Copilot started reviewing on behalf of kubaflo April 28, 2026 14:05 View session

Copilot AI reviewed Apr 28, 2026

View reviewed changes

build-analysis Bot mentioned this pull request Apr 28, 2026

Assertion failed: new_time >= loop->time in Component Governance Detection dotnet/dnceng#6510

Open

3 tasks

MauiBot reviewed Apr 28, 2026

View reviewed changes

MauiBot added s/agent-changes-requested AI agent recommends changes - found a better alternative or issues s/agent-fix-win AI found a better alternative fix than the PR s/agent-reviewed PR was reviewed by AI agent workflow (full 4-phase review) labels Apr 28, 2026

dotnet deleted a comment from MauiBot Apr 28, 2026

PureWeen reviewed Apr 28, 2026

View reviewed changes

MauiBot reviewed Apr 28, 2026

View reviewed changes

github-actions Bot mentioned this pull request Apr 29, 2026

[repo-status] Daily Repo Status - April 29, 2026 🌟 #35205

Closed

dotnet deleted a comment from MauiBot Apr 29, 2026

MauiBot reviewed Apr 29, 2026

View reviewed changes

dotnet deleted a comment from github-actions Bot Apr 29, 2026

dotnet deleted a comment from MauiBot Apr 29, 2026

dotnet deleted a comment from Copilot AI Apr 29, 2026

PureWeen reviewed May 5, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 6, 2026

[PR Review Queue] 2026-05-06 #35320

Closed

Copilot AI added 2 commits May 6, 2026 11:26

kubaflo force-pushed the feature/expert-reviewer-extraction branch from 1ab1329 to 55d7b26 Compare May 6, 2026 18:53

github-actions Bot mentioned this pull request May 7, 2026

[PR Review Queue] 2026-05-07 #35336

Closed

Copilot AI added 2 commits May 7, 2026 15:54

PureWeen approved these changes May 7, 2026

View reviewed changes

PureWeen merged commit b71adea into main May 7, 2026
36 of 42 checks passed

PureWeen deleted the feature/expert-reviewer-extraction branch May 7, 2026 18:34

github-actions Bot added this to the .NET 10 SR7 milestone May 7, 2026

This was referenced May 8, 2026

[repo-status] Daily Repo Status - May 8, 2026 🌟 #35355

Closed

[repo-status] Daily Repo Status - May 12, 2026 🌟 #35389

Closed

This was referenced May 13, 2026

[repo-status] .NET MAUI Daily Status - May 13, 2026 🌟 #35411

Closed

[repo-status] .NET MAUI Daily Status - May 14, 2026 🌟 #35440

Closed

[repo-status] .NET MAUI Daily Status - May 15, 2026 🌟 #35454

Open

github-actions Bot locked and limited conversation to collaborators Jun 7, 2026

		@@ -559,7 +556,8 @@ $gateStatusForPrompt = switch ($gateResult) {
		}


		### Wave 0 — Build Briefing Pack

		1. Read PR diff (`gh pr diff`) and list changed files — form your own assessment BEFORE reading PR description (independence-first)

Conversation

kubaflo commented Apr 28, 2026

Description

What changed

CI Flow

Files

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

MauiBot left a comment

Choose a reason for hiding this comment

Expert Review — 4 findings

Uh oh!

PureWeen commented Apr 28, 2026

Great domain knowledge — let's restructure the wiring

Uh oh!

PureWeen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MauiBot left a comment

Choose a reason for hiding this comment

Expert Review — 5 findings

Uh oh!

kubaflo commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed review flow evolution (capturing discussion)

Today

Shane's proposal

Proposed modifications on top of Shane's

Combined shape

Open questions

Uh oh!

MauiBot left a comment

Choose a reason for hiding this comment

Expert Review — 4 findings

Uh oh!

PureWeen left a comment

Choose a reason for hiding this comment

Round 2 Review — 3-model adversarial consensus (Opus 4.6 / Sonnet 4.6 / GPT-5.3)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kubaflo commented May 7, 2026

Round 3 Review — 3-model adversarial consensus (Opus 4.7 xhigh / Sonnet 4.6 / GPT-5.5)

❌ Error — Example invocation still shows the OLD pre-swap order — try-fix/references/example-invocation.md:26 (3/3 reviewers)

⚠️ Warning — Gate error message cites the wrong step number — try-fix/SKILL.md:429 (2/3 reviewers)

⚠️ Warning — Section header mislabels which step the artifact gate enforces — try-fix/SKILL.md:411 (1/3 reviewers, related to #2)

❌ Error — Self-review can become stale after test-loop fixes — try-fix/SKILL.md:362-368 (1/3 reviewers, GPT-5.5)

💡 Suggestion — Hardcoded internal-only model has no fallback — Review-PR.ps1:302 (2/3 reviewers)

✅ Verified Correct (no findings)

Consensus Verdict: NEEDS_CHANGES

Uh oh!

kubaflo commented May 7, 2026

kubaflo commented Apr 29, 2026 •

edited

Loading

❌ Error — Example invocation still shows the OLD pre-swap order — `try-fix/references/example-invocation.md:26` (3/3 reviewers)

⚠️ Warning — Gate error message cites the wrong step number — `try-fix/SKILL.md:429` (2/3 reviewers)

⚠️ Warning — Section header mislabels which step the artifact gate enforces — `try-fix/SKILL.md:411` (1/3 reviewers, related to #2)

❌ Error — Self-review can become stale after test-loop fixes — `try-fix/SKILL.md:362-368` (1/3 reviewers, GPT-5.5)

💡 Suggestion — Hardcoded internal-only model has no fallback — `Review-PR.ps1:302` (2/3 reviewers)

1. `[error]` Step 7.5 drift comparison is broken — refresh fires unconditionally — 3/3 agreement

2. `[error]` `reviewer-findings.diff` gate fails on documented no-diff Blocked path — 1/3 (Opus xhigh)

3. `[warning]` Step 7.5 procedure is hidden in PowerShell comments inside a code block — 2/3 (Opus xhigh, GPT-5.5)

Cascading regression: 0-byte file → `Get-Content -Raw` → `$null` → false drift → hash-sentinel deadlock