Update pr-review skill model lineup by kubaflo · Pull Request #35174 · dotnet/maui

kubaflo · 2026-04-27T21:03:05Z

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Updates the Phase 2 multi-model exploration list in the pr-review skill:

Order	Before	After
1	claude-opus-4.6	claude-opus-4.6 (unchanged)
2	claude-sonnet-4.6	claude-opus-4.7
3	gpt-5.3-codex	gpt-5.3-codex (unchanged)
4	gemini-3-pro-preview	gpt-5.5

Updated in both the model config table and the Phase 2 launch checklist in .github/skills/pr-review/SKILL.md.

Replace claude-sonnet-4.6 with claude-opus-4.7 and gemini-3-pro-preview with gpt-5.5 in the Phase 2 multi-model exploration list. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-04-27T21:03:17Z

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 35174

Or

Run remotely in PowerShell:

iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 35174"

github-actions · 2026-04-27T21:03:38Z

🔍 Skill Validation Results

✅ Static Checks Passed

Skills checked: 15 | Agents checked: 3

Full validator output

Found 1 skill(s)
[pr-review] 📊 pr-review: 3,267 BPE tokens [chars/4: 3,153] (standard ~), 22 sections, 7 code blocks
[pr-review]    ⚠  Skill is 3,267 BPE tokens (chars/4 estimate: 3,153) — approaching "comprehensive" range where gains diminish.
✅ All checks passed (1 skill(s))
Found 3 agent(s)
Validated 3 agent(s)

✅ All checks passed (3 agent(s))

⏭️ LLM Evaluation: Skipped

No changed skills with eval tests found.

🔍 Full results and investigation steps

Copilot

Pull request overview

Updates the .github/skills/pr-review skill instructions to reflect a new 4-model lineup used during Phase 2 (Try-Fix) multi-model exploration.

Changes:

Replace Phase 2 model #2 from claude-sonnet-4.6 to claude-opus-4.7.
Replace Phase 2 model #4 from gemini-3-pro-preview to gpt-5.5.
Update both the model configuration table and the Phase 2 launch checklist to stay consistent.

kubaflo · 2026-04-28T18:47:21Z

/review

github-actions · 2026-04-28T18:48:01Z

✅ Expert Code Review completed successfully!

github-actions

Expert Code Review — PR #35174

Methodology: 3 independent reviewers with adversarial consensus

Findings

#	Severity	Consensus	File	Lines	Finding
1	🔴 CRITICAL	3/3 reviewers	`.github/skills/pr-review/SKILL.md`	53, 113	`claude-opus-4.7` is not in the platform's available model catalog. Attempt 2 of Phase 2 will fail at runtime.
2	🔴 CRITICAL	3/3 reviewers	`.github/skills/pr-review/SKILL.md`	55, 117	`gpt-5.5` is not in the platform's available model catalog. Attempt 4 of Phase 2 will fail at runtime.
3	🟢 MINOR	3/3 reviewers	Other workflow files	—	`claude-sonnet-4.6` still referenced in `.github/workflows/shared/review-shared.md` and `.github/workflows/copilot-evaluate-tests.md`, but these are separate workflows using it for their own purposes — not stale references from this PR. `gemini-3-pro-preview` is fully removed. No action needed.

Details

Findings 1 & 2 — Unavailable model identifiers

The platform's task tool currently exposes these models: claude-opus-4.6, claude-opus-4.6-1m, claude-opus-4.5, claude-sonnet-4.6, claude-sonnet-4.5, claude-sonnet-4, claude-haiku-4.5, gpt-5.4, gpt-5.3-codex, gpt-5.2-codex, gpt-5.2, gpt-5.4-mini, gpt-5-mini, gpt-4.1.

Neither claude-opus-4.7 nor gpt-5.5 appears in this list. If these model IDs are not resolvable at runtime, 2 of 4 Phase 2 try-fix attempts will fail on every PR review, silently reducing fix exploration diversity by 50%.

If these models are expected to become available soon, consider gating the merge on their deployment. Otherwise, substitute with confirmed models.

Finding 3 — Other claude-sonnet-4.6 references

All 3 reviewers confirmed these are intentionally separate usages (agent models for different workflows), not stale references that should have been updated by this PR.

Internal Consistency ✅

The model config table (lines 50–55) and the Phase 2 launch checklist (lines 110–118) are consistent with each other after this change. No within-file discrepancies.

CI / Test Coverage

This PR modifies only a skill markdown file (no functional code). No CI tests are applicable or expected.

Generated by Expert Code Review for issue #35174 · ● 5.1M

github-actions · 2026-04-28T18:54:03Z

 |-------|-------|
 | 1 | `claude-opus-4.6` |
-| 2 | `claude-sonnet-4.6` |
+| 2 | `claude-opus-4.7` |


🔴 CRITICAL — Model not in available catalog (3/3 reviewers)

claude-opus-4.7 does not appear in the platform's current task-tool model catalog. The documented available models include claude-opus-4.6, claude-opus-4.6-1m, and claude-opus-4.5 — but not claude-opus-4.7.

If this model ID is not resolvable at runtime, Attempt 2 of every Phase 2 try-fix exploration will fail or be skipped, reducing fix diversity from 4 models to 3.

Recommendation: Confirm claude-opus-4.7 is a valid, deployed model identifier before merging. If not yet available, consider keeping claude-sonnet-4.6 or substituting with a confirmed model (e.g., claude-opus-4.6-1m or claude-opus-4.5).

github-actions · 2026-04-28T18:54:03Z

+| 2 | `claude-opus-4.7` |
 | 3 | `gpt-5.3-codex` |
-| 4 | `gemini-3-pro-preview` |
+| 4 | `gpt-5.5` |


🔴 CRITICAL — Model not in available catalog (3/3 reviewers)

gpt-5.5 does not appear in the platform's current task-tool model catalog. The documented available models include gpt-5.4, gpt-5.3-codex, gpt-5.2-codex, gpt-5.2, gpt-5.4-mini, gpt-5-mini, and gpt-4.1 — but not gpt-5.5.

If this model ID is not resolvable at runtime, Attempt 4 of every Phase 2 try-fix exploration will fail or be skipped, and the cross-pollination round will only cover 3 of 4 models.

Recommendation: Confirm gpt-5.5 is a valid, deployed model identifier before merging. If not yet available, consider keeping gemini-3-pro-preview or substituting with a confirmed model (e.g., gpt-5.4).

PureWeen

Expert Code Review — PR #35174

Methodology: 3 independent reviewers with adversarial consensus.

Verdict: Safe to merge (1 advisory note) ✅

This PR updates a 4-line documentation table in .github/skills/pr-review/SKILL.md. The change is correct, internally consistent, and CI is green.

Findings

No bugs. All 3 reviewers (3/3) confirmed:

✅ Both new model IDs (claude-opus-4.7, gpt-5.5) are present in the current task-tool model catalog.
✅ The model table (lines 50–55) and the Phase 2 launch checklist (lines 111–117) are in sync after this change.
✅ No stale references to gemini-3-pro-preview remain in the file. The remaining claude-sonnet-4.6 mentions in .github/workflows/shared/review-shared.md and .github/workflows/copilot-evaluate-tests.md/.lock.yml belong to separate workflows (adversarial reviewer, test evaluator) and are correctly out of scope for this PR.

📝 An earlier automated review on this PR flagged the new model IDs as unavailable. That assessment was based on a stale catalog snapshot and is no longer accurate — both IDs are in the current catalog.

Advisory

⚠️ Config Impact — Reduced model-family diversity in the Phase 2 lineup (3/3 reviewers)

	Before	After
Position 1	`claude-opus-4.6` (Anthropic Opus)	`claude-opus-4.6` (Anthropic Opus)
Position 2	`claude-sonnet-4.6` (Anthropic Sonnet)	`claude-opus-4.7` (Anthropic Opus)
Position 3	`gpt-5.3-codex` (OpenAI)	`gpt-5.3-codex` (OpenAI)
Position 4	`gemini-3-pro-preview` (Google)	`gpt-5.5` (OpenAI)
Vendors	3 (Anthropic / OpenAI / Google)	2 (Anthropic / OpenAI)
Anthropic capability tiers	2 (Opus + Sonnet)	1 (Opus only)

The Phase 2 multi-model exploration is designed to surface divergent fix ideas. Two successive Opus versions tend to share reasoning posture and may converge on the same approach more often than Opus + Sonnet (different capability tier) or Opus + Gemini (different vendor) would.

The Gemini removal is forced — there are no Gemini IDs in the current catalog. The position-2 swap (Sonnet → Opus 4.7) is a deliberate choice that prioritizes "newer/stronger" over "more diverse."

Not blocking. If preserving adversarial diversity matters, one option is to keep claude-sonnet-4.6 at position 2 and place claude-opus-4.7 elsewhere (e.g., as part of cross-pollination). Otherwise a one-line note in the SKILL.md table about the explicit "strength-over-diversity" trade-off would help future maintainers understand the choice.

CI / Test Coverage

All required checks pass. maui-pr and Build Analysis are skipped — normal for a docs-only diff.
No automated tests apply to skill prose.

Prior Reviews

One existing automated Expert Code Review comment based on an older catalog snapshot; superseded by this review.
No unresolved human review threads.

Reviewed with multi-model adversarial consensus — Reviewers 1, 2, and 3 each evaluated the diff independently.

3 parallel reviewers (different models) ran against this PR; consensus findings: A) [3/3 ❌] pr-review/SKILL.md had stale model IDs (claude-sonnet-4.6 + gemini-3-pro-preview). Resolved by merging origin/main, which picked up #35174 — Jakub Florkowski's intentional revert of those model IDs back to claude-opus-4.7 + gpt-5.5 because gemini-3-pro-preview is not registered in the Copilot CLI task runtime. The pr-review/SKILL.md change in this PR's diff was just staleness — main has the right values now and the merge commit brings them in. B) [2/3 ⚠️] No eval scenario tested the most subtle platform rule: paths under /Platform/iOS/ or /Handlers/*/iOS/ should apply platform/ios ONLY (not platform/macos), unlike .ios.cs file extension which applies BOTH. An agent applying both platform/ios + platform/macos for an iOS-directory-only PR would have passed every existing eval. Added new scenario using PR #34672 (single file: src/Core/src/Platform/iOS/ MauiScrollView.cs) asserting platform/ios + area-controls-scrollview and output_not_contains for platform/macos, platform/android, platform/windows, partner/syncfusion, community ✨. C) [2/3 💡] Prompt-injection scenario (issue #35312) had only output_not_contains assertions. An agent that completely noops or returns empty output would pass. Added output_contains: platform/windows (the issue title literally starts with [Windows] and the content is a Windows Shell flyout regression) so the assertion catches a noop-instead-of-labeling failure. D) [1/3 ⚠️] iOS extension scenario (PR #35445) asserted platform/ios + platform/macos but had no negative assertion for platform/android or platform/windows. Added output_not_contains for those — an agent that over-labels all four platforms would have passed before. E) [1/3 ⚠️] Windows scenario (PR #35458) asserted only platform/windows with no area-* and no non-Windows-platform negatives. Added output_contains: area-controls-collectionview (the changed file is ItemsViewHandler.Windows.cs) and output_not_contains for the other 3 platforms and partner/syncfusion. Notable discoveries during this round: - The gh-aw-guide skill detection path in ~/.agents/skills/generic-adversarial-pr-reviewer/SKILL.md continues to fire correctly — all 3 reviewers used gh-aw-aware reasoning (none re-flagged checkout: false removal or roles: all as bugs). - Eval scenario count increased from 20 → 21 (new iOS-dir-only scenario). - lock.yml unchanged (no workflow.md frontmatter changes); only eval.yaml modified in this commit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

> [!NOTE] > Are you waiting for the changes in this PR to be merged? > It would be very helpful if you could [test the resulting artifacts](https://github.com/dotnet/maui/wiki/Testing-PR-Builds) from this PR and let us know in a comment if this change resolves your issue. Thank you! Updates the Phase 2 multi-model exploration list in the `pr-review` skill: | Order | Before | After | |-------|--------|-------| | 1 | claude-opus-4.6 | claude-opus-4.6 (unchanged) | | 2 | **claude-sonnet-4.6** | **claude-opus-4.7** | | 3 | gpt-5.3-codex | gpt-5.3-codex (unchanged) | | 4 | **gemini-3-pro-preview** | **gpt-5.5** | Updated in both the model config table and the Phase 2 launch checklist in `.github/skills/pr-review/SKILL.md`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Update pr-review skill model lineup

e66a27b

Replace claude-sonnet-4.6 with claude-opus-4.7 and gemini-3-pro-preview with gpt-5.5 in the Phase 2 multi-model exploration list. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 27, 2026 21:03

Copilot started reviewing on behalf of kubaflo April 27, 2026 21:04 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

This was referenced Apr 28, 2026

[repo-status] 🌟 Daily Repo Status - April 28, 2026 #35178

Closed

[PR Review Queue] 2026-04-28 #35182

Closed

github-actions Bot reviewed Apr 28, 2026

View reviewed changes

This was referenced Apr 29, 2026

[PR Review Queue] 2026-04-29 #35207

Closed

[PR Review Queue] 2026-04-30 #35246

Closed

[PR Review Queue] 2026-05-01 #35274

Closed

PureWeen reviewed May 17, 2026

View reviewed changes

This was referenced May 18, 2026

[repo-status] Daily Repo Status — May 18, 2026 #35486

Closed

[PR Review Queue] 2026-05-18 #35493

Closed

PureWeen approved these changes May 18, 2026

View reviewed changes

PureWeen merged commit 4873fb9 into main May 18, 2026
17 of 18 checks passed

PureWeen deleted the copilot/pr-review-model-update branch May 18, 2026 13:31

github-actions Bot added this to the .NET 10.0 SR8 milestone May 18, 2026

This was referenced May 19, 2026

[repo-status] Daily Repo Status — May 19, 2026 PureWeen/maui#40

Closed

[repo-status] Daily Repo Status — May 24, 2026 PureWeen/maui#70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update pr-review skill model lineup#35174

Update pr-review skill model lineup#35174
PureWeen merged 1 commit into
mainfrom
copilot/pr-review-model-update

kubaflo commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

kubaflo commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot Apr 28, 2026

Uh oh!

github-actions Bot Apr 28, 2026

Uh oh!

PureWeen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kubaflo commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

🔍 Skill Validation Results

✅ Static Checks Passed

⏭️ LLM Evaluation: Skipped

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

kubaflo commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Expert Code Review — PR #35174

Findings

Details

Internal Consistency ✅

CI / Test Coverage

Uh oh!

github-actions Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

PureWeen left a comment

Choose a reason for hiding this comment

Expert Code Review — PR #35174

Verdict: Safe to merge (1 advisory note) ✅

Findings

Advisory

CI / Test Coverage

Prior Reviews

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Apr 28, 2026 •

edited

Loading