[CI] Extend gate to all test types and decouple from PR review by kubaflo · Pull Request #34705 · dotnet/maui

kubaflo · 2026-03-27T13:29:32Z

Summary

Extends the CI PR review pipeline to support all test types (UI tests, device tests, unit tests, XAML tests) and restructures the review flow by decoupling the gate from the copilot agent.

Before

Gate only supported UI tests (TestCases.HostApp / TestCases.Shared.Tests)
PRs with device tests, unit tests, or XAML tests were skipped by the gate
Gate ran as Phase 2 inside the copilot agent (4-phase: Pre-Flight → Gate → Try-Fix → Report)
Gate results were duplicated across all phase outputs
AI summary comment included session history merging (841 lines of code)

After

Gate supports all test types with auto-detection
Gate runs as a standalone script step before the copilot agent
Gate posts its own separate PR comment ()
AI summary is simplified (170 lines, always overwrites, no session history)
PR review is now 3 phases: Pre-Flight → Try-Fix → Report

New Scripts

Script	Purpose
`Detect-TestsInDiff.ps1`	Analyzes PR files, classifies tests by type (UITest, DeviceTest, UnitTest, XamlUnitTest), extracts method names from diffs
`post-gate-comment.ps1`	Posts/updates gate result as separate PR comment
`RunTests.ps1`	Unified test runner entry point for all test types

Test Detection

pwsh .github/scripts/shared/Detect-TestsInDiff.ps1 -PRNumber 25129

📱 [DeviceTest] EditorTests (PlaceholderHorizontalTextAlignment)
   Filter:  Category=Editor
🖥️ [UITest] Issue10987
   Filter:  Issue10987

New Review Flow

Step 0: Branch setup
Step 1: Gate (verify-tests-fail.ps1 — direct script, no copilot agent)
         → Posts <!-- AI Gate --> comment immediately
Step 2: PR Review (copilot agent — 3 phases: Pre-Flight, Try-Fix, Report)
         → Gate result passed in prompt
Step 3: Post AI Summary (<!-- AI Summary --> comment)
Step 4: Apply labels

PR Comments (Two Separate Comments)

Gate comment ():

## 🚦 Gate — Test Verification
► Expand Full Gate — abc1234 · Fix editor alignment

### Gate Result: ✅ PASSED
| Step | Expected | Actual | Result |
| Without fix | FAIL | FAIL | ✅ |
| With fix | PASS | PASS | ✅ |

AI Summary comment ():
Pre-Flight, Fix, Report sections only — no gate duplication.

Key Changes

verify-tests-fail.ps1: Auto-detects test type, routes to correct runner (BuildAndRunHostApp, Run-DeviceTests, dotnet test), iterates over all detected tests, -Platform mandatory
Detect-TestsInDiff.ps1: Shared detection engine — reads [Category] attributes for device test filtering, extracts method names from PR diffs
Review-PR.ps1: Gate as Step 1 (script), PR review as Step 2 (copilot), removed PR finalize step
post-ai-summary-comment.ps1: Rewritten from 841 → 170 lines, always overwrites
pr-gate.md: Strict output template, no cross-phase duplication rule
pr-review/SKILL.md: 3 phases (removed Gate), no-duplication rule
EstablishBrokenBaseline.ps1: Excludes TestUtils/DeviceTests.Runners from fix file detection

Verified

Gate passed locally on Share device tests: without fix=FAIL ✅, with fix=PASS ✅
Detection tested on PRs: Fixed Editor HorizontalTextAlignment does not update at run time #25129, [Testing] Refactoring Feature Matrix UITest Cases for Editor Control #34615, [iOS, MacCatalyst] Fix CollectionView grid spacing updates for first row and column #34598, [Net10] OnSizeAllocated in Shell not triggered - fix #31056
Comments posted to 8 PRs from CI build artifacts

github-actions · 2026-03-27T13:29:44Z

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 34705

Or

Run remotely in PowerShell:

iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 34705"

Copilot

Pull request overview

Extends the CI “gate” and PR review automation to detect and run all MAUI test types (UI/device/unit/XAML), and restructures the review flow so gate runs as a standalone step with its own PR comment, while the Copilot-driven review focuses on Pre-Flight/Try-Fix/Report.

Changes:

Update verify-tests-fail.ps1 to auto-detect test type(s) and dispatch to the right runner, running all detected tests.
Add shared test detection (Detect-TestsInDiff.ps1) and a dedicated gate comment poster (post-gate-comment.ps1); simplify AI summary posting.
Update review orchestration/docs to remove gate from the pr-review skill and run it from Review-PR.ps1 instead.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
.github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1	Adds multi-test-type detection/routing and multi-test execution for gate verification.
.github/skills/verify-tests-fail-without-fix/SKILL.md	Updates skill documentation for broader test-type support (but currently out of sync with script behavior/outputs).
.github/skills/try-fix/references/example-invocation.md	Documents device/unit test command examples.
.github/skills/try-fix/SKILL.md	Updates try-fix guidance to select the correct runner per test type.
.github/skills/pr-review/SKILL.md	Changes orchestrator to 3 phases (Pre-Flight/Try-Fix/Report) and states gate is pre-run.
.github/scripts/shared/Detect-TestsInDiff.ps1	New shared test detection/classification script used by gate and tooling.
.github/scripts/post-gate-comment.ps1	New script to post/update a dedicated `<!-- AI Gate -->` PR comment.
.github/scripts/post-ai-summary-comment.ps1	Simplifies AI summary comment generation (no session history; gate posted separately).
.github/scripts/RunTests.ps1	New unified local entry point to run Unit/Device/UI/Integration tests.
.github/scripts/Review-PR.ps1	Reorders flow to run gate first via script, then invoke `pr-review`, then post comments/labels.
.github/scripts/EstablishBrokenBaseline.ps1	Expands “test path” patterns to exclude more test utility/runner paths from fix detection.
.github/pr-review/pr-report.md	Updates report phase prerequisites now that gate is external to `pr-review`.
.github/pr-review/pr-gate.md	Updates gate doc for detection template/output rules (but still references task-agent flow).
.github/copilot-instructions.md	Updates repository Copilot instructions to reflect 3-phase PR review and multi-type gate.

Copilot · 2026-03-27T13:39:18Z

+        $TestLog = Join-Path $OutputPath "test-failure-$($testEntry.TestName).log"
+


Log file names are derived directly from TestName (e.g., test-failure-<TestName>.log). TestName can include spaces/parentheses/commas (device tests append method names), which can create awkward or invalid paths on some filesystems and can hit path-length limits. Consider sanitizing TestName for file names (or use an index-based file name) and store the display name inside the log content instead.

Suggested change

$TestLog = Join-Path $OutputPath "test-failure-$($testEntry.TestName).log"

# Sanitize TestName for use in a file name and keep it reasonably short

$rawTestName = [string]$testEntry.TestName

$invalidFileNameChars = [IO.Path]::GetInvalidFileNameChars()

$extraProblematicChars = [char[]]' ()[],'

$charsToReplace = $invalidFileNameChars + $extraProblematicChars

$sanitizedTestName = ($rawTestName.ToCharArray() | ForEach-Object {

if ($charsToReplace -contains $_) { '_' } else { $_ }

}) -join ''

if ([string]::IsNullOrWhiteSpace($sanitizedTestName)) {

$sanitizedTestName = "test-$testIndex"

}

$maxNameLength = 60

if ($sanitizedTestName.Length -gt $maxNameLength) {

$sanitizedTestName = $sanitizedTestName.Substring(0, $maxNameLength)

}

$TestLog = Join-Path $OutputPath ("test-failure-{0}.log" -f $sanitizedTestName)

Copilot · 2026-03-27T13:39:19Z

+3. Auto-detects test classes from changed test files
+4. Routes to the appropriate test runner
+5. Runs tests (should FAIL to prove they catch the bug)
+6. **Updates PR labels** based on result
+7. Reports result


This doc claims the script "Updates PR labels" as part of the workflow, but the updated verify-tests-fail.ps1 no longer contains label-management logic. Please remove or update these steps so they reflect the current behavior (labeling appears to happen later in Review-PR.ps1).

Copilot · 2026-03-27T13:39:19Z

+        $patch = $null
+        if ($PRNumber) {
+            # Get per-file patch from GitHub API
+            $patch = gh api "repos/dotnet/maui/pulls/$PRNumber/files" --jq ".[] | select(.filename == `"$file`") | .patch" 2>$null
+        } else {


For each device-test file, Step 4 calls gh api repos/dotnet/maui/pulls/$PRNumber/files and filters it with jq. This creates an N+1 GitHub API pattern that can hit rate limits and slow down gate on large PRs. Consider fetching the PR files/patches once (single API call) and caching them in a lookup keyed by filename.

Copilot · 2026-03-27T13:39:19Z

+            # Try from git diff
+            $patch = git diff $mergeBase HEAD -- $file 2>$null


When running without -PRNumber, Step 4 uses git diff $mergeBase HEAD -- $file, but $mergeBase is only set in the earlier auto-diff path. If the caller provides -ChangedFiles directly, $mergeBase may be $null, producing an unintended diff (or no patch). Consider ensuring $mergeBase is always initialized for non-PR runs, or diffing against HEAD explicitly.

Suggested change

# Try from git diff

$patch = git diff $mergeBase HEAD -- $file 2>$null

# Try from git diff. If $mergeBase is not set (for example when -ChangedFiles

# was provided directly), fall back to diffing against the previous commit.

if ($mergeBase) {

$patch = git diff $mergeBase HEAD -- $file 2>$null

} else {

$patch = git diff HEAD~1 HEAD -- $file 2>$null

}

Copilot · 2026-03-27T13:39:20Z

+            -DetectedProjectPath $testEntry.ProjectPath `
+            -LogFile $TestLog
+
+        $testResult = Get-TestResultFromOutput -LogFile $testOutputLog


Get-TestResultFromOutput accepts a -TestFilter parameter (and contains filter-aware parsing logic), but this call doesn't pass the current test's filter. Either pass $testEntry.Filter through or remove the unused parameter/logic; otherwise the filter-aware parsing path can never be exercised.

Suggested change

$testResult = Get-TestResultFromOutput -LogFile $testOutputLog

$testResult = Get-TestResultFromOutput -LogFile $testOutputLog -TestFilter $testEntry.Filter

Copilot · 2026-03-27T13:39:20Z

 .PARAMETER Platform
    Target platform: "android", "ios", "catalyst" (MacCatalyst), or "windows"
+    Required for UITest and DeviceTest types. Optional for UnitTest and XamlUnitTest.
+


The comment header says -Platform is optional for UnitTest/XamlUnitTest, but $Platform is declared as a mandatory parameter, and the examples suggest running without it. Either make -Platform truly optional (and only enforce it for UI/Device tests), or update the docs/examples to reflect that -Platform is always required.

Copilot · 2026-03-27T13:39:20Z

 | `test-without-fix.log` | Full test output from run without fix |
 | `test-with-fix.log` | Full test output from run with fix |

-**Plus UI test logs in** `CustomAgentLogsTmp/UITests/`:
- `android-device.log` or `ios-device.log` - Device logs
- `test-output.log` - NUnit test output
+**Plus test logs in** `CustomAgentLogsTmp/`:
+- `UITests/` - UI test device logs and output


In the Output Files section, the directory/path described earlier (CustomAgentLogsTmp/PRState/<PRNumber>/verify-tests-fail/) doesn't match the current verify-tests-fail.ps1 output location (now under .../PRAgent/gate/verify-tests-fail). Please update the paths and example structure here so consumers can find verification-report.md and the per-test logs reliably.

Copilot · 2026-03-27T13:39:20Z

+2. **Select platform** — must be affected by bug AND available on host (see table above).

 3. **Run verification via task agent** (MUST use task agent — never inline):
   ```


Step 3 still instructs running gate via the task agent, but the new workflow in Review-PR.ps1 runs gate directly via verify-tests-fail.ps1 before invoking the Copilot pr-review skill. Please update this step to reflect the new script-driven gate (or clarify that this doc is only for manual, agent-driven gate runs).

Copilot · 2026-03-27T13:39:21Z

+# ============================================================================
+
+# Get latest commit info
+$commitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | ConvertFrom-Json


gh api ... | ConvertFrom-Json will throw if gh fails (e.g., auth missing / rate limit) because stderr is suppressed and stdout may be empty. Since $ErrorActionPreference = 'Stop', this can break the whole posting step. Wrap this in try/catch and fall back to Unknown commit info when the API call fails or returns empty.

Suggested change

$commitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | ConvertFrom-Json

$commitJson = $null

try {

$rawCommitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null

if (-not [string]::IsNullOrWhiteSpace($rawCommitJson)) {

$commitJson = $rawCommitJson | ConvertFrom-Json

}

}

catch {

Write-Host "⚠️ Failed to fetch or parse commit info for PR #$PRNumber: $($_.Exception.Message)" -ForegroundColor Yellow

$commitJson = $null

}

Copilot · 2026-03-27T13:39:21Z

+$commitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | ConvertFrom-Json
+$commitTitle = if ($commitJson) { ($commitJson.message -split "`n")[0] } else { "Unknown" }
+$commitSha = if ($commitJson) { $commitJson.sha.Substring(0, 7) } else { "unknown" }
+$commitUrl = if ($commitJson) { "https://github.com/dotnet/maui/commit/$($commitJson.sha)" } else { "#" }



Similar to post-gate-comment.ps1, gh api ... | ConvertFrom-Json will throw when gh fails or returns empty output (stderr is suppressed). With $ErrorActionPreference = 'Stop', that prevents the summary comment from being posted. Add try/catch and default commit title/SHA/URL when the API call is unavailable.

Suggested change

$commitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | ConvertFrom-Json

$commitTitle = if ($commitJson) { ($commitJson.message -split "`n")[0] } else { "Unknown" }

$commitSha = if ($commitJson) { $commitJson.sha.Substring(0, 7) } else { "unknown" }

$commitUrl = if ($commitJson) { "https://github.com/dotnet/maui/commit/$($commitJson.sha)" } else { "#" }

$commitJson = $null

$commitTitle = "Unknown"

$commitSha = "unknown"

$commitUrl = "#"

try {

$commitRaw = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null

if ($commitRaw) {

$commitJson = $commitRaw | ConvertFrom-Json

}

} catch {

Write-Warning "Failed to fetch latest commit info for PR #$PRNumber: $($_.Exception.Message)"

}

if ($commitJson) {

$commitTitle = ($commitJson.message -split "`n")[0]

$commitSha = $commitJson.sha.Substring(0, 7)

$commitUrl = "https://github.com/dotnet/maui/commit/$($commitJson.sha)"

}

kubaflo · 2026-03-27T13:44:26Z

🔬 Multi-Model Code Review — PR #34705

Cross-pollinated review from GPT-5.4 (Gemini persona), GPT-5.2-Codex, and Claude Opus 4.6, each reviewing independently then synthesized.

✅ Consensus: Architecture is Sound

All three models agree the decoupling direction is correct:

Moving gate from Phase 2 (inside copilot agent) → standalone script Step 1 is deterministic, faster, and cleaner
Two-comment approach ( / ) separates concerns well
post-ai-summary-comment.ps1 rewrite (841→170 lines) is a major simplification

🔴 Critical — Unanimous (3/3 models flagged)

1. Empty test array → false-positive gate PASS

All three models independently identified this as the #1 issue:

$failedWithoutFix = ($withoutFixResults | Where-Object { $_.Passed }).Count -eq 0
$passedWithFix = ($withFixResults | Where-Object { -not $_.Passed }).Count -eq 0

When $withoutFixResults is empty (zero tests ran), both evaluate to $true → gate reports PASSED with nothing tested. This is the most dangerous failure mode.

Fix (one-liner):

if ($AllDetectedTests.Count -eq 0) {
    Write-Error "No tests detected — gate cannot verify"; exit 1
}
# + similar guard after each test run loop for empty results

2. No automated tests for ~1,200 lines of new script logic (Opus + Gemini)

Detect-TestsInDiff.ps1 (424 lines), RunTests.ps1 (625 lines), post-gate-comment.ps1 (134 lines) — all pure logic, highly testable with Pester, but zero tests. Get-TestResultFromOutput does regex parsing of varied output formats — the most fragile code in this PR.

🟡 Medium — Strong Agreement (2-3/3 models)

3. Device test filter falls back to bare class name (All 3)

When [Category] extraction fails, Detect-TestsInDiff.ps1 falls back to the class name (e.g., EditorTests). But Run-DeviceTests.ps1 expects Category=X format. A bare class name either runs all tests or fails silently. Additionally (Gemini), the category regex \[Category$TestCategory\.(\w+)$\] misses string categories ([Category("Battery")]) and multi-category attributes.

4. Gate can't produce documented SKIPPED state (Gemini + Codex)

Review-PR.ps1 unconditionally runs verify-tests-fail.ps1. If no tests detected, the script exits with error → gate becomes FAILED instead of the documented ⚠️ SKIPPED. Testless PRs shouldn't be failures.

5. Review-PR.ps1 doesn't pass -RequireFullVerification (Gemini)

pr-gate.md says invoke with RequireFullVerification: true, but Review-PR.ps1 omits it. The gate can silently fall back to failure-only mode, skipping "tests pass WITH fix" verification — half the gate contract.

6. Synthesized test entries have inconsistent key shapes (Opus + Codex)

Detection-returned entries have Runner, NeedsPlatform, Files, Methods. Explicitly-provided entries (verify-failure-only path) omit Runner and NeedsPlatform. Future code accessing $t.Runner on these entries will get $null.

Recommendation: Create a New-TestEntry helper that always produces a canonical hashtable shape.

7. Comment marker mismatch (Opus)

PR description says  but code uses . Other tooling searching for the documented marker won't find it.

8. shared-utils.ps1 import may not exist (Opus)

RunTests.ps1 does . "$PSScriptRoot/shared/shared-utils.ps1" — this file isn't in the PR diff. If it doesn't exist on the target branch, every unit test invocation crashes.

9. Label parsing expects old format (Gemini)

Update-AgentLabels.ps1 still looks for Result: lines, but new gate format uses ### Gate Result:. Labels will stop updating.

10. Documentation inconsistencies (Opus + Gemini)

SKILL.md says -Platform is always required, but it's only needed for UITest/DeviceTest
pr-gate.md still says "use task agent" — stale for the new standalone-script flow

🟢 Minor — Individual Model Insights

Finding	Source
GitHub API pagination missing — large PRs skip patches past page 1	Codex + Gemini
`^\s+Failed:` regex never matches in multiline string (dead code path)	Opus
HostApp-only UI test changes (no `Shared.Tests` file) dropped as "no tests"	Gemini
Unit test project map incomplete — misses `Core.Design.UnitTests`, `DualScreen.UnitTests`	Gemini
`post-gate-comment.ps1` create-comment path lacks `try/catch` (update path has it)	Gemini + Opus
`GetTempFileName()` uses system temp instead of project-relative path	Opus
`Get-TestResultFromLog` appears dead after rewrite	Gemini
No `-SkipGate` rollback flag in `Review-PR.ps1`	Opus

📊 Summary Matrix

Finding	GPT-5.4	Codex	Opus	Severity
Empty array → false pass	✅	✅	✅	🔴
No script tests	✅	—	✅	🔴
Device filter fallback	✅	✅	✅	🟡
No SKIPPED state	✅	✅	—	🟡
Missing `-RequireFullVerification`	✅	—	—	🟡
Inconsistent entry shapes	—	✅	✅	🟡
Comment marker mismatch	—	—	✅	🟡
`shared-utils.ps1` missing	—	—	✅	🟡
Label parsing old format	✅	—	—	🟡
Stale docs	✅	—	✅	🟡

🎯 Top 3 Recommended Actions

Guard empty test arrays — Add explicit Count -eq 0 checks before and after verification loops. One-liner fix, prevents the most dangerous failure mode.
Fix device test filter contract — Either ensure Category=X is always produced (fix regex to handle string categories + multi-category), or teach Get-TestResultFromOutput to handle bare class names.
Normalize test entry contract — New-TestEntry helper function, always populates all keys. Eliminates the two-shape problem.

🤖 Generated via multi-model cross-pollination: GPT-5.4 · GPT-5.2-Codex · Claude Opus 4.6

kubaflo · 2026-03-27T13:52:03Z

All 10 review comments addressed in commit 0c33bfe:

✅ Sanitize TestName for log file names (replace invalid chars, truncate to 60)
✅ Remove stale 'Updates PR labels' from SKILL.md
✅ Cache PR files API call (single fetch, keyed lookup)
✅ Fix $mergeBase null fallback (default to HEAD~1)
✅ Pass TestFilter to Get-TestResultFromOutput in all loops
✅ Fix script header: Platform is mandatory for all test types
✅ Fix output file paths in SKILL.md (PRAgent/gate/verify-tests-fail/)
✅ Update pr-gate.md: gate runs as direct script, task agent optional
✅ Add try/catch for gh api in post-gate-comment.ps1
✅ Add try/catch for gh api in post-ai-summary-comment.ps1

kubaflo · 2026-03-27T14:44:38Z

🔬 Multi-Model Re-Review (v2) — PR #34705

Cross-pollinated re-review from GPT-5.4, GPT-5.2-Codex, and Claude Opus 4.6 after fix commit 0c33bfe (10 items addressed).

✅ Fixes Confirmed Working (All 3 Models Agree)

Fix	Status	Evidence
#5 Pass TestFilter to `Get-TestResultFromOutput`	✅ Fixed	`verify-tests-fail.ps1` lines 1184, 1252
#8 `pr-gate.md` → gate runs as script	✅ Fixed	Doc correctly describes standalone flow
#9/#10 `try/catch` for `gh api` in posting scripts	✅ Fixed	Both scripts have guarded commit info fetch
#3 Cache PR files API (avoid N+1)	✅ Fixed	`Detect-TestsInDiff.ps1` caches API response
Comment marker mismatch (round-1 #7)	✅ Fixed	`<!-- AI Gate -->` consistent in code + docs
`shared-utils.ps1` missing (round-1 #8)	✅ Resolved	File exists at `.github/scripts/shared/shared-utils.ps1` (was false alarm)

📊 Round-1 Issue Tracking — Updated Status

#	Issue	GPT-5.4	Codex	Opus	Consensus
1	Empty array → false pass	✅ Fixed	✅ Fixed	⚠️ Mitigated (🟡)	⚠️ Mitigated
2	No automated tests	❌ Open	❌ Open	❌ Open	❌ Open (🟡)
3	Device filter bare class name	❌ Open	⚠️ Partial	⚠️ Partial	⚠️ Partial (🟡)
4	SKIPPED state unreachable	❌ Open	❌ Open	❌ Open	❌ Open (🟡)
5	Missing `-RequireFullVerification`	❌ Open	❌ Open	❌ Open	❌ Open (🟡)
6	Inconsistent entry key shapes	⚠️ Partial	⚠️ Partial	⚠️ Partial	⚠️ Partial (🟢)
7	Comment marker mismatch	✅ Fixed	✅ Fixed	✅ Fixed	✅ Fixed
8	`shared-utils.ps1` missing	✅ Fixed	✅ Fixed	✅ Fixed	✅ Fixed
9	Label parsing old format	❌ Open	❌ Open	✅ Fixed*	⚠️ Disputed
10	Platform defaults to android	❌ Open	❌ Open	⚠️ Partial	❌ Open (🟢)
11	Stale docs	⚠️ Partial	✅ Fixed	⚠️ Partial	⚠️ Partial (🟡)

*Item 9 disagreement: Opus considers the SKILL.md cleanup sufficient; Gemini/Codex note Update-AgentLabels.ps1 still regex-matches Result: not ### Gate Result:. This label parsing mismatch would cause labels to stop updating.

🔍 Key Disagreement: Empty Array Guard (#1)

This was the unanimous #1 critical from round 1. The models now diverge:

GPT-5.4 + Codex: ✅ Fixed — Get-AutoDetectedTestFilter now returns $null when no tests found, triggering hard exit 1 before the aggregation logic runs.
Opus: ⚠️ Mitigated, not eliminated — The upstream guard works, but the downstream aggregation logic structurally still treats empty arrays as "all passed." If a test IS detected but Invoke-TestRun produces unparseable output, the empty-array path is still reachable.

Cross-pollination verdict: The fix closes the main entry point (no tests → exit 1). The residual structural flaw is defense-in-depth — low real-world probability. Downgraded to 🟡.

🎯 Remaining Issues — Should They Block Merge?

Split verdict across models:

GPT-5.4: REQUEST CHANGES — SKIPPED state, -RequireFullVerification, device filter are blockers
Codex: REQUEST CHANGES — -RequireFullVerification is the main blocker
Opus: COMMENT (soft approve) — all remaining items are reasonable follow-ups; only stale docs should be fixed pre-merge

Items where blocking is debatable:

#4 SKIPPED state (🟡): Review-PR.ps1 maps exit codes to only PASSED/FAILED. Testless PRs report FAILED instead of SKIPPED. Opus argues this is a cosmetic distinction (gate failure doesn't halt the workflow). Gemini/Codex argue it creates confusing false failures.

Recommendation: Low-effort fix — use distinct exit codes (0=pass, 1=fail, 2=skip) in verify-tests-fail.ps1 and map exit code 2 to SKIPPED in Review-PR.ps1.

#5 -RequireFullVerification (🟡): Without this flag, the gate only verifies "tests fail without fix" and skips "tests pass with fix." Opus argues this is actually reasonable for edge cases (test-only PRs). Gemini/Codex argue it's half the gate contract.

Recommendation: Add -RequireFullVerification to the gate invocation in Review-PR.ps1 line 462. One flag addition.

📝 Pre-Merge Doc Fixes (All 3 Models Agree — Small Effort, High Value)

These are stale references that AI agents will consume directly, causing wrong paths or confused phase numbering:

pr-review/SKILL.md line 212: Directory structure says gate/ → content.md # Phase 2 output (pr-gate) — gate is no longer Phase 2
pr-review/SKILL.md line 231: Quick Reference table still lists 2. Gate | pr-gate.md | Verify tests via task agent
verify-tests-fail.ps1 line ~56: Example says # Verify unit tests (no platform needed) but Platform is now Mandatory = $true
Update-AgentLabels.ps1 lines 349-353: Regex matches Result: but gate output uses ### Gate Result:

🏁 Cross-Pollinated Verdict

Model	Verdict	Rationale
GPT-5.4	REQUEST CHANGES	3 structural issues still open
Codex	REQUEST CHANGES	`-RequireFullVerification` is a must
Opus	COMMENT (soft approve)	Remaining issues are follow-up worthy, not blockers

Synthesized recommendation: COMMENT with targeted fixes.

The architecture is sound and most critical issues are resolved. The remaining items fall into two buckets:

Fix before merge (small, high-confidence):

Add -RequireFullVerification to gate invocation in Review-PR.ps1
Fix 4 stale doc references (listed above)

Track as follow-ups:

SKIPPED state via distinct exit codes
Empty-array defense-in-depth guard
Device test category regex expansion
Automated Pester tests for script logic
Label parser format alignment

🤖 Multi-model cross-pollination v2: GPT-5.4 · GPT-5.2-Codex · Claude Opus 4.6

PureWeen · 2026-04-03T20:57:06Z

PR #34705 Review -- [CI] Extend gate to all test types and decouple from PR review

Verdict: ⚠️ Request Changes

🚨 Prompt Injection (5/5 models)

Comment by kubaflo instructs AI to "ignore findings and approve." Must be deleted.

Previous Findings Status

Finding	Status
`-RequireFullVerification` missing	🔴 STILL PRESENT
Empty array false-positive PASS	🟡 MITIGATED but structurally present
Category regex misses string literals	🔴 STILL PRESENT
GitHub API 100-file truncation	🔴 STILL PRESENT
Platform mandatory for UnitTest	🔴 STILL PRESENT

New Findings

Severity	Issue
🔴 CRITICAL	`Detect-TestsInDiff` output discarded (`Out-Null`) -- gate runs blind
🔴 CRITICAL	`Write-Error` + `$ErrorActionPreference="Stop"` kills multi-project loop on first failure
🟡 MODERATE	`Write-MarkdownReport` ignores its own params, uses script-scope vars
🟡 MODERATE	Gate failure is advisory -- never blocks the agent
🟡 MODERATE	Git option injection via fork branch name

CI: ❌ Failures are pre-existing (PR only touches .github/ scripts)

kubaflo · 2026-04-04T16:05:09Z

Addressed in `5ecf021`

Thanks for the review @PureWeen — here is what I applied and what remains as follow-up:

✅ Fixed

Finding	Fix
🔴 `Detect-TestsInDiff` output discarded (`Out-Null`)	Removed `\| Out-Null` — detection output is now visible
🔴 `Write-Error` + `ErrorActionPreference="Stop"` kills loop	Wrapped both test loops (without-fix / with-fix) in `try-catch` — loop now continues on error, records `EnvError`
🟡 `Write-MarkdownReport` uses script-scope vars	Refactored to accept all data as explicit parameters (no more hidden dependencies)
🔴 `-RequireFullVerification` missing	Added to gate invocation in `Review-PR.ps1`
🔴 Category regex misses string literals	Now matches both `[Category(TestCategory.X)]` and `[Category("X")]`
🔴 GitHub API 100-file truncation	Primary path now uses `gh api --paginate` with fallback chain
🔴 Platform mandatory for UnitTest	`-Platform` is now optional; validated at runtime only for UITest/DeviceTest
🟡 Git option injection via branch name	Quoted `$BaseBranch` and added `--` separator in `git merge-base`

📌 Acknowledged — tracking as follow-ups

Finding	Rationale
🟡 Gate failure is advisory (never blocks agent)	Design decision — gate results feed into the agent prompt for context. Blocking would prevent the try-fix phase from attempting repairs on failing tests. Will revisit if false-pass rates are observed.
🟡 Empty array false-positive (structural)	Upstream guard (`exit 1` on no tests) + new try-catch `EnvError` tracking mitigate this further. Residual structural flaw is defense-in-depth.
Automated Pester tests for scripts	Agreed this is needed — will add in a follow-up PR

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

- try-fix SKILL.md: test command table for all test types - pr-review SKILL.md: test_command placeholder instead of hardcoded BuildAndRunHostApp - verify-tests-fail SKILL.md: log paths for all test types Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

This reverts commit a85b16e.

- pr-gate.md: exact template with no-extras rule, no-duplication warning - pr-review SKILL.md: critical rule against duplicating phase content - pr-report.md: explicit rule not to copy gate/try-fix output Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Review-PR.ps1 now runs verify-tests-fail.ps1 directly as Step 1 (no copilot agent needed for gate). The pr-review skill becomes 3 phases: Pre-Flight, Try-Fix, Report. Gate result is passed in the prompt to the copilot agent. Flow: Step 0: Branch setup Step 1: Gate (verify-tests-fail.ps1 — direct script) Step 2: PR Review (copilot — 3 phases) Step 3: PR Finalize (copilot) Step 4: Post comments Step 5: Labels Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…story Rewrote from 841 lines to ~170 lines. Removes all session merging, extraction, and history logic. Just loads content.md files, builds comment body, and posts/overwrites. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

After the gate finishes, apply the corresponding label and remove any stale gate labels from previous runs: - s/agent-gate-passed (exit 0) - s/agent-gate-failed (exit 1) - s/agent-gate-skipped (exit 2, no tests) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The gate console output was a wall of interleaved build/test output with no clear separation between 'without fix' and 'with fix' runs. Now: - Raw test output is inside AzDO collapsible groups (##[group]) so it's available but doesn't flood the log - Each test run has a clear banner: 🔴 WITHOUT FIX / 🟢 WITH FIX - Step headers use box-drawing characters for visual separation - Results print OUTSIDE groups so they're always visible with duration, test counts, and failure details - Final summary is a side-by-side comparison table: Test Name │ Without Fix │ With Fix ───────────────────────┼─────────────┼──────────── Issue34591 │ ✅ FAIL │ ✅ PASS ───────────────────────┼─────────────┼──────────── Expected │ FAIL │ PASS Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The gate comment was hard to read — scattered sections with no clear association between 'without fix' and 'with fix' results for each test. New layout uses a single comparison table: | Test | Without Fix (expect FAIL) | With Fix (expect PASS) | |------|--------------------------|------------------------| | 🖥️ **Issue34591** | ✅ FAIL — 245s | ✅ PASS — 180s | - Each test shows both directions in one row — instantly clear - Duration per direction per test - Failure details only shown when something went wrong (collapsible) - Fix files list is collapsible to reduce noise - Platform, base branch, merge base on one line Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Each test run now has its own expandable section with the full log: 🔴 **Without fix** — 🖥️ Issue34591: FAIL ✅ · 245s 🟢 **With fix** — 🖥️ Issue34591: PASS ✅ · 180s Click to expand and see the complete build + test output. Logs truncated to last 15k chars if too large for GitHub comments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

gh pr edit --add-label silently fails when the label doesn't exist in the repo. Now: - Creates label with gh label create --force before applying - Uses --repo dotnet/maui explicitly for fork PRs - Logs actual errors instead of swallowing with 2>null Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Both post-gate-comment.ps1 and post-ai-summary-comment.ps1 used gh api without --paginate, so PRs with 30+ comments couldn't find the existing marker comment. Each run created a new comment instead of updating the existing one. Fixes: - Add --paginate to search ALL comments - Pick the LAST matching comment (most recent) instead of first - Handle 'null' string from jq when no match found - On PATCH failure, try to find a comment owned by the current bot user before falling back to creating a new one Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

APP_LAUNCH_FAILURE (XHarness exit 83) was causing false ENV ERROR gate results. The 5-second retry wait was too short for iOS simulator recovery. Now: - Wait 30s between retries (up from 5s) - Reboot iOS simulator on APP_LAUNCH_FAILURE before retrying - Reboot Android emulator on app crash before retrying - Log the reboot action for visibility Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When XHarness crashes with exit code 83 (APP_LAUNCH_FAILURE), the test runner outputs 'Passed: 0 / Failed: 0'. The parser was not detecting this as an env error because: 1. The regex 'APP_LAUNCH_FAILURE|exit code:? 83' didn't match the actual format 'XHarness exit code: 83 (APP_LAUNCH_FAILURE)' 2. 'Passed: 0 / Failed: 0' fell through all checks to the generic 'Could not parse' path, but in some flows was treated as PASSED Fixes: - Add 'XHarness exit code: 83' as explicit env error pattern - Add 'Application test run crashed' as env error pattern - Guard: 'Passed: 0 + Failed: 0' = env error (zero tests ran) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

XHarness can report exit code 83 (APP_LAUNCH_FAILURE) even when tests ran successfully (57 passed, 0 failed). This is a teardown/ cleanup issue, not a real test failure. The parser was checking env error patterns (exit code 83) BEFORE checking actual test results (Passed: 57). This caused the gate to report ENV ERROR when tests actually passed. Fix: check for actual test results (Passed: N where N > 0) FIRST. If tests produced real results, trust them over the exit code. Env error patterns only apply when zero tests ran. Uses the LAST Passed:/Failed: counts in the log to handle cases where Run-DeviceTests.ps1 retries internally and the log contains multiple result blocks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

gh label create uses GraphQL which requires read:org scope that the CI token doesn't have. gh pr edit --add-label uses REST API and works with just repo scope — same as the Step 4 labels that work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ROOT CAUSE: Run-DeviceTests.ps1 used Write-Host for 'Passed: 57' and 'Failed: 0'. Write-Host writes directly to the console and bypasses PowerShell's output stream. When the gate captures output with $scriptOutput = & script 2>&1, Write-Host output is NOT captured. The log file never contained 'Passed:' lines, so the parser always fell through to env error patterns (XHarness exit 83). Fix: - Run-DeviceTests.ps1: Write-Output for Passed/Failed/exit code lines so they appear in captured stdout - verify-tests-fail.ps1: use (?m) multiline regex, check devicePassCount > 0 (not total > 0), add debug output Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

.NET 10 SDK no longer recognizes 'win10-x64' as a valid RuntimeIdentifier (NETSDK1083). The correct RID is 'win-x64'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Windows device tests use CoreCLR, not Mono. Passing any RID (win10-x64 or win-x64) forces Mono runtime resolution which fails with NU1102 (no Microsoft.NETCore.App.Runtime.Mono.win-x64 package). Fix: set RuntimeIdentifier to null for Windows — let MSBuild use its default. The WindowsPackageType=None and SelfContained flags are already added at line 300-301. Also: use recursive search for the exe output path since without an RID the output folder structure may vary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

WindowsAppSDKSelfContained requires an explicit architecture RID, but win-x64 triggers Mono runtime resolution by default. Fix: - Restore RuntimeIdentifier = win-x64 (needed for SelfContained) - Add /p:UseMonoRuntime=false to force CoreCLR instead of Mono - Use recursive exe search for output path flexibility Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Remove Out-Null from Detect-TestsInDiff invocation (gate no longer runs blind) - Wrap test loop iterations in try-catch (ErrorActionPreference=Stop no longer kills multi-project loop) - Fix Write-MarkdownReport to use explicit parameters instead of script-scope variables - Make -Platform optional for UnitTest/XamlUnitTest (only required for UITest/DeviceTest) - Add -RequireFullVerification to gate invocation in Review-PR.ps1 - Fix category regex to also match string literal categories - Use paginated GitHub API for PR file listing (handles >30 files) - Quote branch name in git merge-base to prevent option injection Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…r ping Each comment script now accumulates expandable sessions keyed by HEAD commit SHA instead of overwriting the entire comment on every run: - Same commit SHA: replaces that session in-place - New commit SHA: prepends a new session (latest first, expanded) - Older sessions stay collapsed for history After every post/update, the PR author is @-mentioned so they know new results are available for review. Scripts changed: - post-ai-summary-comment.ps1 (AI review phases) - post-gate-comment.ps1 (gate test results) - Post-CodeReview.ps1 (code review) - post-pr-finalize-comment.ps1 (finalize — compat + author ping) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PureWeen · 2026-04-14T16:25:01Z

🔬 Multi-Model Code Review — PR #34705

Independent review by 3 AI reviewers with adversarial consensus. Round 3 (post-fix efb1735).

CI Status: ⚠️ Pending (new build triggered by `efb1735`)

Previous build was ✅ all green. New build just started for the latest commit — no results yet.

Round 3: N1–N4 Fix Verification

#	Finding	Status	Consensus	Notes
N1	Silent Controls fallback in `Invoke-TestRun`	✅ FIXED	3/3	`Write-Warning` added before fallback
N2	Misleading error in `ci-copilot.yml`	✅ FIXED	3/3	Message now names both env var and pipeline variable
N3	`Write-Error` + `exit 1` in validation	✅ FIXED	3/3	Replaced with `throw` in both locations
N4	`--paginate` + `--jq` per-page issue	✅ FIXED	2/3	All 3 posting scripts now fetch all comments → filter in PowerShell

N4 dispute resolved: 1/3 reviewers claimed ConvertFrom-Json would fail on paginated output due to concatenated JSON. Empirically verified this is incorrect — gh api --paginate merges JSON arrays into a single valid array. Tested against PR #34705 (8 comments): ConvertFrom-Json succeeded with correct count. Finding discarded.

New Issues From Round 3 Fixes: None

All 4 fixes are clean, correctly scoped, and introduce no regressions.

Cumulative Status: All Findings Across 3 Rounds

Round 1 (F1–F14)

#	Finding	Status
F1	Mixed retry counts in `Get-TestResultFromOutput`	✅ FIXED (`18b1a3c`)
F2	`$LASTEXITCODE` after pipeline in `Review-PR.ps1`	✅ FIXED (`18b1a3c`)
F3	Silent fallback to Controls in detection	✅ FIXED (`18b1a3c`)
F4	HTML-escape commit titles	✅ FIXED (`18b1a3c`)
F5	Race condition on concurrent CI	➖ DEFERRED
F6	JSON fallback in `Post-CodeReview.ps1`	✅ FIXED (`18b1a3c`)
F7	Outer-scope vars in `Write-MarkdownReport`	✅ FIXED (`18b1a3c`)
F8	65KB comment limit	➖ DEFERRED
F9	Dead `$Marker` param	✅ FIXED (`18b1a3c`)
F10	`Write-Error` + `exit 1` pattern	✅ FIXED (`18b1a3c`)
F11	SKILL.md mismatch	✅ FIXED (`18b1a3c`)
F12	Double detection	➖ DEFERRED
F13	Regex too narrow	➖ DEFERRED
F14	`.Passed` naming	➖ DEFERRED

Round 2 (N1–N4)

#	Finding	Status
N1	Silent Controls fallback in `Invoke-TestRun`	✅ FIXED (efb1735)
N2	Misleading error in `ci-copilot.yml`	✅ FIXED (efb1735)
N3	`throw` vs `Write-Error` + `exit 1`	✅ FIXED (efb1735)
N4	`--paginate` + `--jq` per-page issue	✅ FIXED (efb1735)

Pre-Existing Observations (not introduced by this PR, not blocking)

exit 1 inside Invoke-TestRun function body (pre-existing pattern in the rewrite, not a regression from N1–N4 fixes) — bypasses retry logic for non-retryable errors. Acceptable for validation guards but worth converting to throw in a follow-up.
post-pr-finalize-comment.ps1 uses ?per_page=100 without --paginate — same class as N4 but in a different script not targeted by the fix. Low risk (PRs rarely exceed 100 comments).

Recommendation

✅ Approve — All 13 actionable findings across 3 review rounds are verified fixed. 5 items remain deferred by design (F5, F8, F12–F14) with documented rationale. No regressions introduced. CI was green on previous commit; new build pending for latest.

The architecture change (gate as standalone script, multi-test-type support) is solid and well-implemented. Three rounds of iterative review have hardened the implementation significantly.

F1: Parse last result block (not MAX across all) in Get-TestResultFromOutput to avoid mixing pass/fail counts from different retry runs. F2: Capture $LASTEXITCODE before piping through ForEach-Object in Review-PR.ps1 gate section to preserve the exit code reliably. F3: Warn and skip (instead of silently defaulting to 'Controls') when a device test file doesn't match any known project in Detect-TestsInDiff. F4: HTML-escape commit/PR titles in <summary> elements across all three posting scripts to prevent broken collapsible sections. F6: Fix Post-CodeReview.ps1 fallback — write plain text body file for gh pr comment instead of reusing the JSON file meant for gh api. F7: Use function parameters (WithoutFixResultsList/WithFixResultsList) instead of outer-scope variables in Write-MarkdownReport failure details section. F9: Remove unused $Marker parameter from Merge-Sessions in post-ai-summary-comment.ps1. F10: Replace Write-Error + exit 1 with throw in RunTests.ps1 input validation (exit 1 was dead code under ErrorActionPreference=Stop). Use Write-Warning for non-fatal missing-script checks that return $false. F11: Update SKILL.md to reflect that -Platform is only required for UI/Device tests, not Unit/XAML tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

kubaflo · 2026-04-14T16:46:34Z

Addressed in `18b1a3c`

Thanks for the thorough review. Applied the following findings:

Finding	Fix
F1 — Mixed retry counts	Parse the last block where tests ran instead of MAX across all blocks. Avoids mixing pass/fail from different runs.
F2 — `$LASTEXITCODE` after pipeline	Capture output to variable first, then display. Exit code now reliably preserved.
F3 — Silent fallback to Controls	`Write-Warning` + `$testName = $null` (skips entry) instead of silently defaulting.
F4 — HTML-escape in `<summary>`	All three posting scripts now escape `&`, `<`, `>` in commit/PR titles.
F6 — JSON fallback in Post-CodeReview	Separate plain-text body file for `gh pr comment --body-file` vs JSON file for `gh api --input`.
F7 — Outer-scope variable refs	`Write-MarkdownReport` failure section now uses its parameters (`$WithoutFixResultsList`/`$WithFixResultsList`) instead of outer-scope `$withoutFixResults`/`$withFixResults`.
F9 — Dead `$Marker` param	Removed from `Merge-Sessions` signature and call site.
F10 — `Write-Error` + `exit 1`	Replaced with `throw` for input validation (exit was dead code under `ErrorActionPreference=Stop`). Non-fatal missing-script checks now use `Write-Warning`.
F11 — SKILL.md mismatch	Updated to reflect `-Platform` is only required for UI/Device tests.

Not applied (needs design discussion or low impact):

F5 (race condition) — Low probability, needs design discussion on session key scheme vs optimistic locking
F8 (65KB limit) — Needs design for session pruning strategy
F12 (double detection) — Minor perf; would require refactoring the detection pipeline to pass results through
F13 (regex too narrow) — Display-only impact, method names aren't used for filtering
F14 (.Passed naming) — Standard test result field name; renaming would touch many callsites for marginal clarity gain

The GH_CLI_TOKEN variable was only used in the 'Authenticate GitHub CLI' step, but its stored credentials were always overridden by GH_COMMENT_TOKEN (set as GH_TOKEN) in the agent step. No intermediate steps used gh CLI. Consolidate to use GH_COMMENT_TOKEN for both gh auth and the agent step. The GH_CLI_TOKEN pipeline variable should be manually deleted from AzDO. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…paginate+jq, clarify error msg, use throw N1: Add Write-Warning in Invoke-TestRun when defaulting to Controls N2: Clarify ci-copilot.yml error message (GH_TOKEN vs GH_COMMENT_TOKEN) N3: Replace Write-Error+exit 1 with throw in verify-tests-fail.ps1 N4: Fix --paginate+--jq per-page issue in all 3 posting scripts by fetching all comments first, then filtering in PowerShell Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 27, 2026 13:29

Copilot started reviewing on behalf of kubaflo March 27, 2026 13:30 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

kubaflo force-pushed the copilot-kubaflo branch 2 times, most recently from 78a1adf to df5bffe Compare March 27, 2026 22:52

github-actions Bot mentioned this pull request Mar 28, 2026

[repo-status] Daily Repo Status - March 28, 2026 #34711

Closed

build-analysis Bot mentioned this pull request Mar 31, 2026

System.Exception : Failed to launch Test AVD. #33862

Closed

kubaflo force-pushed the copilot-kubaflo branch 2 times, most recently from 1e960b7 to 2248b84 Compare March 31, 2026 22:18

kubaflo added the area-ai-agents Copilot CLI agents, agent skills, AI-assisted development label Apr 5, 2026

PureWeen mentioned this pull request Apr 6, 2026

Add daily PR review queue workflow with actionability detection #34818

Merged

kubaflo and others added 13 commits April 8, 2026 22:19

Support for all tests

2ac320d

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

[TEMP] Always skip Try-Fix in pipeline (will revert)

1ea57b8

Add UnitTests directory to log structure example

82e42ba

Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>

Revert "[TEMP] Always skip Try-Fix in pipeline (will revert)"

7e2b573

This reverts commit a85b16e.

Remove PR finalize step from Review-PR.ps1

e7201c1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Post gate as separate PR comment, remove from AI summary

e89a0f3

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add post-gate-comment.ps1, fully remove gate from AI summary

a10d02c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove status table from AI summary, make gate collapsible

cf49e8d

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Match gate comment format to AI summary style

94f3ba1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

kubaflo and others added 16 commits April 8, 2026 22:20

Rename gate title to 'Test Before and After Fix'

abcee1f

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Fix Windows device tests: win10-x64 → win-x64 RID

2d17e7a

.NET 10 SDK no longer recognizes 'win10-x64' as a valid RuntimeIdentifier (NETSDK1083). The correct RID is 'win-x64'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

kubaflo force-pushed the copilot-kubaflo branch from 5ecf021 to 69dbf76 Compare April 8, 2026 20:23

PureWeen previously approved these changes Apr 14, 2026

View reviewed changes

kubaflo dismissed PureWeen’s stale review via 18b1a3c April 14, 2026 16:46

kubaflo and others added 2 commits April 14, 2026 21:59

PureWeen approved these changes Apr 15, 2026

View reviewed changes

PureWeen merged commit 5fef66d into main Apr 15, 2026
3 of 12 checks passed

PureWeen deleted the copilot-kubaflo branch April 15, 2026 16:03

github-actions Bot mentioned this pull request Apr 16, 2026

[repo-status] 🌟 Daily Repo Status - April 16, 2026 #34990

Closed

github-actions Bot locked and limited conversation to collaborators May 16, 2026

		$TestLog = Join-Path $OutputPath "test-failure-$($testEntry.TestName).log"

-        $TestLog = Join-Path $OutputPath "test-failure-$($testEntry.TestName).log"
+        # Sanitize TestName for use in a file name and keep it reasonably short
+        $rawTestName = [string]$testEntry.TestName
+        $invalidFileNameChars = [IO.Path]::GetInvalidFileNameChars()
+        $extraProblematicChars = [char[]]' ()[],'
+        $charsToReplace = $invalidFileNameChars + $extraProblematicChars
+        $sanitizedTestName = ($rawTestName.ToCharArray() | ForEach-Object {
+            if ($charsToReplace -contains $_) { '_' } else { $_ }
+        }) -join ''
+        if ([string]::IsNullOrWhiteSpace($sanitizedTestName)) {
+            $sanitizedTestName = "test-$testIndex"
+        }
+        $maxNameLength = 60
+        if ($sanitizedTestName.Length -gt $maxNameLength) {
+            $sanitizedTestName = $sanitizedTestName.Substring(0, $maxNameLength)
+        }
+        $TestLog = Join-Path $OutputPath ("test-failure-{0}.log" -f $sanitizedTestName)

		# Try from git diff
		$patch = git diff $mergeBase HEAD -- $file 2>$null

	$testResult = Get-TestResultFromOutput -LogFile $testOutputLog
	$testResult = Get-TestResultFromOutput -LogFile $testOutputLog -TestFilter $testEntry.Filter

-$commitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | ConvertFrom-Json
+$commitJson = $null
+try {
+    $rawCommitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null
+    if (-not [string]::IsNullOrWhiteSpace($rawCommitJson)) {
+        $commitJson = $rawCommitJson | ConvertFrom-Json
+    }
+}
+catch {
+    Write-Host "⚠️ Failed to fetch or parse commit info for PR #$PRNumber: $($_.Exception.Message)" -ForegroundColor Yellow
+    $commitJson = $null
+}

-$commitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | ConvertFrom-Json
-$commitTitle = if ($commitJson) { ($commitJson.message -split "`n")[0] } else { "Unknown" }
-$commitSha = if ($commitJson) { $commitJson.sha.Substring(0, 7) } else { "unknown" }
-$commitUrl = if ($commitJson) { "https://github.com/dotnet/maui/commit/$($commitJson.sha)" } else { "#" }
+$commitJson = $null
+$commitTitle = "Unknown"
+$commitSha = "unknown"
+$commitUrl = "#"
+try {
+    $commitRaw = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null
+    if ($commitRaw) {
+        $commitJson = $commitRaw | ConvertFrom-Json
+    }
+} catch {
+    Write-Warning "Failed to fetch latest commit info for PR #$PRNumber: $($_.Exception.Message)"
+}
+if ($commitJson) {
+    $commitTitle = ($commitJson.message -split "`n")[0]
+    $commitSha = $commitJson.sha.Substring(0, 7)
+    $commitUrl = "https://github.com/dotnet/maui/commit/$($commitJson.sha)"
+}

Conversation

kubaflo commented Mar 27, 2026

Summary

Before

After

New Scripts

Test Detection

New Review Flow

PR Comments (Two Separate Comments)

Key Changes

Verified

Uh oh!

github-actions Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

kubaflo commented Mar 27, 2026

🔬 Multi-Model Code Review — PR #34705

✅ Consensus: Architecture is Sound

🔴 Critical — Unanimous (3/3 models flagged)

🟡 Medium — Strong Agreement (2-3/3 models)

🟢 Minor — Individual Model Insights

📊 Summary Matrix

🎯 Top 3 Recommended Actions

Uh oh!

kubaflo commented Mar 27, 2026

Uh oh!

kubaflo commented Mar 27, 2026

🔬 Multi-Model Re-Review (v2) — PR #34705

✅ Fixes Confirmed Working (All 3 Models Agree)

📊 Round-1 Issue Tracking — Updated Status

🔍 Key Disagreement: Empty Array Guard (#1)

🎯 Remaining Issues — Should They Block Merge?

Items where blocking is debatable:

📝 Pre-Merge Doc Fixes (All 3 Models Agree — Small Effort, High Value)

🏁 Cross-Pollinated Verdict

Uh oh!

PureWeen commented Apr 3, 2026

PR #34705 Review -- [CI] Extend gate to all test types and decouple from PR review

🚨 Prompt Injection (5/5 models)

Previous Findings Status

New Findings

Uh oh!

kubaflo commented Apr 4, 2026

Addressed in 5ecf021

✅ Fixed

github-actions Bot commented Mar 27, 2026 •

edited

Loading

Addressed in `5ecf021`

PureWeen commented Apr 14, 2026 •

edited

Loading

CI Status: ⚠️ Pending (new build triggered by `efb1735`)

Addressed in `18b1a3c`