[CI] Extend gate to all test types and decouple from PR review#34705
Conversation
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 34705Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 34705" |
There was a problem hiding this comment.
Pull request overview
Extends the CI “gate” and PR review automation to detect and run all MAUI test types (UI/device/unit/XAML), and restructures the review flow so gate runs as a standalone step with its own PR comment, while the Copilot-driven review focuses on Pre-Flight/Try-Fix/Report.
Changes:
- Update
verify-tests-fail.ps1to auto-detect test type(s) and dispatch to the right runner, running all detected tests. - Add shared test detection (
Detect-TestsInDiff.ps1) and a dedicated gate comment poster (post-gate-comment.ps1); simplify AI summary posting. - Update review orchestration/docs to remove gate from the
pr-reviewskill and run it fromReview-PR.ps1instead.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 | Adds multi-test-type detection/routing and multi-test execution for gate verification. |
| .github/skills/verify-tests-fail-without-fix/SKILL.md | Updates skill documentation for broader test-type support (but currently out of sync with script behavior/outputs). |
| .github/skills/try-fix/references/example-invocation.md | Documents device/unit test command examples. |
| .github/skills/try-fix/SKILL.md | Updates try-fix guidance to select the correct runner per test type. |
| .github/skills/pr-review/SKILL.md | Changes orchestrator to 3 phases (Pre-Flight/Try-Fix/Report) and states gate is pre-run. |
| .github/scripts/shared/Detect-TestsInDiff.ps1 | New shared test detection/classification script used by gate and tooling. |
| .github/scripts/post-gate-comment.ps1 | New script to post/update a dedicated <!-- AI Gate --> PR comment. |
| .github/scripts/post-ai-summary-comment.ps1 | Simplifies AI summary comment generation (no session history; gate posted separately). |
| .github/scripts/RunTests.ps1 | New unified local entry point to run Unit/Device/UI/Integration tests. |
| .github/scripts/Review-PR.ps1 | Reorders flow to run gate first via script, then invoke pr-review, then post comments/labels. |
| .github/scripts/EstablishBrokenBaseline.ps1 | Expands “test path” patterns to exclude more test utility/runner paths from fix detection. |
| .github/pr-review/pr-report.md | Updates report phase prerequisites now that gate is external to pr-review. |
| .github/pr-review/pr-gate.md | Updates gate doc for detection template/output rules (but still references task-agent flow). |
| .github/copilot-instructions.md | Updates repository Copilot instructions to reflect 3-phase PR review and multi-type gate. |
| $TestLog = Join-Path $OutputPath "test-failure-$($testEntry.TestName).log" | ||
|
|
There was a problem hiding this comment.
Log file names are derived directly from TestName (e.g., test-failure-<TestName>.log). TestName can include spaces/parentheses/commas (device tests append method names), which can create awkward or invalid paths on some filesystems and can hit path-length limits. Consider sanitizing TestName for file names (or use an index-based file name) and store the display name inside the log content instead.
| $TestLog = Join-Path $OutputPath "test-failure-$($testEntry.TestName).log" | |
| # Sanitize TestName for use in a file name and keep it reasonably short | |
| $rawTestName = [string]$testEntry.TestName | |
| $invalidFileNameChars = [IO.Path]::GetInvalidFileNameChars() | |
| $extraProblematicChars = [char[]]' ()[],' | |
| $charsToReplace = $invalidFileNameChars + $extraProblematicChars | |
| $sanitizedTestName = ($rawTestName.ToCharArray() | ForEach-Object { | |
| if ($charsToReplace -contains $_) { '_' } else { $_ } | |
| }) -join '' | |
| if ([string]::IsNullOrWhiteSpace($sanitizedTestName)) { | |
| $sanitizedTestName = "test-$testIndex" | |
| } | |
| $maxNameLength = 60 | |
| if ($sanitizedTestName.Length -gt $maxNameLength) { | |
| $sanitizedTestName = $sanitizedTestName.Substring(0, $maxNameLength) | |
| } | |
| $TestLog = Join-Path $OutputPath ("test-failure-{0}.log" -f $sanitizedTestName) |
| 3. Auto-detects test classes from changed test files | ||
| 4. Routes to the appropriate test runner | ||
| 5. Runs tests (should FAIL to prove they catch the bug) | ||
| 6. **Updates PR labels** based on result | ||
| 7. Reports result |
There was a problem hiding this comment.
This doc claims the script "Updates PR labels" as part of the workflow, but the updated verify-tests-fail.ps1 no longer contains label-management logic. Please remove or update these steps so they reflect the current behavior (labeling appears to happen later in Review-PR.ps1).
| $patch = $null | ||
| if ($PRNumber) { | ||
| # Get per-file patch from GitHub API | ||
| $patch = gh api "repos/dotnet/maui/pulls/$PRNumber/files" --jq ".[] | select(.filename == `"$file`") | .patch" 2>$null | ||
| } else { |
There was a problem hiding this comment.
For each device-test file, Step 4 calls gh api repos/dotnet/maui/pulls/$PRNumber/files and filters it with jq. This creates an N+1 GitHub API pattern that can hit rate limits and slow down gate on large PRs. Consider fetching the PR files/patches once (single API call) and caching them in a lookup keyed by filename.
| # Try from git diff | ||
| $patch = git diff $mergeBase HEAD -- $file 2>$null |
There was a problem hiding this comment.
When running without -PRNumber, Step 4 uses git diff $mergeBase HEAD -- $file, but $mergeBase is only set in the earlier auto-diff path. If the caller provides -ChangedFiles directly, $mergeBase may be $null, producing an unintended diff (or no patch). Consider ensuring $mergeBase is always initialized for non-PR runs, or diffing against HEAD explicitly.
| # Try from git diff | |
| $patch = git diff $mergeBase HEAD -- $file 2>$null | |
| # Try from git diff. If $mergeBase is not set (for example when -ChangedFiles | |
| # was provided directly), fall back to diffing against the previous commit. | |
| if ($mergeBase) { | |
| $patch = git diff $mergeBase HEAD -- $file 2>$null | |
| } else { | |
| $patch = git diff HEAD~1 HEAD -- $file 2>$null | |
| } |
| -DetectedProjectPath $testEntry.ProjectPath ` | ||
| -LogFile $TestLog | ||
|
|
||
| $testResult = Get-TestResultFromOutput -LogFile $testOutputLog |
There was a problem hiding this comment.
Get-TestResultFromOutput accepts a -TestFilter parameter (and contains filter-aware parsing logic), but this call doesn't pass the current test's filter. Either pass $testEntry.Filter through or remove the unused parameter/logic; otherwise the filter-aware parsing path can never be exercised.
| $testResult = Get-TestResultFromOutput -LogFile $testOutputLog | |
| $testResult = Get-TestResultFromOutput -LogFile $testOutputLog -TestFilter $testEntry.Filter |
| .PARAMETER Platform | ||
| Target platform: "android", "ios", "catalyst" (MacCatalyst), or "windows" | ||
| Required for UITest and DeviceTest types. Optional for UnitTest and XamlUnitTest. | ||
|
|
There was a problem hiding this comment.
The comment header says -Platform is optional for UnitTest/XamlUnitTest, but $Platform is declared as a mandatory parameter, and the examples suggest running without it. Either make -Platform truly optional (and only enforce it for UI/Device tests), or update the docs/examples to reflect that -Platform is always required.
| | `test-without-fix.log` | Full test output from run without fix | | ||
| | `test-with-fix.log` | Full test output from run with fix | | ||
|
|
||
| **Plus UI test logs in** `CustomAgentLogsTmp/UITests/`: | ||
| - `android-device.log` or `ios-device.log` - Device logs | ||
| - `test-output.log` - NUnit test output | ||
| **Plus test logs in** `CustomAgentLogsTmp/`: | ||
| - `UITests/` - UI test device logs and output |
There was a problem hiding this comment.
In the Output Files section, the directory/path described earlier (CustomAgentLogsTmp/PRState/<PRNumber>/verify-tests-fail/) doesn't match the current verify-tests-fail.ps1 output location (now under .../PRAgent/gate/verify-tests-fail). Please update the paths and example structure here so consumers can find verification-report.md and the per-test logs reliably.
| 2. **Select platform** — must be affected by bug AND available on host (see table above). | ||
|
|
||
| 3. **Run verification via task agent** (MUST use task agent — never inline): | ||
| ``` |
There was a problem hiding this comment.
Step 3 still instructs running gate via the task agent, but the new workflow in Review-PR.ps1 runs gate directly via verify-tests-fail.ps1 before invoking the Copilot pr-review skill. Please update this step to reflect the new script-driven gate (or clarify that this doc is only for manual, agent-driven gate runs).
| # ============================================================================ | ||
|
|
||
| # Get latest commit info | ||
| $commitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | ConvertFrom-Json |
There was a problem hiding this comment.
gh api ... | ConvertFrom-Json will throw if gh fails (e.g., auth missing / rate limit) because stderr is suppressed and stdout may be empty. Since $ErrorActionPreference = 'Stop', this can break the whole posting step. Wrap this in try/catch and fall back to Unknown commit info when the API call fails or returns empty.
| $commitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | ConvertFrom-Json | |
| $commitJson = $null | |
| try { | |
| $rawCommitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | |
| if (-not [string]::IsNullOrWhiteSpace($rawCommitJson)) { | |
| $commitJson = $rawCommitJson | ConvertFrom-Json | |
| } | |
| } | |
| catch { | |
| Write-Host "⚠️ Failed to fetch or parse commit info for PR #$PRNumber: $($_.Exception.Message)" -ForegroundColor Yellow | |
| $commitJson = $null | |
| } |
| $commitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | ConvertFrom-Json | ||
| $commitTitle = if ($commitJson) { ($commitJson.message -split "`n")[0] } else { "Unknown" } | ||
| $commitSha = if ($commitJson) { $commitJson.sha.Substring(0, 7) } else { "unknown" } | ||
| $commitUrl = if ($commitJson) { "https://github.com/dotnet/maui/commit/$($commitJson.sha)" } else { "#" } | ||
|
|
There was a problem hiding this comment.
Similar to post-gate-comment.ps1, gh api ... | ConvertFrom-Json will throw when gh fails or returns empty output (stderr is suppressed). With $ErrorActionPreference = 'Stop', that prevents the summary comment from being posted. Add try/catch and default commit title/SHA/URL when the API call is unavailable.
| $commitJson = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | ConvertFrom-Json | |
| $commitTitle = if ($commitJson) { ($commitJson.message -split "`n")[0] } else { "Unknown" } | |
| $commitSha = if ($commitJson) { $commitJson.sha.Substring(0, 7) } else { "unknown" } | |
| $commitUrl = if ($commitJson) { "https://github.com/dotnet/maui/commit/$($commitJson.sha)" } else { "#" } | |
| $commitJson = $null | |
| $commitTitle = "Unknown" | |
| $commitSha = "unknown" | |
| $commitUrl = "#" | |
| try { | |
| $commitRaw = gh api "repos/dotnet/maui/pulls/$PRNumber/commits" --jq '.[-1] | {message: .commit.message, sha: .sha}' 2>$null | |
| if ($commitRaw) { | |
| $commitJson = $commitRaw | ConvertFrom-Json | |
| } | |
| } catch { | |
| Write-Warning "Failed to fetch latest commit info for PR #$PRNumber: $($_.Exception.Message)" | |
| } | |
| if ($commitJson) { | |
| $commitTitle = ($commitJson.message -split "`n")[0] | |
| $commitSha = $commitJson.sha.Substring(0, 7) | |
| $commitUrl = "https://github.com/dotnet/maui/commit/$($commitJson.sha)" | |
| } |
🔬 Multi-Model Code Review — PR #34705
✅ Consensus: Architecture is SoundAll three models agree the decoupling direction is correct:
🔴 Critical — Unanimous (3/3 models flagged)1. Empty test array → false-positive gate PASS All three models independently identified this as the #1 issue: $failedWithoutFix = ($withoutFixResults | Where-Object { $_.Passed }).Count -eq 0
$passedWithFix = ($withFixResults | Where-Object { -not $_.Passed }).Count -eq 0When Fix (one-liner): if ($AllDetectedTests.Count -eq 0) {
Write-Error "No tests detected — gate cannot verify"; exit 1
}
# + similar guard after each test run loop for empty results2. No automated tests for ~1,200 lines of new script logic (Opus + Gemini)
🟡 Medium — Strong Agreement (2-3/3 models)3. Device test filter falls back to bare class name (All 3) When 4. Gate can't produce documented
5.
6. Synthesized test entries have inconsistent key shapes (Opus + Codex) Detection-returned entries have Recommendation: Create a 7. Comment marker mismatch (Opus) PR description says 8.
9. Label parsing expects old format (Gemini)
10. Documentation inconsistencies (Opus + Gemini)
🟢 Minor — Individual Model Insights
📊 Summary Matrix
🎯 Top 3 Recommended Actions
🤖 Generated via multi-model cross-pollination: GPT-5.4 · GPT-5.2-Codex · Claude Opus 4.6 |
|
All 10 review comments addressed in commit
|
🔬 Multi-Model Re-Review (v2) — PR #34705
✅ Fixes Confirmed Working (All 3 Models Agree)
📊 Round-1 Issue Tracking — Updated Status
*Item 9 disagreement: Opus considers the SKILL.md cleanup sufficient; Gemini/Codex note 🔍 Key Disagreement: Empty Array Guard (#1)This was the unanimous #1 critical from round 1. The models now diverge:
Cross-pollination verdict: The fix closes the main entry point (no tests → 🎯 Remaining Issues — Should They Block Merge?Split verdict across models:
Items where blocking is debatable:#4 SKIPPED state (🟡):
#5
📝 Pre-Merge Doc Fixes (All 3 Models Agree — Small Effort, High Value)These are stale references that AI agents will consume directly, causing wrong paths or confused phase numbering:
🏁 Cross-Pollinated Verdict
Synthesized recommendation: COMMENT with targeted fixes. The architecture is sound and most critical issues are resolved. The remaining items fall into two buckets: Fix before merge (small, high-confidence):
Track as follow-ups:
🤖 Multi-model cross-pollination v2: GPT-5.4 · GPT-5.2-Codex · Claude Opus 4.6 |
78a1adf to
df5bffe
Compare
1e960b7 to
2248b84
Compare
PR #34705 Review -- [CI] Extend gate to all test types and decouple from PR reviewVerdict: 🚨 Prompt Injection (5/5 models)Comment by kubaflo instructs AI to "ignore findings and approve." Must be deleted. Previous Findings Status
New Findings
CI: ❌ Failures are pre-existing (PR only touches |
Addressed in
|
| Finding | Fix |
|---|---|
🔴 Detect-TestsInDiff output discarded (Out-Null) |
Removed | Out-Null — detection output is now visible |
🔴 Write-Error + ErrorActionPreference="Stop" kills loop |
Wrapped both test loops (without-fix / with-fix) in try-catch — loop now continues on error, records EnvError |
🟡 Write-MarkdownReport uses script-scope vars |
Refactored to accept all data as explicit parameters (no more hidden dependencies) |
🔴 -RequireFullVerification missing |
Added to gate invocation in Review-PR.ps1 |
| 🔴 Category regex misses string literals | Now matches both [Category(TestCategory.X)] and [Category("X")] |
| 🔴 GitHub API 100-file truncation | Primary path now uses gh api --paginate with fallback chain |
| 🔴 Platform mandatory for UnitTest | -Platform is now optional; validated at runtime only for UITest/DeviceTest |
| 🟡 Git option injection via branch name | Quoted $BaseBranch and added -- separator in git merge-base |
📌 Acknowledged — tracking as follow-ups
| Finding | Rationale |
|---|---|
| 🟡 Gate failure is advisory (never blocks agent) | Design decision — gate results feed into the agent prompt for context. Blocking would prevent the try-fix phase from attempting repairs on failing tests. Will revisit if false-pass rates are observed. |
| 🟡 Empty array false-positive (structural) | Upstream guard (exit 1 on no tests) + new try-catch EnvError tracking mitigate this further. Residual structural flaw is defense-in-depth. |
| Automated Pester tests for scripts | Agreed this is needed — will add in a follow-up PR |
Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>
- try-fix SKILL.md: test command table for all test types - pr-review SKILL.md: test_command placeholder instead of hardcoded BuildAndRunHostApp - verify-tests-fail SKILL.md: log paths for all test types Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-Authored-By: Copilot <223556219+Copilot@users.noreply.github.com>
This reverts commit a85b16e.
- pr-gate.md: exact template with no-extras rule, no-duplication warning - pr-review SKILL.md: critical rule against duplicating phase content - pr-report.md: explicit rule not to copy gate/try-fix output Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Review-PR.ps1 now runs verify-tests-fail.ps1 directly as Step 1 (no copilot agent needed for gate). The pr-review skill becomes 3 phases: Pre-Flight, Try-Fix, Report. Gate result is passed in the prompt to the copilot agent. Flow: Step 0: Branch setup Step 1: Gate (verify-tests-fail.ps1 — direct script) Step 2: PR Review (copilot — 3 phases) Step 3: PR Finalize (copilot) Step 4: Post comments Step 5: Labels Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…story Rewrote from 841 lines to ~170 lines. Removes all session merging, extraction, and history logic. Just loads content.md files, builds comment body, and posts/overwrites. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After the gate finishes, apply the corresponding label and remove any stale gate labels from previous runs: - s/agent-gate-passed (exit 0) - s/agent-gate-failed (exit 1) - s/agent-gate-skipped (exit 2, no tests) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The gate console output was a wall of interleaved build/test output with no clear separation between 'without fix' and 'with fix' runs. Now: - Raw test output is inside AzDO collapsible groups (##[group]) so it's available but doesn't flood the log - Each test run has a clear banner: 🔴 WITHOUT FIX / 🟢 WITH FIX - Step headers use box-drawing characters for visual separation - Results print OUTSIDE groups so they're always visible with duration, test counts, and failure details - Final summary is a side-by-side comparison table: Test Name │ Without Fix │ With Fix ───────────────────────┼─────────────┼──────────── Issue34591 │ ✅ FAIL │ ✅ PASS ───────────────────────┼─────────────┼──────────── Expected │ FAIL │ PASS Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The gate comment was hard to read — scattered sections with no clear association between 'without fix' and 'with fix' results for each test. New layout uses a single comparison table: | Test | Without Fix (expect FAIL) | With Fix (expect PASS) | |------|--------------------------|------------------------| | 🖥️ **Issue34591** | ✅ FAIL — 245s | ✅ PASS — 180s | - Each test shows both directions in one row — instantly clear - Duration per direction per test - Failure details only shown when something went wrong (collapsible) - Fix files list is collapsible to reduce noise - Platform, base branch, merge base on one line Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Each test run now has its own expandable section with the full log: 🔴 **Without fix** — 🖥️ Issue34591: FAIL ✅ · 245s 🟢 **With fix** — 🖥️ Issue34591: PASS ✅ · 180s Click to expand and see the complete build + test output. Logs truncated to last 15k chars if too large for GitHub comments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
gh pr edit --add-label silently fails when the label doesn't exist in the repo. Now: - Creates label with gh label create --force before applying - Uses --repo dotnet/maui explicitly for fork PRs - Logs actual errors instead of swallowing with 2>null Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Both post-gate-comment.ps1 and post-ai-summary-comment.ps1 used gh api without --paginate, so PRs with 30+ comments couldn't find the existing marker comment. Each run created a new comment instead of updating the existing one. Fixes: - Add --paginate to search ALL comments - Pick the LAST matching comment (most recent) instead of first - Handle 'null' string from jq when no match found - On PATCH failure, try to find a comment owned by the current bot user before falling back to creating a new one Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
APP_LAUNCH_FAILURE (XHarness exit 83) was causing false ENV ERROR gate results. The 5-second retry wait was too short for iOS simulator recovery. Now: - Wait 30s between retries (up from 5s) - Reboot iOS simulator on APP_LAUNCH_FAILURE before retrying - Reboot Android emulator on app crash before retrying - Log the reboot action for visibility Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When XHarness crashes with exit code 83 (APP_LAUNCH_FAILURE), the test runner outputs 'Passed: 0 / Failed: 0'. The parser was not detecting this as an env error because: 1. The regex 'APP_LAUNCH_FAILURE|exit code:? 83' didn't match the actual format 'XHarness exit code: 83 (APP_LAUNCH_FAILURE)' 2. 'Passed: 0 / Failed: 0' fell through all checks to the generic 'Could not parse' path, but in some flows was treated as PASSED Fixes: - Add 'XHarness exit code: 83' as explicit env error pattern - Add 'Application test run crashed' as env error pattern - Guard: 'Passed: 0 + Failed: 0' = env error (zero tests ran) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
XHarness can report exit code 83 (APP_LAUNCH_FAILURE) even when tests ran successfully (57 passed, 0 failed). This is a teardown/ cleanup issue, not a real test failure. The parser was checking env error patterns (exit code 83) BEFORE checking actual test results (Passed: 57). This caused the gate to report ENV ERROR when tests actually passed. Fix: check for actual test results (Passed: N where N > 0) FIRST. If tests produced real results, trust them over the exit code. Env error patterns only apply when zero tests ran. Uses the LAST Passed:/Failed: counts in the log to handle cases where Run-DeviceTests.ps1 retries internally and the log contains multiple result blocks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
gh label create uses GraphQL which requires read:org scope that the CI token doesn't have. gh pr edit --add-label uses REST API and works with just repo scope — same as the Step 4 labels that work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ROOT CAUSE: Run-DeviceTests.ps1 used Write-Host for 'Passed: 57' and 'Failed: 0'. Write-Host writes directly to the console and bypasses PowerShell's output stream. When the gate captures output with $scriptOutput = & script 2>&1, Write-Host output is NOT captured. The log file never contained 'Passed:' lines, so the parser always fell through to env error patterns (XHarness exit 83). Fix: - Run-DeviceTests.ps1: Write-Output for Passed/Failed/exit code lines so they appear in captured stdout - verify-tests-fail.ps1: use (?m) multiline regex, check devicePassCount > 0 (not total > 0), add debug output Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
.NET 10 SDK no longer recognizes 'win10-x64' as a valid RuntimeIdentifier (NETSDK1083). The correct RID is 'win-x64'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Windows device tests use CoreCLR, not Mono. Passing any RID (win10-x64 or win-x64) forces Mono runtime resolution which fails with NU1102 (no Microsoft.NETCore.App.Runtime.Mono.win-x64 package). Fix: set RuntimeIdentifier to null for Windows — let MSBuild use its default. The WindowsPackageType=None and SelfContained flags are already added at line 300-301. Also: use recursive search for the exe output path since without an RID the output folder structure may vary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
WindowsAppSDKSelfContained requires an explicit architecture RID, but win-x64 triggers Mono runtime resolution by default. Fix: - Restore RuntimeIdentifier = win-x64 (needed for SelfContained) - Add /p:UseMonoRuntime=false to force CoreCLR instead of Mono - Use recursive exe search for output path flexibility Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove Out-Null from Detect-TestsInDiff invocation (gate no longer runs blind) - Wrap test loop iterations in try-catch (ErrorActionPreference=Stop no longer kills multi-project loop) - Fix Write-MarkdownReport to use explicit parameters instead of script-scope variables - Make -Platform optional for UnitTest/XamlUnitTest (only required for UITest/DeviceTest) - Add -RequireFullVerification to gate invocation in Review-PR.ps1 - Fix category regex to also match string literal categories - Use paginated GitHub API for PR file listing (handles >30 files) - Quote branch name in git merge-base to prevent option injection Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…r ping Each comment script now accumulates expandable sessions keyed by HEAD commit SHA instead of overwriting the entire comment on every run: - Same commit SHA: replaces that session in-place - New commit SHA: prepends a new session (latest first, expanded) - Older sessions stay collapsed for history After every post/update, the PR author is @-mentioned so they know new results are available for review. Scripts changed: - post-ai-summary-comment.ps1 (AI review phases) - post-gate-comment.ps1 (gate test results) - Post-CodeReview.ps1 (code review) - post-pr-finalize-comment.ps1 (finalize — compat + author ping) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🔬 Multi-Model Code Review — PR #34705
CI Status:
|
| # | Finding | Status | Consensus | Notes |
|---|---|---|---|---|
| N1 | Silent Controls fallback in Invoke-TestRun |
✅ FIXED | 3/3 | Write-Warning added before fallback |
| N2 | Misleading error in ci-copilot.yml |
✅ FIXED | 3/3 | Message now names both env var and pipeline variable |
| N3 | Write-Error + exit 1 in validation |
✅ FIXED | 3/3 | Replaced with throw in both locations |
| N4 | --paginate + --jq per-page issue |
✅ FIXED | 2/3 | All 3 posting scripts now fetch all comments → filter in PowerShell |
N4 dispute resolved: 1/3 reviewers claimed
ConvertFrom-Jsonwould fail on paginated output due to concatenated JSON. Empirically verified this is incorrect —gh api --paginatemerges JSON arrays into a single valid array. Tested against PR #34705 (8 comments):ConvertFrom-Jsonsucceeded with correct count. Finding discarded.
New Issues From Round 3 Fixes: None
All 4 fixes are clean, correctly scoped, and introduce no regressions.
Cumulative Status: All Findings Across 3 Rounds
Round 1 (F1–F14)
| # | Finding | Status |
|---|---|---|
| F1 | Mixed retry counts in Get-TestResultFromOutput |
✅ FIXED (18b1a3c) |
| F2 | $LASTEXITCODE after pipeline in Review-PR.ps1 |
✅ FIXED (18b1a3c) |
| F3 | Silent fallback to Controls in detection | ✅ FIXED (18b1a3c) |
| F4 | HTML-escape commit titles | ✅ FIXED (18b1a3c) |
| F5 | Race condition on concurrent CI | ➖ DEFERRED |
| F6 | JSON fallback in Post-CodeReview.ps1 |
✅ FIXED (18b1a3c) |
| F7 | Outer-scope vars in Write-MarkdownReport |
✅ FIXED (18b1a3c) |
| F8 | 65KB comment limit | ➖ DEFERRED |
| F9 | Dead $Marker param |
✅ FIXED (18b1a3c) |
| F10 | Write-Error + exit 1 pattern |
✅ FIXED (18b1a3c) |
| F11 | SKILL.md mismatch | ✅ FIXED (18b1a3c) |
| F12 | Double detection | ➖ DEFERRED |
| F13 | Regex too narrow | ➖ DEFERRED |
| F14 | .Passed naming |
➖ DEFERRED |
Round 2 (N1–N4)
| # | Finding | Status |
|---|---|---|
| N1 | Silent Controls fallback in Invoke-TestRun |
✅ FIXED (efb1735) |
| N2 | Misleading error in ci-copilot.yml |
✅ FIXED (efb1735) |
| N3 | throw vs Write-Error + exit 1 |
✅ FIXED (efb1735) |
| N4 | --paginate + --jq per-page issue |
✅ FIXED (efb1735) |
Pre-Existing Observations (not introduced by this PR, not blocking)
exit 1insideInvoke-TestRunfunction body (pre-existing pattern in the rewrite, not a regression from N1–N4 fixes) — bypasses retry logic for non-retryable errors. Acceptable for validation guards but worth converting tothrowin a follow-up.post-pr-finalize-comment.ps1uses?per_page=100without--paginate— same class as N4 but in a different script not targeted by the fix. Low risk (PRs rarely exceed 100 comments).
Recommendation
✅ Approve — All 13 actionable findings across 3 review rounds are verified fixed. 5 items remain deferred by design (F5, F8, F12–F14) with documented rationale. No regressions introduced. CI was green on previous commit; new build pending for latest.
The architecture change (gate as standalone script, multi-test-type support) is solid and well-implemented. Three rounds of iterative review have hardened the implementation significantly.
F1: Parse last result block (not MAX across all) in Get-TestResultFromOutput
to avoid mixing pass/fail counts from different retry runs.
F2: Capture $LASTEXITCODE before piping through ForEach-Object in
Review-PR.ps1 gate section to preserve the exit code reliably.
F3: Warn and skip (instead of silently defaulting to 'Controls') when a
device test file doesn't match any known project in Detect-TestsInDiff.
F4: HTML-escape commit/PR titles in <summary> elements across all three
posting scripts to prevent broken collapsible sections.
F6: Fix Post-CodeReview.ps1 fallback — write plain text body file for
gh pr comment instead of reusing the JSON file meant for gh api.
F7: Use function parameters (WithoutFixResultsList/WithFixResultsList) instead
of outer-scope variables in Write-MarkdownReport failure details section.
F9: Remove unused $Marker parameter from Merge-Sessions in
post-ai-summary-comment.ps1.
F10: Replace Write-Error + exit 1 with throw in RunTests.ps1 input validation
(exit 1 was dead code under ErrorActionPreference=Stop). Use Write-Warning
for non-fatal missing-script checks that return $false.
F11: Update SKILL.md to reflect that -Platform is only required for UI/Device
tests, not Unit/XAML tests.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addressed in 18b1a3cThanks for the thorough review. Applied the following findings:
Not applied (needs design discussion or low impact):
|
The GH_CLI_TOKEN variable was only used in the 'Authenticate GitHub CLI' step, but its stored credentials were always overridden by GH_COMMENT_TOKEN (set as GH_TOKEN) in the agent step. No intermediate steps used gh CLI. Consolidate to use GH_COMMENT_TOKEN for both gh auth and the agent step. The GH_CLI_TOKEN pipeline variable should be manually deleted from AzDO. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…paginate+jq, clarify error msg, use throw
N1: Add Write-Warning in Invoke-TestRun when defaulting to Controls
N2: Clarify ci-copilot.yml error message (GH_TOKEN vs GH_COMMENT_TOKEN)
N3: Replace Write-Error+exit 1 with throw in verify-tests-fail.ps1
N4: Fix --paginate+--jq per-page issue in all 3 posting scripts by
fetching all comments first, then filtering in PowerShell
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Summary
Extends the CI PR review pipeline to support all test types (UI tests, device tests, unit tests, XAML tests) and restructures the review flow by decoupling the gate from the copilot agent.
Before
TestCases.HostApp/TestCases.Shared.Tests)After
<!-- AI Gate -->)New Scripts
Detect-TestsInDiff.ps1post-gate-comment.ps1RunTests.ps1Test Detection
New Review Flow
PR Comments (Two Separate Comments)
Gate comment (
<!-- AI Gate -->):AI Summary comment (
<!-- AI Summary -->):Pre-Flight, Fix, Report sections only — no gate duplication.
Key Changes
verify-tests-fail.ps1: Auto-detects test type, routes to correct runner (BuildAndRunHostApp, Run-DeviceTests, dotnet test), iterates over all detected tests,-PlatformmandatoryDetect-TestsInDiff.ps1: Shared detection engine — reads[Category]attributes for device test filtering, extracts method names from PR diffsReview-PR.ps1: Gate as Step 1 (script), PR review as Step 2 (copilot), removed PR finalize steppost-ai-summary-comment.ps1: Rewritten from 841 → 170 lines, always overwritespr-gate.md: Strict output template, no cross-phase duplication rulepr-review/SKILL.md: 3 phases (removed Gate), no-duplication ruleEstablishBrokenBaseline.ps1: Excludes TestUtils/DeviceTests.Runners from fix file detectionVerified