From f8ce7c3a50c2596d423fd3bafa3da30eca531a02 Mon Sep 17 00:00:00 2001 From: Shane Date: Fri, 30 Jan 2026 16:36:46 -0600 Subject: [PATCH 1/2] Improve PR agent Gate verification workflow - Require Gate verification to run via task agent (prevents command substitution) - Reference verify-tests-fail-without-fix skill by name instead of inline commands - Add 'Common Gate Mistakes' section with explicit anti-patterns - Fix ai-summary-comment regex to handle
tags with attributes These changes prevent fabricating dual-direction test results from single runs. --- .github/agents/pr.md | 39 ++++++++++++++++++- .../scripts/post-ai-summary-comment.ps1 | 3 +- 2 files changed, 39 insertions(+), 3 deletions(-) diff --git a/.github/agents/pr.md b/.github/agents/pr.md index 336fcc29310c..b7e60dd46396 100644 --- a/.github/agents/pr.md +++ b/.github/agents/pr.md @@ -441,10 +441,31 @@ Tests were already verified to FAIL in Phase 2. Gate is a confirmation step: **If starting from a PR (fix exists):** Use full verification mode - tests should FAIL without fix, PASS with fix. -```bash -pwsh .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 -Platform android -RequireFullVerification +**🚨 MUST invoke as a task agent** to prevent command substitution: + +```markdown +Invoke the `task` agent with agent_type: "task" and this prompt: + +"Invoke the verify-tests-fail-without-fix skill for this PR: +- Platform: android (or ios) +- TestFilter: 'IssueXXXXX' +- RequireFullVerification: true + +Wait for FULL completion (5-10+ minutes). The skill does TWO test runs: +1. Reverts fix → runs tests (should FAIL) +2. Restores fix → runs tests (should PASS) + +Report back: +1. Did tests FAIL without fix? (Yes/No) +2. Did tests PASS with fix? (Yes/No) +3. Final status: VERIFICATION PASSED or VERIFICATION FAILED +4. Any errors or issues encountered" ``` +**Why task agent?** Running inline allows substituting commands and fabricating results. Task agent runs in isolation and reports exactly what happened. + +**⚠️ Do NOT run Gate verification inline - always use task agent with the skill.** + ### Expected Output (PR with fix) ``` @@ -493,3 +514,17 @@ pwsh .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 - ❌ **Running tests during Pre-Flight** - That's Phase 3 - ❌ **Not creating state file first** - ALWAYS create state file before gathering context - ❌ **Skipping to Phase 4** - Gate MUST pass first + +## Common Gate Mistakes + +- ❌ **Running Gate verification inline** - Use task agent to prevent command substitution +- ❌ **Using `BuildAndRunHostApp.ps1` for Gate** - That only runs ONE direction; the skill does TWO runs +- ❌ **Using manual `dotnet test` commands** - Doesn't revert/restore fix files automatically +- ❌ **Claiming "fails both ways" from a single test run** - That's fabrication; you need the script's TWO runs +- ❌ **Not waiting for task agent completion** - Script takes 5-10+ minutes; wait for task to return + +**🚨 The verify-tests-fail.ps1 script does TWO test runs automatically:** +1. Reverts fix → runs tests (should FAIL) +2. Restores fix → runs tests (should PASS) + +Never run Gate inline. Always invoke as task agent. diff --git a/.github/skills/ai-summary-comment/scripts/post-ai-summary-comment.ps1 b/.github/skills/ai-summary-comment/scripts/post-ai-summary-comment.ps1 index d0748a5a245f..83badb5a3319 100644 --- a/.github/skills/ai-summary-comment/scripts/post-ai-summary-comment.ps1 +++ b/.github/skills/ai-summary-comment/scripts/post-ai-summary-comment.ps1 @@ -289,7 +289,8 @@ function Extract-AllSections { $sections = @{} # Pattern to find all
TITLE...content...
blocks - $pattern = '(?s)
\s*([^<]+)(.*?)
' + # Note: [^>]* handles optional attributes like "open" in
+ $pattern = '(?s)]*>\s*([^<]+)(.*?)
' $matches = [regex]::Matches($StateContent, $pattern) if ($Debug) { From 6112c1f4d0358c40506e26cda940cb341cf9304a Mon Sep 17 00:00:00 2001 From: Shane Date: Fri, 30 Jan 2026 16:45:43 -0600 Subject: [PATCH 2/2] Trim Gate documentation to reference skill instead of duplicating --- .github/agents/pr.md | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-) diff --git a/.github/agents/pr.md b/.github/agents/pr.md index b7e60dd46396..53e323d82c30 100644 --- a/.github/agents/pr.md +++ b/.github/agents/pr.md @@ -451,20 +451,12 @@ Invoke the `task` agent with agent_type: "task" and this prompt: - TestFilter: 'IssueXXXXX' - RequireFullVerification: true -Wait for FULL completion (5-10+ minutes). The skill does TWO test runs: -1. Reverts fix → runs tests (should FAIL) -2. Restores fix → runs tests (should PASS) - -Report back: -1. Did tests FAIL without fix? (Yes/No) -2. Did tests PASS with fix? (Yes/No) -3. Final status: VERIFICATION PASSED or VERIFICATION FAILED -4. Any errors or issues encountered" +Report back: Did tests FAIL without fix? Did tests PASS with fix? Final status?" ``` **Why task agent?** Running inline allows substituting commands and fabricating results. Task agent runs in isolation and reports exactly what happened. -**⚠️ Do NOT run Gate verification inline - always use task agent with the skill.** +See `.github/skills/verify-tests-fail-without-fix/SKILL.md` for full skill documentation. ### Expected Output (PR with fix)