Style skill validation results comment by kubaflo · Pull Request #35713 · dotnet/maui

kubaflo · 2026-06-02T18:28:18Z

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Description of Change

Updates the Skill Validation Results PR comment to match the visual style used by AI Summary and Test Failure Review comments:

stable marker plus ## Skill Validation Results title
PR author/commit context
status badges for overall, static checks, LLM evaluation, skills, and agents
a single expandable session with commit metadata
existing static-check and LLM-evaluation details preserved inside the session

Issues Fixed

No issue filed.

Validation

Extracted and syntax-checked the actions/github-script post-comment JavaScript with node --check
git diff --check

Format the Skill Validation Results PR comment with the same status badge and session layout used by AI Summary and Test Failure Review comments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-02T18:28:33Z

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 35713

Or

Run remotely in PowerShell:

iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 35713"

Render the Skill Validation Results session collapsed by default to match the requested comment behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Normalize Skill Validation Results comments so no details blocks render expanded by default. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

kubaflo · 2026-06-07T21:44:02Z

/review -b feature/enhanced-reviewer -p android

MauiBot

Expert Review — 1 findings

See inline comments for details.

MauiBot

AI Review Summary

@kubaflo — new AI review results are available based on this last commit: 5e02847.
Collapse all skill validation details by default To request a fresh review after new comments or commits, comment /review rerun.

Review Sessions — click to expand

Gate — Test Before & After Fix

Gate Result: ⚠️ SKIPPED

No tests were detected in this PR.

Recommendation: Add tests to verify the fix using the write-tests-agent.

Pre-Flight — Context & Validation

Issue: N/A - No linked issue filed
PR: #35713 - Style skill validation results comment
Platforms Affected: CI workflow only (review requested for android, but diff has no Android/runtime code)
Files Changed: 1 implementation, 0 test

Key Findings

PR modifies .github/workflows/skill-validation.yml to restyle the skill-validation results comment with author/commit context, badges, and a collapsible session.
Gate was already skipped because no tests were detected in this PR; no gate/content.md was created or modified here.
Code review found a status aggregation bug: cancelled/unknown LLM evaluation can still produce an overall green Passed badge when static checks pass.

Code Review Summary

Verdict: NEEDS_CHANGES
Confidence: high
Errors: 1 | Warnings: 1 | Suggestions: 1

Key code review findings:

❌ .github/workflows/skill-validation.yml:1038 — overallStatus can show Passed when eval status is cancelled or Unknown.
⚠️ Failure investigation prompt is computed but still absent from the PR comment body.
💡 Badge-only summary depends on shields.io availability; a plain-text fallback would improve resilience.

Fix Candidates

#	Source	Approach	Test Result	Files Changed	Notes
PR	PR #35713	Restyle skill validation PR comment with PR metadata, shields.io badges, and a collapsible per-commit session while preserving existing result details	⚠️ SKIPPED (Gate: no tests detected)	`.github/workflows/skill-validation.yml`	Original PR

Code Review — Deep Analysis

Code Review — PR #35713 · `.github/workflows/skill-validation.yml`

GitHub CLI authentication was unavailable; analysis used public GitHub API metadata and the local squashed PR diff (HEAD^..HEAD).

Independent Assessment

What this changes: The workflow's skill-validation result comment is restyled from a flat Markdown body into a richer status report with PR author/commit context, shields.io badges, a single expandable session block, and per-commit session markers. It adds a github.rest.pulls.get() call to retrieve PR title, author, and head SHA.

Inferred motivation: Make skill validation comments match the visual style used by other AI-generated PR comments while preserving existing static-check and LLM-evaluation details inside the expanded session.

Reconciliation with PR Narrative

Author claims: PR #35713 says it styles the Skill Validation Results comment with a stable marker, author/commit context, status badges, a single expandable session with commit metadata, and preserved existing details. It reports validation via extracted actions/github-script syntax check and git diff --check.

Agreement/disagreement: The diff matches the narrative. One status aggregation edge case appears incorrect: overallStatus can report Passed when the evaluation job is cancelled or unknown as long as static checks passed.

Findings

❌ Error — `overallStatus` can show Passed when eval is cancelled or unknown

.github/workflows/skill-validation.yml:1038 computes overallStatus as Passed whenever staticStatus === 'Passed' and neither status is Failed. That means staticStatus='Passed' with evalStatus='cancelled' or evalStatus='Unknown' produces a green overall badge even though LLM evaluation did not complete. Expected behavior is Passed only when static passed and eval either passed or was intentionally skipped.

⚠️ Warning — Failure investigation prompt remains outside the PR comment

The workflow computes investigatePrompt for failed evals, but the new PR comment body still does not include it inside the result session. This is pre-existing behavior and not required by the styling goal, but it leaves the author-facing comment less actionable when LLM evaluation fails.

💡 Suggestion — Badge-only summary depends on shields.io availability

The badge row uses https://img.shields.io/badge/.... If shields.io is unavailable, GitHub will show broken images rather than readable badge text. A plain-text fallback line would improve resilience, though this mirrors the referenced styling pattern.

Devil's Advocate

The overallStatus behavior might be intentional if cancelled evals are considered non-blocking, but the label is "Overall" and a cancelled/unknown phase is materially different from a passed phase. The prompt and badge fallback concerns are lower severity because they are either pre-existing or consistent with the style being copied.

Verdict: NEEDS_CHANGES

Confidence: high
Summary: The style refactor is structurally sound and safe, but the derived overall badge can be misleading for cancelled or unknown eval outcomes. A small status aggregation fix should be applied before this change is considered complete.

Blast Radius

Workflow-only change. Affects skill-validation PR comments for all platforms; no MAUI runtime or Android app code is touched.

Failure-Mode Probes

Scenario	Expected overallStatus
static Passed, eval Passed	Passed
static Passed, eval Skipped	Passed
static Passed, eval Failed	Failed
static Passed, eval cancelled	Needs attention
static Passed, eval Unknown	Needs attention
static Failed, eval Passed	Failed
static Failed, eval Skipped	Failed
static Unknown, eval Passed	Needs attention
static Skipped, eval Skipped	Needs attention

Fix — Analysis & Comparison

Fix Candidates

#	Source	Approach	Test Result	Files Changed	Notes
1	code-review + maui-expert-reviewer	Tighten the inline `overallStatus` success condition to require static Passed and eval Passed/Skipped	✅ PASS	1 file	Smallest targeted fix; prevents false green Overall badge for cancelled/unknown eval
2	code-review + maui-expert-reviewer	Extract `getOverallStatus(staticStatus, evalStatus)` helper with explicit guard clauses	✅ PASS	1 file	More readable policy encoding, but larger than candidate 1
PR	PR #35713	Restyle skill validation comment with PR metadata, badges, and collapsible session	⚠️ SKIPPED (Gate: no tests detected)	1 file	Original PR; expert review found false green Overall badge edge case

Cross-Pollination

Model	Round	New Ideas?	Details
maui-expert-reviewer	1	Yes	Identified status aggregation flaw and suggested tightening pass eligibility for eval statuses
local test loop	1	Yes	Candidate 1 validated minimal inline condition against syntax, status matrix, and diff whitespace checks
local test loop	2	Yes	Candidate 2 validated helper-based policy encoding against the same checks
orchestrator	2	No	Further candidates would be trivial variations of the same boolean policy; no meaningfully different approach remains for this workflow-only bug

Exhausted: Yes — two meaningful implementation strategies passed; additional variants would only rename variables or rearrange equivalent conditions.
Selected Fix: Candidate #1 — It passes all targeted checks, fixes the expert-reviewed cancelled/unknown eval failure mode, and is simpler than Candidate #2 while changing fewer lines.

Report — Final Recommendation

Comparative Fix Report - PR #35713

Gate: Skipped; no tests were detected in this PR. Regression ranking below uses the recorded STEP 5a targeted checks and the expert-review sandbox validation. Candidates that failed regression tests would rank below passing candidates; none of the STEP 5a candidates failed.

Candidates Compared

Rank	Candidate	Result	Assessment
1	`pr-plus-reviewer`	Passed targeted status matrix	Best choice. It keeps the submitted PR's styling implementation and applies the expert reviewer's minimal correctness fix so non-completed LLM states no longer show a green overall badge.
2	`try-fix-1`	Passed targeted status matrix	Equivalent code change to `pr-plus-reviewer` and the smallest standalone alternative. Ranked behind `pr-plus-reviewer` only because `pr-plus-reviewer` preserves the PR fix plus applied review feedback as the integrated candidate.
3	`try-fix-2`	Passed targeted status matrix	Correct and clearer if the status policy grows, but it adds a helper for a single call site and is larger than necessary for this workflow-only bug.
4	`pr`	Gate skipped; expert review found a correctness error	The raw PR improves comment presentation but can falsely report `Overall: Passed` when static checks pass and LLM evaluation is `cancelled`, `Unknown`, or another non-success/non-skipped state.

Winner

Winning candidate: pr-plus-reviewer

pr-plus-reviewer is the single best candidate because it retains the PR's intended UI/styling behavior and incorporates the expert review's minimal status aggregation fix. It is functionally equivalent to the strongest try-fix candidate while avoiding the raw PR's false-green overall status.

Notes

The absent failure-investigation prompt in the PR comment body and the shields.io badge fallback concern were reviewed but not treated as blocking. They are either pre-existing behavior or resilience improvements rather than correctness failures in this PR's submitted fix.

Future Action — review latest findings

No alternative fix was selected for this run. Review the session findings and CI results before merging.

kubaflo · 2026-06-10T22:00:03Z

/review rerun

PureWeen

Adversarial review summary

3 independent reviewers analyzed this PR in parallel with adversarial consensus (disputed 1/3 findings were re-evaluated by the other two reviewers; only validated findings are included).

Findings

Severity	Finding	Consensus
❌ Logic	Overall badge shows `Passed` when LLM eval is `cancelled`/`Unknown` or any non-final value (line 1038)	3/3 reviewers
❌ Logic	Comment attributes results to the wrong commit when a push lands during a `/evaluate-skills` run (line 693)	3/3 after dispute
⚠️ Error Handling	Unguarded `github.rest.pulls.get()` (line 688) can sink the entire results comment on a transient API blip	3/3 after dispute

See inline comments for repro scenarios and concrete fixes.

Discarded findings

Two additional findings were investigated and discarded as non-issues after dispute:

HTML attribute injection in badge alt — skillCount/agentCount are wc -l integer output from a trusted setup step, not raw PR-controlled file content; all other badge messages come from a constrained string vocabulary; GitHub's comment HTML sanitizer also blocks the worst cases.
<details open> stripper at line 1076 collapses sub-report formatting — verified intentional per the commit history (5e028470 "Collapse all skill validation details by default"); no current generated section uses <details open>.

Verified safe (do not need changes)

Marker upsert (body.includes(marker)) — relocation of marker from lines[0] to the first element of body is detected correctly; existing comments are updated in place, no duplicates.
prTitle XSS — &/</> are escaped and the value sits in element text (<strong>...</strong>), not an attribute.
headSha7 in HTML comment markers — derived from hex-only pr.data.head.sha, can't contain -->.
lines = [] downstream indexing — no consumer indexes lines[0]; marker is correctly placed at top of body.
Optional chaining on pr.data.user?.login — falls through to the no-mention author line variant.

Prior review context

MauiBot's expert review (2026-06-07) independently flagged the overallStatus aggregation bug as [major]. That finding is still unresolved — this review confirms it with 3-reviewer consensus and proposes the same minimal fix.

Test coverage

The PR's node --check validation only catches JavaScript syntax errors. None of the findings above are syntax errors — they're semantic/logic issues that require executing the comment-rendering path against the matrix of evalResult/staticResult combinations. Consider adding a small JS unit test (or a workflow_dispatch dry-run) that exercises the status-derivation logic to prevent regressions.

Methodology: 3 independent reviewers from different model families; 1/3 findings re-evaluated by the other two with no anchoring on the originator's wording. Verdicts here reflect consensus, not single-reviewer opinion.

…tricter Overall status - Use the gated head_sha (needs.pr-gate/slash-gate) for the results comment instead of the live PR head, so a push during a slow /evaluate-skills run can't misattribute results. - Wrap the decorative pulls.get() in try/catch with fallbacks so a transient API failure (rate-limit/5xx) doesn't sink the whole badges/details comment. - Overall = Passed only when LLM eval is explicitly Passed/Skipped; cancelled/Unknown now reads 'Needs attention', consistent with report-status. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

kubaflo · 2026-06-15T19:50:39Z

@PureWeen addressed all three findings in badeb78d1c (gated SHA for the results comment, guarded the decorative pulls.get, and tightened the Overall status so cancelled/Unknown eval no longer shows green). Ready for re-review — thanks!

PureWeen

Round 2 Adversarial Review — Multi-Model Code Review

Re: previous review #4498374195. Re-reviewed badeb78d1c with 3 independent reviewers from different model families (adversarial consensus).

✅ Round 1 fixes verified clean (3/3 reviewers)

All three findings from the previous round were addressed at root cause; no overcorrection or new regressions introduced by the fixes themselves:

Finding	Round-1 line	Fix in `badeb78d1c`	Verdict
`overallStatus = Passed` on cancelled/Unknown LLM eval	L1038	L1049 tightened to `staticStatus === 'Passed' && (evalStatus === 'Passed' \|\| evalStatus === 'Skipped')`	✅ 3/3
Comment attributes results to wrong commit during `/evaluate-skills` race	L693	New `HEAD_SHA` env at L680 sourced from `pr-gate`/`slash-gate` outputs; `process.env.HEAD_SHA` used for `headSha`/commit-link	✅ 3/3
Unguarded `pulls.get()` can sink entire comment	L688	Wrapped in `try/catch` with `prData = {}` fallback (L690–700); decorative metadata degrades gracefully	✅ 3/3

⚠️ One new finding (2/3 after dispute round)

One reviewer flagged a logic path that was not present in Round 1. After dispute, both other reviewers confirmed the trace is factually correct but argued for ⚠️ rather than ❌ because the underlying mis-categorization is mirrored in two pre-existing reporters (report-status shell at L1170+ and check-run github-script at L1219+) that this PR does not touch. See the inline comment at L1034 for the trace and a minimal in-scope guard suggestion.

Methodology

3 reviewers (different model families) → adversarial consensus
1/3 findings sent to a dispute round with the other 2 reviewers
Inline comments only on paths/lines validated against the actual PR diff
event: COMMENT — approval is a human decision

When the discover-eval job fails (transient runner / gh api error) for a contributor PR with static passing, evalStatus fell through to 'Skipped' and Overall showed green ✅. Guard on discoverResult !== 'success' → 'Needs attention', distinguishing a discovery infra failure from the legitimate 'no eval entries' case (report-status/check-run unchanged). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

kubaflo · 2026-06-16T10:47:20Z

@PureWeen addressed the round-2 finding in a5936c1c7c — a failed discover-eval job no longer renders as a green Overall (now Needs attention). Ready for re-review, thanks!

PureWeen

Round 3 Adversarial Review — Multi-Model Code Review

Re-reviewed a5936c1c7c with 3 independent reviewers from different model families (adversarial consensus). This round verifies the fix for the Round 2 finding (review #4501101962).

✅ Round 2 finding resolved (3/3 consensus)

The new guard in a5936c1c7c is verified correct by all three reviewers, with no regression:

} else if (discoverResult !== 'success') {
  evalStatus = 'Needs attention';   // discovery job failed → don't mask as Skipped/Passed
}

Branch ordering is the key detail, and it's right: the new branch sits after !evalRan && !isContributor, so:

Scenario	`evalStatus`	`Overall`
External (non-contributor) PR, discover-eval skipped	`Skipped`	✅ Passed (unchanged — intended)
Contributor PR, discover-eval failed/cancelled	Needs attention	⚠️ Needs attention (fixed — was ✅)
Contributor PR, discover-eval success, no entries	`Skipped`	✅ Passed (unchanged)
Contributor PR, eval ran with results	Passed/Failed	per results (unchanged)

The guard fires only when discovery didn't succeed and the actor is a contributor/slash-triggered — exactly the targeted case. External PRs are still caught first and map to Skipped, so there are no false "Needs attention" alarms.

✅ Round 1 fixes still hold (3/3)

All three earlier fixes re-verified correct and complete — gated HEAD_SHA used first (no wrong-commit attribution), pulls.get() guarded with graceful metadata fallback (no remaining unguarded deref), and overallStatus only green when staticStatus === 'Passed' && (evalStatus === 'Passed' || evalStatus === 'Skipped'). Security spot-checks pass: PR title / badge alt are HTML-escaped, badge label/message go through encodeURIComponent, and the SHA only flows into hex-only contexts.

💡 Optional follow-up (non-blocking, 1/3 reviewers — out of scope for this PR)

For the same contributor + discover-failure case, the new badge now correctly reads LLM: Needs attention, but the pre-existing detailed body section (and the sibling report-status / check-run reporters) still fall through to "⏭️ LLM Evaluation: Skipped". Those are pre-existing and outside this styling PR's diff — a sensible follow-up issue would add the same discoverResult !== 'success' guard to all reporters so the badge and body agree. Not a blocker for this PR.

Verdict: No blocking issues. All four findings from the prior two rounds are resolved with 3/3 consensus and no overcorrection. The only remaining item is the optional consistency follow-up above.

Methodology: 3 independent reviewers, adversarial consensus. Changed path is workflow YAML with embedded github-script; no automated test coverage exists for this comment-rendering logic (validated by the author with node --check). event: COMMENT — approval is a human decision.

kubaflo · 2026-06-16T19:37:29Z

Thanks for the round-3 verification, @PureWeen — glad the discover-eval guard holds up across all three reviewers with no overcorrection.

Agreed on the optional follow-up: the badge now reads LLM: Needs attention, but the detailed body section and the sibling report-status/check-run reporters still fall through to Skipped for the contributor + discover-failure case. As we settled in round 2, those reporters are pre-existing and out of scope for this styling PR, so I'll keep them untouched here and track the badge/body/reporter consistency as a separate follow-up rather than widen this diff. Appreciate the thorough multi-round review!

Style skill validation results comment

bb0f341

Format the Skill Validation Results PR comment with the same status badge and session layout used by AI Summary and Test Failure Review comments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot added the area-infrastructure CI, Maestro / Coherency, upstream dependencies/versions label Jun 2, 2026

Copilot AI added 2 commits June 2, 2026 20:33

Collapse skill validation session by default

f9e61a5

Render the Skill Validation Results session collapsed by default to match the requested comment behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Collapse all skill validation details by default

5e02847

Normalize Skill Validation Results comments so no details blocks render expanded by default. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

This was referenced Jun 3, 2026

[PR Review Queue] 2026-06-03 #35732

Closed

[PR Review Queue] 2026-06-03 PureWeen/maui#105

Open

[PR Review Queue] 2026-06-04 #35763

Closed

[PR Review Queue] 2026-06-05 #35780

Closed

MauiBot reviewed Jun 7, 2026

View reviewed changes

Comment thread .github/workflows/skill-validation.yml Outdated

MauiBot added s/agent-fix-win AI found a better alternative fix than the PR s/agent-reviewed PR was reviewed by AI agent workflow (full 4-phase review) labels Jun 7, 2026

MauiBot reviewed Jun 7, 2026

View reviewed changes

This was referenced Jun 8, 2026

[PR Review Queue] 2026-06-08 #35801

Closed

[PR Review Queue] 2026-06-09 #35816

Closed

[PR Review Queue] 2026-06-10 #35842

Closed

PureWeen mentioned this pull request Jun 11, 2026

[Release Readiness] .NET 10 SR9 — candidate from main #35867

Open

This was referenced Jun 11, 2026

[PR Review Queue] 2026-06-11 #35871

Closed

[PR Review Queue] 2026-06-12 #35888

Closed

[PR Review Queue] 2026-06-15 #35919

Closed

PureWeen reviewed Jun 15, 2026

View reviewed changes

Comment thread .github/workflows/skill-validation.yml

Comment thread .github/workflows/skill-validation.yml Outdated

Comment thread .github/workflows/skill-validation.yml Outdated

PureWeen reviewed Jun 15, 2026

View reviewed changes

Comment thread .github/workflows/skill-validation.yml

github-actions Bot mentioned this pull request Jun 16, 2026

[PR Review Queue] 2026-06-16 #35941

Closed

PureWeen reviewed Jun 16, 2026

View reviewed changes

PureWeen approved these changes Jun 16, 2026

View reviewed changes

PureWeen merged commit 1c0c0eb into dotnet:main Jun 16, 2026
16 of 19 checks passed

github-actions Bot added this to the .NET 10 SR9 milestone Jun 16, 2026

github-actions Bot mentioned this pull request Jun 18, 2026

[repo-status] Daily Repo Status — June 18, 2026 #35982

Open

Conversation

kubaflo commented Jun 2, 2026

Description of Change

Issues Fixed

Validation

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kubaflo commented Jun 7, 2026

Uh oh!

MauiBot left a comment

Choose a reason for hiding this comment

Expert Review — 1 findings

Uh oh!

Uh oh!

MauiBot left a comment

Choose a reason for hiding this comment

AI Review Summary

Gate Result: ⚠️ SKIPPED

Key Findings

Code Review Summary

Fix Candidates

Code Review — PR #35713 · .github/workflows/skill-validation.yml

Independent Assessment

Reconciliation with PR Narrative

Findings

❌ Error — overallStatus can show Passed when eval is cancelled or unknown

⚠️ Warning — Failure investigation prompt remains outside the PR comment

💡 Suggestion — Badge-only summary depends on shields.io availability

Devil's Advocate

Verdict: NEEDS_CHANGES

Blast Radius

Failure-Mode Probes

Fix Candidates

Cross-Pollination

Comparative Fix Report - PR #35713

Candidates Compared

Winner

Notes

Uh oh!

kubaflo commented Jun 10, 2026

Uh oh!

PureWeen left a comment

Choose a reason for hiding this comment

Adversarial review summary

Findings

Discarded findings

Verified safe (do not need changes)

Prior review context

Test coverage

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kubaflo commented Jun 15, 2026

Uh oh!

PureWeen left a comment

Choose a reason for hiding this comment

Round 2 Adversarial Review — Multi-Model Code Review

✅ Round 1 fixes verified clean (3/3 reviewers)

⚠️ One new finding (2/3 after dispute round)

Methodology

Uh oh!

Uh oh!

kubaflo commented Jun 16, 2026

Uh oh!

PureWeen left a comment

Choose a reason for hiding this comment

Round 3 Adversarial Review — Multi-Model Code Review

✅ Round 2 finding resolved (3/3 consensus)

✅ Round 1 fixes still hold (3/3)

💡 Optional follow-up (non-blocking, 1/3 reviewers — out of scope for this PR)

Uh oh!

kubaflo commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

github-actions Bot commented Jun 2, 2026 •

edited

Loading

Code Review — PR #35713 · `.github/workflows/skill-validation.yml`

❌ Error — `overallStatus` can show Passed when eval is cancelled or unknown