Skip to content

Style skill validation results comment#35713

Merged
PureWeen merged 5 commits into
dotnet:mainfrom
kubaflo:style-skill-validation-comments
Jun 16, 2026
Merged

Style skill validation results comment#35713
PureWeen merged 5 commits into
dotnet:mainfrom
kubaflo:style-skill-validation-comments

Conversation

@kubaflo

@kubaflo kubaflo commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Description of Change

Updates the Skill Validation Results PR comment to match the visual style used by AI Summary and Test Failure Review comments:

  • stable marker plus ## Skill Validation Results title
  • PR author/commit context
  • status badges for overall, static checks, LLM evaluation, skills, and agents
  • a single expandable session with commit metadata
  • existing static-check and LLM-evaluation details preserved inside the session

Issues Fixed

No issue filed.

Validation

  • Extracted and syntax-checked the actions/github-script post-comment JavaScript with node --check
  • git diff --check

Format the Skill Validation Results PR comment with the same status badge and session layout used by AI Summary and Test Failure Review comments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 35713

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 35713"

@github-actions github-actions Bot added the area-infrastructure CI, Maestro / Coherency, upstream dependencies/versions label Jun 2, 2026
Copilot AI added 2 commits June 2, 2026 20:33
Render the Skill Validation Results session collapsed by default to match the requested comment behavior.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Normalize Skill Validation Results comments so no details blocks render expanded by default.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo

kubaflo commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

/review -b feature/enhanced-reviewer -p android

@MauiBot MauiBot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert Review — 1 findings

See inline comments for details.

Comment thread .github/workflows/skill-validation.yml Outdated
@MauiBot MauiBot added s/agent-fix-win AI found a better alternative fix than the PR s/agent-reviewed PR was reviewed by AI agent workflow (full 4-phase review) labels Jun 7, 2026

@MauiBot MauiBot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Review Summary

@kubaflo — new AI review results are available based on this last commit: 5e02847.
Collapse all skill validation details by default To request a fresh review after new comments or commits, comment /review rerun.

Gate Skipped Code Review In Review Confidence High Platform Ci Workflow Only

Review Sessions — click to expand
Gate — Test Before & After Fix

Gate Result: ⚠️ SKIPPED

No tests were detected in this PR.

Recommendation: Add tests to verify the fix using the write-tests-agent.


Pre-Flight — Context & Validation

Issue: N/A - No linked issue filed
PR: #35713 - Style skill validation results comment
Platforms Affected: CI workflow only (review requested for android, but diff has no Android/runtime code)
Files Changed: 1 implementation, 0 test

Key Findings

  • PR modifies .github/workflows/skill-validation.yml to restyle the skill-validation results comment with author/commit context, badges, and a collapsible session.
  • Gate was already skipped because no tests were detected in this PR; no gate/content.md was created or modified here.
  • Code review found a status aggregation bug: cancelled/unknown LLM evaluation can still produce an overall green Passed badge when static checks pass.

Code Review Summary

Verdict: NEEDS_CHANGES
Confidence: high
Errors: 1 | Warnings: 1 | Suggestions: 1

Key code review findings:

  • .github/workflows/skill-validation.yml:1038overallStatus can show Passed when eval status is cancelled or Unknown.
  • ⚠️ Failure investigation prompt is computed but still absent from the PR comment body.
  • 💡 Badge-only summary depends on shields.io availability; a plain-text fallback would improve resilience.

Fix Candidates

# Source Approach Test Result Files Changed Notes
PR PR #35713 Restyle skill validation PR comment with PR metadata, shields.io badges, and a collapsible per-commit session while preserving existing result details ⚠️ SKIPPED (Gate: no tests detected) .github/workflows/skill-validation.yml Original PR

Code Review — Deep Analysis

Code Review — PR #35713 · .github/workflows/skill-validation.yml

GitHub CLI authentication was unavailable; analysis used public GitHub API metadata and the local squashed PR diff (HEAD^..HEAD).

Independent Assessment

What this changes: The workflow's skill-validation result comment is restyled from a flat Markdown body into a richer status report with PR author/commit context, shields.io badges, a single expandable session block, and per-commit session markers. It adds a github.rest.pulls.get() call to retrieve PR title, author, and head SHA.

Inferred motivation: Make skill validation comments match the visual style used by other AI-generated PR comments while preserving existing static-check and LLM-evaluation details inside the expanded session.

Reconciliation with PR Narrative

Author claims: PR #35713 says it styles the Skill Validation Results comment with a stable marker, author/commit context, status badges, a single expandable session with commit metadata, and preserved existing details. It reports validation via extracted actions/github-script syntax check and git diff --check.

Agreement/disagreement: The diff matches the narrative. One status aggregation edge case appears incorrect: overallStatus can report Passed when the evaluation job is cancelled or unknown as long as static checks passed.

Findings

❌ Error — overallStatus can show Passed when eval is cancelled or unknown

.github/workflows/skill-validation.yml:1038 computes overallStatus as Passed whenever staticStatus === 'Passed' and neither status is Failed. That means staticStatus='Passed' with evalStatus='cancelled' or evalStatus='Unknown' produces a green overall badge even though LLM evaluation did not complete. Expected behavior is Passed only when static passed and eval either passed or was intentionally skipped.

⚠️ Warning — Failure investigation prompt remains outside the PR comment

The workflow computes investigatePrompt for failed evals, but the new PR comment body still does not include it inside the result session. This is pre-existing behavior and not required by the styling goal, but it leaves the author-facing comment less actionable when LLM evaluation fails.

💡 Suggestion — Badge-only summary depends on shields.io availability

The badge row uses https://img.shields.io/badge/.... If shields.io is unavailable, GitHub will show broken images rather than readable badge text. A plain-text fallback line would improve resilience, though this mirrors the referenced styling pattern.

Devil's Advocate

The overallStatus behavior might be intentional if cancelled evals are considered non-blocking, but the label is "Overall" and a cancelled/unknown phase is materially different from a passed phase. The prompt and badge fallback concerns are lower severity because they are either pre-existing or consistent with the style being copied.

Verdict: NEEDS_CHANGES

Confidence: high
Summary: The style refactor is structurally sound and safe, but the derived overall badge can be misleading for cancelled or unknown eval outcomes. A small status aggregation fix should be applied before this change is considered complete.

Blast Radius

Workflow-only change. Affects skill-validation PR comments for all platforms; no MAUI runtime or Android app code is touched.

Failure-Mode Probes

Scenario Expected overallStatus
static Passed, eval Passed Passed
static Passed, eval Skipped Passed
static Passed, eval Failed Failed
static Passed, eval cancelled Needs attention
static Passed, eval Unknown Needs attention
static Failed, eval Passed Failed
static Failed, eval Skipped Failed
static Unknown, eval Passed Needs attention
static Skipped, eval Skipped Needs attention

Fix — Analysis & Comparison

Fix Candidates

# Source Approach Test Result Files Changed Notes
1 code-review + maui-expert-reviewer Tighten the inline overallStatus success condition to require static Passed and eval Passed/Skipped ✅ PASS 1 file Smallest targeted fix; prevents false green Overall badge for cancelled/unknown eval
2 code-review + maui-expert-reviewer Extract getOverallStatus(staticStatus, evalStatus) helper with explicit guard clauses ✅ PASS 1 file More readable policy encoding, but larger than candidate 1
PR PR #35713 Restyle skill validation comment with PR metadata, badges, and collapsible session ⚠️ SKIPPED (Gate: no tests detected) 1 file Original PR; expert review found false green Overall badge edge case

Cross-Pollination

Model Round New Ideas? Details
maui-expert-reviewer 1 Yes Identified status aggregation flaw and suggested tightening pass eligibility for eval statuses
local test loop 1 Yes Candidate 1 validated minimal inline condition against syntax, status matrix, and diff whitespace checks
local test loop 2 Yes Candidate 2 validated helper-based policy encoding against the same checks
orchestrator 2 No Further candidates would be trivial variations of the same boolean policy; no meaningfully different approach remains for this workflow-only bug

Exhausted: Yes — two meaningful implementation strategies passed; additional variants would only rename variables or rearrange equivalent conditions.
Selected Fix: Candidate #1 — It passes all targeted checks, fixes the expert-reviewed cancelled/unknown eval failure mode, and is simpler than Candidate #2 while changing fewer lines.


Report — Final Recommendation

Comparative Fix Report - PR #35713

Gate: Skipped; no tests were detected in this PR. Regression ranking below uses the recorded STEP 5a targeted checks and the expert-review sandbox validation. Candidates that failed regression tests would rank below passing candidates; none of the STEP 5a candidates failed.

Candidates Compared

Rank Candidate Result Assessment
1 pr-plus-reviewer Passed targeted status matrix Best choice. It keeps the submitted PR's styling implementation and applies the expert reviewer's minimal correctness fix so non-completed LLM states no longer show a green overall badge.
2 try-fix-1 Passed targeted status matrix Equivalent code change to pr-plus-reviewer and the smallest standalone alternative. Ranked behind pr-plus-reviewer only because pr-plus-reviewer preserves the PR fix plus applied review feedback as the integrated candidate.
3 try-fix-2 Passed targeted status matrix Correct and clearer if the status policy grows, but it adds a helper for a single call site and is larger than necessary for this workflow-only bug.
4 pr Gate skipped; expert review found a correctness error The raw PR improves comment presentation but can falsely report Overall: Passed when static checks pass and LLM evaluation is cancelled, Unknown, or another non-success/non-skipped state.

Winner

Winning candidate: pr-plus-reviewer

pr-plus-reviewer is the single best candidate because it retains the PR's intended UI/styling behavior and incorporates the expert review's minimal status aggregation fix. It is functionally equivalent to the strongest try-fix candidate while avoiding the raw PR's false-green overall status.

Notes

The absent failure-investigation prompt in the PR comment body and the shields.io badge fallback concern were reviewed but not treated as blocking. They are either pre-existing behavior or resilience improvements rather than correctness failures in this PR's submitted fix.


Future Action — review latest findings

No alternative fix was selected for this run. Review the session findings and CI results before merging.

@kubaflo

kubaflo commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

/review rerun

@PureWeen PureWeen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial review summary

3 independent reviewers analyzed this PR in parallel with adversarial consensus (disputed 1/3 findings were re-evaluated by the other two reviewers; only validated findings are included).

Findings

Severity Finding Consensus
❌ Logic Overall badge shows Passed when LLM eval is cancelled/Unknown or any non-final value (line 1038) 3/3 reviewers
❌ Logic Comment attributes results to the wrong commit when a push lands during a /evaluate-skills run (line 693) 3/3 after dispute
⚠️ Error Handling Unguarded github.rest.pulls.get() (line 688) can sink the entire results comment on a transient API blip 3/3 after dispute

See inline comments for repro scenarios and concrete fixes.

Discarded findings

Two additional findings were investigated and discarded as non-issues after dispute:

  • HTML attribute injection in badge altskillCount/agentCount are wc -l integer output from a trusted setup step, not raw PR-controlled file content; all other badge messages come from a constrained string vocabulary; GitHub's comment HTML sanitizer also blocks the worst cases.
  • <details open> stripper at line 1076 collapses sub-report formatting — verified intentional per the commit history (5e028470 "Collapse all skill validation details by default"); no current generated section uses <details open>.

Verified safe (do not need changes)

  • Marker upsert (body.includes(marker)) — relocation of marker from lines[0] to the first element of body is detected correctly; existing comments are updated in place, no duplicates.
  • prTitle XSS — &/</> are escaped and the value sits in element text (<strong>...</strong>), not an attribute.
  • headSha7 in HTML comment markers — derived from hex-only pr.data.head.sha, can't contain -->.
  • lines = [] downstream indexing — no consumer indexes lines[0]; marker is correctly placed at top of body.
  • Optional chaining on pr.data.user?.login — falls through to the no-mention author line variant.

Prior review context

MauiBot's expert review (2026-06-07) independently flagged the overallStatus aggregation bug as [major]. That finding is still unresolved — this review confirms it with 3-reviewer consensus and proposes the same minimal fix.

Test coverage

The PR's node --check validation only catches JavaScript syntax errors. None of the findings above are syntax errors — they're semantic/logic issues that require executing the comment-rendering path against the matrix of evalResult/staticResult combinations. Consider adding a small JS unit test (or a workflow_dispatch dry-run) that exercises the status-derivation logic to prevent regressions.


Methodology: 3 independent reviewers from different model families; 1/3 findings re-evaluated by the other two with no anchoring on the originator's wording. Verdicts here reflect consensus, not single-reviewer opinion.

Comment thread .github/workflows/skill-validation.yml
Comment thread .github/workflows/skill-validation.yml Outdated
Comment thread .github/workflows/skill-validation.yml Outdated
…tricter Overall status

- Use the gated head_sha (needs.pr-gate/slash-gate) for the results comment instead of the
  live PR head, so a push during a slow /evaluate-skills run can't misattribute results.
- Wrap the decorative pulls.get() in try/catch with fallbacks so a transient API failure
  (rate-limit/5xx) doesn't sink the whole badges/details comment.
- Overall = Passed only when LLM eval is explicitly Passed/Skipped; cancelled/Unknown now
  reads 'Needs attention', consistent with report-status.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo

kubaflo commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@PureWeen addressed all three findings in badeb78d1c (gated SHA for the results comment, guarded the decorative pulls.get, and tightened the Overall status so cancelled/Unknown eval no longer shows green). Ready for re-review — thanks!

@PureWeen PureWeen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 2 Adversarial Review — Multi-Model Code Review

Re: previous review #4498374195. Re-reviewed badeb78d1c with 3 independent reviewers from different model families (adversarial consensus).


✅ Round 1 fixes verified clean (3/3 reviewers)

All three findings from the previous round were addressed at root cause; no overcorrection or new regressions introduced by the fixes themselves:

Finding Round-1 line Fix in badeb78d1c Verdict
overallStatus = Passed on cancelled/Unknown LLM eval L1038 L1049 tightened to staticStatus === 'Passed' && (evalStatus === 'Passed' || evalStatus === 'Skipped') ✅ 3/3
Comment attributes results to wrong commit during /evaluate-skills race L693 New HEAD_SHA env at L680 sourced from pr-gate/slash-gate outputs; process.env.HEAD_SHA used for headSha/commit-link ✅ 3/3
Unguarded pulls.get() can sink entire comment L688 Wrapped in try/catch with prData = {} fallback (L690–700); decorative metadata degrades gracefully ✅ 3/3

⚠️ One new finding (2/3 after dispute round)

One reviewer flagged a logic path that was not present in Round 1. After dispute, both other reviewers confirmed the trace is factually correct but argued for ⚠️ rather than ❌ because the underlying mis-categorization is mirrored in two pre-existing reporters (report-status shell at L1170+ and check-run github-script at L1219+) that this PR does not touch. See the inline comment at L1034 for the trace and a minimal in-scope guard suggestion.


Methodology

  • 3 reviewers (different model families) → adversarial consensus
  • 1/3 findings sent to a dispute round with the other 2 reviewers
  • Inline comments only on paths/lines validated against the actual PR diff
  • event: COMMENT — approval is a human decision

Comment thread .github/workflows/skill-validation.yml
When the discover-eval job fails (transient runner / gh api error) for a contributor PR
with static passing, evalStatus fell through to 'Skipped' and Overall showed green ✅.
Guard on discoverResult !== 'success' → 'Needs attention', distinguishing a discovery
infra failure from the legitimate 'no eval entries' case (report-status/check-run unchanged).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo

kubaflo commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

@PureWeen addressed the round-2 finding in a5936c1c7c — a failed discover-eval job no longer renders as a green Overall (now Needs attention). Ready for re-review, thanks!

@PureWeen PureWeen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 3 Adversarial Review — Multi-Model Code Review

Re-reviewed a5936c1c7c with 3 independent reviewers from different model families (adversarial consensus). This round verifies the fix for the Round 2 finding (review #4501101962).

✅ Round 2 finding resolved (3/3 consensus)

The new guard in a5936c1c7c is verified correct by all three reviewers, with no regression:

} else if (discoverResult !== 'success') {
  evalStatus = 'Needs attention';   // discovery job failed → don't mask as Skipped/Passed
}

Branch ordering is the key detail, and it's right: the new branch sits after !evalRan && !isContributor, so:

Scenario evalStatus Overall
External (non-contributor) PR, discover-eval skipped Skipped ✅ Passed (unchanged — intended)
Contributor PR, discover-eval failed/cancelled Needs attention ⚠️ Needs attention (fixed — was ✅)
Contributor PR, discover-eval success, no entries Skipped ✅ Passed (unchanged)
Contributor PR, eval ran with results Passed/Failed per results (unchanged)

The guard fires only when discovery didn't succeed and the actor is a contributor/slash-triggered — exactly the targeted case. External PRs are still caught first and map to Skipped, so there are no false "Needs attention" alarms.

✅ Round 1 fixes still hold (3/3)

All three earlier fixes re-verified correct and complete — gated HEAD_SHA used first (no wrong-commit attribution), pulls.get() guarded with graceful metadata fallback (no remaining unguarded deref), and overallStatus only green when staticStatus === 'Passed' && (evalStatus === 'Passed' || evalStatus === 'Skipped'). Security spot-checks pass: PR title / badge alt are HTML-escaped, badge label/message go through encodeURIComponent, and the SHA only flows into hex-only contexts.

💡 Optional follow-up (non-blocking, 1/3 reviewers — out of scope for this PR)

For the same contributor + discover-failure case, the new badge now correctly reads LLM: Needs attention, but the pre-existing detailed body section (and the sibling report-status / check-run reporters) still fall through to "⏭️ LLM Evaluation: Skipped". Those are pre-existing and outside this styling PR's diff — a sensible follow-up issue would add the same discoverResult !== 'success' guard to all reporters so the badge and body agree. Not a blocker for this PR.


Verdict: No blocking issues. All four findings from the prior two rounds are resolved with 3/3 consensus and no overcorrection. The only remaining item is the optional consistency follow-up above.

Methodology: 3 independent reviewers, adversarial consensus. Changed path is workflow YAML with embedded github-script; no automated test coverage exists for this comment-rendering logic (validated by the author with node --check). event: COMMENT — approval is a human decision.

@kubaflo

kubaflo commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the round-3 verification, @PureWeen — glad the discover-eval guard holds up across all three reviewers with no overcorrection.

Agreed on the optional follow-up: the badge now reads LLM: Needs attention, but the detailed body section and the sibling report-status/check-run reporters still fall through to Skipped for the contributor + discover-failure case. As we settled in round 2, those reporters are pre-existing and out of scope for this styling PR, so I'll keep them untouched here and track the badge/body/reporter consistency as a separate follow-up rather than widen this diff. Appreciate the thorough multi-round review!

@PureWeen PureWeen merged commit 1c0c0eb into dotnet:main Jun 16, 2026
16 of 19 checks passed
@github-actions github-actions Bot added this to the .NET 10 SR9 milestone Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-infrastructure CI, Maestro / Coherency, upstream dependencies/versions s/agent-fix-win AI found a better alternative fix than the PR s/agent-reviewed PR was reviewed by AI agent workflow (full 4-phase review)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants