Summary
Key Findings
-
SKILL.md lacks engine-setup troubleshooting guidance — The surface skill documents CLI commands and engine options but does not explain what to do when an engine is misconfigured or an API key is absent. Agents guided solely by SKILL.md have no fallback heuristics for setup failures. Expected impact: Reduces confusion and support requests when engine setup is incomplete.
-
Suite coverage gap: no eval case for MCP tool misconfiguration — .skill-optimizer/suite.yml covers ANSI escapes, deprecated output expressions, and requireCleanGit, but has no case testing an agent's ability to diagnose missing or misconfigured MCP tools — a common real-world failure mode documented in the workflow health runbook. Expected impact: Adds a high-value eval that exercises GitHub MCP skill and runbook guidance, increasing coverage of the most frequent CI failures.
-
SKILL.md omits the two-checkpoint validation strategy — AGENTS.md documents a mandatory two-checkpoint pre-commit/pre-PR validation pattern (make build && make fmt → make agent-report-progress), but SKILL.md contains no mention of it. Agents onboarded via SKILL.md skip these checkpoints and produce PRs that fail CI immediately. Expected impact: Prevents the recurring CI failures caused by unformatted or uncompiled code in agent-authored PRs (noted as causing 5 failures in a single day).
Evidence from Artifact
summary.json:
{
"repository": "github/gh-aw",
"run_mode": "dry-run",
"run_status": 0
}
run.log:
dry-run: Docker available but OPENROUTER_API_KEY not set; skipping suite execution
.skill-optimizer/suite.yml — current eval cases:
ansi-escape-prevention
sanitized-outputs-migration
skill-optimizer-clean-git
No case covers MCP tool misconfiguration, engine key errors, or pre-PR validation checkpoints.
SKILL.md: References AGENTS.md for conventions rather than embedding the critical two-checkpoint validation rule directly. Contains no troubleshooting section.
AGENTS.md (Critical Requirements section): States that failing to run make agent-report-progress before a PR is the "#1 cause of CI failures" and has caused "5 CI failures in a single day".
Recommendations
-
Add engine-setup troubleshooting to SKILL.md — Add a short "Common Issues" section that explains what to do when an engine key (e.g., OPENROUTER_API_KEY) is absent and how to verify engine configuration before running a workflow. This makes SKILL.md self-sufficient for agents that do not have access to AGENTS.md.
-
Add an MCP tool configuration eval case to .skill-optimizer/suite.yml — Create a case where the task describes a workflow failing because a required MCP toolset is not declared, and grade the answer against the runbook guidance in .github/aw/runbooks/workflow-health.md. This closes the largest coverage gap in the current suite.
-
Embed the two-checkpoint validation rule in SKILL.md and add a matching eval case — Copy the checkpoint summary from AGENTS.md into SKILL.md so agents using only the skill surface know to run make build && make fmt after the first edit and make agent-report-progress before opening a PR. Add a corresponding suite.yml case that grades whether an agent correctly identifies the pre-PR command.
Generated by Daily Skill Optimizer Improvements · ● 4.2M · ◷
Summary
OPENROUTER_API_KEYwas not set, so no suite cases were executedKey Findings
SKILL.mdlacks engine-setup troubleshooting guidance — The surface skill documents CLI commands and engine options but does not explain what to do when an engine is misconfigured or an API key is absent. Agents guided solely bySKILL.mdhave no fallback heuristics for setup failures. Expected impact: Reduces confusion and support requests when engine setup is incomplete.Suite coverage gap: no eval case for MCP tool misconfiguration —
.skill-optimizer/suite.ymlcovers ANSI escapes, deprecated output expressions, andrequireCleanGit, but has no case testing an agent's ability to diagnose missing or misconfigured MCP tools — a common real-world failure mode documented in the workflow health runbook. Expected impact: Adds a high-value eval that exercises GitHub MCP skill and runbook guidance, increasing coverage of the most frequent CI failures.SKILL.mdomits the two-checkpoint validation strategy —AGENTS.mddocuments a mandatory two-checkpoint pre-commit/pre-PR validation pattern (make build && make fmt→make agent-report-progress), butSKILL.mdcontains no mention of it. Agents onboarded viaSKILL.mdskip these checkpoints and produce PRs that fail CI immediately. Expected impact: Prevents the recurring CI failures caused by unformatted or uncompiled code in agent-authored PRs (noted as causing 5 failures in a single day).Evidence from Artifact
summary.json:{ "repository": "github/gh-aw", "run_mode": "dry-run", "run_status": 0 }run.log:.skill-optimizer/suite.yml— current eval cases:ansi-escape-preventionsanitized-outputs-migrationskill-optimizer-clean-gitNo case covers MCP tool misconfiguration, engine key errors, or pre-PR validation checkpoints.
SKILL.md: ReferencesAGENTS.mdfor conventions rather than embedding the critical two-checkpoint validation rule directly. Contains no troubleshooting section.AGENTS.md(Critical Requirements section): States that failing to runmake agent-report-progressbefore a PR is the "#1 cause of CI failures" and has caused "5 CI failures in a single day".Recommendations
Add engine-setup troubleshooting to
SKILL.md— Add a short "Common Issues" section that explains what to do when an engine key (e.g.,OPENROUTER_API_KEY) is absent and how to verify engine configuration before running a workflow. This makesSKILL.mdself-sufficient for agents that do not have access toAGENTS.md.Add an MCP tool configuration eval case to
.skill-optimizer/suite.yml— Create a case where the task describes a workflow failing because a required MCP toolset is not declared, and grade the answer against the runbook guidance in.github/aw/runbooks/workflow-health.md. This closes the largest coverage gap in the current suite.Embed the two-checkpoint validation rule in
SKILL.mdand add a matching eval case — Copy the checkpoint summary fromAGENTS.mdintoSKILL.mdso agents using only the skill surface know to runmake build && make fmtafter the first edit andmake agent-report-progressbefore opening a PR. Add a correspondingsuite.ymlcase that grades whether an agent correctly identifies the pre-PR command.