Skip to content

[skill-optimizer] Daily Skill Optimizer Improvements - 2026-05-09 #31139

@github-actions

Description

@github-actions

Summary

  • Run mode: dry-run
  • Status: ⚠️ Skipped — OPENROUTER_API_KEY not set; suite execution was bypassed

Key Findings

  1. SKILL.md lacks validation/test commands — The prompt surface (SKILL.md) describes only compilation and MCP commands but omits mandatory pre-commit checks (make agent-finish, make build, make fmt). Agents using SKILL.md as their primary entry point may skip these gates, causing CI failures.

    • Expected impact: Fewer CI failures from agents that rely solely on SKILL.md for workflow guidance.
  2. Benchmark is unreachable because OPENROUTER_API_KEY is never set — The optimizer config targets Claude Sonnet 4.6 via OpenRouter, but the secret is absent from the repo. Every scheduled run exits as a no-op dry-run, so the optimizer never produces benchmark data that would drive real skill improvements.

    • Expected impact: Enabling the secret would unlock real pass/fail metrics and allow the optimizer to automatically iterate on SKILL.md.
  3. maxTasks: 20 with maxIterations: 3 may be too conservative for a large skill surfaceSKILL.md covers five distinct capability areas (compile, engine config, MCP, safe-outputs, audit). Twenty tasks spread across five areas gives only ~4 tasks per area — too thin to detect regressions reliably.

    • Expected impact: More granular benchmark coverage, making it easier to detect skill degradation in specific feature areas.
Evidence from Artifact

summary.json

{
  "repository": "github/gh-aw",
  "run_mode": "dry-run",
  "run_status": 0,
  "run_url": "https://github.com/github/gh-aw/actions/runs/25590945527"
}

run.log

dry-run: Docker available but OPENROUTER_API_KEY not set; skipping suite execution

.skill-optimizer/skill-optimizer.json — benchmark config references openrouter/anthropic/claude-sonnet-4.6 and sets maxTasks: 20, maxIterations: 3, perModelFloor: 0.6, targetWeightedAverage: 0.8.

SKILL.md — prompt surface lists four gh aw commands and points to AGENTS.md/skills; it contains no mention of make agent-finish, make build, or make fmt.

Recommendations

  1. Add mandatory validation commands to SKILL.md — Append a "Validation" section listing make build && make fmt (Checkpoint 1) and make agent-report-progress (Checkpoint 2) so agents starting from SKILL.md know how to gate their changes before opening a PR.

  2. Add OPENROUTER_API_KEY as a repository secret — Without this secret the optimizer can never run in benchmark mode. Add the secret to the repo (or org) and verify the next scheduled run produces a non-empty suite-results/ directory.

  3. Increase maxTasks or split into per-area suites — Raise maxTasks to at least 40 in .skill-optimizer/skill-optimizer.json, or create separate benchmark suites for each major capability area (compile, engine, MCP, safe-outputs, audit) to prevent regressions in one area from being diluted by passing tasks in others.

Generated by Daily Skill Optimizer Improvements · ● 3.2M ·

  • expires on May 16, 2026, 3:56 AM UTC

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions