[skill-optimizer] Daily Skill Optimizer Improvements - 2026-05-09

### Summary
- Run mode: dry-run
- Status: ⚠️ Skipped — `OPENROUTER_API_KEY` not set; suite execution was bypassed

### Key Findings

1. **SKILL.md lacks validation/test commands** — The prompt surface (`SKILL.md`) describes only compilation and MCP commands but omits mandatory pre-commit checks (`make agent-finish`, `make build`, `make fmt`). Agents using `SKILL.md` as their primary entry point may skip these gates, causing CI failures.
   - *Expected impact*: Fewer CI failures from agents that rely solely on `SKILL.md` for workflow guidance.

2. **Benchmark is unreachable because `OPENROUTER_API_KEY` is never set** — The optimizer config targets Claude Sonnet 4.6 via OpenRouter, but the secret is absent from the repo. Every scheduled run exits as a no-op dry-run, so the optimizer never produces benchmark data that would drive real skill improvements.
   - *Expected impact*: Enabling the secret would unlock real pass/fail metrics and allow the optimizer to automatically iterate on `SKILL.md`.

3. **`maxTasks: 20` with `maxIterations: 3` may be too conservative for a large skill surface** — `SKILL.md` covers five distinct capability areas (compile, engine config, MCP, safe-outputs, audit). Twenty tasks spread across five areas gives only ~4 tasks per area — too thin to detect regressions reliably.
   - *Expected impact*: More granular benchmark coverage, making it easier to detect skill degradation in specific feature areas.

<details>
<summary><b>Evidence from Artifact</b></summary>

**`summary.json`**
```json
{
  "repository": "github/gh-aw",
  "run_mode": "dry-run",
  "run_status": 0,
  "run_url": "https://github.com/github/gh-aw/actions/runs/25590945527"
}
```

**`run.log`**
```
dry-run: Docker available but OPENROUTER_API_KEY not set; skipping suite execution
```

**`.skill-optimizer/skill-optimizer.json`** — benchmark config references `openrouter/anthropic/claude-sonnet-4.6` and sets `maxTasks: 20`, `maxIterations: 3`, `perModelFloor: 0.6`, `targetWeightedAverage: 0.8`.

**`SKILL.md`** — prompt surface lists four `gh aw` commands and points to `AGENTS.md`/skills; it contains no mention of `make agent-finish`, `make build`, or `make fmt`.

</details>

### Recommendations

1. **Add mandatory validation commands to `SKILL.md`** — Append a "Validation" section listing `make build && make fmt` (Checkpoint 1) and `make agent-report-progress` (Checkpoint 2) so agents starting from `SKILL.md` know how to gate their changes before opening a PR.

2. **Add `OPENROUTER_API_KEY` as a repository secret** — Without this secret the optimizer can never run in benchmark mode. Add the secret to the repo (or org) and verify the next scheduled run produces a non-empty `suite-results/` directory.

3. **Increase `maxTasks` or split into per-area suites** — Raise `maxTasks` to at least 40 in `.skill-optimizer/skill-optimizer.json`, or create separate benchmark suites for each major capability area (compile, engine, MCP, safe-outputs, audit) to prevent regressions in one area from being diluted by passing tasks in others.







> Generated by [Daily Skill Optimizer Improvements](https://github.com/github/gh-aw/actions/runs/25590945527/agentic_workflow) · ● 3.2M · [◷](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fdaily-skill-optimizer%22&type=issues)
> - [x] expires  on May 16, 2026, 3:56 AM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[skill-optimizer] Daily Skill Optimizer Improvements - 2026-05-09 #31139

Summary

Key Findings

Recommendations

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[skill-optimizer] Daily Skill Optimizer Improvements - 2026-05-09 #31139

Description

Summary

Key Findings

Recommendations

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions