Summary
Hermes injects skill content as user-role messages rather than appending it to the system prompt. The CrowClaw harness currently embeds matched skills as <skill> XML blocks inside the system prompt (packages/core/src/prompt-builder.ts:47), which means the system prompt mutates per turn whenever the skill match set changes. This invalidates Anthropic prompt-cache and OpenAI prefix-cache hits on the (longer, otherwise stable) system message.
Adopt the Hermes pattern verbatim — keep accuracy, do not work around it.
Why this matters (no shortcut)
- Anthropic
cache_control: ephemeral saves 5min cache hit at 90% input-token discount only if the prefix is byte-identical. Today, matchSkillManifests() runs per turn and the resulting <skill> blocks change → system prompt changes → cache miss.
- For a 12-iteration session with skills attaching/detaching, current behavior costs ~10× the prompt-caching savings the design intended to deliver.
- Workarounds (e.g., "freeze skills per session") sacrifice the per-turn relevance score Hermes specifically values. Don't take them.
Source (Hermes)
NousResearch/hermes-agent — AGENTS.md documents skills-as-user-messages caching rationale: skill content is rendered into a user-role turn at invocation time, separate from the system prompt's stable bootstrap.
agentskills.io standard: skill payloads carry their own envelope and are caller-injected, not provider-system-prompted.
Scope
Files to change
| File |
Change |
packages/core/src/prompt-builder.ts |
Remove the <skill> block emission from buildSystemPrompt(). Keep matchedSkills parameter accepted but ignored in the system-prompt path. |
packages/core/src/index.ts |
In both run() and runStreaming(), after recall + skill match, prepend a synthetic user-role message <skill name="..." tools="..."><description>...</description><instructions>...</instructions></skill> per matched skill, immediately before the actual user message. Wrap all matched-skill messages in a single user turn so they share one cache key. |
packages/core/src/index.ts |
Track injected skill turn so recordTurn() does not persist the skill payload to session state (these are ephemeral injection artifacts, like memory recall). |
tests/agent-loop.test.ts (or similar) |
New test: when the same user message is sent twice, the system prompt is byte-identical across calls (verify via SHA-256 hash). Today this test would fail when skills match differently. |
Backend contract changes
- The shape of provider request bodies changes:
messages[] now begins with optional skill-injection user turns. No public route signature changes.
- The
MatchedSkill[] array remains exported from core for plugin/observability consumers.
What must NOT change
matchSkillManifests() scoring algorithm and the 3-skill cap.
- The
skillTokenBudget (default 16k) — applied to the new user-turn block.
- The on-disk skill format (SKILL.md frontmatter).
- Persona prompt placement (still at the top of the system prompt).
- Memory recall placement (still as a
<recalled-context> system message).
Acceptance criteria
Performance target
- Anthropic prompt-cache hit rate on system prompt: ≥ 90% across a 10-turn session where skill matches vary. Measured by Anthropic API response
cache_read_input_tokens / total_input_tokens.
- Average input-token cost per turn after the second turn drops to ≤ 20% of an uncached call (90% discount × ~95% prefix-cacheable).
Out of scope
- Implementing skills-as-user-messages for non-Anthropic providers' caching (OpenAI prefix-cache works with the same mechanism but isn't measured here).
- Adding new skill formats or extending
MatchedSkill shape.
- Skill ranking changes.
Labels: enhancement, priority/critical, perf, source/hermes
Summary
Hermes injects skill content as user-role messages rather than appending it to the system prompt. The CrowClaw harness currently embeds matched skills as
<skill>XML blocks inside the system prompt (packages/core/src/prompt-builder.ts:47), which means the system prompt mutates per turn whenever the skill match set changes. This invalidates Anthropic prompt-cache and OpenAI prefix-cache hits on the (longer, otherwise stable) system message.Adopt the Hermes pattern verbatim — keep accuracy, do not work around it.
Why this matters (no shortcut)
cache_control: ephemeralsaves 5min cache hit at 90% input-token discount only if the prefix is byte-identical. Today,matchSkillManifests()runs per turn and the resulting<skill>blocks change → system prompt changes → cache miss.Source (Hermes)
NousResearch/hermes-agent—AGENTS.mddocuments skills-as-user-messages caching rationale: skill content is rendered into a user-role turn at invocation time, separate from the system prompt's stable bootstrap.agentskills.iostandard: skill payloads carry their own envelope and are caller-injected, not provider-system-prompted.Scope
Files to change
packages/core/src/prompt-builder.ts<skill>block emission frombuildSystemPrompt(). KeepmatchedSkillsparameter accepted but ignored in the system-prompt path.packages/core/src/index.tsrun()andrunStreaming(), after recall + skill match, prepend a synthetic user-role message<skill name="..." tools="..."><description>...</description><instructions>...</instructions></skill>per matched skill, immediately before the actual user message. Wrap all matched-skill messages in a single user turn so they share one cache key.packages/core/src/index.tsrecordTurn()does not persist the skill payload to session state (these are ephemeral injection artifacts, like memory recall).tests/agent-loop.test.ts(or similar)Backend contract changes
messages[]now begins with optional skill-injection user turns. No public route signature changes.MatchedSkill[]array remains exported fromcorefor plugin/observability consumers.What must NOT change
matchSkillManifests()scoring algorithm and the 3-skill cap.skillTokenBudget(default 16k) — applied to the new user-turn block.<recalled-context>system message).Acceptance criteria
cache_controlmarker stays effective on the system prompt across skill-match changes.SessionState.messages(verified by inspecting the message log after a turn).Performance target
cache_read_input_tokens / total_input_tokens.Out of scope
MatchedSkillshape.Labels:
enhancement,priority/critical,perf,source/hermes