Skip to content

feat(harness): inject skills as user-role messages instead of system-prompt blocks (Hermes parity) #230

Description

@subinium

Summary

Hermes injects skill content as user-role messages rather than appending it to the system prompt. The CrowClaw harness currently embeds matched skills as <skill> XML blocks inside the system prompt (packages/core/src/prompt-builder.ts:47), which means the system prompt mutates per turn whenever the skill match set changes. This invalidates Anthropic prompt-cache and OpenAI prefix-cache hits on the (longer, otherwise stable) system message.

Adopt the Hermes pattern verbatim — keep accuracy, do not work around it.

Why this matters (no shortcut)

  • Anthropic cache_control: ephemeral saves 5min cache hit at 90% input-token discount only if the prefix is byte-identical. Today, matchSkillManifests() runs per turn and the resulting <skill> blocks change → system prompt changes → cache miss.
  • For a 12-iteration session with skills attaching/detaching, current behavior costs ~10× the prompt-caching savings the design intended to deliver.
  • Workarounds (e.g., "freeze skills per session") sacrifice the per-turn relevance score Hermes specifically values. Don't take them.

Source (Hermes)

  • NousResearch/hermes-agentAGENTS.md documents skills-as-user-messages caching rationale: skill content is rendered into a user-role turn at invocation time, separate from the system prompt's stable bootstrap.
  • agentskills.io standard: skill payloads carry their own envelope and are caller-injected, not provider-system-prompted.

Scope

Files to change

File Change
packages/core/src/prompt-builder.ts Remove the <skill> block emission from buildSystemPrompt(). Keep matchedSkills parameter accepted but ignored in the system-prompt path.
packages/core/src/index.ts In both run() and runStreaming(), after recall + skill match, prepend a synthetic user-role message <skill name="..." tools="..."><description>...</description><instructions>...</instructions></skill> per matched skill, immediately before the actual user message. Wrap all matched-skill messages in a single user turn so they share one cache key.
packages/core/src/index.ts Track injected skill turn so recordTurn() does not persist the skill payload to session state (these are ephemeral injection artifacts, like memory recall).
tests/agent-loop.test.ts (or similar) New test: when the same user message is sent twice, the system prompt is byte-identical across calls (verify via SHA-256 hash). Today this test would fail when skills match differently.

Backend contract changes

  • The shape of provider request bodies changes: messages[] now begins with optional skill-injection user turns. No public route signature changes.
  • The MatchedSkill[] array remains exported from core for plugin/observability consumers.

What must NOT change

  • matchSkillManifests() scoring algorithm and the 3-skill cap.
  • The skillTokenBudget (default 16k) — applied to the new user-turn block.
  • The on-disk skill format (SKILL.md frontmatter).
  • Persona prompt placement (still at the top of the system prompt).
  • Memory recall placement (still as a <recalled-context> system message).

Acceptance criteria

  • When the same user message is sent on two consecutive turns and the same skill is matched, the system prompt sent to the provider is byte-identical (SHA-256 hash equality).
  • When skill matches change between turns, the system prompt stays byte-identical and the skill payload appears in a user-role turn instead.
  • Anthropic provider's cache_control marker stays effective on the system prompt across skill-match changes.
  • No skill content is persisted to SessionState.messages (verified by inspecting the message log after a turn).
  • Existing skill-driven behavior (skill instructions actually shape the agent's reply) continues to pass current tests after the new test above is added.

Performance target

  • Anthropic prompt-cache hit rate on system prompt: ≥ 90% across a 10-turn session where skill matches vary. Measured by Anthropic API response cache_read_input_tokens / total_input_tokens.
  • Average input-token cost per turn after the second turn drops to ≤ 20% of an uncached call (90% discount × ~95% prefix-cacheable).

Out of scope

  • Implementing skills-as-user-messages for non-Anthropic providers' caching (OpenAI prefix-cache works with the same mechanism but isn't measured here).
  • Adding new skill formats or extending MatchedSkill shape.
  • Skill ranking changes.

Labels: enhancement, priority/critical, perf, source/hermes

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestperfPerformance optimizationpriority/criticalCritical — fix before next releasesource/hermesPattern from NousResearch/hermes-agent

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions