Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
fcb3e6b
fix(release): make v0.8.2 safe to ship across runtime surfaces
subinium May 2, 2026
f712fbc
fix(docker): bind container smoke server explicitly
subinium May 2, 2026
3a9a515
fix(docker): force package artifacts during image build
subinium May 2, 2026
b763866
ci(docker): check running container by inspect
subinium May 2, 2026
6239f09
feat(deploy): add self-host deployment paths
subinium May 2, 2026
cda8621
fix(security): formalize auth and delegation guards
subinium May 2, 2026
3b197b3
feat(providers): harden OpenAI and web tool ergonomics
subinium May 2, 2026
bf69bf9
feat(memory): make recall and imports portable across installs
subinium May 2, 2026
3e5997c
feat(tools): add fallback adapters and rollout evaluation
subinium May 2, 2026
9d38046
feat(security): add tailnet and secret hardening
subinium May 2, 2026
375a99b
chore(ts): enforce checked indexed access
subinium May 2, 2026
93dcdb7
feat(plugins): make extension surfaces discoverable
subinium May 2, 2026
3baeb8f
feat(runtime): gate gateways and catalog installs
subinium May 2, 2026
234a23d
feat(web): complete operator dashboard workflows
subinium May 2, 2026
a5d1720
refactor(runtime-node): isolate route handling for release maintenance
subinium May 2, 2026
5f6e92c
feat(i18n): carry operator locale into prompts
subinium May 2, 2026
6643dcb
feat(deploy): close Cloudflare and self-host release gaps
subinium May 2, 2026
75b7ae2
feat(tools): harden provider fallbacks and terminal adapters
subinium May 2, 2026
339307c
refactor(runtime-node): finish release issue decomposition
subinium May 2, 2026
ddd517c
feat(runtime): close final release issue gaps
subinium May 2, 2026
772e907
docs(changelog): record local 0.8.1 issue sweep
subinium May 2, 2026
94d1596
docs(release): add live 0.8.1 worklog
subinium May 2, 2026
0f1d7df
docs(release): codify 0.8.1 checkpoint discipline
subinium May 2, 2026
8a9e0de
docs(release): start open issue coverage audit
subinium May 2, 2026
776ee6f
docs(release): record initial coverage audit findings
subinium May 2, 2026
858e08f
Track unresolved release verifier gaps
subinium May 2, 2026
1b098d6
feat(runtime): complete remaining release contracts
subinium May 2, 2026
f5ae7e0
feat(dashboard): finish release polish gaps
subinium May 2, 2026
feda73b
docs(release): record verified issue sweep completion
subinium May 2, 2026
f02e7b8
chore(release): consolidate v0.8.2 release notes
subinium May 2, 2026
60f4fec
docs(changelog): correct v0.8.2 test count (2,982 / 2,982, no skips)
subinium May 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 10 additions & 20 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,20 +1,10 @@
node_modules/
.git/
.omx/
.env
.env.*
!.env.example
.wrangler/
.dev.vars
dist/
coverage/
tests/
docs/
.github/
*.test.ts
*.tsbuildinfo
*.log
.DS_Store
.claude/
tmp/
.tmp/
node_modules
packages/*/dist
**/*.tsbuildinfo
coverage
.git
.github
.codex
.omx
test-output.txt
npm-debug.log*
2 changes: 2 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ CROWCLAW_MODEL=gpt-4o
# --- Dashboard ---
# Optional token to protect the dashboard UI. If unset, access is unrestricted.
CROWCLAW_DASHBOARD_TOKEN=
# Public URL used by reverse proxies and deployment docs.
CROWCLAW_PUBLIC_URL=

# --- Skills & Personas ---
# Directory to load local SKILL.md files from
Expand Down
24 changes: 24 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,33 @@ jobs:
- name: Typecheck
run: npm run typecheck

- name: Cloudflare route parity
run: node scripts/audit-routes.mjs --check

- name: Test
run: npm test -- --reporter=verbose 2>&1 | tee test-output.txt

- name: Docker smoke
run: |
docker build -t crowclaw-ci .
cid=$(docker run -d -p 8787:8787 --name crowclaw-ci -e CROWCLAW_DASHBOARD_TOKEN=ci-smoke-token crowclaw-ci)
trap 'docker rm -f "$cid" >/dev/null 2>&1 || true' EXIT
for i in $(seq 1 20); do
if [ "$(docker inspect -f '{{.State.Running}}' "$cid" 2>/dev/null || echo false)" != "true" ]; then
docker logs "$cid" || true
docker ps -a
exit 1
fi
if curl -fsS http://127.0.0.1:8787/healthz; then
docker stop --time 10 "$cid"
exit 0
fi
sleep 2
done
docker logs "$cid" || true
docker ps -a
exit 1

- name: Test Summary
if: always()
run: |
Expand Down
198 changes: 198 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,204 @@ All notable changes to CrowClaw will be documented in this file.
> Releases v0.2.0 through v0.3.4 were tracked in GitHub Releases. See
> https://github.com/subinium/hermes-agent-typescript/releases for details.

## [0.8.2] — 2026-05-03 — Audit + parity sweep: 53-issue release

This release lands two parallel investigations against the post-v0.8.1
codebase: the runtime/security hardening sweep that opened immediately after
v0.8.1 (Docker boot, Cloudflare deployment drift, OpenAI-compatible request
shapes, vision SSRF validation, persistent audit logs, optional OpenTelemetry
hooks), plus the v0.6/v0.7 audit-debt cleanup that had been carried open
across earlier releases. Both ship together because verifier passes for the
audit-debt items proved the implementation contracts the audits had spec'd
were not yet complete on `main`; closing the issues required finishing those
contracts. 30 commits across 8 parallel sub-agents with strict file
ownership; ~203 files changed, +20.2k / -8.1k lines.

### Critical
- **#253** Docker now starts the built CLI server entrypoint instead of
loading a runtime module that never called `listen()`.
- **#256** Security audit events persist to JSONL under
`CROWCLAW_DATA_DIR/audit` by default, with file permissions, retention,
and graceful shutdown flushing.
- **#258** Security events now carry optional provenance (`agentId`,
`model`, `provider`, `presetId`), and audit logs expose `flush()` for
tests and consumers that need to drain pending events.
- **#261** `vision.analyze` validates HTTP(S) image URLs with DNS-aware
SSRF preflight before fetch or provider handoff.
- **#262** Docker image is multi-stage, non-root, `tini`-managed,
healthchecked, and volume-backed via `CROWCLAW_DATA_DIR=/data`.

### Provider / runtime correctness
- **#254** Wrangler release config is version-synchronized, and the D1
binding placeholder now points operators to the `wrangler d1 create`
replacement flow.
- **#257** Optional OpenTelemetry hooks record session, iteration, and
tool spans without requiring `@opentelemetry/api` at runtime.
- **#259** OpenAI-compatible providers send the right token and sampling
fields for chat-completions vs. Responses API and strip unsupported
temperature fields from reasoning models.
- **#260** Native structured output now supports Responses API
`text.format` JSON-schema requests and respects `requireStream` by
staying on the streaming path.
- **#287** Codex/OpenAI ChatGPT provider docs, defaults, and
structured-output tests now match the actual `gpt-5.5` and
`requireStream` behavior; the gpt-5 family is added to the native
`json_schema` guard and the codex JSDoc is corrected.

### Runtime, gateway, observability (v0.6 audit-debt cleanup)
- **#73** Gateway endpoint policy is now configurable through persisted
gateway config and schema fields (`policyTier`, `allowedEndpoints`),
applied to Discord outbound routes/delivery, and surfaced through
`gateway:policy_denied` events.
- **#74** Gateway token rotation, revocation, webhook mutation, and
pairing revocation enforce caller-scope containment before mutating
owner-scoped gateway secrets.
- **#82** Prometheus metrics moved to gated `/api/metrics`; OpenTelemetry
opts into `gen_ai_latest_experimental` semantic conventions and emits
stable GenAI span names for harness runs, tool loops, exec calls,
context assembly, and outbound delivery.
- **#96** Runtime startup restores latest `in_progress` checkpoints
across sessions, emits `session:resumed`, and the CLI exposes
`--no-resume` as an operator override.
- **#155** `runtime-node` entrypoint responsibility was split into
focused route, bootstrap, lifecycle, scheduler, plugin, startup,
support, and gateway modules while keeping `index.ts` as the
assembler.
- **#160** Terminal background process tracking is no longer
module-global; terminal state is owned by injected
per-runtime/per-registry sessions.
- **#163** `noUncheckedIndexedAccess` remains enabled across the
TypeScript base config.

### Memory, skills, embedded protocol surfaces (v0.7 audit-debt + parity)
- **#90** Memory backends now have a plugin contract, runtime provider
selection, and a Honcho-compatible reference example.
- **#184** Memory edit/delete UX warns about sensitive data and
redaction, requires typed delete confirmation, and keeps preview/edit
affordances explicit.
- **#187** Memory records carry size/token metadata, and per-session
summaries include estimated memory cost.
- **#188** The Skills settings UI can preview installed and imported
skills through the existing `skill.preview` tool path.
- **#202**, **#203** Embedded MCP and ACP protocol servers receive the
live runtime session store and tool registry instead of disconnected
stubs.
- **#270** Cross-session memory recall now flows through an
`onSessionEnd` hook with optional LLM summarisation (Hermes parity).
- **#271** SkillManifest carries an sha256 content-hash that is verified
on load (NemoClaw parity).
- **#281** Local `memory.search` fallback uses deterministic
semantic-style sparse ranking instead of relying only on substring
matches.
- **#282** Delegate depth is typed, validated, and propagated through
core run inputs instead of legacy `__delegateDepth` casts.
- **#286** Episodic, semantic, and workspace memory now register as
distinct provider tiers in the memory contract (NeMo parity).

### Tools (provider, fetch, voice, image, retry)
- **#268** New `voice.stt` transcription tool (Hermes parity).
- **#269** Atropos RL environment adapter for trajectory rollout
(Hermes parity).
- **#272** Batch-runner gains expected-output assertions and accuracy
scoring (NeMo parity).
- **#273** Docker and SSH terminal execution modes are activated for
the sandbox executor (NemoClaw parity).
- **#274** Token counting replaces the `chars / 4` heuristic with
per-model encoding (`cl100k` / `o200k`).
- **#275** OpenAI requests structure system + tools as a stable prefix
for automatic prompt caching.
- **#277** OpenAI requests retry with exponential backoff on 429/5xx.
- **#278** `web.fetch` adds reader-mode markdown conversion plus a byte
cap.
- **#279** `web.search` replaces the DDG HTML scrape with a structured
provider API.
- **#288** `image.generate` and `vision.analyze` add multi-provider
fallback (Replicate, Gemini).

### Security and access
- **#265** Tailscale-aware bind plus opt-in tailnet allowlist for SSRF.
- **#266** Webhook and chat routes are rate-limited against credit-burn
DoS.
- **#267** Secret loading now includes a SOPS CLI-backed reference
source in addition to env, files, systemd credentials, and 1Password
references.
- **#276** `auth.json` schema is validated, and the runtime warns when
the file is world-readable.
- **#280** SSRF blocklist now covers `192.0.0.0/24` and the 6to4 / Teredo
IPv6 ranges.

### Deployment (Docker, Cloudflare, self-host)
- **#263** `docker-compose.yml` and a Caddy template for VPS deployments.
- **#264** launchd plist plus Mac Mini self-host runbook (pmset,
caffeinate, Tailscale).
- **#283** WhatsApp and Signal channel adapters (Hermes parity).
- **#284** `crowclaw migrate import` CLI command for
settings/memories/skills (Hermes parity).
- **#285** Singularity / Apptainer HPC container backend for the sandbox
executor (Hermes parity).

### Dashboard polish (v0.8.1 verifier-gap follow-ups)
- **#243** Dashboard markdown rendering keeps `marked` + `dompurify` and
drops the eager highlight.js CDN load — highlight.js is now strictly
lazy-loaded only when the first code block renders.
- **#245** Visual reset removes legacy `--glass-*` dashboard tokens from
both UI source and the generated HTML; only modal overlays still use
`backdrop-filter`.
- **#249** A11y baseline adds toast live-region and reduced-motion test
coverage on top of the v0.8.1 contrast and skip-link work.
- **#250** Chat history renders a bounded incremental window so long
sessions stay responsive without forcing virtualization on every list.

### Localisation
- **#204** Korean locale selection now carries into prompt-facing
runtime context, not only dashboard chrome.

### Cross-package contracts added
- `MemoryProvider` plugin contract — `@crowclaw/memory`
- `gateway:policy_denied`, `session:resumed` event-bus types —
`@crowclaw/core`
- `flush()` on `SecurityAuditLog` — `@crowclaw/security`
- `parseReasoningBlocks` / `requireStream` provider hooks extended for
Codex / Responses API — `@crowclaw/providers`
- `policyTier`, `allowedEndpoints` schema fields on persisted gateway
config — `@crowclaw/runtime-node`
- `voice.stt`, expanded `web.fetch` / `web.search`, multi-provider
`image.generate` / `vision.analyze` — `@crowclaw/tools`
- SOPS reference source for secret loading — `@crowclaw/security`
- `unsupported_on_workers` 501 envelope — `@crowclaw/runtime-cloudflare`

### New dependencies
- `tini` (Docker runtime) — PID 1 reaper for the hardened image.
- SOPS (optional, host-installed) — CLI-backed secret reference source.

### Verification
- `npm run build -- --pretty false` — clean
- `npm run typecheck` — clean
- `npm test` — **2,982 / 2,982** (238 files, no skips)
- `npm audit --audit-level=moderate` — 0 vulnerabilities
- Focused unresolved-gap tests — 132 passed
- Dashboard a11y/polish tests — 41 passed
- `npm run build:ui --workspace @crowclaw/web` — clean
- `npm run build:html --workspace @crowclaw/web` — clean
- `node scripts/audit-routes.mjs --check` — clean
- `rg` checks for legacy dashboard glass/highlight.js tokens — clean
- `git diff --check` — clean

### Caveats
- **#255** Cloudflare route parity is intentionally bounded. This sweep
adds a generated parity inventory (`docs/cloudflare-route-parity.md`)
and explicit `501 unsupported_on_workers` responses for Node-only
terminal/code bridge routes — not full Cloudflare parity. CI guards
against new `missing` rows.
- **#267** SOPS support is CLI-backed (the `sops` binary must be
installed on the host); native bindings are out of scope.
- Local Docker image smoke was not run because the Docker daemon was
not available in this workspace; CI now runs Docker build + `/healthz`
smoke on every PR.
- The release ledger (`docs/release-v0.8.2-worklog.md`) is retained at
branch-merge time to preserve the audit trail; it is not consumed at
runtime.

## [0.8.1] — 2026-05-?? — Dashboard overhaul: 10-issue sweep against the v0.7.1 audit findings

The v0.7.1 release closed 18 wiring/correctness gaps in the dashboard. The
Expand Down
20 changes: 20 additions & 0 deletions Caddyfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
email {$CROWCLAW_ACME_EMAIL}
}

{$CROWCLAW_DOMAIN} {
encode zstd gzip

header {
Strict-Transport-Security "max-age=31536000; includeSubDomains"
X-Content-Type-Options "nosniff"
X-Frame-Options "DENY"
Referrer-Policy "strict-origin-when-cross-origin"
}

reverse_proxy crowclaw:8787 {
header_up X-Forwarded-Host {host}
header_up X-Forwarded-Proto {scheme}
header_up X-Forwarded-For {remote_host}
}
}
41 changes: 33 additions & 8 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,14 +1,39 @@
FROM node:22-slim
FROM node:22-slim AS builder
WORKDIR /app

COPY package.json package-lock.json* ./
RUN npm install
COPY package.json package-lock.json ./
COPY packages ./packages
COPY scripts ./scripts
COPY tsconfig.json tsconfig.base.json vitest.config.ts ./
RUN npm ci --no-audit

COPY . .
RUN npm run build
RUN npm run build -- --force && npm prune --omit=dev

# CrowClaw HTTP server listens on port 8787 by default (configurable via PORT env).
# Starts the Node.js runtime server, not the CLI REPL.
FROM node:22-slim AS runtime
RUN apt-get update \
&& apt-get install -y --no-install-recommends tini \
&& rm -rf /var/lib/apt/lists/* \
&& groupadd -r crowclaw \
&& useradd -r -g crowclaw -u 10001 -m -d /home/crowclaw crowclaw \
&& mkdir -p /data \
&& chown crowclaw:crowclaw /data
WORKDIR /app

COPY --from=builder --chown=crowclaw:crowclaw /app/package.json ./package.json
COPY --from=builder --chown=crowclaw:crowclaw /app/package-lock.json ./package-lock.json
COPY --from=builder --chown=crowclaw:crowclaw /app/node_modules ./node_modules
COPY --from=builder --chown=crowclaw:crowclaw /app/packages ./packages
COPY --from=builder --chown=crowclaw:crowclaw /app/scripts ./scripts

USER crowclaw
ENV CROWCLAW_DATA_DIR=/data \
NODE_ENV=production \
PORT=8787
VOLUME ["/data"]
EXPOSE 8787
ENTRYPOINT ["node"]
CMD ["packages/runtime-node/dist/index.js"]
HEALTHCHECK --interval=30s --timeout=5s --start-period=20s --retries=3 \
CMD node -e "fetch('http://127.0.0.1:8787/healthz').then((r)=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))"
# CrowClaw HTTP server: bind explicitly to 0.0.0.0 for container port publishing.
ENTRYPOINT ["/usr/bin/tini", "--", "node"]
CMD ["scripts/docker-serve.mjs"]
Loading
Loading