fix(container): APP_HOST defaults to 127.0.0.1 (HCG tier-2 E1 prereq)#132
Conversation
CI auto-trigger anomaly — manual dispatch workedPR-triggered workflows did not fire on this branch (no Hypothesis: estate-wide CI concurrency-pool saturation from the active sweep122 campaign ([memory: Manual dispatch result for Governance (the most relevant check given the file types in this PR — TOML + shell + YAML): all 6 jobs green at https://github.com/hyperpolymath/boj-server/actions/runs/26156354986
Other workflows (Dogfood Gate, CodeQL, Secret Scanner) don't support Leaving the PR DRAFT until checks land. 🤖 Generated with Claude Code |
Tightens three sites that feed the Zig adapter binary's `--host` flag in production deployments, materialising the ADR-0004 §1 invariant that BoJ's back-side bind is not externally routable when fronted by http-capability-gateway (HCG tier-2). This is action item #7 from the Phase E consumer-side audit on hyperpolymath/standards#100 — companion to actions #6 (boj-server#130, Cowboy bind in the Elixir path) and #8 (boj-server#131, k8s Service ClusterIP). Together the three layers give defence in depth: Elixir Cowboy binds loopback, Zig adapter binds loopback, k8s Service is internal-only. Changes: - `stapeln.toml` [targets.production]: `APP_HOST = "[::]"` -> `APP_HOST = "127.0.0.1"`. Adds a comment block explaining the Phase E posture and the override path for legacy deployments. - `container/entrypoint.sh` lines 40 + 140: `${APP_HOST:-[::]}` -> `${APP_HOST:-127.0.0.1}`. Adds a comment at the exec line pointing to ADR-0004 + the runbook. - `container/compose.prod.yaml` services.boj-rest.environment: `APP_HOST: "[::]"` -> `APP_HOST: "127.0.0.1"`. Adds an inline comment block. Audit-residue follow-ups (deliberately NOT in this PR): - `container/Containerfile` line 125: `ENV PHX_HOST=0.0.0.0` is vestigial — nothing in the codebase reads PHX_HOST (verified via grep). Leave alone or remove in a hygiene PR; not load-bearing. - Unifying APP_HOST (Zig adapter) and BOJ_BIND_IP (Elixir Cowboy) into one envelope is broader scope; file as a separate issue if the divergence proves annoying in operation. Override path for legacy/standalone deployments without HCG in front: set APP_HOST=0.0.0.0 (IPv4 all-interfaces) or APP_HOST=:: (IPv6 all-interfaces) in the deployment config — the in-repo defaults remain loopback. Refs hyperpolymath/standards#100 Refs hyperpolymath/standards#91 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e44b279 to
5830a46
Compare
…ds#100/#91) (#138) ## Summary Adds a second 2026-05-20 entry to `.machine_readable/6a2/STATE.a2ml` `[session-history]` documenting the afternoon HCG Phase E first-session output. The morning Tier C entry (already in main) stays in place; this new entry sits **above** it per the newest-first convention. `Refs hyperpolymath/standards#100` (Phase E), `Refs hyperpolymath/standards#91` (HCG tier-2 channel parent). **NOT Closes**. ## What's in A single new TOML entry in `[session-history] entries = [ ... ]` summarising the afternoon's deliverables: - PR `#128` (MERGED) — `docs/integration/hcg-tier2-rollout-runbook.md` (E5 rollout-and-rollback runbook, 308 lines, `!OWNER:` markers in §1.3 + §4) - PR `#130` (MERGED) — Cowboy bind `127.0.0.1` default + `BOJ_BIND_IP` env override (audit #6) - PR `#131` (MERGED) — k8s Service `LoadBalancer → ClusterIP` (audit #8) - PR `#132` (MERGED) — container `APP_HOST` defaults across `stapeln.toml` + `entrypoint.sh` + `compose.prod.yaml` (audit #7) - Issue `#135` (filed) — k8s NetworkPolicy follow-up (Low priority, Phase E acceptance non-critical) - Defence in depth: 3 independent loopback layers (Elixir Cowboy + Zig adapter + k8s Service) - Phase C §3 invariant 3 correction: confirmed via `git log` that the deny clause landed in `boj-server#106 (40e46f6)`; the channel-status comment claiming it was owner-gated was stale. The entry also records the **Phase E gating posture**: E1/E2/E3/E4 wiring + Trustfile `PENDING → DEPLOYED` flip are all explicitly gated on Phase D-3 (regression alert armed) + D-4 (real baseline numbers populated), per the runbook §1.1. The afternoon session shipped only the Phase-D-independent artefacts. ## Why a separate PR (not amended into another) All four code PRs (#128/#130/#131/#132) are already merged. The STATE.a2ml entry parallels the morning Tier C entry (already in main from the morning session), and the convention is per-session per-entry. Keeping this as its own doc PR is the cleanest record. ## Verification - TOML syntax: valid (single new `{ date = "...", description = "..." }` entry prepended). - Linting: `validate-a2ml` action will run on PR. ## Risk **Negligible.** Doc-only; no code or workflow changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
🔍 Hypatia Security ScanFindings: 30 issues detected
View findings[
{
"reason": "Stale AI session file -- delete",
"type": "stale",
"file": "GEMINI.md",
"action": "delete",
"rule_module": "root_hygiene",
"severity": "medium"
},
{
"reason": "Issue in quality.yml",
"type": "missing_workflow",
"file": "quality.yml",
"action": "create",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "Issue in security-policy.yml",
"type": "missing_workflow",
"file": "security-policy.yml",
"action": "create",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Action hyperpolymath/standards/.github/workflows/governance-reusable.yml@main needs attention",
"type": "unpinned_action",
"file": "governance.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/sanctify-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/academic-workflow-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/fireflag-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/ephapax-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/bofig-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/hesiod-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
}
]Powered by Hypatia Neurosymbolic CI/CD Intelligence |
) ## Summary - New `k8s/networkpolicy.yaml` restricting pod-network ingress to BoJ to only pods labelled `app: http-capability-gateway`. - Fourth, finest-grained layer on top of the existing three (Cowboy loopback bind #130, Zig adapter APP_HOST #132, ClusterIP Service #131). - Header documents CNI requirement, override pattern for non-HCG-fronted deployments, and the kubelet health-probe caveat to verify in staging. - CHANGELOG entry under `### Added`. ## Why Phase E acceptance is already satisfied by the three loopback layers above. ClusterIP makes BoJ unreachable from outside the cluster, but does NOT prevent a compromised neighbour pod that knows the ClusterIP from talking to BoJ. NetworkPolicy closes that gap. It also acts as a safety net if a future overlay re-introduces `type: NodePort` or `type: LoadBalancer` — the pod-network restriction still holds. Per ADR-0004 §1 invariant 4 (\"not externally routable\"), three independent layers must now be violated before BoJ's back-side surface is reachable from anywhere other than HCG. ## Test plan - [ ] YAML parses (`yaml.safe_load_all` confirmed; 1 `NetworkPolicy` doc, name `boj-server-ingress`). - [ ] `kubectl apply --dry-run=client -f k8s/networkpolicy.yaml` lints clean (CI runner; local lacked kubectl). - [ ] Staging smoke test (post-merge, before relying on this layer): - With the policy applied, `curl` from another pod (not labelled `app: http-capability-gateway`) to BoJ's ClusterIP times out. - From an HCG-labelled pod, `curl` succeeds. - [ ] Verify CNI plugin enforces NetworkPolicy (Calico/Cilium/Weave-NetPol). Flannel-no-VXLAN is silent no-op — documented in header. Closes #135 Refs hyperpolymath/standards#100, hyperpolymath/standards#91. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary Aligns \`tests/e2e_full.sh\` with the 6-route Elixir router (\`elixir/lib/boj_rest/router.ex\`). Implements option A from #151 (test follows the router as source of truth). - Drop step 6a \`POST /cartridges/feedback-mcp/load\` — cartridges auto-load via \`BojRest.Catalog\` at boot; there is no \`/load\` route. - Singularise \`/cartridges/feedback-mcp/invoke\` → \`/cartridge/feedback-mcp/invoke\` everywhere (8 sites). - Drop step 7 \`POST /order\` — no \`/order\` endpoint exists. The order-ticket flow is covered at the Zig FFI layer by \`tests/order_ticket_e2e.sh\`, so coverage is preserved. - Replace the step 9 \"invalid order returns error\" check with the analogous \"invoke against unknown cartridge returns 404\" check — exercises the same negative-path semantics on a route that exists. - Docstring updated; \`# See #151\` comments left as audit trail at the three removed-step sites. ## Why option A The router (\`router.ex:1–180\`) was hardened in PRs #130–#132 to its current 6-route shape with singular \`:name\` and unified dispatch through \`POST /cartridge/:name/invoke\`. Adding stale \`/load\` and \`/order\` aliases (option B) would re-introduce coupling to a previous-generation API and need a fresh ADR — the test was the artefact lagging the spec, not the other way round. ## Test plan - [ ] \`bash -n tests/e2e_full.sh\` — syntax check passes (verified locally). - [ ] Greppable invariant: \`grep -E '/cartridges/[^/]+/(load|invoke)|/order' tests/e2e_full.sh\` returns only documentation comments (4 hits, all \`# …\`). - [ ] CI E2E job (\`e2e.yml\`) runs the script against the Elixir router and exits 0 (or with only the pre-existing 3 MCP failures, if any survive #150). - [ ] Manual: \`bash tests/e2e_full.sh\` against a local server prints \"E2E Full: ALL PASS\". Closes #151 Refs #150 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ards#91 / #100) Refreshes docs/integration/hcg-tier2-rollout-runbook.md from v0.1 (draft, 2026-05-20, pre Phase-D) to v0.2 reflecting the current state of the single-lane channel rooted at standards#91: - §1.1 Phase D deliverables: tick D-1..D-3 + D-4 bootstrap with http-capability-gateway PR refs (#12 / #14 / #22 / #26 / #30) and the boj-server D-1 load-profile (#168) that joint-closed standards#99 on 2026-06-01. The one remaining open item is the owner-driven perf-rebaseline workflow dispatch + `_status: scaffold-placeholder -> active` flip; called out explicitly rather than left as a stale unchecked checkbox. - §1.4 BoJ-side prereqs: tick the three loopback-bind layers (#130 / #131 / #132), the Phase C TrustPolicy clause (#106), the NetworkPolicy (#173), and the SSE-route policy coverage (#165). The Trustfile `tier_2_gateway.status: PENDING` line stays intentionally unchecked - it's the §6.4 last-action target. - §1.5 Gateway-side prereqs: tick the new `container/gateway-deploy.k9.ncl` from http-capability-gateway#38 (2026-06-03), record what stays PLACEHOLDER until cerro-torre signing runs, and expand the smoke-test entry with the concrete allow/deny sequence boj-server#165 deferred. - Header banner: replace the stale "Phase D has merged the scaffold only" Phase-D-dependency note with a current-state summary, bump version 0.1 -> 0.2, date 2026-05-20 -> 2026-06-08. - CHANGELOG.md: Documentation entry under [Unreleased] summarising the refresh. No code, infrastructure, or runtime behaviour changes. The runbook is the operator-facing source of truth for what's gating the next Phase E owner action; the drift it had was making "what's still open" harder to read at a glance. Refs hyperpolymath/standards#91 Refs hyperpolymath/standards#100 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
) ## Summary Lands `config/gateway-policy-boj.yaml` — the **live** Verb Governance Spec the HCG tier-2 gateway loads via `POLICY_PATH` in staging (§2.1) and production (§3.1) per the rollout runbook. The Phase A worked example (`config/gateway-policy-boj-example.yaml`) is retained as the documentation artefact; the live file is now the operational one. Closes the example→live promotion item on the Phase E §1.5 checklist. Single-lane HCG tier-2 channel (`standards#91`). Phase A (#96), B (#97), C (#98), D (#99) are joint-closed; Phase E (`standards#100`) is the active phase, with multiple artefacts gating closure (§6.4 Trustfile flip is the last). This PR lands one tractable artefact; staging soak (§2), production traffic split (§3) and the §6.4 flip remain owner-driven. ## What this PR lands - **`config/gateway-policy-boj.yaml`** — live policy file. Content-identical to `gateway-policy-boj-example.yaml` at promotion time. Header rewritten to reflect its live-file role (operational artefact, not pedagogical), with `DEFAULT-DENY INVARIANT` reframed from "Phase A check" to "permanent invariant — must hold for every future gateway release". DSL v1 conformance preserved; all 28 routes (`global_verbs: [GET, POST]`; per-route `verbs`, `exposure`, `name`, `narrative`; `stealth_profile` on internal routes; top-level `stealth: { enabled: true, status_code: 404 }`) carried forward unchanged. - **Runbook §1.5** — flips the trailing "still to be promoted from this example before §3.1" note (on the existing `[x]` example-in-place line) to a discrete `[x]` item recording the live file's existence and the divergence policy ("future BoJ-surface evolution lands in the live file; the example remains as the worked-example artefact"). - **Runbook §2.1 step 2** — switches staging `POLICY_PATH` from the example to the live file so staging exercises the same artefact that production will. Production §3.1 (which inherits §2.1's environment with the traffic-shift mechanism overlaid) needs no change. - **Runbook header** — version 0.2 → 0.3; status line updated to acknowledge the live-policy promotion. ## What this PR deliberately does NOT do - **Close `standards#100`.** Per runbook §6.5 the joint-close happens after the §6.4 Trustfile flip (`tier_2_gateway.status: PENDING → DEPLOYED`), which itself follows the §3.3 100% production-soak window. Using `Refs` not `Closes` to match the established Phase E pattern (PRs #38, and Phase D PRs #14, #22, #26, #30 — all `Refs`'d their phase issue and the owner joint-closed the issue once the final artefact landed). This deliberately diverges from the dispatch brief's literal "Closes hyperpolymath/standards#<phase-issue-number>" line in favour of the canonical runbook §6.5 close-out discipline that the brief itself points to as the source of truth ("using the canonical sources"). The owner remains the sole closer of `standards#100`. - **Touch the HCG deploy spec.** `container/gateway-deploy.k9.ncl` in `hyperpolymath/http-capability-gateway` (PR #38) reads `POLICY_PATH` at deploy time from the env, so the live-file cut-over is a runbook + config artefact change on the BoJ side, not a deploy-spec change on the gateway side. No companion PR on the gateway repo. - **Diverge the live file from the example.** At promotion the two files are content-identical. Future divergence is intentional and the live file is authoritative; the example may be intentionally simpler. - **Trigger any deploy.** No traffic shift, no staging cut-over, no §6.4 flip happens at merge time. This is a static artefact landing. - **Update the deploy spec's `POLICY_PATH` default.** The deploy spec carries env-var declarations; the live-file path is operator-supplied at deploy time. ## Verification - [x] DSL v1 conformance: `dsl_version: "1"`; `governance.global_verbs` is `[GET, POST]`; every route has a non-empty `verbs`; `exposure ∈ {public, authenticated, internal}`; `stealth.enabled` boolean, `stealth.status_code: 404` in 100..599. - [x] All 28 example routes preserved unchanged in the live file (route count, `name`s, paths, verbs, exposures, narratives). - [x] SPDX header `MPL-2.0` matches repo convention (config/, docs/). - [x] Runbook §1.5 and §2.1 cross-references to `gateway-policy-boj.yaml` and `gateway-policy-boj-example.yaml` resolve. - [ ] Manual: `mix gateway.validate config/gateway-policy-boj.yaml` (gateway-side; can be run by the operator before §2.1 stand-up — see runbook §1.5 last open item, smoke-test). ## Channel position ``` standards#91 (parent, open) ├── #96 Phase A — closed (boj-server: contract + policy-authoring + example; gateway: -) ├── #97 Phase B — closed (gateway#10: mTLS primary path) ├── #98 Phase C — closed (gateway#11: strip; boj-server#106: TrustPolicy clause) ├── #99 Phase D — closed (boj-server#168 on 2026-06-01; gateway#12/#14/#22/#26/#30) └── #100 Phase E — IN PROGRESS ├── E5 runbook draft — boj-server#128 (landed; rehearsal pending) ├── E1 loopback prereqs — boj-server#130/#131/#132/#165/#173 (landed) ├── E1 deploy spec — http-capability-gateway#38 (landed) ├── E1 live policy promotion — THIS PR (in review) ├── E1 .ctp signing — owner follow-up ├── E2 staging cut-over — owner follow-up ├── E3 telemetry verification — owner follow-up ├── E4 production rollout — owner follow-up └── §6.4 Trustfile flip + §6.5 joint-close — owner-only ``` Refs hyperpolymath/standards#91 Refs hyperpolymath/standards#100 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- _Generated by [Claude Code](https://claude.ai/code/session_012FiVM8R8FWBgBsUGpnXTZM)_ Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Tightens three sites that feed the Zig adapter binary's
--hostflag in production deployments, materialising the ADR-0004 §1 invariant that BoJ's back-side bind is not externally routable when fronted byhttp-capability-gateway(HCG tier-2).This is action item #7 from the Phase E consumer-side audit. The scope expanded during implementation from one site (the audit named
stapeln.toml) to three sites —entrypoint.shandcompose.prod.yamlhad the same[::]default that the audit missed. All three sites feed into the same Zig-adapter--hostflag, so they need to flip together for the change to actually take effect at runtime.Companion PR to #130 (Cowboy bind tightening in the Elixir path) and #131 (k8s Service ClusterIP). Together the three give defence in depth: Elixir Cowboy binds loopback AND Zig adapter binds loopback AND k8s Service is internal-only.
Refs hyperpolymath/standards#100(NOT Closes — joint-close is owner-only).Refs hyperpolymath/standards#91.What's in
stapeln.toml[targets.production]APP_HOST = "[::]"→APP_HOST = "127.0.0.1"+ comment block.container/entrypoint.shline 40 (log) + line 140 (execinvocation)${APP_HOST:-[::]}→${APP_HOST:-127.0.0.1}+ comment at exec line.container/compose.prod.yamlservices.boj-rest.environmentAPP_HOST: "[::]"→APP_HOST: "127.0.0.1"+ comment block.CHANGELOG.md### Changedentry under[Unreleased].Override path for legacy/standalone use
Deployments without HCG in front: set
APP_HOST=0.0.0.0(IPv4 all-interfaces) orAPP_HOST=::(IPv6 all-interfaces) in your deployment config. The in-repo defaults remain loopback.Audit-residue follow-ups deliberately NOT in this PR
container/Containerfileline 125:ENV PHX_HOST=0.0.0.0is vestigial. Nothing in the codebase readsPHX_HOST(verified bygrep -rn "PHX_HOST\|phx_host" --include="*.ex" ...returning empty). Leftover from a former Phoenix incarnation. Safe to leave alone; can be removed in a hygiene PR if desired.APP_HOST(Zig adapter) andBOJ_BIND_IP(Elixir Cowboy from fix(boj): bind Cowboy to 127.0.0.1 by default (HCG tier-2 E1 prereq) #130) into one envelope is broader scope. The divergence exists because they feed different binaries built by different toolchains. If it proves annoying in operation, file a separate issue.Why DRAFT
Same reason as #131 — this is a behaviour change for anyone running the stapeln-built production container or
compose.prod.yamlas-is and relying on the default[::]for external access. Owner gates merge on confirming no such reliance, or on coordinating with anyone who needs the migration path (HCG-in-front, or explicitAPP_HOST=0.0.0.0override).Test plan
stapeln.tomlparses as valid TOML (syntax preserved).container/entrypoint.shruns throughsh -nwithout syntax error (no syntax change, just literal substitution).container/compose.prod.yamlparses as valid YAML.[::]default; flip from DRAFT to ready.Risk
Low for the codebase, medium for ops. No Elixir / Zig / Idris2 / cartridge logic touched; CI should not show any regressions. Ops risk: anyone whose runbook assumes the container exposes BoJ on all interfaces by default will need to set
APP_HOST=0.0.0.0explicitly. Reasonable default for the Phase E posture; documented override path.🤖 Generated with Claude Code