Skip to content

compiler: string-wall slice 8b — type-directed string ++ lowering#578

Merged
hyperpolymath merged 1 commit into
mainfrom
claude/cool-keller-gr5sl
Jun 13, 2026
Merged

compiler: string-wall slice 8b — type-directed string ++ lowering#578
hyperpolymath merged 1 commit into
mainfrom
claude/cool-keller-gr5sl

Conversation

@hyperpolymath

Copy link
Copy Markdown
Owner

Phase F slice 8b — type-directed string ++ lowering

The full fix the slice-8a guard (#575) stood in for. String ++ now lowers correctly and completely to wasm — including pure variable-to-variable a ++ b, which the syntactic guard could not reach.

The channel (type-directed elaboration)

  • ast.ml: new ExprStringConcat of expr * expr (never produced by the parser).
  • typecheck.ml: synth records each ++ node it types as String concat by physical identity (string_concat_sites); elaborate_string_concat rewrites exactly those nodes to ExprStringConcat. Physical-identity keying is sound because typecheck and codegen run over the same prog object (parse_with_face's lowered prog, shared by resolve/typecheck/codegen); ExprBinary carries no span and same-text ++ occurrences are value-equal, so == is the correct key.
  • bin/main.ml: the wasm path runs elaborate_string_concat after typecheck, before Opt.fold. The interpreter and non-wasm backends keep the original prog (ExprBinary _ OpConcat _), so the oracle is unchanged and only the wasm backend sees the new node.

The lowering (codegen.ml)

Byte concat — allocate 4 + la + lb, write the length word, copy a's then b's bytes — mirroring the list-concat handler but with 1-byte elements + a single length word instead of 4-byte i32 elements. That i32-element copy was exactly the bug: a string's [len][utf8] was copied as i32 elements, so "ab" ++ "cd" read byte 2 as the length word of "cd" (= 2) instead of 'c' (= 99).

Effect-ordinal parity (effect_sites.ml)

ExprStringConcat recurses like ExprBinary and is not counted as an ExprApp call site, so effect-ordinals stay identical between interp (sees ExprBinary) and wasm (sees ExprStringConcat) — avoiding a #555-class desync. An intrinsic-call encoding (ExprApp "__string_concat") would have shifted the ordinals; the dedicated node avoids that. opt.ml folds sub-expressions; interp.ml handles it defensively.

The 8a guard is retained as a backstop: any String ++ reaching codegen un-elaborated still errors loudly rather than emitting garbage.

Tests / verification

  • tests/codegen/string_concat.{affine,mjs} — executable wasm parity, byte-exact via the slice-1 reader: the "ab" ++ "cd" byte-2 = 99 regression (was 2), the var-var case the guard could not catch, chained a ++ b ++ c, and empty operands (oracle 6513269).
  • test/test_e2e.ml "E2E String-wall slice 8 guard" gains a lowers-after-elaboration case.
  • Full run_codegen_wasm_tests.sh green incl. list_concat + slices 1-7 + effect tests; string ++ verified correct in if/match/fn/nested contexts. (dune runtest not runnable in-sandbox — no alcotest; the codegen .mjs parity goes through the real CLI pipeline.)

Migration impact

This closes the string wall's last op: every name-dispatched string builtin (slices 1-7) + concatenation (8) now lower to wasm. The next compiler half is the effect wall (≈111 effect-gated corpus files).

Builds on #575 (guard, merged) and #574 (design, merged).

https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s


Generated by Claude Code

The full fix the slice-8a guard (#575) stood in for. String ++ now lowers
correctly AND completely to wasm (incl. pure variable-to-variable, which
the syntactic guard could not reach).

Channel (type-directed elaboration):
- ast.ml: new ExprStringConcat of expr * expr (not produced by the parser).
- typecheck.ml: synth records each ++ node it types as String concat, by
  physical identity (string_concat_sites); elaborate_string_concat rewrites
  exactly those nodes to ExprStringConcat. Physical-identity keying is sound
  because typecheck and codegen run over the same prog object
  (parse_with_face's lowered prog, shared by resolve/typecheck/codegen);
  ExprBinary carries no span and same-text ++ occurrences are value-equal,
  so == is the correct key.
- bin/main.ml: the wasm path runs elaborate_string_concat after typecheck,
  before Opt.fold_constants_program. The interpreter and non-wasm backends
  keep the original prog (String ++ = ExprBinary _ OpConcat _), so the
  oracle is unchanged and only the wasm backend sees the new node.

Lowering (codegen.ml): byte concat — allocate 4 + la + lb, write the
length word, copy a's then b's bytes — mirroring the list-concat handler
but with 1-byte elements and a single length word (instead of 4-byte i32
elements, which was exactly the bug: the list path copied a string's
[len][utf8] as i32 elements, so "ab" ++ "cd" read byte 2 as the length
word of "cd" = 2 instead of 'c' = 99).

Effect parity (effect_sites.ml): ExprStringConcat recurses like ExprBinary
and is NOT counted as an ExprApp call site, so effect-ordinals stay
identical between the interpreter (which sees ExprBinary) and the wasm
backend (which sees ExprStringConcat) — avoiding a #555-class desync. An
intrinsic-call encoding (ExprApp "__string_concat") would have shifted the
ordinals; the dedicated node avoids that. opt.ml folds its sub-expressions;
interp.ml handles it defensively as ordinary String ++.

The 8a guard is retained as a backstop: any String ++ reaching codegen
un-elaborated still errors loudly rather than emitting garbage.

Tests: tests/codegen/string_concat.{affine,mjs} — executable wasm parity,
byte-exact via the slice-1 reader: the "ab" ++ "cd" byte-2 = 99 regression
(was 2), the var-var case the guard could not catch, chained a ++ b ++ c,
and empty operands; oracle 6513269. test/test_e2e.ml "E2E String-wall
slice 8 guard" gains a lowers-after-elaboration case. Verified: full
run_codegen_wasm_tests.sh green incl. list_concat + slices 1-7 + effect
tests; string ++ correct in if/match/fn/nested contexts.

Design + ledger: proposals/DESIGN-string-concat.adoc (8b LANDED),
proposals/MIGRATION-PLAN.adoc.

https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s
@github-actions

Copy link
Copy Markdown

🔍 Hypatia Security Scan

Findings: 47 issues detected

Severity Count
🔴 Critical 2
🟠 High 24
🟡 Medium 21

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Action actions/add-to-project@v1.0.2 needs attention",
    "type": "unpinned_action",
    "file": "add-to-roadmap.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Action denoland/setup-deno@v2 needs attention",
    "type": "unpinned_action",
    "file": "publish-jsr.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Action trufflesecurity/trufflehog@main needs attention",
    "type": "unpinned_action",
    "file": "secret-scanner.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Issue in add-to-roadmap.yml",
    "type": "missing_timeout_minutes",
    "file": "add-to-roadmap.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in scorecard-enforcer.yml",
    "type": "scorecard_publish_with_run_step",
    "file": "scorecard-enforcer.yml",
    "action": "split_scorecard_publish_job",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Issue in instant-sync.yml",
    "type": "secret_action_without_presence_gate",
    "file": "instant-sync.yml",
    "action": "peter-evans/repository-dispatch",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Shell execution -- validate input before passing to shell (1 occurrences, CWE-78)",
    "type": "js_exec_sync",
    "file": "/home/runner/work/affinescript/affinescript/packages/affinescript-cli/mod.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  },
  {
    "reason": "Shell execution -- validate input before passing to shell (2 occurrences, CWE-78)",
    "type": "js_exec_sync",
    "file": "/home/runner/work/affinescript/affinescript/packages/affine-vscode/mod.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  },
  {
    "reason": "Shell execution -- validate input before passing to shell (1 occurrences, CWE-78)",
    "type": "js_exec_sync",
    "file": "/home/runner/work/affinescript/affinescript/affinescript-vite/src/affine-plugin-improved.js",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "high"
  },
  {
    "reason": "expect() in hot path (32 occurrences, CWE-754)",
    "type": "expect_in_hot_path",
    "file": "/home/runner/work/affinescript/affinescript/affinescriptiser/src/codegen/wasm_gen.rs",
    "action": "flag",
    "rule_module": "code_safety",
    "severity": "medium"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

@hyperpolymath hyperpolymath marked this pull request as ready for review June 13, 2026 17:42
@hyperpolymath hyperpolymath merged commit 1f6ba66 into main Jun 13, 2026
26 of 29 checks passed
@hyperpolymath hyperpolymath deleted the claude/cool-keller-gr5sl branch June 13, 2026 17:43
hyperpolymath added a commit that referenced this pull request Jun 14, 2026
…584)

## Migration wave: 7 integer-brain kernels from string-gated idaptik
modules

First applied wave of the now-unblocked **string-gated corpus**. Phase B
classified 71 string-gated files; closing the string wall (slices 1–8)
plus the `len()` lowering (#583) opened the integer-brain extraction
path. Seven kernels re-decomposed from idaptik `.res` modules into
AffineScript brains under `proposals/idaptik/migrated/`, fanned out
across 6 parallel agents and **re-verified by me before commit**.

Each is a four-gate deliverable — G1 compile, G2 independent-oracle
parity sweep, G4 assail. Strings / floats / promises / mutable state
stay **host-side** per the C1–C12 recipe; only the pure-integer decision
core crosses to wasm.

| Kernel | Exports | Parity | Assail |
|---|---|---|---|
| PortScanner | 4 | 44/44 | clean |
| PasswordCracker | 7 | 215/215 | clean |
| FirewallDevice | 12 | 164/164 | clean |
| Inventory | 9 | 2840/2840 | clean |
| Drone | 32 | 1192/1192 | clean |
| SecurityDog | 29 | 31533/31533 | clean |
| GuardNPC | 19 | 359/359 | clean |

**Re-decompositions, not transliterations** — e.g. PasswordCracker
inverts the djb2 string-loop so the host walks the string and the brain
does i32 math (`Math.imul`/`|0` modelled in the oracle); Inventory packs
slot-state into a base-3 Int instead of a mutable array; FirewallDevice
keeps CIDR/protocol *string* parsing host-side and decides over integer
flags. Floats cross as floored milli-units; out-of-band inputs return
guarded `-1` sentinels (assail-clean, no in-band collapse). Each oracle
is an independent JS reimplementation from the `.res` semantics, not
copied from the `.affine`.

**Deduped:** `SecurityAI` dropped — already tracked as
`migrated/securityai/` (with a boundary proof) from an earlier wave;
`GlobalNetworkData` likewise pre-existed and was left untouched.

**Two compiler quirks surfaced** (flagged for the playbook, not fixed
here): `total` is a reserved keyword (parse error as an identifier); and
an `if { … }` block immediately followed by a parenthesized expression
parses as a function application. Both have trivial source-side
workarounds.

Builds on #583 (`len`, merged) and the string-wall slices
(#574/#575/#578).

https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s

---
_Generated by [Claude
Code](https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s)_

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants