Skip to content

design: lowering string ++ (and slice) to wasm — proposal for sign-off#574

Merged
hyperpolymath merged 1 commit into
mainfrom
claude/cool-keller-gr5sl
Jun 13, 2026
Merged

design: lowering string ++ (and slice) to wasm — proposal for sign-off#574
hyperpolymath merged 1 commit into
mainfrom
claude/cool-keller-gr5sl

Conversation

@hyperpolymath

Copy link
Copy Markdown
Owner

Design proposal — lowering string ++ (and slice) to wasm

Design-only, awaiting your sign-off before implementation. This is the write-up you asked for on the last remaining string-wall items (concat / slice), now grounded in investigation.

The headline finding: string ++ silently miscompiles to wasm

++ is type-dispatched in typecheck (string vs array), so string ++ typechecks and the interpreter is correct — but codegen lowers ++ only as list-concat (4-byte element stride), copying a string's [len][bytes] as if they were i32 elements. Measured for "ab" ++ "cd" (should be "abcd"):

byte interp wasm
0–1 97, 98 97, 98 (coincidental — "ab" packs into one i32)
2 99 ('c') 2 ← the length word of "cd" leaking through
3 100 ('d') 0

A #555-class silent mis-lowering — so this is a correctness fix, not a feature.

Why "add a string_concat builtin" is ruled out (by evidence)

So the only acceptable fix makes the existing ++ lower correctly.

The blocker + recommendation

Codegen is type-blind (ExprBinary carries no type; AST has no spans/ids; Opt.fold rebuilds the AST). The typechecker already decides string-vs-array per ++ node — the work is threading that to codegen.

Recommended (A1): a typecheck-time elaboration that rewrites string ++ into a dedicated internal ExprStringConcat node, which codegen lowers via the established byte-copy idiom (alloc 4 + la + lb, two I32Load8U/I32Store8 loops — the list-++ pattern with a 1-byte stride). ++ stays idiomatic, the stdlib keeps compiling, codegen needs no type logic. slice-on-string is mostly already covered by the shipped string_sub (slice 3); defer its negative-index residual.

Full analysis, alternatives (A2 annotation field; B rejected), implementation plan, and risks are in proposals/DESIGN-string-concat.adoc.

Decision requested

Sign off to proceed with A1 as Phase F slice 8, or redirect. I'll implement only after your go-ahead.

https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s


Generated by Claude Code

Phase F slice-8 design proposal (awaiting owner sign-off before
implementation). proposals/DESIGN-string-concat.adoc.

Key findings from investigation:
  * String ++ SILENTLY MISCOMPILES to wasm today (confirmed): it
    typechecks and the interp is correct, but codegen lowers ++ only as
    list-concat (4-byte element stride), so a string's [len][bytes] get
    copied as i32 elements. Measured "ab" ++ "cd": byte 2 = 2 (the length
    word of "cd") instead of 'c' = 99. A #555-class silent mis-lowering.
  * The "add a string_concat builtin / ban string ++" approach is RULED
    OUT by the codebase: ++ is the canonical surface, string_concat was
    deliberately removed (#330), the stdlib uses string ++ pervasively
    (encoding/json/io/testing/...), and there are string-concat laws +
    conformance tests on the ++ semantics. So ++ must be made to lower
    correctly, not replaced.
  * Blocker: codegen is type-blind (ExprBinary carries no type; AST has no
    spans/ids; Opt.fold rebuilds the AST). The typechecker already decides
    string-vs-array per ++ node; the work is threading that to codegen.

Recommendation: A1 — a typecheck-time elaboration that rewrites string ++
into a dedicated internal ExprStringConcat node, lowered by codegen via
the established byte-copy idiom (alloc 4 + la + lb, two I32Load8U/
I32Store8 loops). slice-on-string is mostly covered by the shipped
string_sub; defer its negative-index residual.

No implementation in this commit — design only, for sign-off.

https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s
@github-actions

Copy link
Copy Markdown

🔍 Hypatia Security Scan

Findings: 54 issues detected

Severity Count
🔴 Critical 2
🟠 High 24
🟡 Medium 28

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Action denoland/setup-deno@v2 needs attention",
    "type": "unpinned_action",
    "file": "publish-jsr.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Action trufflesecurity/trufflehog@main needs attention",
    "type": "unpinned_action",
    "file": "secret-scanner.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Issue in governance.yml",
    "type": "missing_timeout_minutes",
    "file": "governance.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in hypatia-scan.yml",
    "type": "missing_timeout_minutes",
    "file": "hypatia-scan.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in mirror.yml",
    "type": "missing_timeout_minutes",
    "file": "mirror.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in scorecard.yml",
    "type": "missing_timeout_minutes",
    "file": "scorecard.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in secret-scanner.yml",
    "type": "missing_timeout_minutes",
    "file": "secret-scanner.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in spark-theatre-gate.yml",
    "type": "missing_timeout_minutes",
    "file": "spark-theatre-gate.yml",
    "action": "flag",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Issue in scorecard-enforcer.yml",
    "type": "scorecard_publish_with_run_step",
    "file": "scorecard-enforcer.yml",
    "action": "split_scorecard_publish_job",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Issue in instant-sync.yml",
    "type": "secret_action_without_presence_gate",
    "file": "instant-sync.yml",
    "action": "peter-evans/repository-dispatch",
    "rule_module": "workflow_audit",
    "severity": "high"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

@hyperpolymath hyperpolymath marked this pull request as ready for review June 13, 2026 12:14
@hyperpolymath hyperpolymath merged commit 9796fb6 into main Jun 13, 2026
26 of 28 checks passed
@hyperpolymath hyperpolymath deleted the claude/cool-keller-gr5sl branch June 13, 2026 12:14
hyperpolymath added a commit that referenced this pull request Jun 13, 2026
…mpile (#575)

## Phase F slice 8a — guard string `++` against silent miscompile

The "guard first" half of the owner-approved string-concat plan
(`proposals/DESIGN-string-concat.adoc`, merged #574). Depends on: #574
(design, merged).

### The bug this guards

String `++` **silently miscompiles** to wasm: it typechecks (`++` is
type-dispatched in `typecheck.ml`), the interpreter is correct, but
codegen lowers `++` only as *list* concat (4-byte element stride).
Applied to a string's `[len][utf8]` layout it copies the length word +
bytes as i32 elements — `"ab" ++ "cd"` yields byte 2 = `2` (the length
word of `"cd"`) instead of `'c'`. A #555-class silent mis-lowering.

The "add a `string_concat` builtin / ban `++`" route is **ruled out**
(`++` is the canonical, stdlib-pervasive surface; `string_concat` was
deliberately removed in #330), so `++` must be made to lower correctly —
that's **slice 8b** (type-directed lowering). This PR is the interim
**safety guard**.

### This PR (8a)

The wasm `OpConcat` handler now rejects the syntactically-obvious string
`++` with a loud `UnsupportedFeature` error instead of emitting garbage.
`is_string_concat_operand` is a **sound-rejection** heuristic:

- **Fires** only on operands that are unambiguously strings — a string
literal, a String-returning builtin call (`int_to_string`, `trim`,
`to_lowercase`, …), or a nested string `++`.
- **Never** fires on list `++` (arrays / array literals / bare
variables), so genuine list concatenation is unaffected.
- Deliberately **incomplete**: a pure variable-to-variable string `++`
still slips through — closing that gap needs the type channel and is
**slice 8b**. The footgun is currently unreachable in practice (no
migrated kernel compiles string `++` to wasm), so the partial guard is a
safe interim.

### Tests / verification

- `test/test_e2e.ml` group **"E2E String-wall slice 8 guard"** — string
`++` is rejected at wasm codegen (error names "concatenation"); list
`++` still compiles (no false positive).
- Full `run_codegen_wasm_tests.sh` green, **including `list_concat`**
and all 7 string fixtures (slices 1–7 still compile). The list-concat
lowering is untouched.

`dune runtest` couldn't run in-sandbox (no `alcotest`), but the guard
logic + the `wasm_codegen`/`Parse_driver` test pattern were validated
against the real `affinescript` library (string `++` → rejected with
"concatenation"; list `++` → compiles).

### Next

**Slice 8b** — the type-directed string-`++` lowering
(`ExprStringConcat` elaboration + byte two-copy-loop) makes `++`
*correct and complete* (incl. var-var), replacing this guard. Plan in
`proposals/DESIGN-string-concat.adoc` §"Implementation plan".

https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s

---
_Generated by [Claude
Code](https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s)_

Co-authored-by: Claude <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Jun 13, 2026
## Phase F slice 8b — type-directed string `++` lowering

The **full fix** the slice-8a guard (#575) stood in for. String `++` now
lowers **correctly and completely** to wasm — including pure
variable-to-variable `a ++ b`, which the syntactic guard could not
reach.

### The channel (type-directed elaboration)
- **`ast.ml`**: new `ExprStringConcat of expr * expr` (never produced by
the parser).
- **`typecheck.ml`**: `synth` records each `++` node it types as String
concat **by physical identity** (`string_concat_sites`);
`elaborate_string_concat` rewrites exactly those nodes to
`ExprStringConcat`. Physical-identity keying is sound because typecheck
and codegen run over the **same** `prog` object (`parse_with_face`'s
lowered prog, shared by resolve/typecheck/codegen); `ExprBinary` carries
no span and same-text `++` occurrences are value-equal, so `==` is the
correct key.
- **`bin/main.ml`**: the wasm path runs `elaborate_string_concat` after
typecheck, before `Opt.fold`. The **interpreter and non-wasm backends
keep the original `prog`** (`ExprBinary _ OpConcat _`), so the oracle is
unchanged and only the wasm backend sees the new node.

### The lowering (`codegen.ml`)
Byte concat — allocate `4 + la + lb`, write the length word, copy a's
then b's bytes — mirroring the list-concat handler but with **1-byte
elements + a single length word** instead of 4-byte i32 elements. That
i32-element copy was exactly the bug: a string's `[len][utf8]` was
copied as i32 elements, so `"ab" ++ "cd"` read byte 2 as the length word
of `"cd"` (= 2) instead of `'c'` (= 99).

### Effect-ordinal parity (`effect_sites.ml`)
`ExprStringConcat` recurses like `ExprBinary` and is **not** counted as
an `ExprApp` call site, so effect-ordinals stay identical between interp
(sees `ExprBinary`) and wasm (sees `ExprStringConcat`) — avoiding a
#555-class desync. An intrinsic-call encoding (`ExprApp
"__string_concat"`) would have shifted the ordinals; the dedicated node
avoids that. `opt.ml` folds sub-expressions; `interp.ml` handles it
defensively.

The **8a guard is retained as a backstop**: any String `++` reaching
codegen un-elaborated still errors loudly rather than emitting garbage.

### Tests / verification
- `tests/codegen/string_concat.{affine,mjs}` — executable wasm parity,
byte-exact via the slice-1 reader: the **`"ab" ++ "cd"` byte-2 = 99
regression** (was 2), the **var-var** case the guard could not catch,
**chained** `a ++ b ++ c`, and **empty** operands (oracle 6513269).
- `test/test_e2e.ml` "E2E String-wall slice 8 guard" gains a
*lowers-after-elaboration* case.
- Full `run_codegen_wasm_tests.sh` green incl. `list_concat` + slices
1-7 + effect tests; string `++` verified correct in if/match/fn/nested
contexts. (`dune runtest` not runnable in-sandbox — no `alcotest`; the
codegen `.mjs` parity goes through the real CLI pipeline.)

### Migration impact
This closes the **string wall**'s last op: every name-dispatched string
builtin (slices 1-7) + concatenation (8) now lower to wasm. The next
compiler half is the **effect wall** (≈111 effect-gated corpus files).

Builds on #575 (guard, merged) and #574 (design, merged).

https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s

---
_Generated by [Claude
Code](https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s)_

Co-authored-by: Claude <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Jun 14, 2026
…584)

## Migration wave: 7 integer-brain kernels from string-gated idaptik
modules

First applied wave of the now-unblocked **string-gated corpus**. Phase B
classified 71 string-gated files; closing the string wall (slices 1–8)
plus the `len()` lowering (#583) opened the integer-brain extraction
path. Seven kernels re-decomposed from idaptik `.res` modules into
AffineScript brains under `proposals/idaptik/migrated/`, fanned out
across 6 parallel agents and **re-verified by me before commit**.

Each is a four-gate deliverable — G1 compile, G2 independent-oracle
parity sweep, G4 assail. Strings / floats / promises / mutable state
stay **host-side** per the C1–C12 recipe; only the pure-integer decision
core crosses to wasm.

| Kernel | Exports | Parity | Assail |
|---|---|---|---|
| PortScanner | 4 | 44/44 | clean |
| PasswordCracker | 7 | 215/215 | clean |
| FirewallDevice | 12 | 164/164 | clean |
| Inventory | 9 | 2840/2840 | clean |
| Drone | 32 | 1192/1192 | clean |
| SecurityDog | 29 | 31533/31533 | clean |
| GuardNPC | 19 | 359/359 | clean |

**Re-decompositions, not transliterations** — e.g. PasswordCracker
inverts the djb2 string-loop so the host walks the string and the brain
does i32 math (`Math.imul`/`|0` modelled in the oracle); Inventory packs
slot-state into a base-3 Int instead of a mutable array; FirewallDevice
keeps CIDR/protocol *string* parsing host-side and decides over integer
flags. Floats cross as floored milli-units; out-of-band inputs return
guarded `-1` sentinels (assail-clean, no in-band collapse). Each oracle
is an independent JS reimplementation from the `.res` semantics, not
copied from the `.affine`.

**Deduped:** `SecurityAI` dropped — already tracked as
`migrated/securityai/` (with a boundary proof) from an earlier wave;
`GlobalNetworkData` likewise pre-existed and was left untouched.

**Two compiler quirks surfaced** (flagged for the playbook, not fixed
here): `total` is a reserved keyword (parse error as an identifier); and
an `if { … }` block immediately followed by a parenthesized expression
parses as a function application. Both have trivial source-side
workarounds.

Builds on #583 (`len`, merged) and the string-wall slices
(#574/#575/#578).

https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s

---
_Generated by [Claude
Code](https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s)_

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants