design: lowering string ++ (and slice) to wasm — proposal for sign-off#574
Merged
Conversation
Phase F slice-8 design proposal (awaiting owner sign-off before
implementation). proposals/DESIGN-string-concat.adoc.
Key findings from investigation:
* String ++ SILENTLY MISCOMPILES to wasm today (confirmed): it
typechecks and the interp is correct, but codegen lowers ++ only as
list-concat (4-byte element stride), so a string's [len][bytes] get
copied as i32 elements. Measured "ab" ++ "cd": byte 2 = 2 (the length
word of "cd") instead of 'c' = 99. A #555-class silent mis-lowering.
* The "add a string_concat builtin / ban string ++" approach is RULED
OUT by the codebase: ++ is the canonical surface, string_concat was
deliberately removed (#330), the stdlib uses string ++ pervasively
(encoding/json/io/testing/...), and there are string-concat laws +
conformance tests on the ++ semantics. So ++ must be made to lower
correctly, not replaced.
* Blocker: codegen is type-blind (ExprBinary carries no type; AST has no
spans/ids; Opt.fold rebuilds the AST). The typechecker already decides
string-vs-array per ++ node; the work is threading that to codegen.
Recommendation: A1 — a typecheck-time elaboration that rewrites string ++
into a dedicated internal ExprStringConcat node, lowered by codegen via
the established byte-copy idiom (alloc 4 + la + lb, two I32Load8U/
I32Store8 loops). slice-on-string is mostly covered by the shipped
string_sub; defer its negative-index residual.
No implementation in this commit — design only, for sign-off.
https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s
🔍 Hypatia Security ScanFindings: 54 issues detected
View findings[
{
"reason": "Action denoland/setup-deno@v2 needs attention",
"type": "unpinned_action",
"file": "publish-jsr.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Action trufflesecurity/trufflehog@main needs attention",
"type": "unpinned_action",
"file": "secret-scanner.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "Issue in governance.yml",
"type": "missing_timeout_minutes",
"file": "governance.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in hypatia-scan.yml",
"type": "missing_timeout_minutes",
"file": "hypatia-scan.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in mirror.yml",
"type": "missing_timeout_minutes",
"file": "mirror.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in scorecard.yml",
"type": "missing_timeout_minutes",
"file": "scorecard.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in secret-scanner.yml",
"type": "missing_timeout_minutes",
"file": "secret-scanner.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in spark-theatre-gate.yml",
"type": "missing_timeout_minutes",
"file": "spark-theatre-gate.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in scorecard-enforcer.yml",
"type": "scorecard_publish_with_run_step",
"file": "scorecard-enforcer.yml",
"action": "split_scorecard_publish_job",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "Issue in instant-sync.yml",
"type": "secret_action_without_presence_gate",
"file": "instant-sync.yml",
"action": "peter-evans/repository-dispatch",
"rule_module": "workflow_audit",
"severity": "high"
}
]Powered by Hypatia Neurosymbolic CI/CD Intelligence |
hyperpolymath
added a commit
that referenced
this pull request
Jun 13, 2026
…mpile (#575) ## Phase F slice 8a — guard string `++` against silent miscompile The "guard first" half of the owner-approved string-concat plan (`proposals/DESIGN-string-concat.adoc`, merged #574). Depends on: #574 (design, merged). ### The bug this guards String `++` **silently miscompiles** to wasm: it typechecks (`++` is type-dispatched in `typecheck.ml`), the interpreter is correct, but codegen lowers `++` only as *list* concat (4-byte element stride). Applied to a string's `[len][utf8]` layout it copies the length word + bytes as i32 elements — `"ab" ++ "cd"` yields byte 2 = `2` (the length word of `"cd"`) instead of `'c'`. A #555-class silent mis-lowering. The "add a `string_concat` builtin / ban `++`" route is **ruled out** (`++` is the canonical, stdlib-pervasive surface; `string_concat` was deliberately removed in #330), so `++` must be made to lower correctly — that's **slice 8b** (type-directed lowering). This PR is the interim **safety guard**. ### This PR (8a) The wasm `OpConcat` handler now rejects the syntactically-obvious string `++` with a loud `UnsupportedFeature` error instead of emitting garbage. `is_string_concat_operand` is a **sound-rejection** heuristic: - **Fires** only on operands that are unambiguously strings — a string literal, a String-returning builtin call (`int_to_string`, `trim`, `to_lowercase`, …), or a nested string `++`. - **Never** fires on list `++` (arrays / array literals / bare variables), so genuine list concatenation is unaffected. - Deliberately **incomplete**: a pure variable-to-variable string `++` still slips through — closing that gap needs the type channel and is **slice 8b**. The footgun is currently unreachable in practice (no migrated kernel compiles string `++` to wasm), so the partial guard is a safe interim. ### Tests / verification - `test/test_e2e.ml` group **"E2E String-wall slice 8 guard"** — string `++` is rejected at wasm codegen (error names "concatenation"); list `++` still compiles (no false positive). - Full `run_codegen_wasm_tests.sh` green, **including `list_concat`** and all 7 string fixtures (slices 1–7 still compile). The list-concat lowering is untouched. `dune runtest` couldn't run in-sandbox (no `alcotest`), but the guard logic + the `wasm_codegen`/`Parse_driver` test pattern were validated against the real `affinescript` library (string `++` → rejected with "concatenation"; list `++` → compiles). ### Next **Slice 8b** — the type-directed string-`++` lowering (`ExprStringConcat` elaboration + byte two-copy-loop) makes `++` *correct and complete* (incl. var-var), replacing this guard. Plan in `proposals/DESIGN-string-concat.adoc` §"Implementation plan". https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s --- _Generated by [Claude Code](https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s)_ Co-authored-by: Claude <noreply@anthropic.com>
hyperpolymath
added a commit
that referenced
this pull request
Jun 13, 2026
## Phase F slice 8b — type-directed string `++` lowering The **full fix** the slice-8a guard (#575) stood in for. String `++` now lowers **correctly and completely** to wasm — including pure variable-to-variable `a ++ b`, which the syntactic guard could not reach. ### The channel (type-directed elaboration) - **`ast.ml`**: new `ExprStringConcat of expr * expr` (never produced by the parser). - **`typecheck.ml`**: `synth` records each `++` node it types as String concat **by physical identity** (`string_concat_sites`); `elaborate_string_concat` rewrites exactly those nodes to `ExprStringConcat`. Physical-identity keying is sound because typecheck and codegen run over the **same** `prog` object (`parse_with_face`'s lowered prog, shared by resolve/typecheck/codegen); `ExprBinary` carries no span and same-text `++` occurrences are value-equal, so `==` is the correct key. - **`bin/main.ml`**: the wasm path runs `elaborate_string_concat` after typecheck, before `Opt.fold`. The **interpreter and non-wasm backends keep the original `prog`** (`ExprBinary _ OpConcat _`), so the oracle is unchanged and only the wasm backend sees the new node. ### The lowering (`codegen.ml`) Byte concat — allocate `4 + la + lb`, write the length word, copy a's then b's bytes — mirroring the list-concat handler but with **1-byte elements + a single length word** instead of 4-byte i32 elements. That i32-element copy was exactly the bug: a string's `[len][utf8]` was copied as i32 elements, so `"ab" ++ "cd"` read byte 2 as the length word of `"cd"` (= 2) instead of `'c'` (= 99). ### Effect-ordinal parity (`effect_sites.ml`) `ExprStringConcat` recurses like `ExprBinary` and is **not** counted as an `ExprApp` call site, so effect-ordinals stay identical between interp (sees `ExprBinary`) and wasm (sees `ExprStringConcat`) — avoiding a #555-class desync. An intrinsic-call encoding (`ExprApp "__string_concat"`) would have shifted the ordinals; the dedicated node avoids that. `opt.ml` folds sub-expressions; `interp.ml` handles it defensively. The **8a guard is retained as a backstop**: any String `++` reaching codegen un-elaborated still errors loudly rather than emitting garbage. ### Tests / verification - `tests/codegen/string_concat.{affine,mjs}` — executable wasm parity, byte-exact via the slice-1 reader: the **`"ab" ++ "cd"` byte-2 = 99 regression** (was 2), the **var-var** case the guard could not catch, **chained** `a ++ b ++ c`, and **empty** operands (oracle 6513269). - `test/test_e2e.ml` "E2E String-wall slice 8 guard" gains a *lowers-after-elaboration* case. - Full `run_codegen_wasm_tests.sh` green incl. `list_concat` + slices 1-7 + effect tests; string `++` verified correct in if/match/fn/nested contexts. (`dune runtest` not runnable in-sandbox — no `alcotest`; the codegen `.mjs` parity goes through the real CLI pipeline.) ### Migration impact This closes the **string wall**'s last op: every name-dispatched string builtin (slices 1-7) + concatenation (8) now lower to wasm. The next compiler half is the **effect wall** (≈111 effect-gated corpus files). Builds on #575 (guard, merged) and #574 (design, merged). https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s --- _Generated by [Claude Code](https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s)_ Co-authored-by: Claude <noreply@anthropic.com>
hyperpolymath
added a commit
that referenced
this pull request
Jun 14, 2026
…584) ## Migration wave: 7 integer-brain kernels from string-gated idaptik modules First applied wave of the now-unblocked **string-gated corpus**. Phase B classified 71 string-gated files; closing the string wall (slices 1–8) plus the `len()` lowering (#583) opened the integer-brain extraction path. Seven kernels re-decomposed from idaptik `.res` modules into AffineScript brains under `proposals/idaptik/migrated/`, fanned out across 6 parallel agents and **re-verified by me before commit**. Each is a four-gate deliverable — G1 compile, G2 independent-oracle parity sweep, G4 assail. Strings / floats / promises / mutable state stay **host-side** per the C1–C12 recipe; only the pure-integer decision core crosses to wasm. | Kernel | Exports | Parity | Assail | |---|---|---|---| | PortScanner | 4 | 44/44 | clean | | PasswordCracker | 7 | 215/215 | clean | | FirewallDevice | 12 | 164/164 | clean | | Inventory | 9 | 2840/2840 | clean | | Drone | 32 | 1192/1192 | clean | | SecurityDog | 29 | 31533/31533 | clean | | GuardNPC | 19 | 359/359 | clean | **Re-decompositions, not transliterations** — e.g. PasswordCracker inverts the djb2 string-loop so the host walks the string and the brain does i32 math (`Math.imul`/`|0` modelled in the oracle); Inventory packs slot-state into a base-3 Int instead of a mutable array; FirewallDevice keeps CIDR/protocol *string* parsing host-side and decides over integer flags. Floats cross as floored milli-units; out-of-band inputs return guarded `-1` sentinels (assail-clean, no in-band collapse). Each oracle is an independent JS reimplementation from the `.res` semantics, not copied from the `.affine`. **Deduped:** `SecurityAI` dropped — already tracked as `migrated/securityai/` (with a boundary proof) from an earlier wave; `GlobalNetworkData` likewise pre-existed and was left untouched. **Two compiler quirks surfaced** (flagged for the playbook, not fixed here): `total` is a reserved keyword (parse error as an identifier); and an `if { … }` block immediately followed by a parenthesized expression parses as a function application. Both have trivial source-side workarounds. Builds on #583 (`len`, merged) and the string-wall slices (#574/#575/#578). https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s --- _Generated by [Claude Code](https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s)_ Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Design proposal — lowering string
++(andslice) to wasmDesign-only, awaiting your sign-off before implementation. This is the write-up you asked for on the last remaining string-wall items (concat /
slice), now grounded in investigation.The headline finding: string
++silently miscompiles to wasm++is type-dispatched in typecheck (string vs array), so string++typechecks and the interpreter is correct — but codegen lowers++only as list-concat (4-byte element stride), copying a string's[len][bytes]as if they were i32 elements. Measured for"ab" ++ "cd"(should be "abcd"):'c')'d')A #555-class silent mis-lowering — so this is a correctness fix, not a feature.
Why "add a
string_concatbuiltin" is ruled out (by evidence)++is the canonical string-concat surface;string_concatwas deliberately removed in STDLIB-04c:string_concatextern → real impl or remove (operator-only) #330 (stdlib/effects.affine:42).++pervasively (encoding/json/io/testing/…) — rejecting it breaks the stdlib.++(test_stdlib_laws.mlconcat laws + a conformance suite).So the only acceptable fix makes the existing
++lower correctly.The blocker + recommendation
Codegen is type-blind (
ExprBinarycarries no type; AST has no spans/ids;Opt.foldrebuilds the AST). The typechecker already decides string-vs-array per++node — the work is threading that to codegen.Recommended (A1): a typecheck-time elaboration that rewrites string
++into a dedicated internalExprStringConcatnode, which codegen lowers via the established byte-copy idiom (alloc4 + la + lb, twoI32Load8U/I32Store8loops — the list-++pattern with a 1-byte stride).++stays idiomatic, the stdlib keeps compiling, codegen needs no type logic.slice-on-string is mostly already covered by the shippedstring_sub(slice 3); defer its negative-index residual.Full analysis, alternatives (A2 annotation field; B rejected), implementation plan, and risks are in
proposals/DESIGN-string-concat.adoc.Decision requested
Sign off to proceed with A1 as Phase F slice 8, or redirect. I'll implement only after your go-ahead.
https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s
Generated by Claude Code