compiler: string-wall slice 8b — type-directed string ++ lowering#578
Merged
Conversation
The full fix the slice-8a guard (#575) stood in for. String ++ now lowers correctly AND completely to wasm (incl. pure variable-to-variable, which the syntactic guard could not reach). Channel (type-directed elaboration): - ast.ml: new ExprStringConcat of expr * expr (not produced by the parser). - typecheck.ml: synth records each ++ node it types as String concat, by physical identity (string_concat_sites); elaborate_string_concat rewrites exactly those nodes to ExprStringConcat. Physical-identity keying is sound because typecheck and codegen run over the same prog object (parse_with_face's lowered prog, shared by resolve/typecheck/codegen); ExprBinary carries no span and same-text ++ occurrences are value-equal, so == is the correct key. - bin/main.ml: the wasm path runs elaborate_string_concat after typecheck, before Opt.fold_constants_program. The interpreter and non-wasm backends keep the original prog (String ++ = ExprBinary _ OpConcat _), so the oracle is unchanged and only the wasm backend sees the new node. Lowering (codegen.ml): byte concat — allocate 4 + la + lb, write the length word, copy a's then b's bytes — mirroring the list-concat handler but with 1-byte elements and a single length word (instead of 4-byte i32 elements, which was exactly the bug: the list path copied a string's [len][utf8] as i32 elements, so "ab" ++ "cd" read byte 2 as the length word of "cd" = 2 instead of 'c' = 99). Effect parity (effect_sites.ml): ExprStringConcat recurses like ExprBinary and is NOT counted as an ExprApp call site, so effect-ordinals stay identical between the interpreter (which sees ExprBinary) and the wasm backend (which sees ExprStringConcat) — avoiding a #555-class desync. An intrinsic-call encoding (ExprApp "__string_concat") would have shifted the ordinals; the dedicated node avoids that. opt.ml folds its sub-expressions; interp.ml handles it defensively as ordinary String ++. The 8a guard is retained as a backstop: any String ++ reaching codegen un-elaborated still errors loudly rather than emitting garbage. Tests: tests/codegen/string_concat.{affine,mjs} — executable wasm parity, byte-exact via the slice-1 reader: the "ab" ++ "cd" byte-2 = 99 regression (was 2), the var-var case the guard could not catch, chained a ++ b ++ c, and empty operands; oracle 6513269. test/test_e2e.ml "E2E String-wall slice 8 guard" gains a lowers-after-elaboration case. Verified: full run_codegen_wasm_tests.sh green incl. list_concat + slices 1-7 + effect tests; string ++ correct in if/match/fn/nested contexts. Design + ledger: proposals/DESIGN-string-concat.adoc (8b LANDED), proposals/MIGRATION-PLAN.adoc. https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s
🔍 Hypatia Security ScanFindings: 47 issues detected
View findings[
{
"reason": "Action actions/add-to-project@v1.0.2 needs attention",
"type": "unpinned_action",
"file": "add-to-roadmap.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Action denoland/setup-deno@v2 needs attention",
"type": "unpinned_action",
"file": "publish-jsr.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Action trufflesecurity/trufflehog@main needs attention",
"type": "unpinned_action",
"file": "secret-scanner.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "Issue in add-to-roadmap.yml",
"type": "missing_timeout_minutes",
"file": "add-to-roadmap.yml",
"action": "flag",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Issue in scorecard-enforcer.yml",
"type": "scorecard_publish_with_run_step",
"file": "scorecard-enforcer.yml",
"action": "split_scorecard_publish_job",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "Issue in instant-sync.yml",
"type": "secret_action_without_presence_gate",
"file": "instant-sync.yml",
"action": "peter-evans/repository-dispatch",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "Shell execution -- validate input before passing to shell (1 occurrences, CWE-78)",
"type": "js_exec_sync",
"file": "/home/runner/work/affinescript/affinescript/packages/affinescript-cli/mod.js",
"action": "flag",
"rule_module": "code_safety",
"severity": "high"
},
{
"reason": "Shell execution -- validate input before passing to shell (2 occurrences, CWE-78)",
"type": "js_exec_sync",
"file": "/home/runner/work/affinescript/affinescript/packages/affine-vscode/mod.js",
"action": "flag",
"rule_module": "code_safety",
"severity": "high"
},
{
"reason": "Shell execution -- validate input before passing to shell (1 occurrences, CWE-78)",
"type": "js_exec_sync",
"file": "/home/runner/work/affinescript/affinescript/affinescript-vite/src/affine-plugin-improved.js",
"action": "flag",
"rule_module": "code_safety",
"severity": "high"
},
{
"reason": "expect() in hot path (32 occurrences, CWE-754)",
"type": "expect_in_hot_path",
"file": "/home/runner/work/affinescript/affinescript/affinescriptiser/src/codegen/wasm_gen.rs",
"action": "flag",
"rule_module": "code_safety",
"severity": "medium"
}
]Powered by Hypatia Neurosymbolic CI/CD Intelligence |
hyperpolymath
added a commit
that referenced
this pull request
Jun 14, 2026
…584) ## Migration wave: 7 integer-brain kernels from string-gated idaptik modules First applied wave of the now-unblocked **string-gated corpus**. Phase B classified 71 string-gated files; closing the string wall (slices 1–8) plus the `len()` lowering (#583) opened the integer-brain extraction path. Seven kernels re-decomposed from idaptik `.res` modules into AffineScript brains under `proposals/idaptik/migrated/`, fanned out across 6 parallel agents and **re-verified by me before commit**. Each is a four-gate deliverable — G1 compile, G2 independent-oracle parity sweep, G4 assail. Strings / floats / promises / mutable state stay **host-side** per the C1–C12 recipe; only the pure-integer decision core crosses to wasm. | Kernel | Exports | Parity | Assail | |---|---|---|---| | PortScanner | 4 | 44/44 | clean | | PasswordCracker | 7 | 215/215 | clean | | FirewallDevice | 12 | 164/164 | clean | | Inventory | 9 | 2840/2840 | clean | | Drone | 32 | 1192/1192 | clean | | SecurityDog | 29 | 31533/31533 | clean | | GuardNPC | 19 | 359/359 | clean | **Re-decompositions, not transliterations** — e.g. PasswordCracker inverts the djb2 string-loop so the host walks the string and the brain does i32 math (`Math.imul`/`|0` modelled in the oracle); Inventory packs slot-state into a base-3 Int instead of a mutable array; FirewallDevice keeps CIDR/protocol *string* parsing host-side and decides over integer flags. Floats cross as floored milli-units; out-of-band inputs return guarded `-1` sentinels (assail-clean, no in-band collapse). Each oracle is an independent JS reimplementation from the `.res` semantics, not copied from the `.affine`. **Deduped:** `SecurityAI` dropped — already tracked as `migrated/securityai/` (with a boundary proof) from an earlier wave; `GlobalNetworkData` likewise pre-existed and was left untouched. **Two compiler quirks surfaced** (flagged for the playbook, not fixed here): `total` is a reserved keyword (parse error as an identifier); and an `if { … }` block immediately followed by a parenthesized expression parses as a function application. Both have trivial source-side workarounds. Builds on #583 (`len`, merged) and the string-wall slices (#574/#575/#578). https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s --- _Generated by [Claude Code](https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s)_ Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase F slice 8b — type-directed string
++loweringThe full fix the slice-8a guard (#575) stood in for. String
++now lowers correctly and completely to wasm — including pure variable-to-variablea ++ b, which the syntactic guard could not reach.The channel (type-directed elaboration)
ast.ml: newExprStringConcat of expr * expr(never produced by the parser).typecheck.ml:synthrecords each++node it types as String concat by physical identity (string_concat_sites);elaborate_string_concatrewrites exactly those nodes toExprStringConcat. Physical-identity keying is sound because typecheck and codegen run over the sameprogobject (parse_with_face's lowered prog, shared by resolve/typecheck/codegen);ExprBinarycarries no span and same-text++occurrences are value-equal, so==is the correct key.bin/main.ml: the wasm path runselaborate_string_concatafter typecheck, beforeOpt.fold. The interpreter and non-wasm backends keep the originalprog(ExprBinary _ OpConcat _), so the oracle is unchanged and only the wasm backend sees the new node.The lowering (
codegen.ml)Byte concat — allocate
4 + la + lb, write the length word, copy a's then b's bytes — mirroring the list-concat handler but with 1-byte elements + a single length word instead of 4-byte i32 elements. That i32-element copy was exactly the bug: a string's[len][utf8]was copied as i32 elements, so"ab" ++ "cd"read byte 2 as the length word of"cd"(= 2) instead of'c'(= 99).Effect-ordinal parity (
effect_sites.ml)ExprStringConcatrecurses likeExprBinaryand is not counted as anExprAppcall site, so effect-ordinals stay identical between interp (seesExprBinary) and wasm (seesExprStringConcat) — avoiding a #555-class desync. An intrinsic-call encoding (ExprApp "__string_concat") would have shifted the ordinals; the dedicated node avoids that.opt.mlfolds sub-expressions;interp.mlhandles it defensively.The 8a guard is retained as a backstop: any String
++reaching codegen un-elaborated still errors loudly rather than emitting garbage.Tests / verification
tests/codegen/string_concat.{affine,mjs}— executable wasm parity, byte-exact via the slice-1 reader: the"ab" ++ "cd"byte-2 = 99 regression (was 2), the var-var case the guard could not catch, chaineda ++ b ++ c, and empty operands (oracle 6513269).test/test_e2e.ml"E2E String-wall slice 8 guard" gains a lowers-after-elaboration case.run_codegen_wasm_tests.shgreen incl.list_concat+ slices 1-7 + effect tests; string++verified correct in if/match/fn/nested contexts. (dune runtestnot runnable in-sandbox — noalcotest; the codegen.mjsparity goes through the real CLI pipeline.)Migration impact
This closes the string wall's last op: every name-dispatched string builtin (slices 1-7) + concatenation (8) now lower to wasm. The next compiler half is the effect wall (≈111 effect-gated corpus files).
Builds on #575 (guard, merged) and #574 (design, merged).
https://claude.ai/code/session_01WoKhFQePiRsAj7aqnxbG8s
Generated by Claude Code