term: self-normalize exponent encoding (§4.1) so policy identity is libc-independent#16
term: self-normalize exponent encoding (§4.1) so policy identity is libc-independent#16acastellana wants to merge 1 commit into
Conversation
…t (§4.1) num_enc now rewrites the %.17g exponent to the canonical e±dd form (lowercase e, explicit sign, >=2 digits) instead of trusting platform printf. encode / fingerprint / policy identity are now byte-identical on every libc — previously a libc that pads exponents differently (e.g. MSVC's "e-005") forked the identity of any policy carrying an exponent-form number, breaking the cache key, replay/audit, and on-chain commitment. Stays sigma-pol/v1: this rewrites 0 of the committed golden vectors (all already canonical), so no existing identity rotates; only previously non-conformant platforms change. Adds a direct conformance test for the normalization (tests/unit/ir_term.lua) — the golden's exponent vectors are already canonical and a 2-digit-printf CI emits canonical form from %.17g, so the golden alone can't catch a regression of this logic. Full suite: 571 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughAdds a ChangesFloat exponent canonicalization
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/unit/ir_term.lua (1)
220-225: ⚡ Quick winSimulate non-canonical printf output in the encode-path test.
As written, this passes on two-digit-printf hosts even if
num_encstops callingnormalize_exp, because1e-5already formats canonically there. Temporarily stubbingstring.format("%.17g", 1e-5)to return an MSVC-style exponent would make this integration test platform-independent.Test hardening sketch
t.test("encode: exponent-range number params encode canonically", function() + local old_format = string.format + string.format = function(fmt, v, ...) + if fmt == "%.17g" and v == 1e-5 then + return "1.0000000000000001e-005" + end + return old_format(fmt, v, ...) + end + local pol = { "policy", { "ev_zero" }, { "and", { "meets_req" }, { "cmp", "price_out", "le", 1e-5 } }, { "neg", { "normalize", { "field", "price_out" } } }, { "argmax" }, { "id" }, { "always", { action = "next_candidate" } } } - t.contains(enc(pol), "1.0000000000000001e-05", "tiny price ceiling encodes canonically") + local ok, out = pcall(enc, pol) + string.format = old_format + assert(ok, out) + t.contains(out, "1.0000000000000001e-05", "tiny price ceiling encodes canonically") end)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/ir_term.lua` around lines 220 - 225, The test "encode: exponent-range number params encode canonically" needs to be hardened to be platform-independent. Currently it passes on platforms where 1e-5 already formats canonically, even if the normalize_exp function is not working. Add a stub for string.format that returns an MSVC-style exponent format (with two-digit exponent like 1.0000000000000001e-005 instead of 1.0000000000000001e-05) when called with "%.17g" and 1e-5, so that the test actually validates the normalization behavior instead of relying on platform-specific printf behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/unit/ir_term.lua`:
- Around line 220-225: The test "encode: exponent-range number params encode
canonically" needs to be hardened to be platform-independent. Currently it
passes on platforms where 1e-5 already formats canonically, even if the
normalize_exp function is not working. Add a stub for string.format that returns
an MSVC-style exponent format (with two-digit exponent like
1.0000000000000001e-005 instead of 1.0000000000000001e-05) when called with
"%.17g" and 1e-5, so that the test actually validates the normalization behavior
instead of relying on platform-specific printf behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9ec0db80-c787-4b57-a806-3fa2dbfce173
📒 Files selected for processing (2)
llm_policy/term.luatests/unit/ir_term.lua
term: self-normalize exponent encoding (§4.1) so identity is libc-independent
What
term.num_encnow rewrites the%.17gexponent to the canonicale±ddform(lowercase
e, explicit sign, ≥2 digits) instead of trusting platformprintf.term.encode/fingerprint/ policy identity are now byte-identical on every libc.Why
Number rendering is part of the SIGMA-POL §4.1 encoding spec — every conformant
encoder must render a given number to identical bytes. Delegating the exponent to
printfviolated that on libcs that pad differently (e.g. MSVC's three-digite-005vs glibc'se-05), which silently forks the identity space for anypolicy carrying an exponent-form number (magnitude < ~1e-4 or very large) — breaking
the cache key, "request X ran policy H" replay/audit, and on-chain commitment.
Changes (scoped to the fix only)
llm_policy/term.lua— extractnormalize_exp;num_encreturnsnormalize_exp(string.format("%.17g", v)). ExposeT._normalize_expfor tests.tests/unit/ir_term.lua— a direct normalizer test (non-canonical → canonical) plusan encode-path integration test.
lua tests/run_lua.lua→ 571 passed, 0 failed.Not a version bump (stays
sigma-pol/v1)Verified the change rewrites 0 of the committed golden vectors — all three
exponent vectors (
1.0000000000000001e-05,1e+100,2.5000000000000002e-10) arealready canonical, so regenerating the golden produces a byte-identical file and no
existing identity rotates. The fix only changes output on platforms that were
previously non-conformant (i.e. already wrong).
Closes a real test gap
The golden's exponent vectors are already canonical, and a 2-digit-
printfCI(glibc/macOS) emits canonical form straight from
%.17g— so the golden conformancetest passes with or without the normalization there and can't catch a regression of
this logic. The new direct unit test on synthetic non-canonical inputs
(
1e-005→1e-05,1E-05→1e-05,1e-5→1e-05,1e+017→1e+17,1e5→1e+05) closes it.Risk
Minimal / latent. No live policy or config uses exponent-range numbers today, and on
glibc/macOS the output already matched the golden; this hardens correctness for
arbitrary user policies and non-glibc hosts.
Summary by CodeRabbit
Bug Fixes
Tests