term: self-normalize exponent encoding (§4.1) so policy identity is libc-independent by acastellana · Pull Request #16 · genlayerlabs/unhardcoded-engine

acastellana · 2026-06-19T12:28:16Z

term: self-normalize exponent encoding (§4.1) so identity is libc-independent

What

term.num_enc now rewrites the %.17g exponent to the canonical e±dd form
(lowercase e, explicit sign, ≥2 digits) instead of trusting platform printf.
term.encode / fingerprint / policy identity are now byte-identical on every libc.

Why

Number rendering is part of the SIGMA-POL §4.1 encoding spec — every conformant
encoder must render a given number to identical bytes. Delegating the exponent to
printf violated that on libcs that pad differently (e.g. MSVC's three-digit
e-005 vs glibc's e-05), which silently forks the identity space for any
policy carrying an exponent-form number (magnitude < ~1e-4 or very large) — breaking
the cache key, "request X ran policy H" replay/audit, and on-chain commitment.

Changes (scoped to the fix only)

llm_policy/term.lua — extract normalize_exp; num_enc returns
normalize_exp(string.format("%.17g", v)). Expose T._normalize_exp for tests.
tests/unit/ir_term.lua — a direct normalizer test (non-canonical → canonical) plus
an encode-path integration test. lua tests/run_lua.lua → 571 passed, 0 failed.

The companion §4.1 prose update to docs/SIGMA-POL.md ("the reference encoder
self-enforces the canonical exponent") is intentionally left out of this PR to keep
it to the code + its guard. The executable spec already matches: the committed
golden vectors carry canonical exponents, and this change reproduces them
byte-for-byte.

Not a version bump (stays `sigma-pol/v1`)

Verified the change rewrites 0 of the committed golden vectors — all three
exponent vectors (1.0000000000000001e-05, 1e+100, 2.5000000000000002e-10) are
already canonical, so regenerating the golden produces a byte-identical file and no
existing identity rotates. The fix only changes output on platforms that were
previously non-conformant (i.e. already wrong).

Closes a real test gap

The golden's exponent vectors are already canonical, and a 2-digit-printf CI
(glibc/macOS) emits canonical form straight from %.17g — so the golden conformance
test passes with or without the normalization there and can't catch a regression of
this logic. The new direct unit test on synthetic non-canonical inputs
(1e-005→1e-05, 1E-05→1e-05, 1e-5→1e-05, 1e+017→1e+17, 1e5→1e+05) closes it.

Risk

Minimal / latent. No live policy or config uses exponent-range numbers today, and on
glibc/macOS the output already matched the golden; this hardens correctness for
arbitrary user policies and non-glibc hosts.

Note: this PR is only the exponent fix. The other working-tree changes
(config.example.lua extends docs, the cost-scorer + greybox hunks of
SIGMA-POL.md, the other doc edits) are unrelated and belong in separate PRs.

Summary by CodeRabbit

Bug Fixes
- Improved numeric encoding to normalize floating-point numbers with scientific notation into a standardized exponent format, ensuring consistent representation across the system.
Tests
- Added unit tests to verify correct normalization of numeric exponent formats and encoding behavior.

…t (§4.1) num_enc now rewrites the %.17g exponent to the canonical e±dd form (lowercase e, explicit sign, >=2 digits) instead of trusting platform printf. encode / fingerprint / policy identity are now byte-identical on every libc — previously a libc that pads exponents differently (e.g. MSVC's "e-005") forked the identity of any policy carrying an exponent-form number, breaking the cache key, replay/audit, and on-chain commitment. Stays sigma-pol/v1: this rewrites 0 of the committed golden vectors (all already canonical), so no existing identity rotates; only previously non-conformant platforms change. Adds a direct conformance test for the normalization (tests/unit/ir_term.lua) — the golden's exponent vectors are already canonical and a 2-digit-printf CI emits canonical form from %.17g, so the golden alone can't catch a regression of this logic. Full suite: 571 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-06-19T12:28:31Z

📝 Walkthrough

Walkthrough

Adds a normalize_exp helper to llm_policy/term.lua that rewrites %.17g scientific-notation exponents into a canonical e±dd form (lowercase e, explicit sign, two-digit minimum exponent). num_enc now routes non-integer finite numbers through this helper. Two new tests in tests/unit/ir_term.lua cover direct normalization cases and the full encode() path.

Changes

Float exponent canonicalization

Layer / File(s)	Summary
`normalize_exp` helper and `num_enc` wiring `llm_policy/term.lua`	Adds `normalize_exp` that rewrites `e`/`E` notation to lowercase `e` with explicit sign, strips leading zeros, and zero-pads to two digits; exports it as `T._normalize_exp`. Updates `num_enc` to apply this normalization for non-integer finite numbers instead of returning the raw `%.17g` string.
Unit and integration tests `tests/unit/ir_term.lua`	Adds a direct test for `T._normalize_exp` covering MSVC 3-digit exponents, uppercase `E`, missing sign, padding, and idempotence. Adds an integration test asserting exponent-range numerics encode to canonical bytes via the full `encode()` path.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A rabbit hops through digits grand,
e+07 lands just as planned—
No three-digit exponent shall sneak by,
Uppercase E? We normalize with a sigh.
Each float now wears a canonical tie! 🐇✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: adding self-normalization of exponent encoding in the term module to ensure platform-independent policy identity, which directly aligns with the primary purpose of the PR.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/sigma-pol-exponent-canonicalization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/unit/ir_term.lua (1)

220-225: ⚡ Quick win

Simulate non-canonical printf output in the encode-path test.

As written, this passes on two-digit-printf hosts even if num_enc stops calling normalize_exp, because 1e-5 already formats canonically there. Temporarily stubbing string.format("%.17g", 1e-5) to return an MSVC-style exponent would make this integration test platform-independent.

Test hardening sketch

 t.test("encode: exponent-range number params encode canonically", function()
+    local old_format = string.format
+    string.format = function(fmt, v, ...)
+        if fmt == "%.17g" and v == 1e-5 then
+            return "1.0000000000000001e-005"
+        end
+        return old_format(fmt, v, ...)
+    end
+
     local pol = { "policy", { "ev_zero" },
         { "and", { "meets_req" }, { "cmp", "price_out", "le", 1e-5 } },
         { "neg", { "normalize", { "field", "price_out" } } },
         { "argmax" }, { "id" }, { "always", { action = "next_candidate" } } }
-    t.contains(enc(pol), "1.0000000000000001e-05", "tiny price ceiling encodes canonically")
+    local ok, out = pcall(enc, pol)
+    string.format = old_format
+    assert(ok, out)
+    t.contains(out, "1.0000000000000001e-05", "tiny price ceiling encodes canonically")
 end)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/ir_term.lua` around lines 220 - 225, The test "encode:
exponent-range number params encode canonically" needs to be hardened to be
platform-independent. Currently it passes on platforms where 1e-5 already
formats canonically, even if the normalize_exp function is not working. Add a
stub for string.format that returns an MSVC-style exponent format (with
two-digit exponent like 1.0000000000000001e-005 instead of
1.0000000000000001e-05) when called with "%.17g" and 1e-5, so that the test
actually validates the normalization behavior instead of relying on
platform-specific printf behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unit/ir_term.lua`:
- Around line 220-225: The test "encode: exponent-range number params encode
canonically" needs to be hardened to be platform-independent. Currently it
passes on platforms where 1e-5 already formats canonically, even if the
normalize_exp function is not working. Add a stub for string.format that returns
an MSVC-style exponent format (with two-digit exponent like
1.0000000000000001e-005 instead of 1.0000000000000001e-05) when called with
"%.17g" and 1e-5, so that the test actually validates the normalization behavior
instead of relying on platform-specific printf behavior.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9ec0db80-c787-4b57-a806-3fa2dbfce173

📥 Commits

Reviewing files that changed from the base of the PR and between f80ccca and 28a4eb3.

📒 Files selected for processing (2)

llm_policy/term.lua
tests/unit/ir_term.lua

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

term: self-normalize exponent encoding (§4.1) so policy identity is libc-independent#16

term: self-normalize exponent encoding (§4.1) so policy identity is libc-independent#16
acastellana wants to merge 1 commit into
mainfrom
fix/sigma-pol-exponent-canonicalization

acastellana commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

acastellana commented Jun 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

term: self-normalize exponent encoding (§4.1) so identity is libc-independent

What

Why

Changes (scoped to the fix only)

Not a version bump (stays sigma-pol/v1)

Closes a real test gap

Risk

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

acastellana commented Jun 19, 2026 •

edited by coderabbitai Bot

Loading

Not a version bump (stays `sigma-pol/v1`)

coderabbitai Bot commented Jun 19, 2026 •

edited

Loading