Skip to content

perf: capture parse Position without boxing the offset Int#876

Merged
stephenamar-db merged 1 commit into
databricks:masterfrom
He-Pin:perf/parser-pos-no-boxing
Jun 2, 2026
Merged

perf: capture parse Position without boxing the offset Int#876
stephenamar-db merged 1 commit into
databricks:masterfrom
He-Pin:perf/parser-pos-no-boxing

Conversation

@He-Pin

@He-Pin He-Pin commented May 30, 2026

Copy link
Copy Markdown
Contributor

Motivation

Parser.Pos is invoked for nearly every AST node to capture the source offset. It was Index.map(off => new Position(...)): fastparse's Index stores the offset as an Int in its successValue: Any field (boxing it), and the .map then unboxes it and allocates a closure — per node. boxToInteger via SharedPackageDefs.Index was a top self-frame in the parse flamegraph on kube-prometheus.

Modification

  • Write the Position object straight into successValue via ctx.freshSuccess(new Position(fileScope, ctx.index)), skipping the Int box/unbox and the map closure. Parse output (positions / error offsets) is unchanged.

Result

Scala Native --debug-stats parse_time on kube-prometheus (148 files), interleaved and cooled:

min mean
master 92.6 ms 104.2 ms
this PR 86.9 ms 96.0 ms

→ ~6–8% faster parse; output byte-identical. (JMH ParserBenchmark over the test suite showed +5.4% on JVM.)

Test plan

  • ./mill __.reformat
  • ./mill 'sjsonnet.jvm[3.3.7]'.test — 518/518 pass

Motivation:
Parser.Pos is invoked for nearly every AST node. It was `Index.map(off => new
Position(...))`: fastparse's `Index` stores the offset as an Int in its
`successValue: Any` field (boxing it), and the `.map` then unboxes it and
allocates a closure — per node. boxToInteger via SharedPackageDefs.Index was a
top self-frame in the parse flamegraph on kube-prometheus.

Modification:
- Rewrite Pos to write the Position object straight into successValue via
  ctx.freshSuccess(new Position(fileScope, ctx.index)), skipping the Int
  box/unbox and the map closure. Parse output (positions/errors) is unchanged.

Result:
JMH ParserBenchmark (parse-only, all test-suite files): 1.669 -> 1.579 ms/op
(+5.4%, non-overlapping bands). Native parse_time on kube-prometheus:
~105.6 -> ~100.9 ms (+4.5%, consistent). Output byte-identical. 450/450 tests pass.
@He-Pin He-Pin force-pushed the perf/parser-pos-no-boxing branch from b8a525f to c231f70 Compare May 30, 2026 12:42
@stephenamar-db stephenamar-db merged commit 74d466f into databricks:master Jun 2, 2026
5 checks passed
stephenamar-db pushed a commit that referenced this pull request Jun 2, 2026
## Motivation

`std.manifestJson`, `std.manifestJsonMinified`, and `std.manifestJsonEx`
routed through `java.io.StringWriter`, paying `StringBuffer`
synchronization per `write`/`flush` on the hot manifestation path.
Source-built jrsonnet comparisons showed sjsonnet trailing on
object-heavy manifest workloads.

## Modification

- Add `StringBuilderWriter`: an unsynchronized `Writer` over a
`StringBuilder`.
- Add package-private `FastMaterializeJsonRenderer` backed by
`StringBuilderWriter`; route the three `std.manifestJson*` builtins
through it. Public `MaterializeJsonRenderer` ABI/shape unchanged.
- Use an in-place codepoint sort for `sortedVisibleKeyNames` /
`maybeSortKeys` (avoids `.sorted` boxing).
- Fix codepoint comparison for raw surrogate prefixes;
`UnicodeHandlingTests` extended.

## Result

Scala Native hyperfine on kube-prometheus, jrsonnet HEAD `2d7eed05`:

| Workload (native) | Before | After | Δ |
|---|---:|---:|---:|
| kube-prometheus, sjsonnet | 158.4 ± 16.8 ms | 143.7 ± 3.2 ms |
**−9.3%** |
| `manifestJsonEx`, sjsonnet | — | 5.09 ± 1.01 ms | new |

## Test plan

- [x] `./mill __.reformat`
- [x] `./mill 'sjsonnet.jvm[3.3.7]'.test` — 518/518 pass

This PR is the base for the stacked follow-ups #875 (TomlRenderer reuses
`StringBuilderWriter`), and the independent #876/#877/#878.
stephenamar-db pushed a commit that referenced this pull request Jun 22, 2026
…ontexts (#1011)

## Motivation

go-jsonnet issue #873 reports that `(function(x, x, x) x)(null, 3, 2)`
silently returns `2` instead of rejecting duplicate parameter names.
sjsonnet already rejected duplicates in function expressions, but the
error message was misleading in local bindings and object methods
(e.g. "Expected \")\"" or "Expected \":\"").

## Modification

The `params` parser correctly detects duplicates via `flatMapX` +
`Fail`. However, in `bind` (`local f(x,x)=...`) and `field`
(`{f(x,x):...}`), the optional group `.?` around `params` swallowed
the "no duplicate parameter" error and backtracked, producing
misleading errors.

Two changes fix this:

1. In `field()`: added `~/` (cut) after `"("` to commit to the params
   path once `"("` is matched: `"(" ~/ params(...) ~ ")"`.
2. In `params()`: set `ctx.cut = true` programmatically when a
   duplicate is detected, preventing outer `.?` operators from
   swallowing the error. Also captures `Pos` for each parameter to
   position the error at the duplicate parameter name (not at the
   closing paren).

## Result

All duplicate parameter contexts now produce the correct error
`"Expected no duplicate parameter: x"` with accurate position info:

- `(function(x, x, x) x)` -> ParseError (already worked)
- `local f(x, x) = x` -> ParseError (was `"Expected )"`)
- `{ f(x, x): x }` -> ParseError (was `"Expected :"`)
- `{ local f(x, x) = x }` -> ParseError (was `"Expected )"`)
- Nested bindings also fixed as a bonus.

Tests pass on Scala 3.3.7, 2.13.18, 2.12.21.

## References

- google/go-jsonnet#873


## Cross-implementation comparison: Duplicate function parameters

| Implementation | Detection Phase | All Contexts Covered | Error
Message Example |
|---|---|---|---|
| **sjsonnet (this fix)** | Parse time | ✅ All 4 contexts | `Expected no
duplicate parameter: x` |
| **cpp-jsonnet** | Static analysis (post-parse) | ✅ All 4 (via
desugaring) | `Duplicate function parameter: x` |
| **go-jsonnet** (master) | ❌ None | ❌ Bug — silently accepts | N/A —
last argument wins silently |
| **go-jsonnet** ([PR
#876](google/go-jsonnet#876), unmerged) | Parse
time | ✅ All 4 | `Duplicate function parameter: a` / `Duplicate method
parameter: x` |
| **jrsonnet** | Static analysis (post-parse) | ✅ All 4 (via desugaring)
| `local is already defined in the current frame: x` |

### Behavior by test case

| Test Case | sjsonnet (fix) | cpp-jsonnet | go-jsonnet | jrsonnet |
|---|---|---|---|---|
| `(function(x, x, x) x)(null, 3, 2)` | ✅ ParseError | ✅ StaticError | ❌
Returns `2` | ✅ StaticError |
| `local f(x, x) = x; f(1, 2)` | ✅ ParseError | ✅ StaticError | ❌
Accepted | ✅ StaticError |
| `{ f(x, x): x }.f(1, 2)` | ✅ ParseError | ✅ StaticError | ❌ Accepted |
✅ StaticError |
| `{ local f(x, x) = x, a: f(1, 2) }` | ✅ ParseError | ✅ StaticError | ❌
Accepted | ✅ StaticError |

### Key differences

- **Detection timing**: sjsonnet and go-jsonnet (PR #876) detect at
parse time; cpp-jsonnet and jrsonnet detect during a separate static
analysis pass after parsing.
- **Error message specificity**: go-jsonnet PR #876 distinguishes
"function parameter" vs "method parameter"; cpp-jsonnet uses "function
parameter" uniformly (methods desugar to functions before the check).
- **Position accuracy**: sjsonnet points to the duplicate parameter
name; cpp-jsonnet points to the entire function AST node.
- **go-jsonnet master** has a known bug ([issue
#873](google/go-jsonnet#873)) where duplicate
parameters are silently accepted with "last argument wins" semantics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants