Skip to content

fix: parser & graph-walk DoS/overflow hardening (SEC-1, SEC-2, MEM-1)#20

Merged
delta9000 merged 5 commits into
mainfrom
fix/parser-dos-hardening
Jun 27, 2026
Merged

fix: parser & graph-walk DoS/overflow hardening (SEC-1, SEC-2, MEM-1)#20
delta9000 merged 5 commits into
mainfrom
fix/parser-dos-hardening

Conversation

@delta9000

Copy link
Copy Markdown
Owner

Parser & graph-walk DoS/overflow hardening

Closes #11
Closes #12
Closes #13

Three reproduced security/robustness findings from the scoped engineering review,
fixed together because they share one theme — bounding recursion/size over
attacker-controlled input — and one shared mechanism (runtime/RecursionLimits.hpp).

SEC-1 (#11) — parser deep-nesting stack overflow (DoS)

A few-thousand-deep document SIGSEGV'd in the recursive-descent readers
(XmlLite parseElementparseContent, JsonLite parseValueparseArray/
parseObject, ClassicVRML/VRML97 parseNodeparseNodeBody). Added one shared
cap (x3d::kMaxNestingDepth = 1000) and an RAII DepthGuard in each reader that
throws std::runtime_error past the cap — joining each reader's existing
soft-failure path. The guard restores depth on return, so wide-but-shallow
documents are unaffected; legitimate scenes nest well under 100 levels.

SEC-2 (#12) — SFImage signed-overflow + ~2GB allocation

<PixelTexture image="1 1 2000000000 0xFF"/> drove packed >> (8*b) with an
attacker-controlled count (signed-int overflow, UB) and pushed numComponents
bytes/pixel (multi-GB OOM from one token). parseImageFrom now clamps
numComponents to the X3D §5.3.6 range [0,4] before the unpack loop and shifts
in unsigned. Width/height each consume one pixel token, so the pixel loop is
already bounded by the token stream.

MEM-1 (#13) — extractor/pick walks lacked the cycle+depth guard

BoundsSystem/TransformSystem already break a containment cycle, but the
extractor-side walks (SceneExtractor::walk, LightSystem::collect) and pick
walks (PickSystem::pickNode/worldOfRec) carried neither guard, so a USE-cyclic
or pathologically deep scene driven through these public APIs without
buildSceneGraph's up-front breakContainmentCycles() SIGSEGV'd. walk/
pickNode carry the live root→node PathKey and gain a hard depth cap plus a
path-membership test before descending; worldOfRec/LightSystem::walk (no path
accumulator) gain a depth cap. The normal pipeline still severs cycles at build
time — this makes the walkers self-safe when an embedder drives them directly
(mirrors BoundsSystem's existing computing_ defense-in-depth; see ADR-0016).

Definition of Done

  • TDD: each fix has a regression test whose RED state is the real failure
    (SIGSEGV / 2GB alloc) and whose GREEN state is the bounded/throwing behavior.
  • mise run ci green — build 45/45 ctests, golden byte-identical,
    conformance-gate OK, coverage-gate OK, doc-ctest-gate OK, cli-gate-regression
    PASS (Tier-1 200/200, no regressions; 1 previously-failing corpus item now
    passes
    ).
  • Per-header self-containment verified for the new + touched headers.
  • Living docs updated in the same diff (anti-drift): parse-readers.md
    input-hardening note + new tests; ADR-0016 follow-up on the self-guarding
    walkers.
  • No new third-party dependency → NOTICE unchanged.

Code review

A fresh-context adversarial review traced every recursion path and confirmed each
reader funnels through one guarded function (VRML97 dialect included, brace-skippers
are iterative), SEC-2's loop is genuinely token-bounded with a now-safe shift, and
MEM-1's path-membership cut preserves legitimate USE/DAG sharing with balanced
push/pop. Verdict: ready to merge. Two Minor follow-ups were applied in the last
commit: a counter-leak on the throwing guard (a reused reader could accumulate +1
per rejected parse) is fixed by checking before incrementing, and the three copied
guards are hoisted into one shared x3d::NestingGuard.

Known follow-up (out of scope)

The guards make the walkers crash-proof, not DoS-proof against a legitimate
acyclic
"doubling DAG" (G0→[G1,G1], … ~30 deep): no back-edge, under the depth
cap, yet 2³⁰ distinct paths → exponential per-path emission. That is a separate
CPU/memory-DoS class (per-path emission is intentional for USE-instancing) needing a
different mechanism (emission budget / memoization), not the cycle+depth guard.
Recommend tracking as its own card — not filed here.

New tests

  • runtime/codecs/tests/xml_depth_guard_test.cpp — XML deep-nesting cap (SEC-1)
  • runtime/codecs/tests/sfimage_overflow_test.cpp — SFImage clamp (SEC-2)
  • runtime/parse/tests/parser_depth_guard_test.cpp — JSON + ClassicVRML caps (SEC-1)
  • runtime/scene/tests/walker_cycle_guard_test.cpp — extractor/pick/worldOf cycle termination (MEM-1)

)

A hostile PixelTexture/SFImage token could set numComponents to a huge or
negative value. parseImageFrom() then drove an inner shift `packed >> (8*b)`
whose count overflowed a signed int (UB) and pushed numComponents bytes per
pixel, allocating multiple GB from a single one-pixel token (e.g.
<PixelTexture image="1 1 2000000000 0xFF"/>).

Clamp numComponents to the X3D §5.3.6 range [0,4] before the unpack loop and
shift in unsigned. Width/height each consume one pixel token, so the pixel
loop is already bounded by the token stream.

Closes #12
A few-thousand-deep document overran the native stack and SIGSEGV'd with no
diagnostic: XmlLite's parseElement<->parseContent, JsonLite's parseValue<->
parseArray/parseObject, and the ClassicVRML parseNode<->parseNodeBody<->
applyNodeField recursions all descended attacker-controlled nesting unbounded.

Add one shared cap (runtime/RecursionLimits.hpp, kMaxNestingDepth=1000) and an
RAII DepthGuard in each reader that throws std::runtime_error past the cap —
joining each reader's existing soft-failure path. The guard restores depth on
return so sibling elements/values/nodes do not accumulate depth; legitimate
nesting (well under 100 levels) is unaffected.

Closes #11
…s (MEM-1, #13)

BoundsSystem/TransformSystem already break a containment cycle, but the
extractor-side walks (SceneExtractor::walk, LightSystem::collect) and the pick
walks (PickSystem::pickNode/worldOfRec) carried neither a cycle nor a depth
guard. A USE-cyclic or pathologically deep scene driven through these public
APIs without buildSceneGraph's up-front breakContainmentCycles() SIGSEGV'd at
extraction/pick time (the runtime twin of SEC-1).

walk()/pickNode() carry the live root->n PathKey, so they gain a hard depth cap
plus a path-membership test before descending (breaks back-edges). worldOfRec()
and LightSystem::walk() have no path accumulator and gain a depth cap. The
normal pipeline still severs cycles in buildSceneGraph; this makes the walkers
self-safe when an embedder drives them directly.

Closes #13
…M-1)

Living-doc updates in the same change as the hardening (anti-drift):
- parse-readers.md: new input-hardening note (shared kMaxNestingDepth cap on
  XML/JSON/VRML; SFImage numComponents clamp) + the new regression tests.
- ADR-0016 (cycle-breaker): follow-up note that SceneExtractor/LightSystem/
  PickSystem walks now self-guard (depth cap + path-membership) so they are safe
  when driven without buildSceneGraph's breakContainmentCycles.

Refs #11, #12, #13
…row (SEC-1)

Adversarial review follow-ups (no scope change):
- The three readers each carried an identical local DepthGuard. Hoist a single
  x3d::NestingGuard into RecursionLimits.hpp (parameterized by a message tag).
- Fix a counter leak: the old guard did `if (++d > cap) throw`, so a guard that
  threw in its constructor incremented but never ran its destructor, leaving
  depth_ one too high. Harmless with a fresh reader per parse (the front door),
  but a reused ClassicVrmlReader accumulated +1 per rejected parse until it
  rejected valid documents. NestingGuard checks BEFORE incrementing, so a
  rejected parse leaves the counter untouched. Same cap boundary (<=1000 levels).
- Record the cap-is-a-heuristic / worker-thread-stack caveat in the header.

New regression (also pins the cap boundary): a reused reader that rejects a
too-deep document still accepts a subsequent at-the-cap (1000-deep) document
(vrml_reader_depth_counter_resets_after_a_rejected_parse).

Refs #11
@delta9000 delta9000 merged commit 506db8e into main Jun 27, 2026
12 checks passed
@delta9000 delta9000 deleted the fix/parser-dos-hardening branch June 27, 2026 01:27
delta9000 added a commit that referenced this pull request Jun 27, 2026
* build(ci): add ci preset mirroring GitHub Actions gate (BLD-3)

- CMakePresets.json: add a 'ci' preset that retains the default
  X3D_CPP_PER_HEADER_CHECKS=ON, in a separate build-ci/ directory
  so the dev preset's build/ cache stays warm for fast iteration.
- mise.toml: add [tasks.build-ci] (uses --preset ci), and switch
  the [tasks.ci] dependency from build -> build-ci so the local
  pipeline matches the merge gate.
- docs/wiki/guides/build-and-mise.md: document the ci preset and
  build-ci task alongside the existing dev flow.

Closes #16

* build(deps): add vcpkg.json manifest pinning backend deps (BLD-5)

- vcpkg.json: declare the three vcpkg-sourced backend deps (curl,
  aws-sdk-cpp[core,s3], freetype) with a pinned builtin-baseline
  (commit SHA 01f6021... = vcpkg 2024.05.24 release) so the port
  versions are reproducible across CI runs.
- .github/workflows/ci.yml: switch both the assetresolver-swap and
  fontmetrics-swap vcpkg install steps to manifest mode
  ('vcpkg install --triplet x64-linux') so the manifest is the
  single source of truth. Bump the binary-cache key to encode the
  baseline SHA so a baseline bump invalidates the cache.
- docs/wiki/guides/build-and-mise.md: add a vcpkg manifest section
  documenting the dep set + baseline.

Closes #18

* build(ci): widen swap-test path-gates + unconditional texture-swap (BLD-4)

- .github/workflows/ci.yml:
  - changes job: each swap-test regex now also covers its CONSUMING
    subsystems (e.g. assetresolver-swap triggers on parse/InlineExpand/
    ext/ExtResolver/TextureExtract changes; fontmetrics-swap on
    TextLayout/TextExtract/MeshBuilder/RenderItem) so a refactor of
    a shared extractor/parser that uses a seam can't merge green
    without re-proving parity.
  - texture-swap: drop the needs: changes / if: gate. Both decoders are
    vendored + hermetic and the gate runs <2 min, so byte-equal decode
    parity now runs on every PR — the closest thing the SDK has to a
    render-correctness guarantee.
  - Drop the texture output from the changes job (always-run, no gate).

Closes #17

* build(warnings): -Wall -Wextra on project targets, -Werror on the ci gate (BLD-1)

CMakeLists.txt / CMakePresets.json / .github/workflows/ci.yml:
- CMakeLists.txt: add_compile_options(-Wall -Wextra) globally for every
  project-owned target (vendored TUs Jolt / quickjs-ng / stb* / wuffs
  suppress with their own PRIVATE -w). New X3D_CPP_WERROR option
  (default OFF) promotes warnings to errors.
- CMakePresets.json: the 'ci' preset now sets X3D_CPP_WERROR=ON, so
  'mise run build-ci' (and the dev's local mirror of the gate) fails
  on any new warning.
- ci.yml: pass -DX3D_CPP_WERROR=ON to the cpp fast gate and the
  cpp-matrix baseline, so all PR pushes and the manual release
  matrix share the same warning-as-error gate.

Source files (14 sites, 6 unused functions/vars + 1 misleading-indent
+ 7 dead-code helpers + 2 unused parameters + 1 deprecated call):

- runtime/codecs/JsonWriter.hpp: mark unused depth param [[maybe_unused]].
- runtime/parse/ClassicVrmlReader.hpp: mark unused scene param.
- runtime/codecs/tests/codec_roundtrip_audit_test.cpp: delete unused
  canonXml helper.
- runtime/extract/tests/mesh_builder_b7_test.cpp: delete unused dot helper.
- runtime/events/tests/m2d_tick_test.cpp: delete unused addChild helper.
- runtime/events/tests/node_lifecycle_audit_test.cpp: delete unused kids helper.
- runtime/events/tests/viewpoint_offset_test.cpp: delete unused setF helper.
- runtime/events/tests/animation_test.cpp: pragma-silence the
  addActiveNode deprecation (tracked for migration).
- runtime/events/tests/scene_bridge_test.cpp: same pragma-silence.
- runtime/events/tests/follower_conformance_test.cpp: add braces to the
  misleadingly-indented one-line for loops.
- runtime/events/tests/mem_safety_audit_test.cpp: mark unused oldAddr
  variable [[maybe_unused]].
- runtime/sound/tests/sound_system_test.cpp: add Panner case to the
  switch (no-op, closes the enum-coverage warning).
- tools/x3d-cli/cli_gate.cpp: delete unused readFile + loadSubset helpers;
  mark unused verdicts param.
- tools/x3d-cli/canon_gate.cpp: delete unused writeFile + t2fail_count.
- tools/x3d-cli/canonicalize_unit_test.cpp: delete unused writeTempFile.
- tools/x3d_cli.cpp: delete unused profileExceedances + unused local
  'table' ref; mark unused 'scene' param.

Verified clean under -Wall -Wextra (-Werror) for all project targets:
449/449 ctests green under the ci preset.

Closes #14

* build(ci): ASan/UBSan + libFuzzer gates, fix the memory bugs they caught (BLD-2)

Adds two memory-safety build modes — both separate from `mise run ci`, each
its own GitHub Actions job on every PR — and fixes the memory bugs the
sanitizer gate surfaced in the existing test suite.

Gates:
- CMakePresets.json: new `san` preset (X3D_CPP_SAN=ON -> ASan + UBSan,
  -fno-sanitize-recover=all, RelWithDebInfo) and `fuzz` preset (Clang-only,
  X3D_CPP_FUZZ=ON), plus their build/test presets.
- CMakeLists.txt: X3D_CPP_SAN adds -fsanitize=address,undefined globally;
  X3D_CPP_FUZZ builds the x3d_parse_fuzz harness. The harness now carries
  COMPILE-time -fsanitize=fuzzer,address,undefined: `fuzzer` at compile time
  is what inserts SanitizerCoverage (the edge feedback libFuzzer mutates
  against) and ASan/UBSan instrumentation into the header-only parse path.
  Without it the fuzzer ran blind (no coverage) and ASan saw nothing — it
  only linked the runtimes. Verified: corpus now grows from 1 -> 1000+ units.
- runtime/parse/tests/parse_fuzz.cpp: new libFuzzer harness driving
  sdk::parseDocument across all four encodings; asserts the "never panic"
  contract (no crash/leak/UB on any input).
- mise.toml: `build-san` (ctest under sanitizers) + `build-fuzz` tasks.
- ci.yml: `cpp-san` (build + every ctest under ASan/UBSan) and `cpp-fuzz`
  (bounded 60s libFuzzer smoke) jobs, unconditional on every PR.

Bugs the gate caught (all in tests; the runtime APIs are sound):
- json_script_field_test.cpp: heap-use-after-free. The `effectiveByName`
  helper returned a pointer into a single `static FieldTable` that the next
  call reassigned (freeing the prior buffer); roundTrips() held the
  `fraction` and `scale` pointers across the second call, then read freed
  memory. Replaced with a `byName(table, name)` search over a caller-owned
  FieldTable whose lifetime spans all pointer uses.
- shared_ptr containment-cycle leaks in tests that deliberately keep a cycle
  intact through their cycle-safety assertions: bounds_cycle_test.cpp,
  walker_cycle_guard_test.cpp, range_validate_audit_test.cpp,
  range_warnings_test.cpp. Each now severs the back-edge afterward (teardown
  hygiene — via breakContainmentCycles or by clearing the children field) so
  LeakSanitizer is clean. walker_cycle_guard's leak only surfaced after the
  rebase onto the MEM-1 work (#20), which the new san gate had never run.
- inline_containment_cycle_test.cpp: the raw parse intentionally returns the
  cyclic graph (sanitization is a downstream X3DExecutionContext step), so
  the parse-layer test now calls breakContainmentCycles(doc.scene) before
  scope exit to collect the self-cycle.

Docs (anti-drift, in-diff):
- build-and-mise.md: new "Sanitizer and fuzz gates" section + build-san/
  build-fuzz task rows; also fixed a pre-existing broken relative link to
  ci.yml (strict docs-build was red on the branch).
- parse-readers.md: note the libFuzzer harness in "How it is tested".

Gate status: `cmake --preset san` build + 450/450 ctests green under
ASan/UBSan; `cpp-fuzz` 60s smoke clean; `mise run docs-build` strict green;
`mise run ci` green (cli-gate-regression needs the local corpus).

* fix(ci): quote step names with a colon so the workflow YAML parses

Two 'Install backend deps (vcpkg.json manifest: curl + aws-sdk-cpp)' /
'(... manifest: freetype)' step names carried an unquoted colon-space, which
YAML reads as a nested mapping ('mapping values are not allowed here') — the
whole workflow failed to parse and every CI run aborted in 0s with no checks
reported. Quote both names. Introduced with the vcpkg manifest jobs (BLD-4/5).

* fix(deps): drop "core" from aws-sdk-cpp features in vcpkg.json

vcpkg rejects the manifest outright: \`the feature "core" cannot be in a
dependency's feature list\` — "core" is the implicit base, not a listable
feature (s3 pulls it in transitively). The bad manifest failed BOTH vcpkg
swap-test jobs at \`vcpkg install\` (manifest mode validates every declared dep,
so even the freetype-only FontMetrics job died on it). Surfaced for the first
time now that the workflow YAML parses and the jobs actually run. Introduced
with the manifest (BLD-5).

---------

Co-authored-by: claude <claude@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant