fix: parser & graph-walk DoS/overflow hardening (SEC-1, SEC-2, MEM-1) by delta9000 · Pull Request #20 · delta9000/x3d-cpp

delta9000 · 2026-06-27T01:17:44Z

Parser & graph-walk DoS/overflow hardening

Closes #11
Closes #12
Closes #13

Three reproduced security/robustness findings from the scoped engineering review,
fixed together because they share one theme — bounding recursion/size over
attacker-controlled input — and one shared mechanism (runtime/RecursionLimits.hpp).

SEC-1 (#11) — parser deep-nesting stack overflow (DoS)

A few-thousand-deep document SIGSEGV'd in the recursive-descent readers
(XmlLite parseElement↔parseContent, JsonLite parseValue↔parseArray/
parseObject, ClassicVRML/VRML97 parseNode↔parseNodeBody). Added one shared
cap (x3d::kMaxNestingDepth = 1000) and an RAII DepthGuard in each reader that
throws std::runtime_error past the cap — joining each reader's existing
soft-failure path. The guard restores depth on return, so wide-but-shallow
documents are unaffected; legitimate scenes nest well under 100 levels.

SEC-2 (#12) — SFImage signed-overflow + ~2GB allocation

<PixelTexture image="1 1 2000000000 0xFF"/> drove packed >> (8*b) with an
attacker-controlled count (signed-int overflow, UB) and pushed numComponents
bytes/pixel (multi-GB OOM from one token). parseImageFrom now clamps
numComponents to the X3D §5.3.6 range [0,4] before the unpack loop and shifts
in unsigned. Width/height each consume one pixel token, so the pixel loop is
already bounded by the token stream.

MEM-1 (#13) — extractor/pick walks lacked the cycle+depth guard

BoundsSystem/TransformSystem already break a containment cycle, but the
extractor-side walks (SceneExtractor::walk, LightSystem::collect) and pick
walks (PickSystem::pickNode/worldOfRec) carried neither guard, so a USE-cyclic
or pathologically deep scene driven through these public APIs without
buildSceneGraph's up-front breakContainmentCycles() SIGSEGV'd. walk/
pickNode carry the live root→node PathKey and gain a hard depth cap plus a
path-membership test before descending; worldOfRec/LightSystem::walk (no path
accumulator) gain a depth cap. The normal pipeline still severs cycles at build
time — this makes the walkers self-safe when an embedder drives them directly
(mirrors BoundsSystem's existing computing_ defense-in-depth; see ADR-0016).

Definition of Done

TDD: each fix has a regression test whose RED state is the real failure
(SIGSEGV / 2GB alloc) and whose GREEN state is the bounded/throwing behavior.
mise run ci green — build 45/45 ctests, golden byte-identical,
conformance-gate OK, coverage-gate OK, doc-ctest-gate OK, cli-gate-regression
PASS (Tier-1 200/200, no regressions; 1 previously-failing corpus item now
passes).
Per-header self-containment verified for the new + touched headers.
Living docs updated in the same diff (anti-drift): parse-readers.md
input-hardening note + new tests; ADR-0016 follow-up on the self-guarding
walkers.
No new third-party dependency → NOTICE unchanged.

Code review

A fresh-context adversarial review traced every recursion path and confirmed each
reader funnels through one guarded function (VRML97 dialect included, brace-skippers
are iterative), SEC-2's loop is genuinely token-bounded with a now-safe shift, and
MEM-1's path-membership cut preserves legitimate USE/DAG sharing with balanced
push/pop. Verdict: ready to merge. Two Minor follow-ups were applied in the last
commit: a counter-leak on the throwing guard (a reused reader could accumulate +1
per rejected parse) is fixed by checking before incrementing, and the three copied
guards are hoisted into one shared x3d::NestingGuard.

Known follow-up (out of scope)

The guards make the walkers crash-proof, not DoS-proof against a legitimate
acyclic "doubling DAG" (G0→[G1,G1], … ~30 deep): no back-edge, under the depth
cap, yet 2³⁰ distinct paths → exponential per-path emission. That is a separate
CPU/memory-DoS class (per-path emission is intentional for USE-instancing) needing a
different mechanism (emission budget / memoization), not the cycle+depth guard.
Recommend tracking as its own card — not filed here.

New tests

runtime/codecs/tests/xml_depth_guard_test.cpp — XML deep-nesting cap (SEC-1)
runtime/codecs/tests/sfimage_overflow_test.cpp — SFImage clamp (SEC-2)
runtime/parse/tests/parser_depth_guard_test.cpp — JSON + ClassicVRML caps (SEC-1)
runtime/scene/tests/walker_cycle_guard_test.cpp — extractor/pick/worldOf cycle termination (MEM-1)

) A hostile PixelTexture/SFImage token could set numComponents to a huge or negative value. parseImageFrom() then drove an inner shift `packed >> (8*b)` whose count overflowed a signed int (UB) and pushed numComponents bytes per pixel, allocating multiple GB from a single one-pixel token (e.g. <PixelTexture image="1 1 2000000000 0xFF"/>). Clamp numComponents to the X3D §5.3.6 range [0,4] before the unpack loop and shift in unsigned. Width/height each consume one pixel token, so the pixel loop is already bounded by the token stream. Closes #12

A few-thousand-deep document overran the native stack and SIGSEGV'd with no diagnostic: XmlLite's parseElement<->parseContent, JsonLite's parseValue<-> parseArray/parseObject, and the ClassicVRML parseNode<->parseNodeBody<-> applyNodeField recursions all descended attacker-controlled nesting unbounded. Add one shared cap (runtime/RecursionLimits.hpp, kMaxNestingDepth=1000) and an RAII DepthGuard in each reader that throws std::runtime_error past the cap — joining each reader's existing soft-failure path. The guard restores depth on return so sibling elements/values/nodes do not accumulate depth; legitimate nesting (well under 100 levels) is unaffected. Closes #11

…s (MEM-1, #13) BoundsSystem/TransformSystem already break a containment cycle, but the extractor-side walks (SceneExtractor::walk, LightSystem::collect) and the pick walks (PickSystem::pickNode/worldOfRec) carried neither a cycle nor a depth guard. A USE-cyclic or pathologically deep scene driven through these public APIs without buildSceneGraph's up-front breakContainmentCycles() SIGSEGV'd at extraction/pick time (the runtime twin of SEC-1). walk()/pickNode() carry the live root->n PathKey, so they gain a hard depth cap plus a path-membership test before descending (breaks back-edges). worldOfRec() and LightSystem::walk() have no path accumulator and gain a depth cap. The normal pipeline still severs cycles in buildSceneGraph; this makes the walkers self-safe when an embedder drives them directly. Closes #13

…M-1) Living-doc updates in the same change as the hardening (anti-drift): - parse-readers.md: new input-hardening note (shared kMaxNestingDepth cap on XML/JSON/VRML; SFImage numComponents clamp) + the new regression tests. - ADR-0016 (cycle-breaker): follow-up note that SceneExtractor/LightSystem/ PickSystem walks now self-guard (depth cap + path-membership) so they are safe when driven without buildSceneGraph's breakContainmentCycles. Refs #11, #12, #13

…row (SEC-1) Adversarial review follow-ups (no scope change): - The three readers each carried an identical local DepthGuard. Hoist a single x3d::NestingGuard into RecursionLimits.hpp (parameterized by a message tag). - Fix a counter leak: the old guard did `if (++d > cap) throw`, so a guard that threw in its constructor incremented but never ran its destructor, leaving depth_ one too high. Harmless with a fresh reader per parse (the front door), but a reused ClassicVrmlReader accumulated +1 per rejected parse until it rejected valid documents. NestingGuard checks BEFORE incrementing, so a rejected parse leaves the counter untouched. Same cap boundary (<=1000 levels). - Record the cap-is-a-heuristic / worker-thread-stack caveat in the header. New regression (also pins the cap boundary): a reused reader that rejects a too-deep document still accepts a subsequent at-the-cap (1000-deep) document (vrml_reader_depth_counter_resets_after_a_rejected_parse). Refs #11

* build(ci): add ci preset mirroring GitHub Actions gate (BLD-3) - CMakePresets.json: add a 'ci' preset that retains the default X3D_CPP_PER_HEADER_CHECKS=ON, in a separate build-ci/ directory so the dev preset's build/ cache stays warm for fast iteration. - mise.toml: add [tasks.build-ci] (uses --preset ci), and switch the [tasks.ci] dependency from build -> build-ci so the local pipeline matches the merge gate. - docs/wiki/guides/build-and-mise.md: document the ci preset and build-ci task alongside the existing dev flow. Closes #16 * build(deps): add vcpkg.json manifest pinning backend deps (BLD-5) - vcpkg.json: declare the three vcpkg-sourced backend deps (curl, aws-sdk-cpp[core,s3], freetype) with a pinned builtin-baseline (commit SHA 01f6021... = vcpkg 2024.05.24 release) so the port versions are reproducible across CI runs. - .github/workflows/ci.yml: switch both the assetresolver-swap and fontmetrics-swap vcpkg install steps to manifest mode ('vcpkg install --triplet x64-linux') so the manifest is the single source of truth. Bump the binary-cache key to encode the baseline SHA so a baseline bump invalidates the cache. - docs/wiki/guides/build-and-mise.md: add a vcpkg manifest section documenting the dep set + baseline. Closes #18 * build(ci): widen swap-test path-gates + unconditional texture-swap (BLD-4) - .github/workflows/ci.yml: - changes job: each swap-test regex now also covers its CONSUMING subsystems (e.g. assetresolver-swap triggers on parse/InlineExpand/ ext/ExtResolver/TextureExtract changes; fontmetrics-swap on TextLayout/TextExtract/MeshBuilder/RenderItem) so a refactor of a shared extractor/parser that uses a seam can't merge green without re-proving parity. - texture-swap: drop the needs: changes / if: gate. Both decoders are vendored + hermetic and the gate runs <2 min, so byte-equal decode parity now runs on every PR — the closest thing the SDK has to a render-correctness guarantee. - Drop the texture output from the changes job (always-run, no gate). Closes #17 * build(warnings): -Wall -Wextra on project targets, -Werror on the ci gate (BLD-1) CMakeLists.txt / CMakePresets.json / .github/workflows/ci.yml: - CMakeLists.txt: add_compile_options(-Wall -Wextra) globally for every project-owned target (vendored TUs Jolt / quickjs-ng / stb* / wuffs suppress with their own PRIVATE -w). New X3D_CPP_WERROR option (default OFF) promotes warnings to errors. - CMakePresets.json: the 'ci' preset now sets X3D_CPP_WERROR=ON, so 'mise run build-ci' (and the dev's local mirror of the gate) fails on any new warning. - ci.yml: pass -DX3D_CPP_WERROR=ON to the cpp fast gate and the cpp-matrix baseline, so all PR pushes and the manual release matrix share the same warning-as-error gate. Source files (14 sites, 6 unused functions/vars + 1 misleading-indent + 7 dead-code helpers + 2 unused parameters + 1 deprecated call): - runtime/codecs/JsonWriter.hpp: mark unused depth param [[maybe_unused]]. - runtime/parse/ClassicVrmlReader.hpp: mark unused scene param. - runtime/codecs/tests/codec_roundtrip_audit_test.cpp: delete unused canonXml helper. - runtime/extract/tests/mesh_builder_b7_test.cpp: delete unused dot helper. - runtime/events/tests/m2d_tick_test.cpp: delete unused addChild helper. - runtime/events/tests/node_lifecycle_audit_test.cpp: delete unused kids helper. - runtime/events/tests/viewpoint_offset_test.cpp: delete unused setF helper. - runtime/events/tests/animation_test.cpp: pragma-silence the addActiveNode deprecation (tracked for migration). - runtime/events/tests/scene_bridge_test.cpp: same pragma-silence. - runtime/events/tests/follower_conformance_test.cpp: add braces to the misleadingly-indented one-line for loops. - runtime/events/tests/mem_safety_audit_test.cpp: mark unused oldAddr variable [[maybe_unused]]. - runtime/sound/tests/sound_system_test.cpp: add Panner case to the switch (no-op, closes the enum-coverage warning). - tools/x3d-cli/cli_gate.cpp: delete unused readFile + loadSubset helpers; mark unused verdicts param. - tools/x3d-cli/canon_gate.cpp: delete unused writeFile + t2fail_count. - tools/x3d-cli/canonicalize_unit_test.cpp: delete unused writeTempFile. - tools/x3d_cli.cpp: delete unused profileExceedances + unused local 'table' ref; mark unused 'scene' param. Verified clean under -Wall -Wextra (-Werror) for all project targets: 449/449 ctests green under the ci preset. Closes #14 * build(ci): ASan/UBSan + libFuzzer gates, fix the memory bugs they caught (BLD-2) Adds two memory-safety build modes — both separate from `mise run ci`, each its own GitHub Actions job on every PR — and fixes the memory bugs the sanitizer gate surfaced in the existing test suite. Gates: - CMakePresets.json: new `san` preset (X3D_CPP_SAN=ON -> ASan + UBSan, -fno-sanitize-recover=all, RelWithDebInfo) and `fuzz` preset (Clang-only, X3D_CPP_FUZZ=ON), plus their build/test presets. - CMakeLists.txt: X3D_CPP_SAN adds -fsanitize=address,undefined globally; X3D_CPP_FUZZ builds the x3d_parse_fuzz harness. The harness now carries COMPILE-time -fsanitize=fuzzer,address,undefined: `fuzzer` at compile time is what inserts SanitizerCoverage (the edge feedback libFuzzer mutates against) and ASan/UBSan instrumentation into the header-only parse path. Without it the fuzzer ran blind (no coverage) and ASan saw nothing — it only linked the runtimes. Verified: corpus now grows from 1 -> 1000+ units. - runtime/parse/tests/parse_fuzz.cpp: new libFuzzer harness driving sdk::parseDocument across all four encodings; asserts the "never panic" contract (no crash/leak/UB on any input). - mise.toml: `build-san` (ctest under sanitizers) + `build-fuzz` tasks. - ci.yml: `cpp-san` (build + every ctest under ASan/UBSan) and `cpp-fuzz` (bounded 60s libFuzzer smoke) jobs, unconditional on every PR. Bugs the gate caught (all in tests; the runtime APIs are sound): - json_script_field_test.cpp: heap-use-after-free. The `effectiveByName` helper returned a pointer into a single `static FieldTable` that the next call reassigned (freeing the prior buffer); roundTrips() held the `fraction` and `scale` pointers across the second call, then read freed memory. Replaced with a `byName(table, name)` search over a caller-owned FieldTable whose lifetime spans all pointer uses. - shared_ptr containment-cycle leaks in tests that deliberately keep a cycle intact through their cycle-safety assertions: bounds_cycle_test.cpp, walker_cycle_guard_test.cpp, range_validate_audit_test.cpp, range_warnings_test.cpp. Each now severs the back-edge afterward (teardown hygiene — via breakContainmentCycles or by clearing the children field) so LeakSanitizer is clean. walker_cycle_guard's leak only surfaced after the rebase onto the MEM-1 work (#20), which the new san gate had never run. - inline_containment_cycle_test.cpp: the raw parse intentionally returns the cyclic graph (sanitization is a downstream X3DExecutionContext step), so the parse-layer test now calls breakContainmentCycles(doc.scene) before scope exit to collect the self-cycle. Docs (anti-drift, in-diff): - build-and-mise.md: new "Sanitizer and fuzz gates" section + build-san/ build-fuzz task rows; also fixed a pre-existing broken relative link to ci.yml (strict docs-build was red on the branch). - parse-readers.md: note the libFuzzer harness in "How it is tested". Gate status: `cmake --preset san` build + 450/450 ctests green under ASan/UBSan; `cpp-fuzz` 60s smoke clean; `mise run docs-build` strict green; `mise run ci` green (cli-gate-regression needs the local corpus). * fix(ci): quote step names with a colon so the workflow YAML parses Two 'Install backend deps (vcpkg.json manifest: curl + aws-sdk-cpp)' / '(... manifest: freetype)' step names carried an unquoted colon-space, which YAML reads as a nested mapping ('mapping values are not allowed here') — the whole workflow failed to parse and every CI run aborted in 0s with no checks reported. Quote both names. Introduced with the vcpkg manifest jobs (BLD-4/5). * fix(deps): drop "core" from aws-sdk-cpp features in vcpkg.json vcpkg rejects the manifest outright: \`the feature "core" cannot be in a dependency's feature list\` — "core" is the implicit base, not a listable feature (s3 pulls it in transitively). The bad manifest failed BOTH vcpkg swap-test jobs at \`vcpkg install\` (manifest mode validates every declared dep, so even the freetype-only FontMetrics job died on it). Surfaced for the first time now that the workflow YAML parses and the jobs actually run. Introduced with the manifest (BLD-5). --------- Co-authored-by: claude <claude@anthropic.com>

delta9000 added 5 commits June 26, 2026 20:46

delta9000 merged commit 506db8e into main Jun 27, 2026
12 checks passed

delta9000 mentioned this pull request Jun 27, 2026

PERF/SEC — acyclic 'doubling DAG' triggers exponential per-path emission in extractor/pick walks (CPU/mem DoS) #21

Closed

delta9000 deleted the fix/parser-dos-hardening branch June 27, 2026 01:27

This was referenced Jun 27, 2026

fix: bound graph-walk fan-out against the doubling-DAG DoS (#21) #23

Merged

build: build & CI hygiene batch (BLD-1..5) #24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: parser & graph-walk DoS/overflow hardening (SEC-1, SEC-2, MEM-1)#20

fix: parser & graph-walk DoS/overflow hardening (SEC-1, SEC-2, MEM-1)#20
delta9000 merged 5 commits into
mainfrom
fix/parser-dos-hardening

delta9000 commented Jun 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

delta9000 commented Jun 27, 2026

Parser & graph-walk DoS/overflow hardening

SEC-1 (#11) — parser deep-nesting stack overflow (DoS)

SEC-2 (#12) — SFImage signed-overflow + ~2GB allocation

MEM-1 (#13) — extractor/pick walks lacked the cycle+depth guard

Definition of Done

Code review

Known follow-up (out of scope)

New tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant