Skip to content

fix: bound graph-walk fan-out against the doubling-DAG DoS (#21)#23

Merged
delta9000 merged 4 commits into
mainfrom
fix/doubling-dag-emission-dos
Jun 27, 2026
Merged

fix: bound graph-walk fan-out against the doubling-DAG DoS (#21)#23
delta9000 merged 4 commits into
mainfrom
fix/doubling-dag-emission-dos

Conversation

@delta9000

Copy link
Copy Markdown
Owner

Bound graph-walk fan-out against the doubling-DAG DoS

Closes #21

Follow-up to the MEM-1 hardening (#20): the cycle+depth guards made the
extractor/pick/light walks crash-proof, but not DoS-proof against a
legitimate acyclic graph that fans out exponentially.

A "doubling DAG" — G0 children [G1,G1], G1 children [G2,G2], … ~30 deep, a
Shape at the leaf — has no containment back-edge (so breakContainmentCycles
leaves it intact) and stays far under the depth cap, yet the leaf is reachable
via 2^30 distinct paths. Per-path traversal then explodes into a CPU/memory DoS.

The fix differs by walker semantics

Per-path emission is intentional for USE-instancing (a node USE'd under N
placements legitimately yields N RenderItems/LightDescs/pick candidates, each in
its own world frame), so the emitters cannot be memoized — they must be
bounded. worldOf is the exception: a first-found search, fixed by
memoization.

Walker Mechanism
SceneExtractor::walk WalkBudget node-visit cap → budgetExceeded()
LightSystem::walk shares the snapshot's budget → same budgetExceeded()
PickSystem::pickNode WalkBudget (cap via pickClosest(..., maxVisits)) → PickResult.budgetExceeded
PickSystem::worldOfRec visited-set memoization — O(paths)→O(nodes), result unchanged

WalkBudget + kMaxGraphWalkVisits (default 1'000'000, overridable via
MeshBuildOptions.maxWalkVisits) live in runtime/RecursionLimits.hpp. Overflow
is a graceful, non-throwing signal (the extractor/pick are read paths), so a
capped walk returns a partial result plus the flag for an embedder to react.

Definition of Done

  • TDD: worldOf(absent) on a 2⁴⁵ DAG is a runtime RED (non-terminating
    pre-fix; instant post-memoization). The budget cases assert budgetExceeded +
    bounded itemCount(); a small scene never trips the cap.
  • mise run ci green (build/golden/conformance/coverage/cli-gate).
  • Codegen-free → golden binding hash byte-identical.
  • Per-header self-containment verified for the touched headers.
  • Living docs in-diff: ADR-0037 (binding embedder-visible contract;
    extends ADR-0016 from crash-proof to DoS-bounded), coverage.md row + count,
    mkdocs.yml nav, extract.md interface note. Design spec under
    docs/superpowers/specs/.
  • No new third-party dependency → NOTICE unchanged.

Code review

A fresh-context adversarial review verified the budget machinery (latch/off-by-one,
reset placement, shared-budget threading, MEM-1 ordering) and caught one real
correctness bug, now fixed: worldOfRec marked a node visited at entry, but
a node first reached near the MEM-1 depth cap has its subtree truncated — marking
it then wrongly pruned a shallower path on which the target was findable, returning
identity. The fix memoizes a node only when its subtree was fully explored (an
incomplete flag propagates depth-truncation up), pinned by a new regression
(worldof_depth_truncated_subtree_is_not_falsely_memoized). Two review minors also
applied: the extractor test's decorative itemCount assertion was corrected (the
doubling DAG reuses node pointers → one interned PathKey; budgetExceeded() is
the load-bearing signal), and the cap comment now states it counts node-visits
(shared across light + geometry), not RenderItems.

New test

runtime/scene/tests/walker_budget_test.cpp — worldOf memoization (absent
terminates / present correct / depth-truncated subtree not falsely memoized),
extractor + light + pick budget trips on a doubling DAG, and a no-trip guard for
a legitimate small scene.

Approved design for the doubling-DAG DoS follow-up: a WalkBudget node-visit cap
on the per-path emitters (extractor/pick/light) with a graceful budgetExceeded
signal, plus visited-set memoization for the worldOfRec search.

Refs #21
… DoS (#21)

The MEM-1 cycle+depth guards are crash-proof but not DoS-proof: an acyclic
'doubling DAG' (G0->[G1,G1], G1->[G2,G2], ...) has no back-edge and stays
shallow, yet its leaf is reachable via 2^depth distinct paths, so the per-path
graph walks fan out exponentially (CPU/memory DoS).

The fix differs by walker semantics:
- worldOf is a first-found SEARCH -> visited-set memoization. Pruning
  already-explored subtrees never changes the first-found result but drops
  O(paths) to O(nodes), so it SUCCEEDS on a legitimately wide DAG.
- SceneExtractor::walk, LightSystem::walk and PickSystem::pickNode emit/test
  PER PATH (intentional for USE-instancing), so they cannot memoize. Add a
  WalkBudget node-visit cap (RecursionLimits.hpp, default kMaxGraphWalkVisits =
  1e6, overridable via MeshBuildOptions.maxWalkVisits). On exhaustion the walk
  stops and latches a graceful, non-throwing signal: SceneExtractor::
  budgetExceeded() (one budget shared across the snapshot's light collection +
  geometry walk) and PickResult.budgetExceeded.

Tests (walker_budget_test.cpp): a 2^45 DAG makes worldOf(absent) terminate
instantly (vs non-terminating pre-fix) and worldOf(present) still returns the
correct transform; a 2^22 DAG trips the extractor/light/pick budgets and bounds
the emitted item count; a small scene never trips it.

Codegen-free; golden unaffected.

Refs #21
Living docs in the same change as the #21 fix (anti-drift):
- ADR-0037: the traversal-budget decision (WalkBudget cap on the per-path
  emitters + worldOf memoization), extending ADR-0016 from crash-proof to
  DoS-bounded. Embedder-visible contract -> binding ADR.
- coverage.md: ADR-0037 row + count 36->37 (coverage-gate green) + narrative.
- mkdocs.yml: ADR-0037 nav entry (docs-build --strict green).
- extract.md: budgetExceeded()/maxWalkVisits in the exposed interface + ADR link.

Refs #21
…view)

Adversarial review caught a real correctness bug: worldOfRec marked a node
visited at entry, but a node first reached near the MEM-1 depth cap has its
subtree truncated (children at depth>=1000 return without exploring). Marking it
then wrongly pruned a SHALLOWER path on which the target IS findable -> worldOf
returned identity instead of the true transform (a bounded-wrong camera/viewpoint
frame on an adversarial ~1000-deep graph; the prior "provably not under it"
comment was false).

Fix: propagate an "incomplete" flag and memoize a node ONLY when its subtree was
fully explored (no descendant hit the depth cap). The wide-DAG fast path is
unchanged (a shallow doubling DAG never truncates). New regression
worldof_depth_truncated_subtree_is_not_falsely_memoized pins it.

Also from review (minor): correct the extractor-budget test's decorative
itemCount assertion -- the doubling DAG reuses node pointers so all paths share
one PathKey and emit() interns to <=1 item (light collection, sharing the budget,
can consume it first); budgetExceeded() is the load-bearing signal. Clarify the
kMaxGraphWalkVisits comment (counts node-visits, shared across light+geometry,
not RenderItems).

Refs #21
@delta9000 delta9000 merged commit 763b6e9 into main Jun 27, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PERF/SEC — acyclic 'doubling DAG' triggers exponential per-path emission in extractor/pick walks (CPU/mem DoS)

1 participant