Skip to content

Audit large input limits for worker JSON validation, user regex search, and sentinel bounds #4058

Description

@Widthdom

Summary

Several large-input and resource-boundary paths should be reviewed together:

  • worker JSON validation parses a full JsonDocument before recursive property/string validation;
  • user-provided regex matching in file/status search uses a timeout but still relies on the classic regex engine for feature compatibility;
  • MaxValue/sentinel limits appear in multiple query, extraction, database, and MCP paths and should be checked for practical caps before allocation, traversal, or query expansion.

These paths appear intentionally bounded in part, but they are worth hardening as explicit resource-boundary contracts.

Evidence

Dogfood search/audit findings:

  • json-parse-apis surfaced WorkerProtocolJsonValidator.TryValidate as a full DOM parse path.
  • dotnet-risk-patterns surfaced regex construction in DbReader.FilesStatus.CreateFindRegexMatcher.
  • risky-code/unbounded-json-parse found JsonDocument.Parse in 17 production files, including UpdateChecker, ActiveWorkspace, CdidxConfigFile, ConsoleUi, IndexWatchRunner, ProgramRunner, QueryCommandRunner.Batch, WorkspaceManifest, DiagnosticRedactor, DependencyPackageExtractor, structured-data and path-alias extractors, LspServer, HttpMcpTransport, McpIndexRunLock, and WorkerProtocolJsonValidator.
  • dogfood-risk-patterns/max-value-sentinel found 60 MaxValue hits in 30 production files. Top files include QueryCommandRunner, C# reference extraction support, ReferenceExtractor.TypeReferences, DbReader.CSharpResolution, DbSearchReader, SearchSnippetFormatter, DbReader.FilesStatus, Mcp.RateLimiter, JsonEnvelopeWrapper, DbWriter, SqliteCommandPolicy, and McpToolHandlers.

Questions to resolve

  • Is there a strict upstream byte/frame limit before worker protocol JSON reaches JsonDocument.Parse?
  • Is DefaultMaxJsonProperties intentionally as high as it is for untrusted worker output?
  • Should user regex search expose a lower default budget, a clearer timeout error, or a safe/non-backtracking mode where compatible?
  • Which MaxValue uses are pure sentinels, and which can flow into allocation, traversal, SQL limit, pagination, timeout, or payload-size behavior?

Acceptance criteria

  • Document the intended byte, depth, property-count, timeout, and sentinel-limit contracts for these paths.
  • Add tests that exercise over-limit JSON and pathological regex inputs without excessive runtime.
  • Audit MaxValue hits and clamp or document any user-influenced paths.
  • Tighten limits or add streaming/pre-parse guards where the current bounds are insufficient.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions