Summary
Dogfood review found 397 Regex. API hits across 90 production files. Many are likely calls to the repository's bounded wrapper or generated/precompiled patterns, but the high count makes it worth auditing whether any static/raw BCL regex paths bypass timeout policy.
Evidence
Dogfood command:
dotnet ./src/CodeIndex/bin/Debug/net8.0/cdidx.dll search --recipe dogfood-risk-patterns/static-regex-api --path src/ --exclude-tests --count-by file --limit 140
Top files:
SqlReferenceExtractor: 34
LanguageReferenceExtractionSupport: 33
SymbolExtractor: 25
PythonReferenceExtractor: 18
RReferenceExtractor: 16
SymbolExtractor.JavaScriptTypeScriptSupport: 16
SymbolExtractor.CSharpScanner: 13
PhpReferenceExtractor: 12
RustReferenceExtractor: 11
SymbolExtractor.Markup: 10
ReferenceExtractor.Core: 9
CSharpReferenceExtractor.Support: 8
ReferenceExtractor, SymbolExtractor.Go, SymbolExtractor.Php: 7 each
Related positive evidence:
Audit goals
- Distinguish wrapper-backed
CodeIndex.Indexer.BoundedRegex calls from raw BCL static calls.
- Confirm all untrusted or large-input regex matching has an explicit timeout or shared bounded policy.
- Confirm generated/precompiled or fixed small-input patterns are documented as safe.
- Add a command or analyzer-friendly convention if the alias makes audits ambiguous.
Acceptance criteria
- Static regex hits are classified as bounded-wrapper, raw BCL with timeout, generated/precompiled, trusted small input, or fix-needed.
- Fix-needed raw BCL calls are moved to bounded helpers or explicit timeout overloads.
- Tests cover at least one timeout/large-input behavior path if code changes are needed.
- Documentation or code comments clarify the intended bounded-regex convention.
Summary
Dogfood review found 397
Regex.API hits across 90 production files. Many are likely calls to the repository's bounded wrapper or generated/precompiled patterns, but the high count makes it worth auditing whether any static/raw BCL regex paths bypass timeout policy.Evidence
Dogfood command:
Top files:
SqlReferenceExtractor: 34LanguageReferenceExtractionSupport: 33SymbolExtractor: 25PythonReferenceExtractor: 18RReferenceExtractor: 16SymbolExtractor.JavaScriptTypeScriptSupport: 16SymbolExtractor.CSharpScanner: 13PhpReferenceExtractor: 12RustReferenceExtractor: 11SymbolExtractor.Markup: 10ReferenceExtractor.Core: 9CSharpReferenceExtractor.Support: 8ReferenceExtractor,SymbolExtractor.Go,SymbolExtractor.Php: 7 eachRelated positive evidence:
dotnet-risk-patternsreported 78 files usingusing Regex = CodeIndex.Indexer.BoundedRegex, which suggests many hits may be wrapper-backed.new Regex(construction is much lower and is covered by Audit large input limits for worker JSON validation, user regex search, and sentinel bounds #4058 for large-input/user-regex behavior.Audit goals
CodeIndex.Indexer.BoundedRegexcalls from raw BCL static calls.Acceptance criteria