[linter-miner] feat(linters): add regexp-compile-in-function linter#33608
Conversation
Implements a new static analysis linter that detects regexp.MustCompile() and regexp.Compile() calls inside function bodies. These should be moved to package-level variables for performance. Evidence from mining: - Found 20+ instances in pkg/ directory - Issue #33459 identified performance hotspots - Real-world impact in sanitization and parsing code The linter: - Detects regexp compilation inside functions, methods, and closures - Suggests moving patterns to package-level variables - Reduces repeated compilation overhead in hot paths Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
🧪 Test Quality Sentinel completed test quality analysis. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. |
|
✅ PR Code Quality Reviewer completed the code quality review. |
There was a problem hiding this comment.
Review Summary
This linter addresses a real performance issue and follows the established analyzer patterns well. The implementation is clean and the test coverage demonstrates the intended behavior.
However, there are two important correctness issues:
- Missing import qualification - The identifier check doesn't verify "regexp" actually refers to the standard library
- False positives - Legitimate dynamic regexp patterns (user input, config-driven) will be incorrectly flagged
Key Issues
- ❌ Import validation missing:
regexpidentifier may be shadowed or aliased - ❌ Dynamic patterns flagged: Legitimate runtime compilation will trigger false positives
Recommendations
- Add type checking to verify the package is
regexpfrom standard library (see inline comment) - Only flag literal string patterns that could be moved to package-level (see inline comment)
What works well
- ✅ Clean implementation following Go analysis conventions
- ✅ Proper use of
inspector.WithStackfor context tracking - ✅ Test file exclusion using
filecheck.IsTestFile - ✅ Comprehensive test fixtures with positive/negative cases
- ✅ Good integration with existing linter infrastructure
These changes will significantly reduce false positives and make the linter more robust.
🔎 Code quality review by PR Code Quality Reviewer · ● 2.5M
| if !ok { | ||
| return false | ||
| } | ||
| return ident.Name == "regexp" && (sel.Sel.Name == "MustCompile" || sel.Sel.Name == "Compile") |
There was a problem hiding this comment.
The isRegexpCompileCall function only checks the identifier name "regexp" but doesn't verify it actually refers to the standard library regexp package. This can cause false positives if code shadows the package name or uses a different package with the same name.
Why this matters:
- A local variable/parameter named
regexpwould trigger false positives - An aliased import (
import rx "regexp") would be missed - A custom package named
regexpwould be incorrectly flagged
Suggested fix:
Use type checking to verify the selector refers to the actual standard library package. You'll need to access pass.TypesInfo which requires the types package in the analyzer requirements. Other linters in the codebase use this pattern to avoid false positives from shadowed names.
Example approach:
func isRegexpCompileCall(call *ast.CallExpr, pass *analysis.Pass) bool {
sel, ok := call.Fun.(*ast.SelectorExpr)
if !ok {
return false
}
ident, ok := sel.X.(*ast.Ident)
if !ok {
return false
}
// Verify ident refers to the regexp package from standard library
obj := pass.TypesInfo.Uses[ident]
if obj == nil {
return false
}
pkg, ok := obj.(*types.PkgName)
if !ok || pkg.Imported().Path() != "regexp" {
return false
}
return sel.Sel.Name == "MustCompile" || sel.Sel.Name == "Compile"
}| } | ||
|
|
||
| // Check if we're inside a function (not package-level) | ||
| if isInsideFunction(stack) { |
There was a problem hiding this comment.
This linter will flag all regexp compilation in functions, including legitimate cases where dynamic patterns are required.
False positive scenarios:
- User input patterns:
regexp.Compile(userProvidedPattern)- pattern comes from config/API - Conditional patterns: Pattern varies based on runtime state
- One-time initialization: Functions called once during startup
Why this matters:
While the general advice (move to package-level) is sound for static patterns, dynamic patterns must be compiled at runtime. Flagging these creates noise and trains developers to ignore the linter.
Suggested enhancement:
Only flag cases where the pattern is a literal string:
if isInsideFunction(stack) && isLiteralPattern(call) {
pass.Report(analysis.Diagnostic{
Pos: call.Pos(),
End: call.End(),
Message: "regexp compilation with literal pattern inside function should be moved to package-level variable",
})
}Add helper:
func isLiteralPattern(call *ast.CallExpr) bool {
if len(call.Args) == 0 {
return false
}
_, ok := call.Args[0].(*ast.BasicLit)
return ok
}This way regexp.MustCompile( + "^[a-z]+$" + ) is flagged, but regexp.Compile(userPattern) is not.
There was a problem hiding this comment.
Pull request overview
Adds a new custom go/analysis linter to detect regexp.MustCompile / regexp.Compile calls inside function bodies, with analysistest fixtures and registration in the custom linter runner.
Changes:
- Introduce
pkg/linters/regexpcompileinfunctionanalyzer implementing the regexp-in-function check. - Add analysistest coverage via
testdata/fixtures and a unit test. - Register the new analyzer in
cmd/linters/main.goso it runs in the multichecker binary.
Show a summary per file
| File | Description |
|---|---|
pkg/linters/regexpcompileinfunction/regexpcompileinfunction.go |
Implements the analyzer logic using inspector.WithStack to detect regexp compilation inside functions. |
pkg/linters/regexpcompileinfunction/regexpcompileinfunction_test.go |
Adds analysistest harness for the new analyzer. |
pkg/linters/regexpcompileinfunction/testdata/src/regexpcompileinfunction/regexpcompileinfunction.go |
Provides positive/negative fixtures with // want diagnostics. |
cmd/linters/main.go |
Wires the analyzer into the multichecker runner. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 4/4 changed files
- Comments generated: 1
| // Check if we're inside a function (not package-level) | ||
| if isInsideFunction(stack) { | ||
| pass.Report(analysis.Diagnostic{ | ||
| Pos: call.Pos(), | ||
| End: call.End(), | ||
| Message: "regexp compilation inside function should be moved to package-level variable", | ||
| }) | ||
| } |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd (test-driven development) and /zoom-out (architectural consistency) based on this being a new linter feature with test coverage.
Key Themes
1. Test coverage gaps (/tdd)
- Missing edge cases: aliased imports (
import rx "regexp"), dot imports, init() functions - No test demonstrating that test file exclusion works
- Current testdata only covers the happy path
2. String-based detection is fragile (/tdd)
- Using
ident.Name == "regexp"won't catchimport rx "regexp" - Should use type information (
pass.TypesInfo) likectxbackgrounddoes - This will cause false negatives in real-world code
3. Inconsistent with sibling linters (/zoom-out)
ctxbackgroundprovidesSuggestedFixesfor editor quick-fixes- This linter only reports diagnostics without suggested fixes
- Adding fixes would improve developer experience and maintain architectural consistency
Positive Highlights ✅
- Clean structure: Follows established patterns in
pkg/linters/perfectly - Good motivation: PR description clearly explains the performance rationale with evidence
- Comprehensive testdata: Covers functions, methods, loops, closures, and valid cases
- Proper registration: Correctly integrated into
cmd/linters/main.go
Verdict
Requesting changes on:
- Type-based detection (not string matching) to handle import aliases
- Additional test cases for edge cases mentioned above
The SuggestedFixes improvement is nice-to-have but not blocking.
💡 Learn more about these skills:
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 5.8M
| ident, ok := sel.X.(*ast.Ident) | ||
| if !ok { | ||
| return false | ||
| } |
There was a problem hiding this comment.
[/tdd] This string-based check will miss aliased imports:
import rx "regexp"
func bad() {
re := rx.MustCompile("...") // Won't be caught!
}Recommendation: Use type information instead of string matching. See how ctxbackground uses types.Identical() for robust detection:
func isRegexpCompileCall(pass *analysis.Pass, call *ast.CallExpr) bool {
sel, ok := call.Fun.(*ast.SelectorExpr)
if !ok {
return false
}
if sel.Sel.Name != "MustCompile" && sel.Sel.Name != "Compile" {
return false
}
// Check the type of sel.X is *regexp.Regexp
t := pass.TypesInfo.TypeOf(sel.X)
return t != nil && t.String() == "regexp"
}Add a testdata case with aliased imports to verify this works.
|
|
||
| func (v *Validator) Validate(s string) bool { | ||
| return packageLevelRegexp.MatchString(s) | ||
| } |
There was a problem hiding this comment.
[/tdd] Missing critical test cases:
- Test file exclusion - No test demonstrating that
_test.gofiles are correctly excluded - Init functions - Should
init()functions be flagged? The implementation will flag them, but is that desired? - Aliased imports - No coverage for
import rx "regexp" - Dot imports - What about
import . "regexp"?
Add testdata cases:
// Test file should be excluded
// (in a separate regexpcompileinfunction_test.go file)
func TestHelper(t *testing.T) {
re := regexp.MustCompile(`test`) // Should NOT be flagged
...
}
// Init function edge case
func init() {
re := regexp.MustCompile(`pattern`) // Unclear if this should be flagged
}
// Aliased import
import rx "regexp"
func processAliased(s string) {
re := rx.MustCompile(`pattern`) // Should be flagged
...
}These edge cases ensure the linter behaves correctly in real codebases.
| pass.Report(analysis.Diagnostic{ | ||
| Pos: call.Pos(), | ||
| End: call.End(), | ||
| Message: "regexp compilation inside function should be moved to package-level variable", |
There was a problem hiding this comment.
[/zoom-out] Missing SuggestedFixes that sibling linters provide.
Compare to ctxbackground (lines 63-75), which offers automatic fixes. This linter could provide a similar fix by suggesting the variable name:
pass.Report(analysis.Diagnostic{
Pos: call.Pos(),
End: call.End(),
Message: "regexp compilation inside function should be moved to package-level variable",
SuggestedFixes: []analysis.SuggestedFix{
{
Message: "Move to package-level variable",
TextEdits: []analysis.TextEdit{
{
Pos: call.Pos(),
End: call.End(),
NewText: []byte("/* TODO: define package-level var */ " + extractedVarName),
},
},
},
},
})This maintains consistency with the codebase's linter architecture and improves developer experience with quick-fix support in editors.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
🧪 Test Quality Sentinel completed test quality analysis. |
✅ Design Decision Gate — ADR VerifiedThe implementation in this PR aligns with the linked Architecture Decision Record. ADR reviewed: ADR-33608: Add regexp-compile-in-function Linter (Status: Verification SummaryEvery normative requirement in the ADR's RFC 2119 specification is satisfied by the code changes in this PR:
Notes for the PR author before flipping the ADR from Draft → Accepted
The design decision has been recorded and the implementation follows it. 🏗️ References: §26185856962
|
|
✅ PR Code Quality Reviewer completed the code quality review. |
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd and /zoom-out based on the new linter feature addition.
Summary
The implementation is architecturally sound and follows project conventions excellently. The main issue is incomplete test coverage — the test fixture validates happy paths but misses critical edge cases and negative cases that document what the linter doesn't catch.
Key Findings
From /tdd perspective:
- ✅ Test uses correct
analysistestframework - ✅ Has
// wantcomments for positive cases - ❌ Missing negative cases (package-level usage that should NOT trigger)
- ❌ Missing edge cases (aliased imports, nested functions, test file exclusion)
- ❌ No integration test against real codebase examples mentioned in ADR
From /zoom-out perspective:
- ✅ Fits perfectly with existing linter architecture (
largefunc,osexitinlibrary, etc.) - ✅ Uses correct AST inspector pattern (
WithStack) - ✅ Follows package layout conventions
- ✅ ADR is exemplary — comprehensive alternatives and consequences
What's Great
- Excellent ADR — one of the best I've seen; thoroughly documents context, alternatives, and trade-offs
- Consistent conventions — follows existing linter patterns perfectly
- Clear diagnostic message — actionable and matches the RFC 2119 specification
- Good AST usage —
WithStackis the right tool for context tracking
Required Changes
Before merge, strengthen the test suite:
- Add negative test cases showing package-level usage that should NOT trigger
- Add edge case tests for aliased imports (documenting the known limitation)
- Add nested function literal test to validate stack traversal
- Add test file exclusion test to verify
IsTestFileworks - Consider adding a comment in
isRegexpCompileCallexplaining the intentional limitation
These aren't about fixing bugs — they're about documenting expected behavior so future maintainers understand what works, what doesn't, and why.
Verdict
Requesting changes for test coverage improvements. The core implementation is production-ready; we just need tests that document edge cases and limitations.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 2.9M
| ) | ||
|
|
||
| // Package-level regexp compilation is allowed and recommended. | ||
| var packageLevelRegexp = regexp.MustCompile(`^[a-z]+$`) |
There was a problem hiding this comment.
[/tdd] Missing negative test case: the fixture should include explicit examples of package-level regexp.MustCompile usage that should not trigger the linter.
Add a clear section showing valid patterns:
// VALID: Package-level compilation (should NOT be flagged)
var validPattern1 = regexp.MustCompile(`^[a-z]+$`)
var validPattern2, _ = regexp.Compile(`\d+`)
func init() {
// VALID: init() is special but still a function
// Should this be flagged? Document the expected behavior
initRegexp := regexp.MustCompile(`init`)
}This documents what the linter doesn't catch, which is as important as what it does catch.
| "github.com/github/gh-aw/pkg/linters/regexpcompileinfunction" | ||
| ) | ||
|
|
||
| func TestAnalyzer(t *testing.T) { |
There was a problem hiding this comment.
[/tdd] Missing test: verify that test files are correctly excluded by the filecheck.IsTestFile check. Add a test fixture in a *_test.go file:
// Create testdata/src/regexpcompileinfunction/excluded_test.go
package regexpcompileinfunction
import (
"regexp"
"testing"
)
// NOT FLAGGED: test files are excluded
func TestSomething(t *testing.T) {
re := regexp.MustCompile(`test`) // Should NOT trigger linter
// ...
}Without this test, there's no validation that the IsTestFile check actually works.
| return nil, nil | ||
| } | ||
|
|
||
| // isRegexpCompileCall checks if the call is to regexp.MustCompile or regexp.Compile. |
There was a problem hiding this comment.
[/tdd] The logic correctly checks if we're inside a function, but consider adding a defensive check for the case where ident.Name could be from a different package with a similar structure:
func isRegexpCompileCall(call *ast.CallExpr) bool {
sel, ok := call.Fun.(*ast.SelectorExpr)
if !ok {
return false
}
ident, ok := sel.X.(*ast.Ident)
if !ok {
return false
}
// This is intentionally simple - only catches `regexp.X` calls
// Aliased imports (import re "regexp") are not detected (documented in ADR)
return ident.Name == "regexp" && (sel.Sel.Name == "MustCompile" || sel.Sel.Name == "Compile")
}Adding a comment here that explains the intentional limitation helps future maintainers understand why it's simple.
| @@ -0,0 +1,58 @@ | |||
| package regexpcompileinfunction | |||
|
|
|||
| import ( | |||
There was a problem hiding this comment.
[/tdd] Missing edge case: aliased imports. The ADR acknowledges this won't be caught, but the test fixture should validate this limitation explicitly:
import re "regexp"
// NOT FLAGGED (known limitation): aliased import
func AliasedImport(s string) bool {
pattern := re.MustCompile(`test`) // This won't trigger the linter
return pattern.MatchString(s)
}Documenting known limitations in tests prevents future contributors from trying to "fix" behavior that's intentionally out of scope.
| return results | ||
| } | ||
|
|
||
| // flagged: regexp.MustCompile in function literal |
There was a problem hiding this comment.
[/tdd] Missing edge case: deeply nested function literals. Add a test for nested closures to ensure the stack traversal works correctly:
// flagged: nested function literal (should still be caught)
func OuterFunc() func() func(string) bool {
return func() func(string) bool {
return func(s string) bool {
re := regexp.MustCompile(`nested`) // want `regexp compilation inside function`
return re.MatchString(s)
}
}
}This validates that isInsideFunction correctly walks the entire stack, not just the immediate parent.
🧪 Test Quality Sentinel ReportTest Quality Score: 100/100✅ Excellent test quality
Test Classification Details
Analysis SummaryThis PR adds a single Go analyzer test using the standard ✅ Behavioral Contract: The test verifies the linter's observable behavior by validating that it produces the correct diagnostics for regexp compilation patterns. ✅ Comprehensive Coverage: The testdata file includes:
✅ Edge Cases: Covers both ✅ No Test Inflation: Test file is only 16 lines compared to 84 lines of production code (0.19:1 ratio, well below the 2:1 threshold). The testdata file (58 lines) contains the actual test scenarios. ✅ Coding Guidelines: Has required build tag ( Language SupportTests analyzed:
Verdict
📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.
|
There was a problem hiding this comment.
✅ Test Quality Sentinel: 100/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). This is a textbook example of a high-quality Go analyzer test that enforces behavioral contracts and provides comprehensive coverage.
There was a problem hiding this comment.
Code Review Summary
Verdict: The linter implementation follows repository conventions well, but existing review comments identify important improvements needed before merging.
Key Strengths
✅ Architecture - Follows established pkg/linters/<name>/ pattern consistently
✅ Registration - Properly integrated in cmd/linters/main.go
✅ Test framework - Uses analysistest correctly
✅ Documentation - Comprehensive ADR with RFC 2119 spec
Critical Issues (Already Raised by Other Reviewers)
The 11 existing review comments comprehensively cover the main concerns:
1. Type Safety - Use type checking instead of string matching
Multiple reviewers noted that isRegexpCompileCall() uses simple identifier matching (ident.Name == "regexp"), which:
- Misses aliased imports (
import rx "regexp") - Can produce false positives if a local variable shadows the package name
- Should use
pass.TypesInfoto verify the actual package path
This is the most significant correctness issue.
2. False Positives - Only flag literal patterns
The linter currently flags ALL regexp compilation in functions, including:
- User input patterns:
regexp.Compile(userProvidedPattern) - Dynamic patterns from configuration
- One-time initialization patterns
Recommendation: Add isLiteralPattern() check to only flag cases with compile-time constant patterns.
3. Test Coverage - Missing critical edge cases
Test fixtures should include:
- Aliased imports (
import rx "regexp") - Test file exclusion verification (
*_test.go) - Init function behavior
- Nested closures
- Dot imports
Without these, the linter behavior is undocumented for common real-world scenarios.
Recommendation
The existing review feedback is thorough and actionable. Address those comments before merging to ensure:
- Correctness (type checking prevents false positives/negatives)
- Usability (literal pattern filtering reduces noise)
- Maintainability (edge case tests document behavior)
No additional comments needed - the existing reviewers have covered the critical issues comprehensively.
🔎 Code quality review by PR Code Quality Reviewer · ● 2.2M
|
@copilot review all comments and reviews |
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Addressed. I reviewed the feedback and updated the analyzer to report only when the regexp pattern is a compile-time constant string (literals/consts), so runtime-provided patterns are no longer flagged. I also added fixture coverage for a dynamic-pattern non-flag case and a const-identifier flagged case. Commits: |
Add regexp-compile-in-function linter
Summary
Adds a new static analysis linter that detects
regexp.Compileandregexp.MustCompilecalls inside function bodies and recommends moving them to package-level variables to avoid repeated compilation overhead.What Changed
Core Implementation
Registration & Integration
regexpcompileinfunctionanalyzer in the linter registry to make it available to the analysis pipeline.Documentation
Testing
analysistestframeworkImpact Assessment
Testing Evidence
analysistestframework for deterministic validationRelated Context
Commit history shows iterative refinement:
33a3b778a)e70a33825)2dd321d6e)fe0e986c2)c344fc3bb)Recommendation: ✅ Ready to merge - Well-structured linter addition with proper testing, documentation, and incremental refinement based on edge case discovery.