diff --git a/AGENTS.md b/AGENTS.md index 080c1128d..d306eeb1d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1001,7 +1001,7 @@ mock.module("./some-module", () => ({ * **DSN cache invalidation uses two-level mtime tracking (sourceMtimes + dirMtimes)**: \*\*DSN cache invalidation uses two-level mtime tracking\*\*: \`sourceMtimes\` (DSN-bearing files, catches in-place edits) + \`dirMtimes\` (every walked dir, catches new files) + root mtime fast-path + 24h TTL. Dropping either map is a correctness regression. Walker emits mtimes via \`onDirectoryVisit\` hook + \`recordMtimes\` option; DSN scanner uses \`grepFiles({pattern: DSN\_PATTERN, recordMtimes: true, onDirectoryVisit})\` as full-scan pipeline (~20% faster than old walkFiles path). \`scanCodeForFirstDsn\` stays on direct walker loop — worker-pool init cost (~20ms) dominates for single-DSN. \*\*sourceMtimes invariant\*\*: \`processMatch\` must record mtime for EVERY file containing a host-validated DSN, not just deduplicated new ones — track via \`fileHadValidDsn\` flag independent of \`seen.has(raw)\`. \*\*Error-path invariant\*\*: \`scanDirectory\` catch MUST return empty \`dirMtimes: {}\`, NOT partial map from callback (would silently bless unvisited dirs). \`ConfigError\` re-throws instead. -* **Grep worker pool: binary-transferable matches + streaming dispatch in src/lib/scan/**: \*\*Grep worker pool\*\* (\`src/lib/scan/worker-pool.ts\` + \`grep-worker.js\`): lazy singleton pool (size \`min(8, max(2, availableParallelism()))\`). Matches encoded as \`Uint32Array\` quads \`\[pathIdx, lineNum, lineOffset, lineLength]\` + single \`linePool\` string, transferred via \`postMessage(msg, \[ints.buffer])\` — cut grep pipeline ~40% by dropping structuredClone. Worker source lives in real \`grep-worker.js\` (plain JS, lintable), imported via \`with { type: "text" }\` → fed to \`Blob\` + \`URL.createObjectURL\`. \`new Worker(new URL('./w.ts', import.meta.url))\` HANGS in \`bun build --compile\` binaries (\`import.meta.url\` → \`/$bunfs/root/binary\`); string paths are CWD-relative and brittle. Worker source must be self-contained (no user imports); \`require('node:fs')\` works inside. \*\*FIFO handler queue per worker\*\* — fresh \`addEventListener\` per dispatch fires on every message, resolving wrong requests and causing hangs. \*\*Dead-worker guard\*\*: \`PooledWorker.alive\` flipped on \`error\`. Disable via \`SENTRY\_SCAN\_DISABLE\_WORKERS=1\`. +* **Grep worker pool: binary-transferable matches + streaming dispatch in src/lib/scan/**: \*\*Grep worker pool\*\* (\`src/lib/scan/worker-pool.ts\` + \`grep-worker.js\`): lazy singleton, size \`min(8, max(2, availableParallelism()))\`. Matches encoded as \`Uint32Array\` quads \`\[pathIdx, lineNum, lineOffset, lineLength]\` + \`linePool\` string, transferred via \`postMessage(msg, \[ints.buffer])\` — ~40% faster than structuredClone. Worker imported via \`with { type: "text" }\` → \`Blob\` + \`URL.createObjectURL\`; \`new Worker(new URL(...))\` HANGS in \`bun build --compile\` binaries. \*\*FIFO \`pending\` queue per worker\*\* — per-dispatch \`addEventListener\` causes wrong-request resolution. \*\*\`ref()\`/\`unref()\` are idempotent booleans, NOT refcounted\*\* — only unref when \`inflight\` drops to 0; workers spawn unref'd so idle pool doesn't block CLI exit. Readiness-failure handler must close over its own slot, not \`pop()\`. Disable via \`SENTRY\_SCAN\_DISABLE\_WORKERS=1\`. Surface pipeline failures: track \`dispatchedBatches\`/\`failedBatches\`, \`await Promise.allSettled(dispatchPromises)\` in \`finally\`; throw if all batches failed so DSN cache doesn't persist false-negatives. * **Input validation layer: src/lib/input-validation.ts guards CLI arg parsing**: CLI arg input validation (\`src/lib/input-validation.ts\`): Four validators — \`rejectControlChars\` (ASCII < 0x20), \`rejectPreEncoded\` (%XX), \`validateResourceId\` (rejects ?, #, %, whitespace), \`validateEndpoint\` (rejects \`..\` traversal). Applied in \`parseSlashOrgProject\`, \`parseOrgProjectArg\`, \`parseIssueArg\`, \`normalizeEndpoint\`. NOT applied in \`parseSlashSeparatedArg\` for plain IDs (may contain structural separators). Env vars and DB cache values are trusted. @@ -1013,7 +1013,7 @@ mock.module("./some-module", () => ({ * **Sentry SDK uses @sentry/node-core/light instead of @sentry/bun to avoid OTel overhead**: Sentry SDK uses \`@sentry/node-core/light\` instead of \`@sentry/bun\` to avoid OpenTelemetry overhead (~150ms, 24MB). \`@sentry/core\` barrel patched via \`bun patch\` to remove ~32 unused exports. Gotcha: \`LightNodeClient\` hardcodes \`runtime: { name: 'node' }\` AFTER spreading options — fix by patching \`client.getOptions().runtime\` post-init (mutable ref). Transport uses Node \`http\` instead of native \`fetch\`. Upstream: getsentry/sentry-javascript#19885, #19886. -* **src/lib/scan/ module: policy-free file walker with IgnoreStack**: \`src/lib/scan/\` — policy-free walker + grep/glob engine. Files: \`walker.ts\` (DFS async gen; \`minDepth\`/\`maxDepth\` cap DESCENT not yield; \`timeBudgetMs\`, \`AbortSignal\`, \`onDirectoryVisit\`, \`recordMtimes\`), \`ignore.ts\` (IgnoreStack two-tier \`#rootIg\`+\`#nestedByRelDir\`), \`binary.ts\` (8KB NUL sniff + ext fast-path), \`regex.ts\` (\`(?i)\`/\`(?im)\`/\`(?U)\` → JS flags), \`grep.ts\`/\`glob.ts\` (picomatch \`{dot:true}\`). DSN policy lives in \`src/lib/dsn/scan-options.ts\`. \`GrepOptions\` supports \`recordMtimes\` + \`onDirectoryVisit\` passthroughs — each \`GrepMatch\` gets optional \`mtime\` from walker's \`entry.mtime\`. Gotchas: (1) \`classifyFile\` short-circuits when \`cfg.extensions\` set. (2) \`buildWalkOptions\` MUST forward \`descentHook\`+\`alwaysSkipDirs\`+\`followSymlinks\`+\`recordMtimes\`+\`clock\`. (3) NEVER \`pLimit.clearQueue()\` (hangs); use \`onResult\` returning \`{done:true}\`. (4) \`path.join\`/\`relative\`/\`extname\` ~10× slower than manual string ops in hot loop — use \`abs.slice(cwdPrefixLen)\` and \`lastIndexOf('.')+slice+toLowerCase\`. (5) Scan module trusts \`opts.path\`; sandbox enforcement is caller's responsibility (e.g. init-wizard adapters must call \`safePath\` from \`init/tools/shared.ts\`). +* **src/lib/scan/ module: policy-free file walker with IgnoreStack**: \`src/lib/scan/\` — policy-free walker + grep/glob engine. Files: \`walker.ts\` (DFS async gen; \`minDepth\`/\`maxDepth\` cap DESCENT not yield; \`timeBudgetMs\`, \`AbortSignal\`, \`onDirectoryVisit\`, \`recordMtimes\`), \`ignore.ts\` (IgnoreStack two-tier \`#rootIg\`+\`#nestedByRelDir\`), \`binary.ts\` (8KB NUL sniff + ext fast-path), \`regex.ts\` (\`(?i)\`/\`(?im)\`/\`(?U)\` → JS flags), \`grep.ts\`/\`glob.ts\` (picomatch \`{dot:true}\`). DSN policy lives in \`src/lib/dsn/scan-options.ts\`. \*\*Unification ceiling\*\*: downward tree-walker only — NOT for upward walks (\`walk-up.ts\`), single-file reads, cache sweeps, or writes. Gotchas: \`classifyFile\` short-circuits when \`cfg.extensions\` set; \`buildWalkOptions\` MUST forward \`descentHook\`+\`alwaysSkipDirs\`+\`followSymlinks\`+\`recordMtimes\`+\`clock\`; NEVER \`pLimit.clearQueue()\` (hangs) — use \`onResult\` returning \`{done:true}\`; \`path.join\`/\`relative\`/\`extname\` ~10× slower than manual string ops in hot loop; scan module trusts \`opts.path\`, caller enforces sandbox. * **Telemetry opt-out is env-var-only — no persistent preference or DO\_NOT\_TRACK**: Telemetry opt-out priority: (1) \`SENTRY\_CLI\_NO\_TELEMETRY=1\`, (2) \`DO\_NOT\_TRACK=1\`, (3) \`metadata.defaults.telemetry\`, (4) default on. DB read try/catch wrapped (runs before DB init). Schema v13 merged \`defaults\` table into \`metadata\` KV with keys \`defaults.{org,project,telemetry,url}\`; getters/setters in \`src/lib/db/defaults.ts\`. \`sentry cli defaults\` uses variadic \`\[key, value?]\`: no args → show all; 1 arg → show key; 2 args → set; \`--clear\` without args → clear all (guarded); \`--clear key\` → clear specific. \`computeTelemetryEffective()\` returns resolved source for display. @@ -1034,11 +1034,11 @@ mock.module("./some-module", () => ({ * **AuthError constructor takes reason first, message second**: \`AuthError(reason: AuthErrorReason, message?: string)\` where \`AuthErrorReason\` is \`"not\_authenticated" | "expired" | "invalid"\`. Easy to accidentally swap args as \`new AuthError("Token expired", "expired")\` — the string \`"Token expired"\` gets assigned as \`reason\` (invalid enum value). Tests aren't type-checked (tsconfig excludes them), so TypeScript won't catch this. Correct: \`new AuthError("expired", "Token expired")\`. Default messages exist for each reason, so the second arg is often unnecessary. - -* **esbuild doesn't support \`with { type: "text" }\` — needs a plugin**: esbuild errors on Bun's \`import x from "./f.js" with { type: "text" }\` attribute ("Importing with a type attribute of \\"text\\" is not supported"). Our build pipeline is esbuild → bin.js → Bun compile, so esbuild sees the import first. Fix: \`script/text-import-plugin.ts\` — tiny esbuild plugin that intercepts imports with \`pluginData.with?.type === "text"\` (via \`onResolve\`/\`onLoad\`), reads the file, and emits \`export default "\"\`. Shared between \`script/build.ts\` (CLI binary) and \`script/bundle.ts\` (npm library). Bun runtime (dev, \`bun test\`) handles the attribute natively, so no plugin needed there. TypeScript also doesn't model the attribute — cast \`import x from "./f.js" with {...}\` through \`as unknown as string\`. + +* **DEFAULT\_SKIP\_DIRS prunes dist/build/.next — narrow for build-output scans**: \*\*scan module walker gotchas\*\*: (1) \`DEFAULT\_SKIP\_DIRS\` in \`src/lib/scan/ignore.ts\` prunes broadly: \`node\_modules\`, VCS, plus \`dist\`, \`build\`, \`target\`, \`.next\`, \`.nuxt\`, \`.output\`, \`vendor\`, \`.gradle\`, \`.bundle\`, \`coverage\`, \`.cache\`, \`.turbo\`. When walker starts AT \`dist/\`, skip list doesn't apply to cwd itself; recursing from parent prunes. For \`sourcemap/inject.ts\`, pass narrowed \`alwaysSkipDirs: \["node\_modules"]\` + \`respectGitignore: false\` to scan build outputs. (2) When \`extensions\` is set, walker skips binary NUL-sniff entirely. (3) walkFiles migration checklist — verify 5 knobs: \`respectGitignore\` (default true), \`hidden\` (default true, include), \`alwaysSkipDirs\` (default broad), \`extensions\` (skips NUL-sniff), \`followSymlinks\` (default false). DFS order differs from files-first-then-dirs — verify no ordering contract at call sites. -* **mapFilesConcurrent skips null but not empty arrays — callers must return null for no-op**: Scan/formatter off-by-ones: (1) \`mapFilesConcurrent\` in \`src/lib/scan/concurrent.ts\` filters \`null\` but NOT empty arrays — fires \`onResult\` per empty result. Callers like \`processEntry\` in \`dsn/code-scanner.ts\` must return \`null\` (not \`\[]\`) for no-op files; on 10k-file walks ~99% yield empties. Stream variant filters both. (2) \`collectGlob\`/\`collectGrep\` must NOT forward \`maxResults\` to the underlying iterator — collector drains uncapped, sets \`truncated=true\` on overshoot. Forwarding makes iterator stop at exactly N without \`truncated\`. (3) \`filterFields\` in \`formatters/json.ts\` uses dot-notation — property tests must use \`\[a-zA-Z0-9\_]\` charset to avoid ambiguous keys with dots. +* **mapFilesConcurrent skips null but not empty arrays — callers must return null for no-op**: Scan/formatter off-by-ones: (1) \`mapFilesConcurrent\` in \`src/lib/scan/concurrent.ts\` filters \`null\` but NOT empty arrays — fires \`onResult\` per empty result. Callers (e.g. \`processEntry\` in \`dsn/code-scanner.ts\`) must return \`null\` (not \`\[]\`) for no-op files (~99% of 10k-file walks). Stream variant filters both. (2) \`collectGlob\`/\`collectGrep\` must NOT forward \`maxResults\` to underlying iterator — collector drains uncapped, sets \`truncated=true\` on overshoot; forwarding stops iterator at exactly N without \`truncated\`. (3) \`filterFields\` in \`formatters/json.ts\` uses dot-notation — property tests must use \`\[a-zA-Z0-9\_]\` charset to avoid ambiguous keys with dots. * **process.stdin.isTTY unreliable in Bun — use isatty(0) and backfill for clack**: \`process.stdin.isTTY\` unreliable in Bun — use \`isatty(0)\` from \`node:tty\`. Bun's single-file binary can leave \`process.stdin.isTTY === undefined\` on TTY fds inherited via redirects like \`exec … \ ({ * **test:unit glob only picks up test/lib, test/commands, test/types, test/isolated**: \`bun test\` script globs in \`package.json\` are narrow: \`test:unit\` = \`test/lib test/commands test/types\`, \`test:isolated\` = \`test/isolated\`, \`test:e2e\` = \`test/e2e\`. Tests placed under \`test/fixtures/\`, \`test/scripts/\`, or \`test/script/\` are NOT picked up by any standard CI script despite being valid Bun test files. Place new tests under \`test/lib/\/\` to inherit coverage. Note: \`test/script/\` (singular) exists with script tests but is also outside the globs. -* **Whole-buffer matchAll slower than split+test when aggregated over many files**: grep perf traps in \`src/lib/scan/grep.ts::readAndGrep\`: (1) Whole-buffer \`regex.exec\` is 12× faster per-file but ~1.6× SLOWER than \`split('\n')+regex.test\` aggregated over 10k files — match-emission dominates. Early-exit at \`maxResults\` via \`mapFilesConcurrent.onResult\` is what wins. (2) Literal prefilter is FILE-LEVEL gate only (\`indexOf\` → skip), then whole-buffer \`regex.exec\`. Per-line-verify broke cross-newline patterns and Unicode length-changing \`toLowerCase\` (Turkish \`İ\`→\`i̇\`). Extractor handles multi-char escapes via \`escapeSequenceLength\`; \`hasTopLevelAlternation\`+\`skipGroup\` must call \`skipCharacterClass\` (PCRE \`\[]abc]\` ≠ JS). (3) \`GrepOptions.multiline\` defaults to \`true\` (rg/git-grep); \`compilePattern\`'s default stays \`false\`. (4) Wake-latch race: \`let notify=null; await new Promise(r=>notify=r)\` loses signals if producer calls \`wakeConsumer()\` before assignment — use a latched \`pendingWake\` flag. +* **Whole-buffer matchAll slower than split+test when aggregated over many files**: grep perf/correctness traps in \`src/lib/scan/grep.ts\`: (1) Whole-buffer \`regex.exec\` 12× faster per-file but ~1.6× SLOWER than \`split('\n')+test\` aggregated over 10k files — early-exit at \`maxResults\` via \`mapFilesConcurrent.onResult\` wins. (2) Literal prefilter is FILE-LEVEL gate (\`indexOf\` → skip), then whole-buffer exec. Per-line-verify breaks cross-newline patterns and Unicode length-changing \`toLowerCase\` (Turkish \`İ\`→\`i̇\`). Extractor handles multi-char escapes via \`escapeSequenceLength\`; \`hasTopLevelAlternation\`+\`skipGroup\` must call \`skipCharacterClass\` (PCRE \`\[]abc]\` ≠ JS empty class). (3) \`GrepOptions.multiline\` defaults \`true\` (rg/git-grep); \`compilePattern\` default stays \`false\`. (4) Wake-latch race: \`let notify=null; await new Promise(r=>notify=r)\` loses signals if producer wakes before assignment — use latched \`pendingWake\` flag. ### Pattern @@ -1063,12 +1063,9 @@ mock.module("./some-module", () => ({ * **Event view cross-org fallback chain: project → org → all orgs**: Event view cross-org fallback chain: \`fetchEventWithContext\` in \`src/commands/event/view.ts\` tries (1) \`getEvent(org, project, eventId)\`, (2) \`resolveEventInOrg(org, eventId)\`, (3) \`findEventAcrossOrgs(eventId, { excludeOrgs })\`. Extracted into \`tryEventFallbacks()\` for Biome complexity limit. \`excludeOrgs\` only set when same-org search returned null (not on transient error). Both catch blocks re-throw \`AuthError\` while swallowing transient errors. Warning distinguishes same-org/different-project from different-org via \`crossOrg.org === org\`. - -* **Extract shared pipeline setup between streaming and collecting variants of same operation**: When a module exposes both a streaming (\`AsyncGenerator\`) and collecting (\`Promise\\`) variant of the same pipeline (e.g., \`grepFiles\` + \`collectGrep\` in \`src/lib/scan/grep.ts\`), extract a shared setup helper (e.g., \`setupGrepPipeline\`) that handles pattern compilation, matcher compilation, option defaulting, walker construction, and filter wiring. Otherwise divergence between the two setup paths becomes a silent correctness bug. The collecting variant typically adds a \`+1 probe\` pattern over the streaming variant's \`maxResults\` to distinguish 'exactly N' from 'N+ available'. - * **PR review workflow: Cursor Bugbot + Seer + human cycle**: PR review workflow (Cursor Bugbot + Seer + human): \`gh pr checks \ --json state,link\` + \`gh run view --log-failed\` for CI. Unresolved threads via \`gh api graphql\` with \`reviewThreads\` query filtering \`isResolved:false\`+\`isMinimized:false\`. After fixes: reply via \`gh api repos/.../pulls/comments/\/replies\` then resolve via \`resolveReviewThread\` mutation. Bots auto-resolve on detected fix. Statuses: \`UNSTABLE\`=non-blocking bots running, \`BLOCKED\`=required CI pending, \`CLEAN\`+\`MERGEABLE\`=ready. Repo is squash-merge only. Expect 4-6 rounds on subtle regex/Unicode PRs. Bugbot findings usually real but occasionally assume PCRE/Python semantics (e.g. \`\[]abc]\` is class-with-\`]\` in PCRE but empty class in JS) — verify with reproduction. Dispatch self-review subagent between rounds. -* **Worker bodies as real .js files via text-import, not inline strings**: Prefer real \`.js\` files imported as text over inline template-literal worker sources. Inline strings are invisible to Biome/TypeScript — lint bugs (\`noIncrementDecrement\`, \`useBlockStatements\`, \`useTemplate\`, unused vars) go undetected. Pattern: write worker as plain JS (\`grep-worker.js\`, CJS-style \`require("node:fs")\` to avoid top-level-await parsing), import via \`with { type: "text" }\`, feed to \`Blob\` + \`URL.createObjectURL\`. Works in \`bun run\`, \`bun test\`, and compiled binaries (with the esbuild text-import plugin). Trade-off: no TS types in the worker body — keep message protocol types on main-thread side (\`worker-pool.ts\`). Must be self-contained (no user imports — worker module registry won't have them). +* **Worker bodies as real .js files via text-import, not inline strings**: \*\*Worker bodies as real .js files via text-import\*\*: Prefer real \`.js\` files imported as text over inline template-literal worker sources — inline strings invisible to Biome/TypeScript. Pattern: write worker as plain JS (CJS-style \`require("node:fs")\` to avoid TLA parsing), import via \`with { type: "text" }\`, feed to \`Blob\` + \`URL.createObjectURL\`. Must be self-contained (no user imports). \*\*esbuild doesn't support \`with { type: "text" }\`\*\* — fix: \`script/text-import-plugin.ts\` intercepts via \`onResolve\`/\`onLoad\`, reads file, emits \`export default "\"\`. Shared between \`script/build.ts\` and \`script/bundle.ts\`. Bun runtime handles natively. TypeScript doesn't model the attribute — cast through \`as unknown as string\`. diff --git a/src/lib/sourcemap/inject.ts b/src/lib/sourcemap/inject.ts index 03fa29602..476605b55 100644 --- a/src/lib/sourcemap/inject.ts +++ b/src/lib/sourcemap/inject.ts @@ -5,8 +5,9 @@ * then injects Sentry debug IDs into each pair. */ -import { readdir, readFile, stat } from "node:fs/promises"; -import { extname, join } from "node:path"; +import { readFile, stat } from "node:fs/promises"; +import { resolve as resolvePath } from "node:path"; +import { walkFiles } from "../scan/index.js"; import { EXISTING_DEBUGID_RE, injectDebugId } from "./debug-id.js"; /** Default JavaScript file extensions to scan. */ @@ -92,28 +93,46 @@ async function findCompanionMap(jsPath: string): Promise { /** * Recursively discover JS files with companion .map files. + * + * Uses the shared `walkFiles` engine from `src/lib/scan/` for + * directory traversal. Sourcemap injection targets build output + * directories — so we: + * + * - Disable `respectGitignore` (build outputs like `dist/` are + * typically gitignored; the walker would otherwise prune them). + * - Skip dotfiles + `node_modules` only (via `SOURCEMAP_SKIP_DIRS`); + * the walker's `DEFAULT_SKIP_DIRS` is too broad — it prunes + * `dist`/`build`/`.next` etc., which are exactly the dirs users + * want to scan into. + * - Disable the `maxFileSize` cap. The walker defaults to 256 KB, + * but webpack / rollup / Next.js bundles routinely exceed that. + * The old hand-rolled `readdir` loop had no size limit; silently + * dropping large JS files would skip debug-ID injection on the + * exact bundles users care about most. */ +const SOURCEMAP_SKIP_DIRS: readonly string[] = ["node_modules"]; + async function discoverFilePairs( dir: string, extensions: Set ): Promise { + // `walkFiles` requires an absolute cwd. CLI callers pass + // user-supplied positional args like `./dist` directly through to + // `injectDirectory`, so we resolve here rather than push the + // requirement up to every caller. + const absDir = resolvePath(dir); const pairs: FilePair[] = []; - const entries = await readdir(dir, { withFileTypes: true }); - - for (const entry of entries) { - const fullPath = join(dir, entry.name); - if (entry.isDirectory()) { - // Skip node_modules and hidden directories - if (entry.name === "node_modules" || entry.name.startsWith(".")) { - continue; - } - const subPairs = await discoverFilePairs(fullPath, extensions); - pairs.push(...subPairs); - } else if (entry.isFile() && extensions.has(extname(entry.name))) { - const mapPath = await findCompanionMap(fullPath); - if (mapPath) { - pairs.push({ jsPath: fullPath, mapPath }); - } + for await (const entry of walkFiles({ + cwd: absDir, + extensions, + alwaysSkipDirs: SOURCEMAP_SKIP_DIRS, + hidden: false, + respectGitignore: false, + maxFileSize: Number.POSITIVE_INFINITY, + })) { + const mapPath = await findCompanionMap(entry.absolutePath); + if (mapPath) { + pairs.push({ jsPath: entry.absolutePath, mapPath }); } } return pairs; diff --git a/test/lib/sourcemap/inject.test.ts b/test/lib/sourcemap/inject.test.ts new file mode 100644 index 000000000..00b3c1896 --- /dev/null +++ b/test/lib/sourcemap/inject.test.ts @@ -0,0 +1,187 @@ +/** + * Tests for directory-level debug-ID injection. Covers the discovery + * walk (used to be hand-rolled, now delegates to `walkFiles`) — + * specifically the skip policy for `node_modules` / dotfiles, the + * `.gitignore` bypass for build-output dirs, and the extension + * filter. + */ + +import { afterEach, beforeEach, describe, expect, test } from "bun:test"; +import { + mkdirSync, + mkdtempSync, + rmSync, + symlinkSync, + writeFileSync, +} from "node:fs"; +import { tmpdir } from "node:os"; +import { join } from "node:path"; +import { injectDirectory } from "../../../src/lib/sourcemap/inject.js"; + +describe("injectDirectory — discovery", () => { + let dir: string; + + beforeEach(() => { + dir = mkdtempSync(join(tmpdir(), "sentry-inject-")); + }); + + afterEach(() => { + rmSync(dir, { recursive: true, force: true }); + }); + + /** Write a .js + .js.map pair at `rel` inside `dir`. */ + function writePair(rel: string): void { + const full = join(dir, rel); + mkdirSync(join(full, ".."), { recursive: true }); + writeFileSync(full, `// ${rel}\n`); + writeFileSync(`${full}.map`, "{}\n"); + } + + test("discovers .js pairs in nested dirs", async () => { + writePair("app.js"); + writePair("a/nested.js"); + writePair("a/b/deep.js"); + + const results = await injectDirectory(dir, { dryRun: true }); + const paths = results.map((r) => r.jsPath.slice(dir.length + 1)).sort(); + expect(paths).toEqual(["a/b/deep.js", "a/nested.js", "app.js"]); + }); + + test("skips .js files without a companion .map", async () => { + writePair("withmap.js"); + writeFileSync(join(dir, "orphan.js"), "// orphan\n"); + + const results = await injectDirectory(dir, { dryRun: true }); + const paths = results.map((r) => r.jsPath.slice(dir.length + 1)); + expect(paths).toEqual(["withmap.js"]); + }); + + test("discovers .cjs and .mjs files by default", async () => { + writePair("a.js"); + writePair("b.cjs"); + writePair("c.mjs"); + + const results = await injectDirectory(dir, { dryRun: true }); + const paths = results.map((r) => r.jsPath.slice(dir.length + 1)).sort(); + expect(paths).toEqual(["a.js", "b.cjs", "c.mjs"]); + }); + + test("respects custom extensions", async () => { + writePair("a.js"); + writePair("b.ts"); + + const results = await injectDirectory(dir, { + dryRun: true, + extensions: ["ts"], + }); + const paths = results.map((r) => r.jsPath.slice(dir.length + 1)); + expect(paths).toEqual(["b.ts"]); + }); + + test("skips node_modules", async () => { + writePair("app.js"); + writePair("node_modules/foo/lib.js"); + + const results = await injectDirectory(dir, { dryRun: true }); + const paths = results.map((r) => r.jsPath.slice(dir.length + 1)); + expect(paths).toEqual(["app.js"]); + }); + + test("skips hidden (dot-prefixed) directories", async () => { + writePair("app.js"); + writePair(".cache/cached.js"); + writePair(".git/hooks/script.js"); + + const results = await injectDirectory(dir, { dryRun: true }); + const paths = results.map((r) => r.jsPath.slice(dir.length + 1)); + expect(paths).toEqual(["app.js"]); + }); + + test("ignores .gitignore — build-output dirs are always scanned", async () => { + // Typical build setup: `dist/` is gitignored but contains the + // files we want to inject into. + writeFileSync(join(dir, ".gitignore"), "dist/\nbuild/\n"); + writePair("src/a.js"); + writePair("dist/bundle.js"); + writePair("build/out.js"); + + const results = await injectDirectory(dir, { dryRun: true }); + const paths = results.map((r) => r.jsPath.slice(dir.length + 1)).sort(); + expect(paths).toEqual(["build/out.js", "dist/bundle.js", "src/a.js"]); + }); + + test("scans a directory that's itself named like a gitignore target", async () => { + // User passes `dist/` directly as the scan root. The default + // skip list in `scan/` includes "dist" — we explicitly narrow + // it to `["node_modules"]` for this use case. + writePair("bundle.js"); + writePair("chunks/one.js"); + + const results = await injectDirectory(dir, { dryRun: true }); + const paths = results.map((r) => r.jsPath.slice(dir.length + 1)).sort(); + expect(paths).toEqual(["bundle.js", "chunks/one.js"]); + }); + + test("does not follow symlinks", async () => { + // Default: symlinks are ignored (matches pre-refactor behavior). + writePair("real.js"); + const realDir = join(dir, "src"); + const linkDir = join(dir, "link"); + mkdirSync(realDir, { recursive: true }); + writePair("src/x.js"); + try { + symlinkSync(realDir, linkDir, "dir"); + } catch { + // Some filesystems (e.g. Windows without dev mode) can't + // create symlinks — skip this assertion in that case. + return; + } + const results = await injectDirectory(dir, { dryRun: true }); + const paths = results.map((r) => r.jsPath.slice(dir.length + 1)).sort(); + // `real.js` + `src/x.js` should be discovered; `link/x.js` must NOT. + expect(paths).toEqual(["real.js", "src/x.js"]); + }); + + test("returns empty for missing directory", async () => { + const results = await injectDirectory(join(dir, "does-not-exist"), { + dryRun: true, + }); + expect(results).toEqual([]); + }); + + test("accepts relative paths (not just absolute)", async () => { + // Regression: `walkFiles` enforces absolute cwd and throws on + // relative input. CLI callers (`sourcemap inject ./dist`) pass + // the user-supplied arg straight through, so the adapter must + // resolve it to absolute itself. + writePair("app.js"); + const originalCwd = process.cwd(); + process.chdir(dir); + try { + for (const relDir of ["./", ".", "./."]) { + const results = await injectDirectory(relDir, { dryRun: true }); + expect(results).toHaveLength(1); + // The jsPath must still be absolute — consumers expect + // absolute paths for downstream file ops. + expect(results[0]?.jsPath).toMatch(/^\//); + } + } finally { + process.chdir(originalCwd); + } + }); + + test("discovers large JS bundles (> walker's default 256 KB)", async () => { + // Regression: `walkFiles` defaults to `maxFileSize: 256 KB`, + // which silently skipped any `.js` file larger than that — + // i.e. every real-world webpack/rollup/Next.js bundle. The + // adapter must opt out of the size cap. + const bundlePath = join(dir, "bundle.js"); + // 512 KB of filler — exceeds the walker's default 256 KB cap. + writeFileSync(bundlePath, "x".repeat(512 * 1024)); + writeFileSync(`${bundlePath}.map`, "{}\n"); + + const results = await injectDirectory(dir, { dryRun: true }); + const paths = results.map((r) => r.jsPath.slice(dir.length + 1)); + expect(paths).toEqual(["bundle.js"]); + }); +});