Skip to content

Add unit tests, security hardening, and error logging improvements#43

Open
huangyingting wants to merge 1 commit into
mainfrom
huangyingting/improve-security-testing-errors
Open

Add unit tests, security hardening, and error logging improvements#43
huangyingting wants to merge 1 commit into
mainfrom
huangyingting/improve-security-testing-errors

Conversation

@huangyingting
Copy link
Copy Markdown
Collaborator

Summary

High-priority improvements identified during a full project review:

1. Unit Tests (new)

  • Added test suite using Node.js built-in node:test runner — no new dependencies needed
  • test/utils.test.mjs — covers slugify, normalizeCompanyName, normalizeDomain, canonicalSourceUrl, companySlugFromRunId, runDateFromRunId, isRunId, researchCacheDir, hasText, parseDate, registrableDomain, collectClaimRefs, normalizeRevision
  • test/fetch-url.test.mjs — covers canonicalCacheKey, looksLikePdfBuffer, looksLikeBotChallenge, waybackUrl, readerUrl, htmlToText, cleanExtractedText, extractTitle, stripWaybackToolbar, registrableDomain
  • Added npm run test to root package.json

2. Security Hardening

  • URL scheme whitelist: canonicalSourceUrl() now rejects javascript:, data:, file: and other non-http(s) schemes (returns '')
  • SSRF protection: fetchUrl() validates URL scheme before making any network request
  • Memory exhaustion guard: fetchUrl() enforces a 50 MB max response body size (MAX_BODY_BYTES) — checks both Content-Length header and actual body

3. Error Handling

  • Replaced silent catch {}return null/false patterns with console.warn() logging so failures are diagnosable:
    • check-reports.mjs: folderDigest() and loadCache() now log the specific error
    • utils.mjs: listDirs() logs which path failed to stat
    • fetch.mjs: readCache() and loadHostStrategies() log cache read failures

4. DX

  • Added .nvmrc pinning Node 22 for consistent environments

Testing

Run: npm run test

- Add test suite using Node.js built-in test runner (test/utils.test.mjs,
  test/fetch-url.test.mjs) covering utils functions, URL handling, caching,
  bot detection, HTML extraction, and more.
- Add 'npm run test' script to root package.json.
- Security: reject non-http(s) URL schemes in canonicalSourceUrl() to prevent
  javascript:/data:/file: injection.
- Security: add URL scheme validation in fetchUrl() to block SSRF via
  disallowed protocols.
- Security: enforce MAX_BODY_BYTES (50 MB) on fetch responses to prevent
  memory exhaustion from oversized downloads.
- Error handling: replace silent catch blocks with console.warn() logging in
  check-reports.mjs (folderDigest, loadCache), utils.mjs (listDirs), and
  fetch.mjs (readCache, loadHostStrategies) so failures are diagnosable.
- Add .nvmrc pinning Node 22.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants