Skip to content

fix(cache): handle ValueError from parsed.port on Windows file:// URLs#1446

Merged
danielmeppiel merged 2 commits into
mainfrom
danielmeppiel/fix-windows-file-url-port-valueerror
May 21, 2026
Merged

fix(cache): handle ValueError from parsed.port on Windows file:// URLs#1446
danielmeppiel merged 2 commits into
mainfrom
danielmeppiel/fix-windows-file-url-port-valueerror

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

Problem

Windows CI (run 26252988143) failed with:

ValueError: Port could not be cast to integer value as '\\Users\\runneradmin\\AppData\\Local\\Temp\\...\\bare.git'

at src/apm_cli/cache/url_normalize.py:86 (port = parsed.port).

Root cause: On Windows, urlparse("file://C:\\path\\bare.git") places C:\path... into the netloc. Python's parsed.port property splits the netloc on : and interprets the drive-letter colon as a host:port separator -- raising ValueError because the 'port' isn't a digit. This crashed cache_shard_key() for every test that constructed a file://{tmp_path} URL (e.g. TestGetCheckoutLayout.test_full_variant_layout).

Linux/macOS were unaffected because their tmp paths start with /, producing an empty netloc.

Fix

Wrap hostname/username/port extraction in try/except ValueError. On failure, fall back to the raw lowercased netloc as the authority. This keeps cache-key derivation:

  • Non-raising -- never crashes on malformed netlocs
  • Deterministic -- same URL always yields same shard
  • Per-URL distinct -- different Windows paths (C:\repo_a.git vs C:\repo_b.git) still get different shards (the prior trivial fix collapsed them all to file://c)

Tests

Added two regression tests in tests/unit/cache/test_url_normalize.py:

  • test_windows_file_url_does_not_raise -- verifies no ValueError
  • test_windows_file_urls_distinct_paths_distinct_keys -- verifies shard-key distinctness

Validation

  • uv run --extra dev pytest tests/unit/cache/ -q -> 44 passed
  • uv run --extra dev ruff check src/ tests/ -> All checks passed
  • uv run --extra dev ruff format --check src/ tests/ -> 1046 files already formatted
  • uv run --extra dev python -m pylint --disable=all --enable=R0801 --min-similarity-lines=10 --fail-on=R0801 src/apm_cli/ -> 10.00/10
  • bash scripts/lint-auth-signals.sh -> auth-signal lint clean

On Windows, file://C:\path\bare.git parses with netloc 'C:\path...'
and the drive-letter colon is interpreted as host:port. Accessing
parsed.port raises ValueError, crashing cache_shard_key in any test
that constructs file:// URLs from tmp_path.

Catch ValueError and fall back to the raw lowercased netloc as the
authority, preserving per-URL distinctness for cache shard keys.

Adds regression tests for the Windows file:// case.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 21, 2026 21:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Windows CI failures in the cache URL normalizer by preventing urllib.parse from raising ValueError when encountering malformed file://C:\... URLs (where the drive-letter colon is misinterpreted as a host:port separator).

Changes:

  • Wrap parsed.hostname / parsed.port extraction in try/except ValueError and fall back to a deterministic authority derived from parsed.netloc.
  • Add unit regression tests to ensure Windows-style file://C:\... inputs do not raise and produce distinct cache shard keys for distinct paths.
Show a summary per file
File Description
tests/unit/cache/test_url_normalize.py Adds regression tests for Windows-style malformed file:// URLs to ensure shard-key derivation doesn’t raise and remains distinct per path.
src/apm_cli/cache/url_normalize.py Catches ValueError from urlparse port/hostname parsing and falls back to raw netloc to keep cache-key derivation deterministic on Windows.

Copilot's findings

  • Files reviewed: 2/2 changed files
  • Comments generated: 1

Comment thread src/apm_cli/cache/url_normalize.py Outdated
Comment on lines +83 to +90
# host:port. Fall back to the raw netloc (lowercased) so cache-key
# derivation stays deterministic and per-URL distinct without raising.
try:
hostname = (parsed.hostname or "").lower()
username = parsed.username or ""
port = parsed.port
except ValueError:
authority = parsed.netloc.lower()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch -- fixed in 2573439. The fallback now splits userinfo on @, drops everything after : in the user portion, then lowercases, mirroring Step 4 in the happy path. Added test_fallback_strips_password_from_netloc to lock it in.

Address Copilot review feedback on #1446. The except-ValueError
fallback path previously did 'authority = parsed.netloc.lower()',
which retained userinfo passwords (e.g. user:token@host:badport)
and violated documented Step 4 (strip password). That would bake
credentials into cache keys and risk leaking them via any caller
that logs normalize_repo_url() output.

Split on '@', drop everything after ':' in the userinfo, then
lowercase -- mirroring the happy-path behaviour.

Adds regression test test_fallback_strips_password_from_netloc.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel merged commit 4430711 into main May 21, 2026
17 checks passed
@danielmeppiel danielmeppiel deleted the danielmeppiel/fix-windows-file-url-port-valueerror branch May 21, 2026 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants