Skip to content

fix: marketplace build respects GITHUB_HOST for GHE repos#1009

Merged
danielmeppiel merged 5 commits into
mainfrom
fix/1008-marketplace-build-ghe
Apr 28, 2026
Merged

fix: marketplace build respects GITHUB_HOST for GHE repos#1009
danielmeppiel merged 5 commits into
mainfrom
fix/1008-marketplace-build-ghe

Conversation

@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator

@sergio-sisternes-epam sergio-sisternes-epam commented Apr 27, 2026

Description

apm marketplace build hardcoded github.com in four places, so GITHUB_HOST had no effect on ref resolution, token lookup, or metadata fetch. This PR threads the existing default_host() / build_https_clone_url() / AuthResolver pattern (already used by apm install) through the marketplace build pipeline, and decouples auth from marketplace generation by reusing existing resolution infrastructure.

Fixes #1008
Related: #1010 (ADO marketplace support -- not covered here; URL parsing accepts ADO forms but downstream resolution still uses GITHUB_HOST)

Changes

Phase A -- Bug fix (commit 11a9d27)

ref_resolver.py -- RefResolver accepts an optional host parameter (defaults to GITHUB_HOST or github.com). Both list_remote_refs() and resolve_ref_sha() use build_https_clone_url() instead of hardcoded github.com.

builder.py -- MarketplaceBuilder stores a normalised host and HostInfo:

  • _resolve_github_token() resolves against the configured host, not "github.com"
  • _fetch_remote_metadata() uses the GitHub REST API for GHES/GHE Cloud (since raw.githubusercontent.com is github.com-only), skips metadata for non-GitHub hosts, and short-circuits tokenless GHE Cloud requests
  • AuthResolver import moved to top of try block to fix a scoping issue when auth_resolver is pre-injected

Phase B -- Resolution decoupling (commit 239064d)

ref_resolver.py -- RefResolver accepts an optional token parameter. When set, git ls-remote uses authenticated URLs (x-access-token), so private GHES repos work without separate git credential setup.

builder.py -- Extracted lazy _ensure_auth() method with _auth_resolved sentinel for true idempotency (including offline mode). Called from _get_resolver() so both resolve() and build() benefit from authenticated git ls-remote. Resolver is eagerly initialised before the thread pool to prevent a race condition. Fixed _host_info type annotation (Optional["HostInfo"] with TYPE_CHECKING guard).

resolver.py -- _resolve_url_source() now delegates to DependencyReference.parse() instead of hardcoding github.com prefix matching. This reuses the existing resolution infrastructure (as suggested by @danielmeppiel) and gives marketplace type: url sources broader URL form acceptance. Note: the URL's host is not preserved -- downstream resolution uses the configured GITHUB_HOST. True cross-host resolution is tracked in #1010.

Review feedback (commits 51a1760, 94fe220)

Addressed findings from both the APM Review Panel and Copilot code review:

  • Added _auth_resolved sentinel to _ensure_auth() for true idempotency (panel + Copilot)
  • Fixed offline branch to also set sentinel (Copilot)
  • Clarified _resolve_url_source() docstring: host is not preserved (panel)
  • Added test documenting host-is-ignored behaviour for non-GitHub URLs (panel)
  • Split CHANGELOG entry into two bullets; reworded to clarify GITHUB_HOST drives resolution (panel + Copilot)
  • Updated marketplace-authoring docs to warn against cross-host URL reliance (Copilot)

Tests and docs

  • 22 new tests covering GHE host resolution, metadata fetch paths, token injection, lazy auth, cross-source URL resolution, and host-is-ignored behaviour
  • Updated CHANGELOG, marketplace-authoring guide, and apm-usage skill resource

Type of change

  • Bug fix
  • New feature (cross-source URL parsing in marketplace)
  • Documentation
  • Maintenance / refactor (auth decoupling)

Testing

  • Tested locally
  • All existing tests pass (6,649 passed)
  • Added tests for new functionality (22 new tests)

Thread the existing default_host() / build_https_clone_url() / AuthResolver
pattern (used by apm install) through the marketplace build pipeline.

Changes:
- RefResolver: accept optional host parameter, use build_https_clone_url()
  instead of hardcoded github.com for git ls-remote URLs
- MarketplaceBuilder: resolve tokens against configured host, use REST API
  for metadata fetch on GHES/GHE Cloud (raw.githubusercontent.com is
  github.com-only), skip metadata for non-GitHub hosts
- Fix AuthResolver import scoping so classify_host() works when
  auth_resolver is pre-injected
- Add GHE Cloud early-exit when no token (avoids pointless 401)

Tests:
- Update URL assertions to use urlparse (test convention)
- Add 4 RefResolver GHE host tests
- Add 3 metadata fetch path tests (GHES REST API, non-GitHub skip,
  GHE Cloud no-token skip)
- Add builder host env test

Docs:
- CHANGELOG: Fixed entry under [Unreleased]
- marketplace-authoring guide: GHES section
- apm-usage authentication skill: marketplace build example

Closes #1008

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam sergio-sisternes-epam added the panel-review Trigger the apm-review-panel gh-aw workflow label Apr 27, 2026
@github-actions
Copy link
Copy Markdown

APM Review Panel Verdict

Disposition: APPROVE (two minor pre-merge suggestions; neither is a blocker)


Per-persona findings

Python Architect:

This is a routine bug-fix PR: two existing classes (MarketplaceBuilder, RefResolver) receive a host parameter; no new abstractions, no hierarchy changes. One class diagram + one flow diagram applies.

OO / class diagram

classDiagram
    direction LR
    class MarketplaceBuilder {
        <<Builder>>
        +_host: str
        +_host_info: Optional[object]
        +_github_token: Optional[str]
        +_get_resolver() RefResolver
        +_resolve_github_token() Optional[str]
        +_fetch_remote_metadata(pkg) Optional[Dict]
        +build() BuildResult
    }
    class RefResolver {
        <<Service>>
        +_host: str
        +list_remote_refs(owner_repo) List[RemoteRef]
        +resolve_ref_sha(owner_repo, ref) str
    }
    class AuthResolver {
        <<Strategy>>
        +classify_host(host) HostInfo
        +resolve(host) AuthContext
    }
    class HostInfo {
        <<ValueObject>>
        +kind: str
        +api_base: str
    }
    class AuthContext {
        <<ValueObject>>
        +token: str
        +source: str
    }
    class default_host {
        <<Pure>>
        +default_host() str
        +build_https_clone_url(host, repo) str
    }
    class ResolvedPackage {
        <<ValueObject>>
        +source_repo: str
        +sha: str
        +subdir: Optional[str]
    }
    MarketplaceBuilder *-- RefResolver : creates lazily
    MarketplaceBuilder ..> AuthResolver : classify_host and resolve
    MarketplaceBuilder ..> HostInfo : stores as _host_info
    MarketplaceBuilder ..> ResolvedPackage : reads in _fetch_remote_metadata
    MarketplaceBuilder ..> default_host : reads host at init
    RefResolver ..> default_host : reads host at init
    AuthResolver ..> HostInfo : returns
    AuthResolver ..> AuthContext : returns
    class MarketplaceBuilder:::touched
    class RefResolver:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading

Execution flow diagram

flowchart TD
    A["apm marketplace build\ncli.py"] --> B["MarketplaceBuilder.__init__()\nbuilder.py\n_host = default_host() or 'github.com'"]
    B --> C["_prefetch_metadata(resolved)\nbuilder.py:589"]
    C --> D["_resolve_github_token()\nbuilder.py:547\nsets _host_info AND _github_token"]
    D --> E["[NET] AuthResolver.classify_host(self._host)\nsrc/apm_cli/core/auth.py:134\nreturns HostInfo(kind, api_base)"]
    D --> F["[NET] resolver.resolve(self._host)\nreturns AuthContext.token"]
    F --> G["pool.submit(_fetch_remote_metadata, pkg)\nbuilder.py:553\nfor each resolved package"]
    G --> H{"host_kind?"}
    H -->|"not github/ghe_cloud/ghes"| I["logger.debug skip\nreturn None"]
    H -->|"ghe_cloud and no token"| J["logger.debug skip\nreturn None"]
    H -->|"self._host == 'github.com'"| K["[NET] raw.githubusercontent.com\n/{source_repo}/{sha}/{path}/apm.yml\nurllib.request.urlopen"]
    H -->|"ghes or ghe_cloud+token"| L["[NET] {api_base}/repos/{source_repo}\n/contents/{file}?ref={sha}\nAccept: application/vnd.github.raw\nurllib.request.urlopen"]
    K --> M["yaml.safe_load(raw) -> dict"]
    L --> M
    M --> N["return metadata dict"]
Loading

Design patterns

  • Used in this PR: Lazy initialization -- _host_info is populated as a side effect of _resolve_github_token() (called once before the thread pool in _prefetch_metadata()). _get_resolver() already used this pattern; _host_info extends it consistently.
  • Pragmatic suggestion: Move AuthResolver.classify_host(self._host) into __init__() (directly after self._host is set) rather than as a side effect of _resolve_github_token(). classify_host is a pure, cheap operation and its placement inside a method named "resolve token" is surprising. No new abstraction needed -- one line moved up. This eliminates the Optional[object] None-guard in _fetch_remote_metadata and makes the class invariant explicit: _host_info is always populated after __init__.

CLI Logging Expert: All new log calls use logger.debug() at library layer -- correct. No _rich_* or CommandLogger calls introduced in builder or resolver. The two new debug messages ("Skipping metadata fetch for %s (non-GitHub host: %s)" and "Skipping metadata fetch for %s (GHE Cloud requires auth)") follow the "named thing, reason" style and pass the verbose-mode "So What?" test. No user-visible output changes. No issues.


DevX UX Expert: This is a silent behavior fix -- no new flags, no command surface changes. The key UX property preserved: GITHUB_HOST set once, all apm commands obey it. Previously apm install respected it but apm marketplace build did not; now the mental model is consistent. The marketplace-authoring.md addition is clean: 2-line runnable example, cross-link to the authentication docs. The apm-usage/authentication.md skill update is correctly scoped (one line showing apm marketplace build uses the same env convention). No new flags to document in cli-commands.md (no surface change). No blocking issues.


Supply Chain Security Expert: Reviewed against the threat model:

  • Identity: Metadata URLs use pkg.source_repo and pkg.sha from the already-resolved ResolvedPackage (sourced from the lockfile). The SHA-pinning integrity model is untouched.
  • Integrity: apm.yml metadata is informational enrichment only; it does not affect install integrity decisions.
  • Token scope: resolver.resolve(self._host) routes through AuthResolver for the configured host -- correct. Token appears only in the Authorization header, not in any URL or log line.
  • Fail closed: Non-GitHub hosts return None (skip); GHE Cloud without a token returns None (skip). Both fail closed without error -- appropriate since metadata is optional enrichment.
  • api_base fallback construction: f"https://{self._host}/api/v3" when api_base is not set. self._host comes from os.environ.get("GITHUB_HOST", "github.com"). This is not a path traversal vector (network, not filesystem). No path security guard is needed here.
  • One observation: the _host_info is None fallback in _fetch_remote_metadata defaults host_kind to "github", meaning a non-GitHub host could reach the URL-construction branch if called out of sequence. In the production path this cannot happen (the call order is guaranteed via _prefetch_metadata). Moving classify_host to __init__ (per the Python Architect suggestion) would make this invariant structural rather than relying on call order.

No new supply-chain surface opened.


Auth Expert: Activated -- the PR changes resolver.resolve("github.com") to resolver.resolve(self._host) and introduces AuthResolver.classify_host(self._host).

  • Token resolution: Correct fix. Previously token resolution always happened against github.com even with GITHUB_HOST=corp.ghe.com. Now it routes through the full precedence chain (GITHUB_APM_PAT_{ORG} -> GITHUB_APM_PAT -> GITHUB_TOKEN -> GH_TOKEN -> git credential fill) for the configured host, consistent with resolve_for_dep().
  • AuthResolver import scoping: The lazy from ..core.auth import AuthResolver was moved to the top of the try block. This is the correct fix for the scoping issue when auth_resolver is pre-injected -- classify_host() can now be called regardless of whether a custom resolver was provided.
  • Thread safety: _host_info is set with if self._host_info is None: guard. Since _resolve_github_token() is called from the main thread before executor.submit(), there is no TOCTOU risk in the current implementation.
  • One concern (minor): self._host_info: Optional[object] should be Optional["HostInfo"] (with a TYPE_CHECKING import). The weak object type means type checkers cannot catch incorrect attribute access on this field. Suggested fix: add from typing import TYPE_CHECKING block with if TYPE_CHECKING: from ..core.auth import HostInfo and annotate _host_info: Optional["HostInfo"].
  • AuthResolver precedence invariant: Unchanged. The precedence diagram in docs/getting-started/authentication.md does not need updating -- the PR adds a call site, not a new strategy.

OSS Growth Hacker: This fix completes APM's GHES story: apm install already respected GITHUB_HOST; now apm marketplace build does too. For enterprise teams that use APM on GHES, this removes a silent failure mode that was invisible until build time. The CHANGELOG entry is clean and story-shaped. The doc addition gives enterprise users a self-contained 2-line recipe. Side-channel to CEO: once merged, this is worth a release-note beat that frames APM's complete GHE support posture -- "APM now fully supports GitHub Enterprise across install and marketplace build workflows" -- with a concrete GITHUB_HOST example. This is directly on the enterprise conversion surface and should be included in the next v0.10.x or v0.11.0 release announcement.


CEO arbitration

Specialists agree: this is a correct, well-tested, well-documented bug fix from an external contributor. The two minor suggestions (move classify_host to __init__, fix Optional[object] type annotation) are non-blocking quality improvements. Neither changes behavior in the production path -- _prefetch_metadata() guarantees _resolve_github_token() runs before _fetch_remote_metadata() -- but both make the code easier to reason about and type-safe. The PR is in draft; the author should address these before marking ready. No specialist disagreements to arbitrate. The Growth Hacker's release-beat note is filed for the maintainer's release planning (not a merge gate). Disposition: APPROVE pending the two suggestions below.


Required actions before merge

  1. Move classify_host to __init__ (src/apm_cli/marketplace/builder.py): After self._host: str = default_host() or "github.com" (line ~154), add self._host_info = AuthResolver.classify_host(self._host) (requires the lazy import to become an eager import, or use TYPE_CHECKING for the type hint and keep the lazy import). Remove the if self._host_info is None: guard in _resolve_github_token() and the if self._host_info else "github" fallback in _fetch_remote_metadata(). This makes the class invariant explicit and eliminates silent fallback behavior.

  2. Fix _host_info type annotation (src/apm_cli/marketplace/builder.py, line ~155): Change self._host_info: Optional[object] = None to self._host_info: Optional["HostInfo"] = None with a TYPE_CHECKING guard: if TYPE_CHECKING: from ..core.auth import HostInfo. This is a one-line change and ensures type checkers can validate _host_info.kind and _host_info.api_base access.


Optional follow-ups

  • Once merged, include a release-note beat for the next release that tells the complete GHE support story across apm install and apm marketplace build -- the Growth Hacker flags this as a concrete enterprise conversion surface.
  • Future: if a third host-routing branch is added to _fetch_remote_metadata (e.g., Bitbucket Server), consider a small HostMetadataStrategy abstraction; at 3 branches inline is still appropriate.
  • The _resolve_github_token() method now does two things (token resolution + host classification). If it grows further, consider splitting into _init_host_info() and _resolve_github_token(). Not needed now.

Generated by PR Review Panel for issue #1009 · ● 641.3K ·

…yReference for URL sources

Phase B of #1008 -- decouples authentication from marketplace generation
and reuses existing resolution infrastructure for cross-source compatibility.

Changes:
- RefResolver: accept optional token for authenticated git ls-remote
- Builder: extract lazy _ensure_auth() called from _get_resolver() so
  both resolve() and build() benefit from authenticated ls-remote
- Builder: eagerly init resolver before thread pool (race prevention)
- Builder: fix _host_info type annotation (Optional["HostInfo"] with
  TYPE_CHECKING guard)
- resolver.py: _resolve_url_source() now delegates to
  DependencyReference.parse() -- accepts any valid Git URL (GitHub,
  GHES, GitLab, Bitbucket, ADO, SSH) instead of github.com only
- 13 new tests covering token injection, lazy auth, and cross-source
  URL resolution

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam sergio-sisternes-epam added panel-review Trigger the apm-review-panel gh-aw workflow and removed panel-review Trigger the apm-review-panel gh-aw workflow labels Apr 28, 2026
@github-actions
Copy link
Copy Markdown

APM Review Panel Verdict

Disposition: APPROVE (with one recommended docstring fix and one follow-up on URL-source semantics)


Per-persona findings

Python Architect:

This is a focused, well-scoped fix. Two commits: one that threads GITHUB_HOST through the marketplace build pipeline, and one that extracts lazy auth and delegates URL-source parsing to DependencyReference.parse(). Neither constitutes a major architectural shift (no new base classes, protocols, or registries), so two diagrams apply.

1. OO / class diagram

classDiagram
    direction LR

    class MarketplaceBuilder {
        <<Builder>>
        -_host str
        -_host_info HostInfo
        -_github_token Optional[str]
        -_resolver Optional[RefResolver]
        -_auth_resolver Optional[AuthResolver]
        +_ensure_auth() None
        +_get_resolver() RefResolver
        +_resolve_github_token() Optional[str]
        +_fetch_remote_metadata(pkg) Optional[dict]
        +resolve() ResolveResult
        +build() BuildReport
    }

    class RefResolver {
        <<IOBoundary>>
        -_host str
        -_token Optional[str]
        -_cache RefCache
        -_lock threading.Lock
        +list_remote_refs(owner_repo) List[RemoteRef]
        +resolve_ref_sha(owner_repo, ref) str
    }

    class AuthResolver {
        <<Strategy>>
        +resolve(host) AuthContext
        +classify_host(host) HostInfo
    }

    class HostInfo {
        <<ValueObject>>
        +kind str
        +api_base Optional[str]
    }

    class AuthContext {
        <<ValueObject>>
        +token Optional[str]
        +source str
    }

    class DependencyReference {
        <<ValueObject>>
        +repo_url str
        +reference Optional[str]
        +is_local bool
        +parse(url) DependencyReference
    }

    class _resolve_url_source {
        <<Pure>>
        accepts any Git URL via DependencyReference
    }

    MarketplaceBuilder *-- RefResolver : lazily creates
    MarketplaceBuilder ..> AuthResolver : calls resolve(host)
    MarketplaceBuilder ..> HostInfo : reads kind, api_base
    AuthResolver ..> HostInfo : returns from classify_host()
    AuthResolver ..> AuthContext : returns from resolve()
    _resolve_url_source ..> DependencyReference : delegates parse()
    RefResolver ..> AuthContext : uses token field

    class MarketplaceBuilder:::touched
    class RefResolver:::touched
    class _resolve_url_source:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading

2. Execution flow diagram

flowchart TD
    A["apm marketplace build\n(cli.py)"] --> B["MarketplaceBuilder.build()"]
    B --> C["MarketplaceBuilder.resolve()"]
    C --> D["_get_resolver()\n[TOUCHED]"]
    D --> E["_ensure_auth()\n[TOUCHED]"]
    E --> F{"_github_token\nalready set?"}
    F -- yes --> G["return early"]
    F -- no --> H["_resolve_github_token()"]
    H --> I["[NET] AuthResolver.classify_host(host)\nset _host_info"]
    I --> J["[NET] AuthResolver.resolve(host)\ntoken precedence chain"]
    J --> K["self._github_token = token or None"]
    K --> L["RefResolver(host=, token=)\n[TOUCHED]"]
    L --> M["ThreadPoolExecutor spawned\nfor each package entry"]
    M --> N["RefResolver.list_remote_refs(owner_repo)"]
    N --> O["build_https_clone_url(host, owner_repo, token=)\nproduces x-access-token URL or plain URL"]
    O --> P["[EXEC] subprocess: git ls-remote\nGIT_TERMINAL_PROMPT=0, GIT_ASKPASS=echo"]
    P --> Q["ref + sha resolved"]
    B --> R["_prefetch_metadata(resolved)"]
    R --> S["_ensure_auth() again\n(idempotent if token set;\nre-calls _resolve_github_token if None)"]
    S --> T{"host_kind?"}
    T -- github.com --> U["[NET] urllib: raw.githubusercontent.com"]
    T -- ghes / ghe_cloud --> V["[NET] urllib: {api_base}/repos/.../contents/...\nAccept: application/vnd.github.raw"]
    T -- generic --> W["skip metadata enrichment"]
    T -- ghe_cloud + no token --> X["skip metadata enrichment"]
Loading

Design patterns

  • Used in this PR: Lazy Initialization -- _ensure_auth() + _get_resolver() each guard with "already set?" short-circuits; race prevented by pre-calling _get_resolver() before ThreadPoolExecutor in resolve(). Adapter -- _resolve_url_source() delegates to DependencyReference.parse(), normalising any Git URL to owner/repo[#ref] (visible as <<Pure>> + <<ValueObject>> in diagram).
  • Pragmatic suggestion: none -- the current shape is the simplest correct design at this scope. A sentinel value (_UNRESOLVED = object()) could make _ensure_auth() fully idempotent in the no-token case, but it adds complexity that is not justified until there is evidence of repeated AuthResolver.resolve() calls in hot paths.

Minor architecture note: _ensure_auth() docstring says "Short-circuits when already resolved" but when no token is available, self._github_token remains None so the method re-invokes _resolve_github_token() on every subsequent call. In practice both call sites (inside _get_resolver() and _prefetch_metadata()) are pre-pool so the redundancy is harmless, but the docstring promises more than is delivered. Fixing the docstring (or the guard) is the one recommended action below.


CLI Logging Expert: No concerns. All new diagnostic output routes through logger.debug() -- never through _rich_* helpers directly. Token values are stripped from stderr via _redact_token() before appearing in GitLsRemoteError.hint. The GHE-skip debug messages follow the "name the thing" rule ("Skipping metadata fetch for {pkg.name} (non-GitHub host: {host})"). Nothing in this PR changes the user-visible output path; the happy-path surface is identical before and after.


DevX UX Expert: Transparent to the user. A marketplace author on a GHE instance sets GITHUB_HOST (the same env var they already set for apm install) and apm marketplace build now just works. No new flags, no new error messages, no command-surface change. The GHES section added to marketplace-authoring.md is appropriately concise. The type: url acceptance expansion quietly removes a friction point for authors using mixed-host environments. No UX regressions.


Supply Chain Security Expert: Three surfaces reviewed.

  1. Token in subprocess URL: build_https_clone_url(..., token=token) embeds the PAT as x-access-token:{token}@host`` in the URL passed to git ls-remote. The docstring on `build_https_clone_url` warns callers not to log the raw URL; `_redact_token()` in `_git_utils.py` redacts the `(token/redacted)@` pattern in all stderr before it reaches error messages. `GIT_TERMINAL_PROMPT=0` / `GIT_ASKPASS=echo` are set. No credential leakage path identified.

  2. DependencyReference.parse() scope expansion: _resolve_url_source() previously rejected non-GitHub URLs (raised ValueError). It now accepts any URL DependencyReference.parse() can handle (GitLab, Bitbucket, ADO, SSH). The resulting owner/repo is then resolved against the configured GitHub host (self._host), not the URL's own host. This means a GitLab URL silently resolves to owner/repo against github.com -- the expansion does not introduce a new network destination (the configured host is always used). The is_local guard prevents path traversal via ./ or ../ sources. No new attack surface identified; the semantic gap (accepting a GitLab URL but hitting GitHub) is a UX confusion issue, not a security one.

  3. Thread safety of auth state: _github_token is set before thread pool creation and read-only inside workers. _host_info is set inside _resolve_github_token() which is called before the pool. No lock needed and none is missing. Safe.


Auth Expert: Activated (fallback self-check: YES -- the PR changes how tokens are injected into git ls-remote URLs, how AuthResolver.classify_host() is used to determine host kind, and changes resolver.resolve("github.com") to resolver.resolve(self._host), which directly changes credential resolution semantics).

The key fix -- changing from the hardcoded resolver.resolve("github.com") to resolver.resolve(self._host) -- is correct and mirrors the pattern used throughout apm install. Token precedence chain (GITHUB_APM_PAT_{ORG} -> GITHUB_APM_PAT -> GITHUB_TOKEN -> GH_TOKEN -> git credential fill) is fully preserved because the fix routes through AuthResolver.resolve() rather than bypassing it.

The x-access-token format used by build_https_clone_url is compatible with GHES and GHE Cloud. The GHE Cloud + no-token -> skip metadata guard is correct: GHE Cloud has no public API surface and a 401 on every package would be wasteful and noisy.

One idempotency gap (noted by Python Architect): when _resolve_github_token() returns None (no credentials found), _ensure_auth() will re-invoke _resolve_github_token() on every subsequent call because self._github_token is not None never becomes true. This results in repeated AuthResolver.resolve() calls in the no-auth path. It is harmless (all calls are pre-pool, AuthResolver.resolve() is cheap for env-var resolution) but the docstring is misleading. Recommend either a sentinel guard or a docstring correction.

No auth precedence regression. No credential leakage path. No new os.getenv() bypasses of AuthResolver.


OSS Growth Hacker: This is an enterprise unlock. Marketplace authoring previously required a github.com-accessible network path; GHE/GHES users were silently blocked. The fix uses the same GITHUB_HOST env var that GHE users already know from apm install, meaning zero new concepts for the target audience.

Side-channel to CEO: The type: url acceptance expansion (GitLab/Bitbucket/ADO URLs now accepted) is a quiet capability increase that may deserve a sentence in the release note -- it reinforces APM's "works with any Git host" positioning. The CHANGELOG entry is dense; consider splitting into two bullets (GHE host fix / URL-source expansion) in the release narrative to maximize two distinct story beats.

The marketplace-authoring.md GHES section is appropriately minimal. No quickstart changes needed; the GITHUB_HOST pattern is already documented in the auth guide.


CEO arbitration

Specialists are in agreement: this is a correct, well-tested fix with no regressions. The core change -- routing AuthResolver.resolve(self._host) instead of the hardcoded resolve("github.com") -- mirrors the pattern used everywhere else in the codebase and closes a gap that blocked enterprise marketplace authoring. The _ensure_auth() idempotency gap is the only item worth resolving before merge; it is a docstring accuracy issue (the method does not short-circuit in the no-token case), not a correctness bug. The DependencyReference URL-source expansion is net-positive: it removed a github.com-only restriction without introducing a new attack surface. The semantic note (GitLab URLs are accepted but resolved against the configured GitHub host) belongs in a follow-up issue, not as a blocker. Ratified: APPROVE with one pre-merge fix.


Required actions before merge

  1. _ensure_auth() docstring accuracy (src/apm_cli/marketplace/builder.py, _ensure_auth method): The docstring says "Short-circuits when already resolved" but the guard if self._github_token is not None: return does not short-circuit when token resolution returned None (no credentials found). Either update the docstring to "Short-circuits when token is already set to a non-None value" or add a sentinel to make the method truly idempotent. The sentinel approach:
    # In __init__:
    self._auth_resolved: bool = False
    # In _ensure_auth:
    if self._auth_resolved:
        return
    ...
    self._auth_resolved = True
    Either fix is acceptable; the docstring-only fix is lower risk.

Optional follow-ups

  • URL-source host semantics (src/apm_cli/marketplace/resolver.py): _resolve_url_source() now accepts GitLab/Bitbucket/ADO URLs and normalises them to owner/repo, but the downstream RefResolver resolves that owner/repo against the configured GitHub host -- not the URL's own host. The docstring currently says "any valid Git URL (GitHub, GHES, GitLab, Bitbucket, ADO, SSH) is accepted" which implies cross-host resolution that is not implemented. A follow-up issue should clarify intended semantics and either document the limitation or implement true cross-host support.
  • test_non_github_url test intent shift: The old test asserted that GitLab URLs raised ValueError; the new test asserts they resolve to "owner/repo". The test comment ("DependencyReference.parse() handles any valid Git host URL") is accurate but worth pairing with a test that documents the host-is-ignored behavior explicitly, so future contributors don't accidentally implement cross-host resolution that breaks the owner/repo assumption.

Generated by PR Review Panel for issue #1009 · ● 1.2M ·

- Add _auth_resolved sentinel to _ensure_auth() for true idempotency
- Clarify _resolve_url_source() docstring: host is not preserved (#1010)
- Split CHANGELOG #1008 entry into GHE fix + URL-source expansion
- Add test documenting host-is-ignored behaviour for non-GitHub URLs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator Author

Review Panel Findings -- Addressed

All findings from both panel reviews have been addressed in 51a17603:

Required fixes

Finding Status Details
_ensure_auth() idempotency gap Fixed Added _auth_resolved: bool sentinel -- method now short-circuits even when no token was found. Docstring updated to match.
_host_info: Optional[object] type Already done Fixed in Phase B commit (239064d1) with TYPE_CHECKING guard.

Optional fixes (also implemented)

Finding Status Details
_resolve_url_source() docstring overpromises Fixed Clarified that URL host is not preserved; downstream uses configured GITHUB_HOST. Cross-ref to #1010.
Test for host-is-ignored behaviour Added test_url_host_is_not_preserved_in_output -- 4 hosts (github.com, gitlab.com, bitbucket.org, corp.ghe.com) all resolve to same owner/repo.
CHANGELOG bullet split Done Split into two distinct story beats: GHE host fix + URL-source expansion.

Validation

  • 6,649 unit tests pass (16.3s)
  • ASCII compliance verified on all changed lines

@sergio-sisternes-epam sergio-sisternes-epam marked this pull request as ready for review April 28, 2026 10:12
Copilot AI review requested due to automatic review settings April 28, 2026 10:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes apm marketplace build to respect GITHUB_HOST (consistent with the install/auth infrastructure) so GHES/GHE Cloud repos can be resolved/authenticated correctly, and expands marketplace type: url parsing by delegating to DependencyReference.parse().

Changes:

  • Thread default_host() / build_https_clone_url() + host/token handling through marketplace ref resolution (git ls-remote) and metadata fetching.
  • Add lazy auth resolution (_ensure_auth) so both resolve() and build() use authenticated ref resolution when available.
  • Update docs/changelog and add unit tests for GHE host behavior and URL parsing.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/apm_cli/marketplace/ref_resolver.py Adds host + optional token support; builds git ls-remote URLs via shared host utilities.
src/apm_cli/marketplace/builder.py Uses default host, lazy auth resolution, and host-aware metadata fetch (raw CDN vs REST API).
src/apm_cli/marketplace/resolver.py Delegates type: url resolution to DependencyReference.parse() instead of github.com-only matching.
tests/unit/marketplace/test_ref_resolver.py Adds URL parsing assertions + GHE host/token-in-URL coverage.
tests/unit/marketplace/test_marketplace_resolver.py Adds tests for URL-source parsing behavior (including host stripping).
tests/unit/marketplace/test_builder.py Adds tests for host-kind branching in _fetch_remote_metadata() and _ensure_auth() behavior.
tests/unit/commands/test_marketplace_build.py Confirms GITHUB_HOST is respected by MarketplaceBuilder.
docs/src/content/docs/guides/marketplace-authoring.md Documents GHES usage for marketplace build.
packages/apm-guide/.apm/skills/apm-usage/authentication.md Updates auth guide to mention marketplace build respects GITHUB_HOST.
CHANGELOG.md Adds Unreleased Fixed entries for GHE host support and URL parsing behavior.

Comment thread CHANGELOG.md Outdated
Comment thread src/apm_cli/marketplace/resolver.py
Comment thread src/apm_cli/marketplace/builder.py Outdated
Comment thread docs/src/content/docs/guides/marketplace-authoring.md Outdated
- Fix _ensure_auth() offline branch to set _auth_resolved sentinel
- Clarify CHANGELOG and docs: URL host is not preserved, GITHUB_HOST required
- Update marketplace-authoring.md to warn against cross-host URL reliance

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel added panel-review Trigger the apm-review-panel gh-aw workflow and removed panel-review Trigger the apm-review-panel gh-aw workflow labels Apr 28, 2026
@github-actions
Copy link
Copy Markdown

APM Review Panel Verdict

Disposition: APPROVE (with one optional follow-up noted below)


Per-persona findings

Python Architect:

The PR is a well-scoped bug fix that decouples host-awareness from the marketplace build pipeline. Three files are modified in the problem space: MarketplaceBuilder (builder.py), RefResolver (ref_resolver.py), and _resolve_url_source (resolver.py). The changes are internally consistent and proportional to the problem.

1. OO / Class Diagram

classDiagram
    direction LR
    class MarketplaceBuilder {
        <<Service>>
        -_host str
        -_host_info Optional[HostInfo]
        -_auth_resolved bool
        -_github_token Optional[str]
        -_resolver Optional[RefResolver]
        +build() MarketplaceOutput
        +resolve() list
        -_ensure_auth() None
        -_get_resolver() RefResolver
        -_resolve_github_token() Optional[str]
        -_fetch_remote_metadata(pkg) Optional[dict]
    }
    class RefResolver {
        <<Service>>
        -_host str
        -_token Optional[str]
        -_cache RefCache
        -_lock Lock
        +list_remote_refs(owner_repo) List[RemoteRef]
        +resolve_ref_sha(owner_repo, ref) str
    }
    class AuthResolver {
        <<Strategy>>
        +classify_host(host) HostInfo
        +resolve(host) AuthContext
    }
    class HostInfo {
        <<ValueObject>>
        +kind str
        +api_base Optional[str]
    }
    class AuthContext {
        <<ValueObject>>
        +token str
        +source str
    }
    class DependencyReference {
        <<ValueObject>>
        +repo_url str
        +reference Optional[str]
        +is_local bool
        +parse(url) DependencyReference
    }
    class github_host_utils {
        <<Module>>
        +default_host() Optional[str]
        +build_https_clone_url(host, repo, token) str
    }
    class resolver_module {
        <<Module>>
        +_resolve_url_source(source) str
    }
    class MarketplaceBuilder:::touched
    class RefResolver:::touched
    class resolver_module:::touched
    MarketplaceBuilder *-- RefResolver : creates and owns
    MarketplaceBuilder ..> AuthResolver : token and host classification
    MarketplaceBuilder ..> HostInfo : branches on kind (github/ghes/ghe_cloud/generic)
    MarketplaceBuilder ..> github_host_utils : default_host()
    RefResolver ..> github_host_utils : build_https_clone_url()
    AuthResolver ..> HostInfo : returns
    AuthResolver ..> AuthContext : returns
    resolver_module ..> DependencyReference : delegates URL parsing
    note for MarketplaceBuilder "Lazy init: _ensure_auth() is idempotent\n_auth_resolved flag prevents re-entry\n_host_info set as side-effect in _resolve_github_token()"
    classDef touched fill:#fff3b0,stroke:#d47600
Loading

2. Execution Flow Diagram

flowchart TD
    A["apm marketplace build (cli.py)"] --> B["MarketplaceBuilder.build()"]
    B --> C["_get_resolver() [eager, pre-thread-pool]"]
    C --> D["_ensure_auth()"]
    D --> E{_auth_resolved?}
    E -- yes --> F["return (idempotent)"]
    E -- no --> G{offline mode?}
    G -- yes --> H["_auth_resolved = True, token = None"]
    G -- no --> I["_resolve_github_token()"]
    I --> J["[NET] AuthResolver.classify_host(self._host) -> HostInfo"]
    J --> K["[NET] AuthResolver.resolve(self._host) -> AuthContext"]
    K --> L["self._github_token = ctx.token\nself._host_info = HostInfo\n_auth_resolved = True"]
    L --> M["RefResolver(host=self._host, token=self._token)"]
    M --> N["ThreadPoolExecutor: _resolve_references()"]
    N --> O["[EXEC] RefResolver.list_remote_refs(owner_repo)"]
    O --> P["build_https_clone_url(host, owner_repo, token)"]
    P --> Q["[EXEC] git ls-remote --tags --heads url.git\nGIT_TERMINAL_PROMPT=0, GIT_ASKPASS=echo"]
    B --> R["_prefetch_metadata(resolved)"]
    R --> S["_ensure_auth() (idempotent)"]
    R --> T["ThreadPoolExecutor: _fetch_remote_metadata(pkg)"]
    T --> U{host_info.kind?}
    U -- generic --> V["return None (skip, no HTTP)"]
    U -- ghe_cloud no token --> W["return None (skip, no HTTP)"]
    U -- github.com --> X["[NET] urllib GET raw.githubusercontent.com/repo/sha/apm.yml\nAuthorization: token ..."]
    U -- ghes/ghe_cloud with token --> Y["[NET] urllib GET api_base/repos/repo/contents/apm.yml?ref=sha\nAccept: application/vnd.github.raw\nAuthorization: token ..."]
    X --> Z["yaml.safe_load(raw) -> dict"]
    Y --> Z
Loading

3. Design patterns

Design patterns

  • Used in this PR: Lazy init with idempotency flag (_ensure_auth() + _auth_resolved) -- prevents re-entrant auth resolution across _get_resolver() and _prefetch_metadata() call sites; shown as note for MarketplaceBuilder in the class diagram above.
  • Used in this PR: Adapter -- _resolve_url_source() now delegates to DependencyReference.parse() rather than reimplementing URL parsing, eliminating the github.com-only hard-coding.
  • Pragmatic suggestion: The host-dispatch logic in _fetch_remote_metadata() (if/elif on host_kind) could eventually be a small Strategy object (MetadataFetcher per host kind), but only if a third host type with distinct behavior emerges. At current scope, the if/elif is the simplest correct design.

One code smell (not blocking): _host_info is set as a side effect inside _resolve_github_token(). If _resolve_github_token() raises and the exception is caught inside the try/except-all, _host_info may remain None; _fetch_remote_metadata() defensively falls back to host_kind = "github" in that case, so the failure mode is a graceful degradation to the github.com CDN path rather than a crash. The smell is the implicit coupling, not a correctness bug.


CLI Logging Expert: No output path changes. The PR adds only logger.debug() calls with %s-placeholder format strings, consistent with the established pattern. Debug messages correctly use pkg.name and self._host for concrete context ("Skipping metadata fetch for %s (non-GitHub host: %s)"). No _rich_* helpers or CommandLogger phases are touched. Clean.


DevX UX Expert: No CLI surface changes -- no new flags, no new commands, no help text changes. The fix is transparent for github.com users. For GHES users, the pattern is export GITHUB_HOST=corp.ghe.com && apm marketplace build, which is identical to the existing apm install pattern -- zero new mental model required.

The docs addition in marketplace-authoring.md is accurate, concise, and includes a working example. The caveat that type: url host is ignored (with #1010 forward reference) is the right level of honesty -- it prevents users from assuming cross-host resolution works when it does not. The authentication skill resource (packages/apm-guide/.apm/skills/apm-usage/authentication.md) is updated in the same PR per Rule 4.

One note: the cli-commands.md reference doc is not updated, but marketplace build already exists there with no GHE-specific flag -- so no update is required.


Supply Chain Security Expert: No new security surface introduced.

  1. Token in git ls-remote URL: build_https_clone_url() embeds x-access-token:{token}@ -- this is the pre-existing pattern used across APM for git operations. GIT_TERMINAL_PROMPT=0 + GIT_ASKPASS=echo prevent interactive prompt fallbacks. stderr is passed through _redact_token() before any error logging (ref_resolver.py lines 236, 252, 313, 329). The github_host.py docstring explicitly notes "callers must avoid logging raw token-bearing URLs" and this caller does not log the URL.

  2. GHES REST API path construction: {api_base}/repos/{pkg.source_repo}/contents/{file_path}?ref={sha}. source_repo is the owner/repo coordinate from the marketplace.yml, validated upstream. file_path is {subdir}/apm.yml where subdir comes from the resolved package -- not from user URL input. No path traversal surface.

  3. _resolve_url_source host stripping: Non-github.com URLs now parse to owner/repo and resolve against GITHUB_HOST. This is documented and tracked in feat: ADO marketplace support (marketplace.yml with Azure DevOps repos) #1010. Not a security regression: the owner/repo coordinate goes through the same auth and integrity pipeline as a github-source entry. The risk is user misconfiguration (wrong host), not adversarial injection.

  4. Fail-closed behavior: Non-GitHub host -> metadata skipped (not fetched with no auth). GHE Cloud without token -> metadata skipped. Auth exception -> token None, debug-logged, not raised. These are all correct graceful-degradation paths that fail to "no enrichment" rather than to a security bypass.


Auth Expert: Activated -- the PR changes token resolution from hardcoded github.com to self._host, uses AuthResolver.classify_host(), and injects the resolved token into RefResolver.

Token resolution chain: AuthResolver.resolve(self._host) follows the documented precedence (GITHUB_APM_PAT_{ORG} -> GITHUB_APM_PAT -> GITHUB_TOKEN -> GH_TOKEN -> git credential fill). Previously this chain was called against "github.com" regardless of GITHUB_HOST, causing GHES tokens to be missed. The fix is correct.

Thread safety: _ensure_auth() is called before ThreadPoolExecutor spawns workers (eagerly via _get_resolver() at line 396, confirmed in the diff). _auth_resolved, _github_token, and _host_info are all set before workers read them. AuthContext is frozen. Consistent with the Auth Expert guidance.

Offline mode: _ensure_auth() short-circuits with _auth_resolved = True and _github_token = None. Offline builds do not attempt network auth. Correct.

Side-effect coupling (minor): self._host_info = AuthResolver.classify_host(self._host) is set inside _resolve_github_token() rather than in _ensure_auth() directly. The defensive fallback in _fetch_remote_metadata() (host_kind = self._host_info.kind if self._host_info else "github") handles the None case, so the failure mode is graceful. This is a mild design smell -- not a correctness bug.

ADO / non-GitHub hosts: _resolve_url_source() now parses ADO-style URLs via DependencyReference.parse(), extracting owner/repo and resolving against GITHUB_HOST. The documented limitation (cross-host resolution deferred to #1010) is appropriate.

No regressions to AuthResolver precedence, host classification, or credential leakage surface.


OSS Growth Hacker: This fix closes a gap that blocked enterprise customers from using apm marketplace build on GHES, while apm install already worked. With this PR, APM has a consistent GHES story across its main commands.

Story angle for release notes: "APM marketplace now works with GitHub Enterprise Server -- export GITHUB_HOST=corp.ghe.com && apm marketplace build resolves, authenticates, and fetches metadata from your GHES instance using the same token you already configured." This reinforces the enterprise-readiness frame without requiring new auth setup.

The CHANGELOG entries are clean raw material. The type: url limitation note and #1010 forward reference show maturity -- shipping the immediate fix while being transparent about what's next builds community trust.

Side-channel to CEO: GHES parity across install + marketplace build is a concrete enterprise unlock. Worth a dedicated bullet in the next release post targeting enterprise DevEx leads.


CEO arbitration

The five specialists and the Auth Expert are in strong agreement: this is a correct, well-tested, well-documented bug fix. The lazy-init pattern with _ensure_auth() is clean; the _host_info side-effect in _resolve_github_token() is a minor cleanliness observation, not a correctness issue, and fixing it inline would increase the PR's diff scope without adding safety value -- track in a follow-up if desired. The security posture is neutral (no new surfaces, pre-existing token-in-URL pattern, explicit graceful-degradation paths). The Auth Expert confirms no regressions to AuthResolver precedence or host classification.

The Growth Hacker's framing is sound: GHES parity across install and marketplace build is a concrete enterprise unlock worth calling out in the release narrative. No strategic concerns.

Ratification: APPROVE. The change ships the right fix at the right scope.


Required actions before merge

  1. None. The disposition is a clean APPROVE. The _host_info side-effect coupling in _resolve_github_token() is noted but is not blocking -- it is a gracefully handled degradation path with defensive code in _fetch_remote_metadata().

Optional follow-ups

  • feat: ADO marketplace support (marketplace.yml with Azure DevOps repos) #1010 (cross-host resolution): The type: url sources strip the URL host and resolve against GITHUB_HOST. True cross-host resolution (routing to the URL's actual host) is the natural next step; the existing #1010 reference in the code and docs is the right place to track this.
  • _host_info side-effect refactor: Move the AuthResolver.classify_host(self._host) call from inside _resolve_github_token() into _ensure_auth() directly, making the data-flow explicit and removing the implicit side effect. Low urgency; purely a readability improvement.
  • Release narrative: The OSS Growth Hacker suggests a dedicated bullet in the next release post highlighting GHES parity across install and marketplace build for enterprise audiences.

Generated by PR Review Panel for issue #1009 · ● 1.2M ·

@danielmeppiel danielmeppiel enabled auto-merge April 28, 2026 16:50
@danielmeppiel danielmeppiel added this pull request to the merge queue Apr 28, 2026
Merged via the queue into main with commit 3fdaa94 Apr 28, 2026
39 checks passed
@danielmeppiel danielmeppiel deleted the fix/1008-marketplace-build-ghe branch April 28, 2026 16:58
@danielmeppiel danielmeppiel added this to the 0.11.0 milestone Apr 29, 2026
sergio-sisternes-epam added a commit that referenced this pull request May 19, 2026
* fix: marketplace build respects GITHUB_HOST for GHE repos (#1008)

Thread the existing default_host() / build_https_clone_url() / AuthResolver
pattern (used by apm install) through the marketplace build pipeline.

Changes:
- RefResolver: accept optional host parameter, use build_https_clone_url()
  instead of hardcoded github.com for git ls-remote URLs
- MarketplaceBuilder: resolve tokens against configured host, use REST API
  for metadata fetch on GHES/GHE Cloud (raw.githubusercontent.com is
  github.com-only), skip metadata for non-GitHub hosts
- Fix AuthResolver import scoping so classify_host() works when
  auth_resolver is pre-injected
- Add GHE Cloud early-exit when no token (avoids pointless 401)

Tests:
- Update URL assertions to use urlparse (test convention)
- Add 4 RefResolver GHE host tests
- Add 3 metadata fetch path tests (GHES REST API, non-GitHub skip,
  GHE Cloud no-token skip)
- Add builder host env test

Docs:
- CHANGELOG: Fixed entry under [Unreleased]
- marketplace-authoring guide: GHES section
- apm-usage authentication skill: marketplace build example

Closes #1008

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* refactor(marketplace): decouple auth from resolution, reuse DependencyReference for URL sources

Phase B of #1008 -- decouples authentication from marketplace generation
and reuses existing resolution infrastructure for cross-source compatibility.

Changes:
- RefResolver: accept optional token for authenticated git ls-remote
- Builder: extract lazy _ensure_auth() called from _get_resolver() so
  both resolve() and build() benefit from authenticated ls-remote
- Builder: eagerly init resolver before thread pool (race prevention)
- Builder: fix _host_info type annotation (Optional["HostInfo"] with
  TYPE_CHECKING guard)
- resolver.py: _resolve_url_source() now delegates to
  DependencyReference.parse() -- accepts any valid Git URL (GitHub,
  GHES, GitLab, Bitbucket, ADO, SSH) instead of github.com only
- 13 new tests covering token injection, lazy auth, and cross-source
  URL resolution

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(marketplace): address review panel findings

- Add _auth_resolved sentinel to _ensure_auth() for true idempotency
- Clarify _resolve_url_source() docstring: host is not preserved (#1010)
- Split CHANGELOG #1008 entry into GHE fix + URL-source expansion
- Add test documenting host-is-ignored behaviour for non-GitHub URLs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(marketplace): address Copilot review findings

- Fix _ensure_auth() offline branch to set _auth_resolved sentinel
- Clarify CHANGELOG and docs: URL host is not preserved, GITHUB_HOST required
- Update marketplace-authoring.md to warn against cross-host URL reliance

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Sergio Sisternes <sergio.sisternes@epam.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

panel-review Trigger the apm-review-panel gh-aw workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] apm marketplace build fails for GitHub Enterprise Server repositories

3 participants