Skip to content

Fix aw-portfolio-yield agent job by switching to the OTel queries skill#31470

Merged
mnkiefer merged 3 commits into
mainfrom
copilot/fix-agent-job-failure-again
May 11, 2026
Merged

Fix aw-portfolio-yield agent job by switching to the OTel queries skill#31470
mnkiefer merged 3 commits into
mainfrom
copilot/fix-agent-job-failure-again

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 11, 2026

  • Inspect the existing workflow source, generated lock file, and recent commit history
  • Recompile aw-portfolio-yield with the repository toolchain and minimize unintended diff changes
  • Run targeted validation for the workflow generation path
  • Scan changed files, validate the final diff, and reply to the PR comment

Copilot AI and others added 2 commits May 11, 2026 10:27
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Co-authored-by: mnkiefer <8320933+mnkiefer@users.noreply.github.com>
Copilot AI changed the title fix: remove broken otel MCP server import from aw-portfolio-yield (#31459 follow-up) Fix aw-portfolio-yield agent job by switching to the OTel queries skill May 11, 2026
Copilot AI requested a review from mnkiefer May 11, 2026 10:33
@mnkiefer mnkiefer marked this pull request as ready for review May 11, 2026 10:34
Copilot AI review requested due to automatic review settings May 11, 2026 10:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes the failing aw-portfolio-yield agent workflow by switching its shared import from the legacy OTel observability module (which provisions MCP gateway OTLP wiring) to the newer “OTel queries” guidance/skill, so the workflow retains telemetry interpretation instructions without generating broken gateway config.

Changes:

  • Replaced shared/otel-observability.md with shared/otel-queries.md in the aw-portfolio-yield workflow source.
  • Updated the workflow prompt contract to reference the OTel queries skill instead of an otel MCP server.
  • Recompiled aw-portfolio-yield.lock.yml so the generated workflow no longer includes OTLP env/secrets, OTel MCP server config, or MCP gateway OpenTelemetry wiring.
Show a summary per file
File Description
.github/workflows/aw-portfolio-yield.md Switches the shared import to OTel queries and updates agent instructions accordingly.
.github/workflows/aw-portfolio-yield.lock.yml Compiled workflow output reflecting the new import (removes OTLP/MCP OTel wiring and updates generated workflow details).

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 2

Comment thread .github/workflows/aw-portfolio-yield.lock.yml
Comment thread .github/workflows/aw-portfolio-yield.lock.yml
@mnkiefer
Copy link
Copy Markdown
Collaborator

@copilot Recompile workflow

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 11, 2026

@copilot Recompile workflow

Ran ./gh-aw compile aw-portfolio-yield --validate locally with the current repo toolchain. It produced no diff, so .github/workflows/aw-portfolio-yield.lock.yml is already up to date with the current markdown source.

Copilot AI requested a review from mnkiefer May 11, 2026 10:48
@mnkiefer mnkiefer merged commit b2d167b into main May 11, 2026
23 checks passed
@mnkiefer mnkiefer deleted the copilot/fix-agent-job-failure-again branch May 11, 2026 10:54
@github-actions github-actions Bot mentioned this pull request May 11, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose as this is a bug fix PR. The fix is directionally correct — switching from shared/otel-observability.md to shared/otel-queries.md cleans up the OTel dependency and removes the OTLP_ENDPOINT/OTLP_TOKEN secrets that were previously required.

Key Themes

  • GH_AW_VERSION: dev in production lock file: The recompilation was done with a local development build (dev) rather than a tagged release. This means the live workflow will pick up a non-deterministic dev build at runtime — a production correctness concern.
  • Broad diff scope: The lock file diff includes several unrelated improvements (better node detection, AWF_REFLECT_ENABLED, chmod a+rX, maxdepth 4→5, removal of the observability summary step). These are compiler improvements being baked in as a side effect of the recompile, which is fine, but GH_AW_VERSION: dev stands out as an artifact that should not be in a release lock file.
  • Root cause not documented (per /diagnose Phase 6 — state which hypothesis was correct in the commit/PR): The PR description is a task checklist rather than an explanation of why the lock file was stale or what originally caused the agent job failure. Future debuggers won't know whether this was a missed make recompile, a compiler rollback, or something else.

Positive Highlights

  • ✅ Secrets removed cleanly: OTLP_ENDPOINT and OTLP_TOKEN no longer appear in the lock file manifest — the surface area for credential exposure is reduced.
  • ✅ Better node-not-found error handling in the compiled agent bootstrap — the new exit 127 path with a helpful message is an improvement over silently falling back to a bare node string.
  • ✅ The shared/otel-queries.md skill is a more focused, purpose-built import for this workflow's needs.

Verdict

Requesting changes specifically for the GH_AW_VERSION: dev value in the lock file — this should be a tagged version before merging to keep production workflows reproducible.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · ● 10.1M

@@ -767,7 +735,7 @@ jobs:
GH_AW_PHASE: agent
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] GH_AW_VERSION is set to dev in the compiled lock file. This will cause the production workflow to pull a development build of gh-aw at runtime rather than a pinned release.

This appears to be an artifact of recompiling with a local development build instead of a tagged release. Before merging, recompile with a tagged version (e.g. v0.71.5 or the latest release) so the lock file reflects a stable, reproducible runtime.

Running a production workflow against dev makes the execution environment non-deterministic across runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants