Skip to content

feat(ci): consolidate Arc canary + prod release into one pipeline#1715

Merged
zanejohnson-azure merged 1 commit into
ci_prodfrom
zane/consolidate-arc-canary-prod
Jun 17, 2026
Merged

feat(ci): consolidate Arc canary + prod release into one pipeline#1715
zanejohnson-azure merged 1 commit into
ci_prodfrom
zane/consolidate-arc-canary-prod

Conversation

@zanejohnson-azure

Copy link
Copy Markdown
Contributor

What

Consolidates the separate Arc K8s extension canary and prod release pipelines into a single self-contained pipeline by prepending three canary stages to ci-arc-k8s-extension-prod-release.yaml.

Changes

  • Stage_Canary_MCR (auto) — RELEASE_STAGE_NAME=Canary; packages the local chart and pushes to canary/stable via the arc-k8s-extension-Managed-SDP Ev2 root.
  • Stage_Canary_Regions (manual) — RELEASE_STAGE_NAME=CanaryStable; registers canary regions via the arc-k8s-extension-release-v2-Managed-SDP Ev2 root.
  • Wait_After_Canary — 25h (delayForMinutes: 1500) bake before prod.
  • Stage_1 (prod1/stable push) — now dependsOn: Wait_After_Canary and gated with trigger: manual for extra safety, so prod is never re-pushed without an explicit human start (in addition to the existing in-stage ApprovalTask).

No existing prod-tier logic was modified beyond the Stage_1 header.

Result

One pipeline now runs canary → 25h bake → manual-gated prod tiers, replacing the need to run the canary pipeline separately. Queue it the same way (same ContainerInsights-MultiArch-MergedBranches artifact, same VAR_* variables).

Validation

  • YAML parses cleanly: 13 stages, chain intact (Canary_MCR → Canary_Regions → Wait → Stage_1 → Stage_2 … → Stage_7).
  • Three-dot diff vs ci_prod shows only the single pipeline file.

@zanejohnson-azure zanejohnson-azure requested a review from a team as a code owner June 12, 2026 23:50
@zanejohnson-azure

Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@zanejohnson-azure zanejohnson-azure enabled auto-merge (squash) June 16, 2026 23:44
@zanejohnson-azure zanejohnson-azure merged commit 7ea95a2 into ci_prod Jun 17, 2026
19 checks passed
suyadav1 added a commit that referenced this pull request Jun 19, 2026
)

* Revert "add canary in arc prod release pipeline (#1715)"

This reverts commit 7ea95a2.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Recover ama-logs workspace key from extension protected secret + checksum restart

When the extension manager stops re-delivering protectedParameters (e.g. during an
extension auto-update), the rendered .Values workspace key becomes empty and the chart
would overwrite the live ama-logs-secret KEY with an empty value, breaking the agent
~10-14 days later when cached credentials expire.

Changes:
- ama-logs-secret.yaml: when the incoming workspace key is empty/placeholder, fall back
  to the extension manager's on-cluster protected-parameters secret
  'protected-ext-parameters-<release>' (data key 'OmsAgent.workspaceKey' for AKS or
  'amalogs.secret.key' for Arc). This is the secret the config agent persists from
  protectedParameters; the agent does not clear it when it drops the CR reference, so it
  remains the source of truth. Only the KEY is recovered - WSID is non-protected and still
  delivered. lookup is a no-op on first install (incoming key populated).
- ama-logs-daemonset.yaml / ama-logs-daemonset-windows.yaml / ama-logs-deployment.yaml:
  add checksum/secret annotation to the AKS (non-Arc) Linux and Windows pod templates so
  pods roll when the effective secret changes. The previous WSID-only annotation did not
  detect workspace KEY changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Gate workspace-key fallback on non-AAD clusters

The protected-parameters key fallback only applies to workspace-key (non-AAD) clusters.
On AAD/managed-identity clusters the workspace key is empty by design (auth uses a token,
not a key), so recovering a key there is meaningless. Evaluate isUsingAADAuth (AKS) /
useAADAuth (Arc) the same way the daemonset/deployment templates do, and skip the lookup +
fallback entirely when AAD auth is in use.

Validated on a live AKS cluster: with a key injected into protected-ext-parameters-*,
isUsingAADAuth=true does NOT recover it (gated), isUsingAADAuth=false does.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Recover workspace key only when missing (non-AAD, broken cluster)

Gate the protected-parameters lookup on BOTH non-AAD auth AND the incoming workspace key
being empty/placeholder, so the recovery runs only when the mandatory key is missing
(the cluster is already broken / about to be). Healthy clusters (key supplied) and AAD
clusters skip the live secret read entirely — no unnecessary get-secret API call on the
common path.

Uses an explicit if/else (not ternary) to pick the active path's incoming key:
OmsAgent.workspaceKey for AKS, amalogs.secret.key for Arc.

Validated on a live AKS cluster: non-AAD+empty recovers; non-AAD+real key uses the
supplied key (lookup skipped); AAD+empty stays empty (gated). helm lint clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add dummy env vars

* remove arc secret changes and refactor

* removed checksum/secret

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Madhav Jakkampudi <hejakkam@microsoft.com>
zanejohnson-azure added a commit that referenced this pull request Jun 23, 2026
Restores the canary stages reverted in #1718 (Stage_Canary_MCR,
Stage_Canary_Regions, Wait_After_Canary) so the Arc prod pipeline runs
canary -> 25h bake -> manual-gated prod tiers in one pipeline.

Includes the fix that the original #1715 was missing: Stage_1 is
trigger: manual WITHOUT an explicit dependsOn (ADO rejects dependsOn on
manually-triggered stages). Stage_1 still orders after Wait_After_Canary
via default sequential dependency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants