Skip to content

Fix for extension dropped config issue & revert canary release PR#1718

Merged
suyadav1 merged 7 commits into
ci_prodfrom
suyadav/fix-extension-dropped-config
Jun 19, 2026
Merged

Fix for extension dropped config issue & revert canary release PR#1718
suyadav1 merged 7 commits into
ci_prodfrom
suyadav/fix-extension-dropped-config

Conversation

@suyadav1

Copy link
Copy Markdown
Contributor

This pull request introduces improvements to the Azure Monitor Container Insights Helm chart, focusing on enhancing secret management for workspace keys and adding a configuration fix flag. The main changes ensure that missing workspace keys are recovered from an existing protected secret, preventing agent outages, and add a checksum annotation to trigger pod restarts when secrets change.

Secret management and recovery improvements:

  • Added logic in ama-logs-secret.yaml to recover the workspace key from the protected-ext-parameters-<release> secret if the key is missing and the cluster is not using AAD authentication, preventing agent downtime during extension manager updates. The effective workspace key ($arcKey/$aksKey) is now resolved using this recovery logic.
  • Updated the secret data in ama-logs-secret.yaml to use the resolved workspace key variables ($arcKey and $aksKey) instead of the direct values from .Values, ensuring the recovered key is used when necessary. [1] [2]

Deployment and pod restart improvements:

  • Added a checksum/secret annotation to the pod templates in ama-logs-daemonset.yaml, ama-logs-daemonset-windows.yaml, and ama-logs-deployment.yaml to ensure pods are restarted automatically when the secret changes. [1] [2] [3]

Configuration fixes:

  • Added the EXTENSION_DROPPED_CONFIG_FIX_ENABLED environment variable (set to "true") to all relevant deployments and daemonsets, enabling a fix for extension-dropped configuration issues. [1] [2] [3]

suyadav1 and others added 5 commits June 19, 2026 00:27
This reverts commit 7ea95a2.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ksum restart

When the extension manager stops re-delivering protectedParameters (e.g. during an
extension auto-update), the rendered .Values workspace key becomes empty and the chart
would overwrite the live ama-logs-secret KEY with an empty value, breaking the agent
~10-14 days later when cached credentials expire.

Changes:
- ama-logs-secret.yaml: when the incoming workspace key is empty/placeholder, fall back
  to the extension manager's on-cluster protected-parameters secret
  'protected-ext-parameters-<release>' (data key 'OmsAgent.workspaceKey' for AKS or
  'amalogs.secret.key' for Arc). This is the secret the config agent persists from
  protectedParameters; the agent does not clear it when it drops the CR reference, so it
  remains the source of truth. Only the KEY is recovered - WSID is non-protected and still
  delivered. lookup is a no-op on first install (incoming key populated).
- ama-logs-daemonset.yaml / ama-logs-daemonset-windows.yaml / ama-logs-deployment.yaml:
  add checksum/secret annotation to the AKS (non-Arc) Linux and Windows pod templates so
  pods roll when the effective secret changes. The previous WSID-only annotation did not
  detect workspace KEY changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The protected-parameters key fallback only applies to workspace-key (non-AAD) clusters.
On AAD/managed-identity clusters the workspace key is empty by design (auth uses a token,
not a key), so recovering a key there is meaningless. Evaluate isUsingAADAuth (AKS) /
useAADAuth (Arc) the same way the daemonset/deployment templates do, and skip the lookup +
fallback entirely when AAD auth is in use.

Validated on a live AKS cluster: with a key injected into protected-ext-parameters-*,
isUsingAADAuth=true does NOT recover it (gated), isUsingAADAuth=false does.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Gate the protected-parameters lookup on BOTH non-AAD auth AND the incoming workspace key
being empty/placeholder, so the recovery runs only when the mandatory key is missing
(the cluster is already broken / about to be). Healthy clusters (key supplied) and AAD
clusters skip the live secret read entirely — no unnecessary get-secret API call on the
common path.

Uses an explicit if/else (not ternary) to pick the active path's incoming key:
OmsAgent.workspaceKey for AKS, amalogs.secret.key for Arc.

Validated on a live AKS cluster: non-AAD+empty recovers; non-AAD+real key uses the
supplied key (lookup skipped); AAD+empty stays empty (gated). helm lint clean.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@suyadav1 suyadav1 requested a review from a team as a code owner June 19, 2026 00:59
@suyadav1

Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@zanejohnson-azure

Copy link
Copy Markdown
Contributor

This pull request introduces improvements to the Azure Monitor Container Insights Helm chart, focusing on enhancing secret management for workspace keys and adding a configuration fix flag. The main changes ensure that missing workspace keys are recovered from an existing protected secret, preventing agent outages, and add a checksum annotation to trigger pod restarts when secrets change.

Secret management and recovery improvements:

  • Added logic in ama-logs-secret.yaml to recover the workspace key from the protected-ext-parameters-<release> secret if the key is missing and the cluster is not using AAD authentication, preventing agent downtime during extension manager updates. The effective workspace key ($arcKey/$aksKey) is now resolved using this recovery logic.
  • Updated the secret data in ama-logs-secret.yaml to use the resolved workspace key variables ($arcKey and $aksKey) instead of the direct values from .Values, ensuring the recovered key is used when necessary. [1] [2]

Deployment and pod restart improvements:

  • Added a checksum/secret annotation to the pod templates in ama-logs-daemonset.yaml, ama-logs-daemonset-windows.yaml, and ama-logs-deployment.yaml to ensure pods are restarted automatically when the secret changes. [1] [2] [3]

Configuration fixes:

  • Added the EXTENSION_DROPPED_CONFIG_FIX_ENABLED environment variable (set to "true") to all relevant deployments and daemonsets, enabling a fix for extension-dropped configuration issues. [1] [2] [3]

@suyadav1

Copy link
Copy Markdown
Contributor Author

/azp run

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

@suyadav1 suyadav1 merged commit 03c4bb5 into ci_prod Jun 19, 2026
19 checks passed
zanejohnson-azure added a commit that referenced this pull request Jun 23, 2026
Restores the canary stages reverted in #1718 (Stage_Canary_MCR,
Stage_Canary_Regions, Wait_After_Canary) so the Arc prod pipeline runs
canary -> 25h bake -> manual-gated prod tiers in one pipeline.

Includes the fix that the original #1715 was missing: Stage_1 is
trigger: manual WITHOUT an explicit dependsOn (ADO rejects dependsOn on
manually-triggered stages). Stage_1 still orders after Wait_After_Canary
via default sequential dependency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants