Azure VM deployment with Terraform + Ansible by gaganso · Pull Request #143 · microsoft/AIOpsLab

gaganso · 2026-02-18T21:47:47Z

Summary

Adds fully automated deployment of Kubernetes clusters on Azure VMs using Terraform and Ansible, with a single-command deploy.py script.

Deployment automation (Terraform + Ansible)

deploy.py: Single-command orchestrator that provisions Azure VMs, runs Ansible, installs local tools (kubectl, helm, poetry), and configures AIOpsLab -- supports --plan, --apply, and --destroy
Mode A (AIOpsLab on controller VM) and Mode B (AIOpsLab on laptop with remote kubectl)
Terraform: Azure VMs (controller + N workers), VNet, NSG with configurable source via --allowed-ips / nsg_allowed_source, public IPs
Ansible: Docker CE + cri-dockerd, Kubernetes v1.31 (kubeadm), Flannel CNI, kubeconfig fetch with public IP SAN
generate_inventory.py: Auto-generates Ansible inventory from Terraform outputs
Security: Kubeconfig permissions set to 0600, admin.conf restored to 0600 after fetch, join token/cert hash debug tasks use no_log: true, kubectl binary verified with SHA256 checksum
terraform.tfvars.example: Documented config template with sizing examples

Documentation changes

scripts/terraform/README.md: Full deployment guide with Mode A/B, VM sizing, troubleshooting, cost management
CLAUDE.md: Added Azure deployment section with key files, config reference, common issues, architecture diagram
README.md and TutorialSetup.md: Added Terraform/Ansible as cluster setup option

Other changes (not deployment-specific)

aiopslab/service/kubectl.py: Fix kubeconfig context for non-kind clusters -- read k8s_host from config.yml, only use kind-* context for kind/localhost
aiopslab/service/helm.py: Guard -f values_file in upgrade with if values_file: check; add FileNotFoundError with submodule hint for missing charts
aiopslab/service/telemetry/prometheus.py: Downgrade "release not found" from ERROR+traceback to WARNING on first run
poetry.lock: Refreshed lock file
.gitignore: Add terraform state files, ansible retry files

Test plan

deploy.py --plan dry-run succeeds with correct Terraform plan
deploy.py --apply --mode B provisions VMs, runs Ansible, configures AIOpsLab
kubectl get nodes shows all nodes Ready from laptop
python3 cli.py + start misconfig_app_hotel_res-detection-1 runs end-to-end (Prometheus, OpenEBS, app deploy, fault injection, workload)
deploy.py --destroy cleans up all Azure resources

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

…o terraform

…ult.

poetry shell was removed in Poetry 2.0. Update README.md and TutorialSetup.md to use the new activation command and drop the poetry-plugin-shell dependency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add nsg_allowed_source variable (default "*") to control SSH and K8s API NSG rules declaratively. Replaces hardcoded CorpNetPublic so deploys work without corporate VPN. Users can pass a CIDR or Azure service tag to restrict access. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove add_nsg_corpnet_rule() in favor of Terraform nsg_allowed_source - Add --allowed-ips flag to pass NSG source through to Terraform - Add auto-install for kubectl, helm, and poetry (Linux/amd64) - Add setup_aiopslab_mode_b(): verifies kubeconfig, generates aiopslab/config.yml, runs poetry install, prints summary table - Add setup_aiopslab_mode_a() placeholder - Remove emoji from log messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Document --mode B automatic setup, add tested-on platform note, add git worktree WSL submodule caveat, update Mode A instructions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove duplicate Terraform quick start, merge into single Azure Deployment section with deploy.py single-command workflow. Fix poetry shell references and add tested-on platform note. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The submodule path check incorrectly treated remote helm repo references (e.g. chaos-mesh/chaos-mesh) as local paths. Add remote_chart flag to Chaos Mesh config and fix upgrade method to respect the same flag. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

# Conflicts: # CLAUDE.md # poetry.lock

- Replace cluster_size with worker_vm_count in manual tf commands - Replace poetry shell with eval $(poetry env activate) - Add poetry env use python3.11 before poetry install - Remove Documentation section (links to nonexistent files) - Remove SECURITY.md references - Replace --restrict-ssh-corpnet with --allowed-ips CorpNetPublic - Remove deploy_old.py and DEPLOYMENT_GUIDE.md from file tree - Remove "Migration from v1.0" section Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

NSG rules default to open (*), not restricted to CorpNetPublic. Document --allowed-ips flag and nsg_allowed_source variable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Change kubeconfig copy mode from 0644 to 0600 on control plane - Restore /etc/kubernetes/admin.conf to 0600 after fetch - Set ~/.kube/config to 0600 on localhost after fetch - Add no_log: true to kube_token and cert_hash debug tasks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Guard -f values_file with conditional check in helm upgrade - Remove unused azurerm_public_ip_prefix resource from main.tf - Remove unused location variable from variables.tf and tfvars example - Add gen1/gen2 note to os_sku variable description - Add trailing newlines to main.tf and variables.tf Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Download and verify kubectl binary checksum before installing. Add note about helm pipe-to-bash being the official install method. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The merge from main introduced AIOPSLAB_CLUSTER which hardcodes a kind- prefix on the context name, breaking remote/Azure clusters. Now reads k8s_host from config.yml: uses kind-kind for kind/localhost, default kubeconfig context for remote hosts, and AIOPSLAB_CLUSTER env var still works for parallel kind clusters. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When Prometheus isn't installed yet, Helm.status() raises RuntimeError. This is expected on first run -- log a warning instead of a full traceback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR introduces comprehensive automated deployment capabilities for AIOpsLab on Azure using Terraform and Ansible. It transforms the deployment process from manual VM setup to a single-command operation that provisions infrastructure, configures Kubernetes clusters, and sets up the AIOpsLab environment.

Changes:

Single-command deployment orchestration via deploy.py with support for plan, apply, and destroy operations
Dynamic Kubernetes cluster provisioning (1 controller + N workers) on Azure VMs with configurable sizing
Automated Ansible playbook execution for Docker, Kubernetes, and CNI installation
Enhanced application code to support remote clusters with proper kubeconfig context handling
Comprehensive documentation covering deployment modes, troubleshooting, and security best practices

Reviewed changes

Copilot reviewed 21 out of 23 changed files in this pull request and generated 26 comments.

Show a summary per file

File	Description
scripts/terraform/deploy.py	Main orchestrator script (new) - handles Terraform, Ansible, tool installation, and AIOpsLab configuration
scripts/terraform/generate_inventory.py	Generates Ansible inventory from Terraform outputs (new)
scripts/terraform/main.tf	Refactored infrastructure definition with dynamic worker nodes and proper NSG rules
scripts/terraform/variables.tf	Complete rewrite with new variables for VM size, count, SSH keys, and NSG configuration
scripts/terraform/outputs.tf	Restructured outputs with controller/worker details for automation
scripts/terraform/providers.tf	Updated provider versions and added skip_provider_registration guidance
scripts/terraform/terraform.tfvars.example	New configuration template with examples (new)
scripts/terraform/README.md	Comprehensive deployment guide with Mode A/B, troubleshooting, and cost management
scripts/terraform/ssh.tf	Removed (replaced with user-provided SSH keys)
scripts/terraform/data.tf	Moved to main.tf
scripts/ansible/setup_common.yml	Added conntrack package for kubeadm
scripts/ansible/remote_setup_controller_worker.yml	Major enhancements: public IP SAN support, proper permissions, idempotency, verification steps
scripts/ansible/inventory.yml.example	Updated with private_ip fields and improved documentation
aiopslab/service/kubectl.py	Fixed context selection to support non-kind clusters
aiopslab/service/helm.py	Added chart existence checks and values_file guard
aiopslab/service/telemetry/prometheus.py	Downgraded "release not found" from ERROR to WARNING
aiopslab/generators/fault/inject_symp.py	Added remote_chart flag for Chaos Mesh
README.md	Added Azure deployment option with Poetry installation guidance
TutorialSetup.md	Updated to use `eval $(poetry env activate)`
CLAUDE.md	Documented Azure deployment workflow and troubleshooting
.gitignore	Added terraform state files and sensitive configuration files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-18T21:55:54Z

scripts/ansible/remote_setup_controller_worker.yml

+        regexp: 'https://{{ control_plane_ip }}:6443'
+        replace: 'https://{{ control_plane_public_ip }}:6443'


The regexp replacement uses the private IP but should match any IP pattern. If the control_plane_ip format doesn't match exactly (e.g., has different formatting), the replacement will silently fail. Consider using a more robust pattern like 'regexp: "server: https://[^:]+:6443"' to match any IP or hostname.

Suggested change

regexp: 'https://{{ control_plane_ip }}:6443'

replace: 'https://{{ control_plane_public_ip }}:6443'

regexp: 'server: https://[^:]+:6443'

replace: 'server: https://{{ control_plane_public_ip }}:6443'

Copilot · 2026-02-18T21:55:54Z

scripts/terraform/deploy.py

+            # Confirm destruction
+            confirm = input("This will destroy all resources. Type 'yes' to confirm: ")
+            if confirm.lower() != 'yes':
+                logger.info("Destruction cancelled")


The destroy operation requires manual confirmation (line 675), which is good for safety. However, if the confirmation is not 'yes', the function returns False but logs 'Destruction cancelled' as INFO level. Consider using logger.warning() instead to make it more visible that the operation was intentionally cancelled, distinguishing it from an actual failure.

Suggested change

logger.info("Destruction cancelled")

logger.warning("Destruction cancelled")

Copilot · 2026-02-18T21:55:55Z

scripts/terraform/outputs.tf

+  value = {
+    name       = azurerm_linux_virtual_machine.controller.name
+    public_ip  = azurerm_public_ip.controller.ip_address
+    private_ip = azurerm_network_interface.controller.ip_configuration[0].private_ip_address


The private_ip field is referenced in generate_inventory.py (line 88) and remote_setup_controller_worker.yml (line 14), but it's accessed as an output of ip_configuration. This access pattern is correct for Azure, but consider adding a validation check in generate_inventory.py to ensure the private_ip is not empty before writing to the inventory.

Copilot · 2026-02-18T21:55:55Z

scripts/terraform/deploy.py

+                ver = subprocess.run(
+                    [path, "--version"], capture_output=True, text=True, check=True
+                ).stdout.strip()
+                minor = int(ver.split(".")[1])
+                if minor >= 11:
+                    return candidate, ver


The Python version detection logic parses the version string by splitting on '.' and accessing index [1] for the minor version. This will fail if Python returns a version string in an unexpected format or if the split doesn't produce enough elements. Add error handling or validation to prevent IndexError.

Suggested change

ver = subprocess.run(

[path, "--version"], capture_output=True, text=True, check=True

).stdout.strip()

minor = int(ver.split(".")[1])

if minor >= 11:

return candidate, ver

proc = subprocess.run(

[path, "--version"], capture_output=True, text=True, check=True

)

# Some Python versions print to stderr instead of stdout

ver_output = (proc.stdout or proc.stderr or "").strip()

match = re.search(r"Python\s+(\d+)\.(\d+)\.(\d+)", ver_output)

if not match:

continue

major = int(match.group(1))

minor = int(match.group(2))

# Accept Python 3.11+ and any higher major versions

if major > 3 or (major == 3 and minor >= 11):

return candidate, ver_output

Copilot · 2026-02-18T21:55:55Z

aiopslab/service/kubectl.py

+        if cluster_env:
+            context = f"kind-{cluster_env}"
+        elif k8s_host == "kind" or k8s_host == "localhost":
+            context = "kind-kind"


The logic for context selection has a potential issue: if k8s_host is set to "localhost", it will use "kind-kind" context (line 30-31). However, for remote clusters accessed via localhost (like with port forwarding), this would incorrectly use the kind context. Consider checking if the context actually exists before using it, or refining the logic to distinguish between kind clusters and remote clusters accessed via localhost.

Suggested change

if cluster_env:

context = f"kind-{cluster_env}"

elif k8s_host == "kind" or k8s_host == "localhost":

context = "kind-kind"

# Detect whether the default kind context ("kind-kind") exists in the current kubeconfig.

kind_context_exists = False

try:

contexts, _ = config.list_kube_config_contexts()

if contexts:

kind_context_exists = any(

ctx.get("name") == "kind-kind" for ctx in contexts

)

except Exception:

# If kubeconfig can't be listed, fall back to using the default context.

kind_context_exists = False

if cluster_env:

context = f"kind-{cluster_env}"

elif k8s_host == "kind" or k8s_host == "localhost":

# Only use the kind context if it actually exists; otherwise, use the default context.

context = "kind-kind" if kind_context_exists else None

Copilot · 2026-02-18T21:55:59Z

scripts/terraform/deploy.py

+            try:
+                sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+                sock.settimeout(5)
+                result = sock.connect_ex((host, port))
+                sock.close()
+
+                if result == 0:
+                    logger.info(f"SSH available on {host}")
+                    return True
+
+            except socket.error:
+                pass
+
+            time.sleep(interval)
+


The socket connection check (line 172) uses connect_ex which returns 0 on success. However, the socket is not properly closed in all cases - if an exception occurs in the except block (line 179), the socket remains open. Consider using a 'with' context manager or ensure the socket.close() is in a finally block.

Copilot · 2026-02-18T21:55:59Z

scripts/terraform/main.tf

+  # SSH access - restrict via var.nsg_allowed_source or --allowed-ips in deploy.py
  security_rule {
    name                       = "SSH"
-    priority                   = 1001
+    priority                   = 100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "22"
-    source_address_prefix      = "*"
+    source_address_prefix      = var.nsg_allowed_source
    destination_address_prefix = "*"
  }
-}
-
-resource "azurerm_network_security_group" "aiopslab_nsg_2" {
-  name                = "${var.resource_name_prefix}_aiopslabNSG_2"
-  location            = var.resource_location
-  resource_group_name = var.resource_group_name

+  # Kubernetes API server - for remote kubectl access (Mode B)
  security_rule {
-    name                       = "SSH"
-    priority                   = 1001
+    name                       = "KubernetesAPI"
+    priority                   = 110
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
-    destination_port_range     = "22"
-    source_address_prefix      = "*"
+    destination_port_range     = "6443"
+    source_address_prefix      = var.nsg_allowed_source
    destination_address_prefix = "*"


The azurerm_network_security_group rules for SSH and Kubernetes API use var.nsg_allowed_source, which currently defaults to *, leaving ports 22 and 6443 exposed to the entire internet. An external attacker can scan these endpoints and attempt to exploit SSH or kube-apiserver vulnerabilities or stolen credentials to gain control of the VMs and cluster. Restrict source_address_prefix to specific CIDRs or Azure service tags by default (and require explicit opt-in for *) so that management and API ports are not globally accessible.

Copilot · 2026-02-18T21:55:59Z

scripts/terraform/variables.tf

+  description = "Source address prefix for NSG rules (SSH + K8s API). Use '*' for open access, a CIDR like '203.0.113.0/24', or an Azure service tag like 'CorpNetPublic'."
+  default     = "*"


The nsg_allowed_source variable defaults to *, which means that unless explicitly overridden, both SSH (22) and Kubernetes API (6443) NSG rules will allow inbound traffic from any IP address. This broad default significantly increases the attack surface by exposing management and control-plane services to internet-wide scanning and exploitation attempts. Use a safer default (such as a corporate CIDR or no default that forces the caller to supply a restricted prefix) and treat * only as an explicit, documented testing override.

Suggested change

description = "Source address prefix for NSG rules (SSH + K8s API). Use '*' for open access, a CIDR like '203.0.113.0/24', or an Azure service tag like 'CorpNetPublic'."

default = "*"

description = "Required: source address prefix for NSG rules (SSH + K8s API). Use a restricted CIDR like '203.0.113.0/24' or an Azure service tag like 'CorpNetPublic'. Use '*' only as an explicit, temporary testing override."

Copilot · 2026-02-18T21:56:00Z

scripts/terraform/README.md

+# Install Docker, Kubernetes packages on all nodes
+ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -i inventory.yml setup_common.yml
+
+# Initialize K8s cluster and join workers
+ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -i inventory.yml remote_setup_controller_worker.yml


The documentation recommends running Ansible with ANSIBLE_HOST_KEY_CHECKING=False, which disables SSH host key verification for cluster provisioning. This allows a network attacker in the path between your machine and Azure to perform a man-in-the-middle attack on the SSH connections and inject or observe all provisioning commands and secrets. Remove ANSIBLE_HOST_KEY_CHECKING=False from the recommended commands and instead manage host keys via known_hosts or Ansible's ssh_known_hosts facilities so host authenticity is still verified.

Suggested change

# Install Docker, Kubernetes packages on all nodes

ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -i inventory.yml setup_common.yml

# Initialize K8s cluster and join workers

ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -i inventory.yml remote_setup_controller_worker.yml

# Ensure target host SSH keys are present in ~/.ssh/known_hosts (e.g., via ssh-keyscan or Ansible's ssh_known_hosts module).

# Install Docker, Kubernetes packages on all nodes

ansible-playbook -i inventory.yml setup_common.yml

# Initialize K8s cluster and join workers

ansible-playbook -i inventory.yml remote_setup_controller_worker.yml

Copilot · 2026-02-18T21:56:00Z

README.md

+ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -i inventory.yml setup_common.yml
+ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -i inventory.yml remote_setup_controller_worker.yml


The Azure deployment instructions suggest running Ansible with ANSIBLE_HOST_KEY_CHECKING=False, which turns off SSH host key verification for provisioning the Kubernetes cluster. With host key checks disabled, a network-positioned attacker can impersonate the VMs during Ansible runs and gain control over the cluster or capture credentials without being detected. Drop ANSIBLE_HOST_KEY_CHECKING=False from the example commands and rely on proper SSH host key management so users maintain protection against man-in-the-middle attacks.

Suggested change

ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -i inventory.yml setup_common.yml

ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -i inventory.yml remote_setup_controller_worker.yml

ansible-playbook -i inventory.yml setup_common.yml

ansible-playbook -i inventory.yml remote_setup_controller_worker.yml

Mode A runs AIOpsLab on a kubeadm controller VM where k8s_host is "localhost" but the kubeconfig context is kubernetes-admin@kubernetes, not kind-kind. Only treat k8s_host="kind" as a kind cluster. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

setup_aiopslab.yml installs Python 3.11 (deadsnakes PPA), Poetry, Helm, delivers code via git clone or rsync (dev mode), adds user to docker group for VirtualizationFaultInjector, generates config.yml from Jinja2 template, and runs poetry install. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implements setup_aiopslab_mode_a() which runs the Ansible playbook with extra-vars for clone vs rsync mode. Adds setup_only() for re-running setup without reprovisioning VMs. New CLI flags: --setup-only (mutually exclusive with --plan/--apply/--destroy), --dev (rsync local repo instead of git clone, Mode A only). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gaganso and others added 30 commits August 24, 2025 00:14

update terraform version

6d4c0fd

update azurerm provider and remove unnecessary providers

054cc9d

updating to pessimistic constraint operator to better manage versions

0c16657

Update privileges required for resource creation

fd3c3d1

Merge branch 'terraform' of https://github.com/microsoft/AIOpsLab int…

ba4ac51

…o terraform

updated gitignore

21a165f

support dynamic cluster size and update the terraform scripts

5fc2b3b

skip provider registration

f828600

explain different scenarios for registering providers

139ec2c

updated the scripts to use existing resource group

c66c519

add default path for ssh files

504654c

Add early check for missing helm chart paths with submodule hint

c8e906c

add conntrack package

383cca2

ansible changes for terraform integration

1d880da

Terraform NSG and output improvements - restricted to corpnet by defa…

7e25919

…ult.

Update gitignore and poetry lock file

dcd95d6

Terraform and Ansible deployment improvements

6844709

Replace poetry shell with eval $(poetry env activate)

95c165e

poetry shell was removed in Poetry 2.0. Update README.md and TutorialSetup.md to use the new activation command and drop the poetry-plugin-shell dependency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update terraform README for deploy.py changes and Mode A/B

8652dcd

Document --mode B automatic setup, add tested-on platform note, add git worktree WSL submodule caveat, update Mode A instructions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Consolidate CLAUDE.md Azure deployment section

4e36592

Remove duplicate Terraform quick start, merge into single Azure Deployment section with deploy.py single-command workflow. Fix poetry shell references and add tested-on platform note. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into terraform

3adffc7

# Conflicts: # CLAUDE.md # poetry.lock

Fix CLAUDE.md NSG description to match actual defaults

f595889

NSG rules default to open (*), not restricted to CorpNetPublic. Document --allowed-ips flag and nsg_allowed_source variable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add kubectl SHA256 checksum verification in deploy.py

15e14a3

Download and verify kubectl binary checksum before installing. Add note about helm pipe-to-bash being the official install method. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Downgrade Prometheus "not found" log from ERROR to WARNING

2541e06

When Prometheus isn't installed yet, Helm.status() raises RuntimeError. This is expected on first run -- log a warning instead of a full traceback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gaganso requested a review from Copilot February 18, 2026 21:48

Copilot started reviewing on behalf of gaganso February 18, 2026 21:49 View session

gaganso linked an issue Feb 18, 2026 that may be closed by this pull request

Automate provisioning and deployment #54

Open

Copilot AI reviewed Feb 18, 2026

View reviewed changes

Gagan and others added 3 commits February 23, 2026 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Azure VM deployment with Terraform + Ansible#143

Azure VM deployment with Terraform + Ansible#143
gaganso wants to merge 34 commits intomainfrom
terraform

gaganso commented Feb 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Copilot AI Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		regexp: 'https://{{ control_plane_ip }}:6443'
		replace: 'https://{{ control_plane_public_ip }}:6443'

	logger.info("Destruction cancelled")
	logger.warning("Destruction cancelled")

-                ver = subprocess.run(
-                    [path, "--version"], capture_output=True, text=True, check=True
-                ).stdout.strip()
-                minor = int(ver.split(".")[1])
-                if minor >= 11:
-                    return candidate, ver
+                proc = subprocess.run(
+                    [path, "--version"], capture_output=True, text=True, check=True
+                )
+                # Some Python versions print to stderr instead of stdout
+                ver_output = (proc.stdout or proc.stderr or "").strip()
+                match = re.search(r"Python\s+(\d+)\.(\d+)\.(\d+)", ver_output)
+                if not match:
+                    continue
+                major = int(match.group(1))
+                minor = int(match.group(2))
+                # Accept Python 3.11+ and any higher major versions
+                if major > 3 or (major == 3 and minor >= 11):
+                    return candidate, ver_output

-        if cluster_env:
-            context = f"kind-{cluster_env}"
-        elif k8s_host == "kind" or k8s_host == "localhost":
-            context = "kind-kind"
+        # Detect whether the default kind context ("kind-kind") exists in the current kubeconfig.
+        kind_context_exists = False
+        try:
+            contexts, _ = config.list_kube_config_contexts()
+            if contexts:
+                kind_context_exists = any(
+                    ctx.get("name") == "kind-kind" for ctx in contexts
+                )
+        except Exception:
+            # If kubeconfig can't be listed, fall back to using the default context.
+            kind_context_exists = False
+        if cluster_env:
+            context = f"kind-{cluster_env}"
+        elif k8s_host == "kind" or k8s_host == "localhost":
+            # Only use the kind context if it actually exists; otherwise, use the default context.
+            context = "kind-kind" if kind_context_exists else None

		description = "Source address prefix for NSG rules (SSH + K8s API). Use '*' for open access, a CIDR like '203.0.113.0/24', or an Azure service tag like 'CorpNetPublic'."
		default = "*"

	description = "Source address prefix for NSG rules (SSH + K8s API). Use '*' for open access, a CIDR like '203.0.113.0/24', or an Azure service tag like 'CorpNetPublic'."
	default = "*"
	description = "Required: source address prefix for NSG rules (SSH + K8s API). Use a restricted CIDR like '203.0.113.0/24' or an Azure service tag like 'CorpNetPublic'. Use '*' only as an explicit, temporary testing override."

		ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -i inventory.yml setup_common.yml
		ANSIBLE_HOST_KEY_CHECKING=False ansible-playbook -i inventory.yml remote_setup_controller_worker.yml

Comments

Conversation

gaganso commented Feb 18, 2026

Summary

Deployment automation (Terraform + Ansible)

Documentation changes

Other changes (not deployment-specific)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant