Skip to content

[CLI] no in-place sandbox restart or reconcile command; operators must kubectl delete pod after every CR edit #2041

@davidglogan

Description

@davidglogan

nemoclaw <name> subcommands cover connect, status, logs, snapshot create/restore/list, rebuild, destroy, skill install, policy-add/list/remove. None of them reconcile the current CR into the running pod.

rebuild is the closest, but it is documented as "Upgrade sandbox to current agent version": heavyweight, touches the image, not intended for "the pod spec just changed, please roll the pod." destroy removes the sandbox, too destructive for this purpose. snapshot restore rebuilds from a snapshot, not from the live CR.

Result: when the Sandbox CR spec changes (for example, after a hostAliases patch that has no CLI surface; see companion issue), the operator's only option is:

docker exec openshell-cluster-nemoclaw kubectl -n openshell delete pod my-assistant
# wait for the operator to recreate the pod from the updated CR

Environment

  • NemoClaw v0.0.17, OpenShell v0.0.26
  • Sandbox: my-assistant on DietPi2 (x86_64, 64GB, MicroK8s)
  • Context: added a new hostAliases entry to the CR; needed to force the operator to re-roll the pod

Expected behaviour

nemoclaw my-assistant restart [--wait]
# or
nemoclaw my-assistant reconcile [--wait]

Delete and recreate the pod (preserving the PVC / memory), wait for Ready, exit 0 once the new pod is serving. Keep snapshots as the heavy-surface backup / rollback primitive.

Actual behaviour

No such command. Operators who need an in-place pod refresh must kubectl delete pod inside the k3s container, which (a) the docs discourage and (b) requires knowing the cluster name and the namespace layout.

Impact

  • CR-edit workflows end with a kubectl detour on the docker-exec path.
  • Scripting a "hot-reload policy + DNS" workflow has no supported entrypoint.
  • Diagnostic workflows (restart the pod, watch the fresh boot logs) currently need multi-step manual steps.

Suggested fix

Add nemoclaw <name> restart that:

  1. Deletes the pod via the operator's authority, not direct kubectl.
  2. Waits for the operator to recreate it.
  3. Waits for the new pod to reach Ready.
  4. Optionally prints the boot log tail.
  5. Exits non-zero if anything times out.

Accept --wait-timeout <seconds> with a sensible default (300s).

Companion issues

Third in a cluster of three CLI-gap filings:

All three describe the same pattern: the supported CLI covers the common cases, but custom-infrastructure edits drop below the supported surface. #2039 covers the egress-policy axis, #2040 covers the DNS axis, and this issue covers the reconcile axis. A workflow that edits the Sandbox CR (for example to add a hostAliases entry via #2040's workaround) typically needs #2039's policy-edit and this issue's reconcile-pod step to complete end-to-end.

Notes

Filed on 2026-04-17 alongside #2039 and #2040.

Metadata

Metadata

Assignees

No one assigned

    Labels

    NemoClaw CLIUse this label to identify issues with the NemoClaw command-line interface (CLI).enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions