Skip to content

docs(container-gateway): fix Docker driver setup for containerized gateway#1419

Open
ericcurtin wants to merge 1 commit into
NVIDIA:mainfrom
ericcurtin:docs-container-gateway-docker-driver/ec
Open

docs(container-gateway): fix Docker driver setup for containerized gateway#1419
ericcurtin wants to merge 1 commit into
NVIDIA:mainfrom
ericcurtin:docs-container-gateway-docker-driver/ec

Conversation

@ericcurtin
Copy link
Copy Markdown
Contributor

@ericcurtin ericcurtin commented May 17, 2026

Summary

The container-gateway docs were missing or misstating several requirements for running the gateway as a Docker container with the Docker compute driver. Validated by deploying on a Fedora Kinoite (bootc) system.

Related Issue

N/A — discovered during hands-on deployment on a bootc system.

Changes

  • Add OPENSHELL_GRPC_ENDPOINT to all Docker driver examples (required; gateway refuses to start without it)
  • Add supervisor binary extraction step — the binary must exist on the host filesystem at the same path mounted into the gateway container, because the host Docker daemon uses that path as a bind-mount source when creating sandbox containers
  • Keep port binding as 127.0.0.1:8080 — the Docker driver automatically binds the gateway to the bridge network interface via gateway_bind_addresses(), so exposing on 0.0.0.0 is unnecessary
  • Add group_add: [docker] to the compose service — the gateway image runs as nvs:nvs (UID 1000) which needs the docker group to access the Docker socket
  • Add remote gateway registration instructions (--remote flag for LAN access)
  • Add --server-san host.openshell.internal to generate-certs in the mTLS section — sandbox containers resolve host.openshell.internal to reach the gateway, so this SAN must be present in the server cert
  • Complete the mTLS docker run with the missing docker driver requirements (--group-add docker, supervisor binary mount, OPENSHELL_GRPC_ENDPOINT, OPENSHELL_DOCKER_SUPERVISOR_BIN)

Testing

  • mise run pre-commit passes (markdownlint clean; python:proto failure is pre-existing env issue unrelated to this change)
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 17, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Comment thread docs/about/container-gateway.mdx Outdated
Comment thread docs/about/container-gateway.mdx Outdated
@ericcurtin ericcurtin force-pushed the docs-container-gateway-docker-driver/ec branch from eafe4f7 to c1ff3e7 Compare May 18, 2026 10:36
…teway

The existing docs omitted or misstated several requirements when running
the gateway as a container with the Docker compute driver:

1. OPENSHELL_GRPC_ENDPOINT is required. The Docker driver rejects
   startup if this env var is missing, but it was not mentioned.

2. The supervisor binary must be extracted to a host path before
   starting the gateway. The gateway validates the path at startup
   from inside the container, and the host Docker daemon uses the
   same path as a bind-mount source when creating sandbox containers.
   Extracting to a path inside the gateway container alone is
   insufficient.

3. Docker socket access requires adding the docker group. The gateway
   image runs as nvs:nvs (UID 1000) which does not have access to the
   Docker socket by default.

4. Port binding should remain 127.0.0.1. The Docker driver
   automatically binds the gateway to the bridge network interface
   (gateway_bind_addresses in the driver) so sandbox containers can
   reach it without exposing the port on 0.0.0.0.

5. The mTLS setup section was missing --server-san host.openshell.internal
   on generate-certs. Sandbox containers resolve host.openshell.internal
   to reach the gateway, so this SAN must be present in the server cert.
   The mTLS docker run was also missing --group-add docker, the supervisor
   binary mount, OPENSHELL_GRPC_ENDPOINT, and OPENSHELL_DOCKER_SUPERVISOR_BIN.

Validated by deploying OpenShell on a Fedora Kinoite (bootc) system
using the updated compose.yml.
@ericcurtin ericcurtin force-pushed the docs-container-gateway-docker-driver/ec branch from c1ff3e7 to 05176ab Compare May 18, 2026 11:42
Comment on lines +36 to +37
bind_address = "0.0.0.0:8080"
health_bind_address = "0.0.0.0:8081"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In another PR there was a comment about not using 0.0.0.0? Should this be changed here? Is something like :8080 a better format to use?


services:
gateway:
image: ghcr.io/nvidia/openshell/gateway:latest
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Does the image support envvar replacement so that one could use ${IMAGE_TAG:-latests}?


volumes:
# Docker socket — lets the gateway create and manage sandbox containers.
- /var/run/docker.sock:/var/run/docker.sock
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that when running dind containers, for example, one needs to run the "outer" container as privileged. Is this not the case here? What permissions are required to communicate over the socket?

Comment on lines +87 to +88
source: /var/lib/openshell
target: /var/lib/openshell
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be gateway-specific?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants