feat(gpu): derive sandbox access requirements from CDI specs

## Problem Statement

Parent: #1444

OpenShell GPU sandbox setup currently relies on a static filesystem baseline plus runtime enumeration of `/dev/nvidia[0-9]` device nodes. That covers the common native Linux CUDA path, but it does not derive the inner sandbox requirements from the selected CDI device spec.

That leaves non-standard CDI layouts fragile. The container runtime may inject the right device nodes, mounts, libraries, and group metadata into the outer sandbox container, while the OpenShell supervisor still denies or misconfigures the inner workload because those CDI-derived requirements are absent from the Landlock policy or process credentials.

Motivating examples:

- WSL2 exposes NVIDIA GPUs through `/dev/dxg` and WSL-specific libraries. Prior work in #404 and closed PR #608 identified `/dev/dxg`, `/usr/lib/wsl`, and CDI mode as important for WSL GPU support.
- Tegra/Jetson systems can require additional platform device nodes, host-file mounts, and supplemental group handling. Closed PR #625 called out CDI `additionalGIDs`, GID 44 / `video`, and `/dev/nvmap`.
- Other NVIDIA platforms may expose CDI device nodes or library mount destinations that do not match the current static native Linux baseline.

This issue is related to #398, but narrower. #398 tracks using CDI for GPU injection. This issue tracks translating selected CDI metadata into the OpenShell supervisor's inner Landlock policy and process setup.

## Proposed Design

File this as a design/spike issue first. Do not mark it `state:agent-ready` until the runtime contract and implementation split are accepted.

Settled direction:

- Docker is the first build target.
- Docker-selected CDI device IDs come from the existing GPU request path.
- Docker CDI spec directories come from daemon-reported `Info.CDISpecDirs`.
- Docker delivers `cdi-context.json` through the container archive upload API after `create_container` and before `start_container`.
- The supervisor resolves selected opaque CDI IDs from supervisor-only CDI spec mounts and applies derived Landlock/process requirements before exec.
- CDI specs are the source of truth for device nodes, mount destinations, writable single-file mount candidates, and `additionalGIDs`.
- CUDA `/proc` write access is out of scope for the CDI resolver and remains a separate CUDA sandbox runtime compatibility requirement linked to #1522.
- Non-GPU sandboxes receive no CDI context, spec mounts, CDI-derived policy, or CDI warnings.
- Kubernetes, Podman, WSL2 hardware validation, and Tegra/Jetson hardware validation are follow-ups unless their dependencies are available during the first implementation.

The shared CDI context schema and resolver should live in `openshell-core::cdi` unless implementation uncovers a dependency reason to split it into a dedicated crate. Docker-specific tar upload and bind construction should remain in `openshell-driver-docker`; Landlock and process credential application should remain in `openshell-sandbox`.

The runtime-to-supervisor contract should be a small versioned supervisor-only CDI context. Initial shape:

```json
{
  "version": 1,
  "runtime": "docker",
  "selected_devices": ["nvidia.com/gpu=all"],
  "spec_dirs": [
    {
      "path": "/run/openshell/supervisor/cdi-specs/0",
      "source": "/etc/cdi"
    }
  ]
}
```

`source` is diagnostic only. The supervisor must use container-side `path` values for CDI spec discovery and must not treat host-side source paths as policy paths.

The supervisor-side resolver should:

- resolve opaque fully-qualified CDI device IDs to spec `kind` plus device `name`;
- apply spec-level `containerEdits` once for any selected device from the spec;
- apply device-level `containerEdits` for each selected device;
- extract `deviceNodes[].path` as container-side device paths;
- extract `mounts[].containerPath` as container-side mount destinations;
- extract `additionalGIDs` and apply supplemental groups before exec;
- ignore host paths for Landlock policy;
- ignore CDI hooks, environment variables, Intel RDT, and network devices for the first filesystem/process-credential implementation except for diagnostics.

CDI mount destinations should default to read-only. Read-write access may be allowed for single files only when an explicit policy opt-in is present. Read-write CDI directory mounts should be rejected in the first implementation.

Kubernetes should use the same supervisor-side resolution path only after a node-local provider can supply selected CDI device IDs and supervisor-only CDI spec mounts for the actual sandbox pod/container. The Kubernetes driver must not infer selected CDI IDs from `nvidia.com/gpu`, because that is a resource request, not an opaque CDI device ID.

## Alternatives Considered

- **Static path expansion:** rejected as the long-term design. It repeats the current failure mode for WSL2, Tegra/Jetson, and future CDI layouts.
- **Inspect injected container state:** useful for diagnostics, but rejected as the authorization source because it cannot reliably distinguish selected CDI paths from unrelated devices or mounts.
- **Driver-side pre-resolution:** rejected for Docker because it creates different local and remote daemon behavior. The supervisor should resolve selected CDI IDs from daemon-host CDI specs mounted into the sandbox.
- **Hard-coded Docker CDI spec dir defaults:** rejected for the initial implementation. Use `Info.CDISpecDirs`; if Docker reports no CDI spec dirs, keep rejecting GPU/CDI sandbox creation as unsupported.

## Test Strategy

Most behavior can be tested without physical GPUs by constructing test-only CDI specs and presenting them to the supervisor through the same runtime-to-supervisor contract used by Docker.

Minimum hardware-free fixture suite:

- `native-single`: one selected device with `/dev/nvidiactl`, `/dev/nvidia0`, and a read-only library mount.
- `native-all`: aggregate `nvidia.com/gpu=all` entry with multiple device nodes.
- `wsl-shape`: fake WSL-style entry with `/dev/dxg` and WSL library mount destinations.
- `tegra-shape`: fake Tegra/Jetson-style entry with `/dev/nvmap`, another platform device node, and `additionalGIDs`.
- `rw-file`: writable single-file mount, accepted only with explicit policy opt-in.
- `rw-directory`: writable directory mount, rejected.
- `missing-device`: selected CDI ID absent from all mounted specs.
- `malformed-spec`: invalid YAML or unsupported CDI shape.
- `unsafe-paths`: relative paths, paths containing `..`, `/`, `/dev`, `/proc`, `/sys`, `/run`, and `/usr` as broad derived policy paths.
- `duplicate-conflict`: duplicate paths that request incompatible access modes.

## Definition Of Done

- [ ] Document the Docker CDI metadata source and runtime-to-supervisor contract.
- [ ] Define how Docker mounts daemon-host CDI spec directories for local and remote daemon cases.
- [ ] Define how Docker uploads `cdi-context.json` without bind-mounting a gateway-local metadata file.
- [ ] Ensure Docker removes a created container if CDI context upload fails before `start_container`.
- [ ] Ensure CDI context, spec mounts, policy changes, and warnings apply only to GPU/CDI sandboxes.
- [ ] Define fail-closed diagnostics for missing selected IDs, missing specs, malformed specs, unsafe paths, writable directory requests, and custom-policy conflicts.
- [ ] Confirm the initial implementation does not add static GPU device-node or library fallbacks.
- [ ] Document that CUDA `/proc` write access is out of scope for the CDI resolver and remains linked to #1522.
- [ ] Define explicit policy opt-in behavior for CDI writable single-file mount destinations.
- [ ] Reject CDI-derived read-write directory mounts.
- [ ] Define Kubernetes selected-device handoff requirements or split that into a follow-up issue.
- [ ] Add resolver tests for representative CDI specs, including `additionalGIDs`.
- [ ] Add supervisor tests using test-only CDI specs and fake/non-standard CDI devices.
- [ ] Add sandbox policy enrichment tests proving CDI-derived paths are included only for GPU-requested sandboxes.
- [ ] Add tests for missing selected IDs, malformed specs, duplicate paths, writable mounts, and unsafe broad paths.
- [ ] Preserve or improve diagnostics for custom policies that conflict with CDI-required GPU paths.
- [ ] Create follow-up issues for Podman, Kubernetes selected-device discovery, WSL2 validation, and Tegra/Jetson validation as needed.

## Agent Investigation

Local design iteration reviewed the Docker GPU/CDI request path, the sandbox GPU baseline, Kubernetes GPU resource handling, related WSL/Tegra history, and the CDI runtime contract constraints.

Current relevant code paths:

- `crates/openshell-driver-docker/src/lib.rs`: Docker GPU support detection from `Info.CDISpecDirs`, CDI `DeviceRequest`, Docker create/start flow.
- `crates/openshell-sandbox/src/lib.rs`: current static GPU Landlock baseline and custom-policy conflict behavior.
- `crates/openshell-core/src/gpu.rs`: shared GPU-to-CDI device ID helper.
- `crates/openshell-driver-kubernetes/src/driver.rs`: Kubernetes `nvidia.com/gpu` resource request path.

Related work:

- #1444 GPU roadmap
- #398 CDI GPU injection migration
- #404 WSL2 GPU passthrough
- #608 closed WSL2 GPU support PR
- #568 Jetson platform compatibility
- #625 closed Tegra/Jetson GPU support PR
- #1486, #1524, #1522 immediate Docker CUDA filesystem policy fixes


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gpu): derive sandbox access requirements from CDI specs #1606

Problem Statement

Proposed Design

Alternatives Considered

Test Strategy

Definition Of Done

Agent Investigation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(gpu): derive sandbox access requirements from CDI specs #1606

Description

Problem Statement

Proposed Design

Alternatives Considered

Test Strategy

Definition Of Done

Agent Investigation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions