-
Notifications
You must be signed in to change notification settings - Fork 3.3k
improvement(helm): update GPU device plugin and add cert-manager issuers #3036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile OverviewGreptile SummaryThis PR improves GPU support and certificate management infrastructure for the Sim Helm chart. Key Changes:
Improvements:
Confidence Score: 4.5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Helm as Helm Install
participant K8s as Kubernetes API
participant CM as cert-manager
participant GPU as GPU Device Plugin
participant Node as GPU Node
Note over Helm,Node: cert-manager Issuer Bootstrap (if certManager.enabled=true)
Helm->>K8s: Create SelfSigned ClusterIssuer
K8s->>CM: Register bootstrap issuer
Helm->>K8s: Create Root CA Certificate
K8s->>CM: Request certificate from bootstrap issuer
CM->>CM: Generate self-signed root CA
CM->>K8s: Store CA cert in secret (cert-manager namespace)
Helm->>K8s: Create CA ClusterIssuer
K8s->>CM: Register CA issuer (references root CA secret)
Note over CM: CA issuer auto-reconciles when secret ready
Note over Helm,Node: GPU Device Plugin Setup (if ollama.gpu.enabled=true)
Helm->>K8s: Create ConfigMap with GPU strategy config
Note over K8s: Config includes MIG or time-slicing settings
Helm->>K8s: Deploy DaemonSet (v0.18.2)
K8s->>Node: Schedule pod on nodes with accelerator=nvidia
Node->>GPU: Mount ConfigMap at /etc/device-plugin/
GPU->>GPU: Parse config.yaml (MIG or time-slicing)
GPU->>Node: Register GPU resources with kubelet
Note over GPU,Node: GPUs now available as nvidia.com/gpu
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
|
@greptile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
|
@greptile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| failOnInitError: false | ||
| plugin: | ||
| passDeviceSpecs: true | ||
| deviceListStrategy: envvar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid config structure for NVIDIA device plugin settings
Medium Severity
The ConfigMap places passDeviceSpecs and deviceListStrategy under a plugin: section, but the NVIDIA k8s-device-plugin config schema expects these settings under the flags: section. The original code passed these as CLI arguments (--pass-device-specs=true, --device-list-strategy=envvar), which map to the flags: section in config file format. With the current structure, the device plugin may ignore these settings and use default values instead, potentially causing GPU device passthrough and enumeration issues.
Summary
Type of Change
Testing
Tested with
helm lintandhelm template- all templates render correctlyChecklist