Skip to content

feat: add API and webhook validation for GCS active-passive-head#4910

Open
gangli113 wants to merge 1 commit into
ray-project:masterfrom
gangli113:feat/active-passive-head
Open

feat: add API and webhook validation for GCS active-passive-head#4910
gangli113 wants to merge 1 commit into
ray-project:masterfrom
gangli113:feat/active-passive-head

Conversation

@gangli113

Copy link
Copy Markdown

Why are these changes needed?

This PR implements "Customer-Facing API Changes" part of GCS active-passive-head feature

Related issue number

ray-project/ray#63643

Checks

  • I've made sure the tests are passing.
  • Testing Strategy
    • Unit tests
    • Manual tests
    • This PR is not tested :(

@gangli113 gangli113 marked this pull request as draft June 11, 2026 23:49
Comment thread ray-operator/pkg/webhooks/v1/raycluster_webhook.go
Comment thread ray-operator/pkg/webhooks/v1/raycluster_webhook.go

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a638b94649

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread ray-operator/config/crd/bases/ray.io_rayjobs.yaml
Comment thread ray-operator/apis/ray/v1/raycluster_types.go
@gangli113 gangli113 force-pushed the feat/active-passive-head branch from a638b94 to 2f43881 Compare June 12, 2026 00:04
@gangli113 gangli113 marked this pull request as ready for review June 12, 2026 00:45

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2f43881caf

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

// EnableActivePassiveHead enables active-passive high availability for the GCS.
// If enabled, KubeRay will provision a standby head node to ensure quick recovery.
// +kubebuilder:default:=false
EnableActivePassiveHead *bool `json:"enableActivePassiveHead,omitempty"`

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Regenerate the Helm-packaged CRDs

Adding these API fields only to config/crd/bases leaves Helm installs with the old schemas: I checked helm-chart/kuberay-operator/crds/ray.io_ray{clusters,jobs,services,cronjobs}.yaml in this commit and none contains enableActivePassiveHead or the leader-election fields. Users who install/upgrade KubeRay via the published Helm chart will have these fields rejected under strict validation or pruned before the webhook/controller sees them, so the new active-passive configuration cannot be used in that deployment path.

Useful? React with 👍 / 👎.

@gangli113 gangli113 force-pushed the feat/active-passive-head branch from 2f43881 to 32767e2 Compare June 12, 2026 17:41

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 32767e2. Configure here.


if err := w.validateGcsFTOptions(rayCluster); err != nil {
allErrs = append(allErrs, err)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Active-passive rules webhook-only

Medium Severity

Active-passive GCS checks live only on the RayCluster admission webhook, not in ValidateRayClusterSpec, which RayJob and RayService reconcilers use via ValidateRayJobSpec / ValidateRayServiceSpec. Invalid embedded gcsFaultToleranceOptions can pass controller validation and leave RayJobs requeueing on failed RayCluster creates instead of ValidationFailed.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 32767e2. Configure here.

… Head HA

Signed-off-by: Gang Li <ganglica@google.com>
@gangli113 gangli113 force-pushed the feat/active-passive-head branch from 32767e2 to 4788d90 Compare June 12, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant