Skip to content

[ML] Add xpack.ml.model_graph_validation_enabled cluster setting#145157

Merged
edsavage merged 16 commits intoelastic:mainfrom
edsavage:feature/model-graph-validation-setting
Apr 9, 2026
Merged

[ML] Add xpack.ml.model_graph_validation_enabled cluster setting#145157
edsavage merged 16 commits intoelastic:mainfrom
edsavage:feature/model-graph-validation-setting

Conversation

@edsavage
Copy link
Copy Markdown
Contributor

@edsavage edsavage commented Mar 29, 2026

Summary

Adds a dynamic node-scope cluster setting to control TorchScript model graph validation in the pytorch_inference native process.

PUT _cluster/settings
{
  "persistent": {
    "xpack.ml.model_graph_validation_enabled": false
  }
}

When set to false, the pytorch_inference process is launched with --skipModelValidation, bypassing the operation allowlist check. Default is true (validation enabled).

This provides an operator-accessible escape hatch for all deployment types (self-managed, Cloud, serverless) via the cluster settings API.

Changes

  • MachineLearningField.java — defines xpack.ml.model_graph_validation_enabled (dynamic, node-scope)
  • MachineLearning.java — registers the setting
  • NativePyTorchProcessFactory.java — reads the setting, subscribes to dynamic updates, passes to builder
  • PyTorchBuilder.java — adds --skipModelValidation to the command when validation is disabled
  • PyTorchBuilderTests.java — updated existing tests + new testBuildWithValidationDisabled

Cross-repo dependency

  • ml-cpp: elastic/ml-cpp#3013 adds the --skipModelValidation CLI flag to the pytorch_inference binary

Test plan

  • PyTorchBuilderTests pass
  • Integration: set xpack.ml.model_graph_validation_enabled: false, deploy a model, verify WARN log "Model graph validation SKIPPED"
  • Integration: set back to true, deploy a model, verify validation runs normally

Depends on: elastic/ml-cpp#3013

Made with Cursor

Adds a dynamic node-scope setting to control TorchScript model graph
validation. When set to false, the pytorch_inference process is
launched with --skipModelValidation, bypassing the operation
allowlist/forbidden list check.

This provides an operator-accessible escape hatch for all deployment
types (self-managed, Cloud, serverless) via the cluster settings API,
without requiring infrastructure access or a rebuild.

The setting is dynamic — changes take effect on the next model
deployment without restarting the node.

Companion to elastic/ml-cpp#3013 which adds the --skipModelValidation
CLI flag to the pytorch_inference binary.

Made-with: Cursor
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an operator-configurable ML cluster setting to control TorchScript model graph validation for the pytorch_inference native process, allowing operators to bypass the operation allowlist check by launching the process with --skipModelValidation.

Changes:

  • Introduces xpack.ml.model_graph_validation_enabled as a dynamic node-scope cluster setting (default true).
  • Wires the setting into NativePyTorchProcessFactory and passes it into PyTorchBuilder.
  • Extends PyTorchBuilder to append --skipModelValidation when validation is disabled, with corresponding unit test updates.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/MachineLearningField.java Defines the new cluster setting and documents its intent/security implications.
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MachineLearning.java Registers the new setting with the ML plugin.
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/pytorch/process/NativePyTorchProcessFactory.java Reads the setting and applies it to new pytorch_inference process launches; subscribes to dynamic updates.
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/pytorch/process/PyTorchBuilder.java Adds --skipModelValidation flag emission when validation is disabled.
x-pack/plugin/ml/src/test/java/org/elasticsearch/xpack/ml/inference/pytorch/process/PyTorchBuilderTests.java Updates existing expectations and adds coverage for the disabled-validation command-line.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ml/MachineLearningField.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@edsavage edsavage added >non-issue :ml Machine learning labels Mar 30, 2026
Register xpack.ml.model_graph_validation_enabled on MachineLearning (single
registration), use it in NativePyTorchProcessFactory, document the setting,
add ModelGraphValidationEnabledIT, and adjust PyTorchModelRestTestCase.

Resolves merge with duplicate MachineLearningField registration.

Made-with: Cursor
Remove duplicate Setting from MachineLearning; register and consume
MachineLearningField.MODEL_GRAPH_VALIDATION_ENABLED only.

Made-with: Cursor
@edsavage edsavage marked this pull request as ready for review April 1, 2026 21:10
@edsavage edsavage requested a review from a team as a code owner April 1, 2026 21:10
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Apr 1, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/ml-core (Team:ML)

Comment thread docs/reference/elasticsearch/configuration-reference/machine-learning-settings.md Outdated
@edsavage edsavage requested a review from valeriy42 April 7, 2026 22:12
Copy link
Copy Markdown
Contributor

@valeriy42 valeriy42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. Just a few comments. It would be great to have it in 9.4 alongside the main change, but I am not sure whether this is still possible.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, you don't need this test. It essentially tests Elasticsearch settings infrastructure itself.


$$$xpack.ml.model_graph_validation_enabled$$$

`xpack.ml.model_graph_validation_enabled` {applies_to}`stack: ga 9.4`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yesterday was FF for 9.4. Is it still possible to add it?
  2. Maybe calling it xpack.ml.trained_models.graph_validation_enabled would be more along the usual naming conventions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may have just missed the boat for 9.4.0, but it will be there for the 9.4.1 patch release.

edsavage added 3 commits April 9, 2026 09:42
Rename xpack.ml.model_graph_validation_enabled to
xpack.ml.trained_models.graph_validation_enabled per naming
conventions. Remove ModelGraphValidationEnabledIT as it only
tests the settings infrastructure.

Made-with: Cursor
@edsavage edsavage merged commit 8d1bbc3 into elastic:main Apr 9, 2026
35 of 36 checks passed
lukewhiting pushed a commit to lukewhiting/elasticsearch that referenced this pull request Apr 14, 2026
…stic#145157)

Adds a dynamic node-scope setting to control TorchScript model graph
validation. When set to false, the pytorch_inference process is
launched with --skipModelValidation, bypassing the operation
allowlist/forbidden list check.

This provides an operator-accessible escape hatch for all deployment
types (self-managed, Cloud, serverless) via the cluster settings API,
without requiring infrastructure access or a rebuild.

The setting is dynamic - changes take effect on the next model
deployment without restarting the node.

Companion to elastic/ml-cpp#3013 which adds the --skipModelValidation
CLI flag to the pytorch_inference binary.

Made-with: Cursor

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:ml Machine learning >non-issue Team:ML Meta label for the ML team v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants