[Relax][TensorRT] Update TensorRT runtime to 10#19789
Conversation
TensorRT 10 removed a large set of APIs that the Relax TensorRT BYOC integration relied on, so it failed to compile against TRT >= 10 (apache#19609). Port the runtime and codegen to the TRT10 API and require TensorRT >= 10: - Lifetime: obj->destroy() -> delete (destroy() removed in TRT10). - Builder: drop implicit-batch mode (networks are always explicit-batch via createNetworkV2(0); setMaxBatchSize removed); setMaxWorkspaceSize -> setMemoryPoolLimit(kWORKSPACE); buildEngineWithConfig -> buildSerializedNetwork + deserializeCudaEngine, keeping the IRuntime alive alongside the engine. - Execution: the binding-index model (getNbBindings / getBindingIndex / setBindingDimensions / execute / executeV2) -> the named-tensor model (getNbIOTensors / setInputShape / setTensorAddress / enqueueV3); deserializeCudaEngine drops the trailing IPluginFactory* argument. - Layers: addConvolution / addPooling / addDeconvolution / addPadding -> the *Nd variants; set{Stride,Dilation} -> *Nd; IFullyConnectedLayer / addFullyConnected removed -> dense rebuilt with addConstant + addMatrixMultiply. - Add a build-time guard that emits a clear error on TensorRT < 10. Also fix pre-existing issues that prevented this path from running end-to-end: the runtime had drifted from the current tvm-ffi API (TVMTensorCopyToBytes / TVMGetLastError, VectorToTrtDims over ffi::Array, a stale `override` on the destructor), and the conv converters read a Relay-era "channels" attribute that Relax does not emit (output channels are now derived from the kernel shape). Correctness fixes from an old-vs-new parity review, plus tests: - Conv1D assumed an implicit batch dimension and dropped the spatial dimension under explicit batch; the reshape now derives from the full input rank. - INT8 calibration: the per-input element count no longer includes the batch dimension (the calibrator multiplies by batch size itself), which previously over-read the input, and the calibrator's device buffers are now sized for a full batch instead of a single sample, which previously over-wrote memory. Both crashed INT8 calibration for batch > 1. - Single-engine reuse now requires an exact batch match, since an explicit-batch engine's optimization profile pins the built batch size. - TRT_HAS_IMPLICIT_BATCH is unconditionally false and no longer calls the deprecated hasImplicitBatchDimension(). - Run on TVM's current CUDA stream instead of the default stream. - Warn instead of silently ignoring use_implicit_batch=True, and default it to False in the codegen config. - Null-check the engine build/deserialize paths and free the runtime on failure. - conv2d_transpose / conv3d_transpose now use the IOHW / IODHW kernel layout (Relax's default, which also matches TensorRT's deconvolution weight layout) instead of the Relay-era OIHW assumption, so the weight is passed through directly and the output channel count comes from the second kernel dimension. - Remove dead pre-5.1.5 padding blocks and unused builder members. - Add offload tests for conv1d, max_pool2d, avg_pool2d, softmax, sigmoid, tanh, conv2d_transpose, conv3d_transpose, and INT8 calibration. Verified: builds against TensorRT 10.16 with CUDA 12.8, and the added tests pass on both an RTX 2070 (Turing) and an RTX 5090 (Blackwell). Fixes apache#19609
There was a problem hiding this comment.
Code Review
This pull request updates the TVM TensorRT integration to target the TensorRT 10 API, which removes implicit-batch mode, binding indices, and several deprecated layer creation APIs (such as addFullyConnected). The changes transition the codebase to explicit-batch mode, update the operator converters to use the new Nd layer APIs, and manage the deserialization runtime lifetime alongside the engine. The review feedback highlights several critical safety improvements, specifically recommending null-pointer checks for createInferRuntime, addConstant, and addShuffle calls, guarding against integer overflow when handling dynamic dimensions during INT8 calibration, and preventing out-of-bounds access when resolving the device ID from input_var_eid_.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
This pr fixes #19609. TensorRT 10 removed a large set of APIs that the Relax TensorRT BYOC integration relied on, so it failed to compile against TRT >= 10. Port the runtime and codegen to the TRT10 API and require TensorRT >= 10: - Lifetime: obj->destroy() -> delete (destroy() removed in TRT10). - Builder: drop implicit-batch mode (networks are always explicit-batch via createNetworkV2(0); setMaxBatchSize removed); setMaxWorkspaceSize -> setMemoryPoolLimit(kWORKSPACE); buildEngineWithConfig -> buildSerializedNetwork + deserializeCudaEngine, keeping the IRuntime alive alongside the engine. - Execution: the binding-index model (getNbBindings / getBindingIndex / setBindingDimensions / execute / executeV2) -> the named-tensor model (getNbIOTensors / setInputShape / setTensorAddress / enqueueV3); deserializeCudaEngine drops the trailing IPluginFactory* argument. - Layers: addConvolution / addPooling / addDeconvolution / addPadding -> the *Nd variants; set{Stride,Dilation} -> *Nd; IFullyConnectedLayer / addFullyConnected removed -> dense rebuilt with addConstant + addMatrixMultiply. - Add a build-time guard that emits a clear error on TensorRT < 10. Also fix pre-existing issues that prevented this path from running end-to-end: the runtime had drifted from the current tvm-ffi API (TVMTensorCopyToBytes / TVMGetLastError, VectorToTrtDims over ffi::Array, a stale `override` on the destructor), and the conv converters read a Relay-era "channels" attribute that Relax does not emit (output channels are now derived from the kernel shape). All tests are verified correct locally. This pr barely includes api updates and there is no new parts added
…#19810) This pr is the follow-up pr to #19789. CurrentTensorRT BYOC converters were ported from Relay and still read attribute names/shapes that no longer match the Relax ops, so most ops crashed ("Key: <name> is not found") or produced wrong results when offloaded. This pr changed - Converters (tensorrt_ops.cc): port reduce, matmul, expand_dims, layer_norm, clip, reshape, strided_slice, split and layout_transform to read Relax's attributes/arguments. Notable shape changes: clip min/max are PrimValue arguments (not a_min/a_max attrs), reshape's shape is a Shape argument, matmul has no transpose flags, split is multi-output with no "mode", and layout_transform is an IndexMap rather than src/dst_layout strings. Unsupported cases (non-static reshape, non-permutation layout_transform) now raise a clear error instead of crashing. - Codegen (codegen.cc): serialize an op's non-tensor arguments (PrimValue / ShapeExpr / tuple) as "arg_"-prefixed node attributes, materialize a reduce op's all-axes default, and translate a pure-permutation layout_transform IndexMap into a transpose order. - Runtime: disable the TF32 builder flag so offloaded FP32 subgraphs match TVM's FP32 reference, and use a process-lifetime TensorRT logger (a per-runtime logger was left dangling once its runtime was destroyed, corrupting the heap during TensorRT teardown). All tests are validated locally.
This pr fixes #19609. TensorRT 10 removed a large set of APIs that the Relax TensorRT BYOC integration relied on, so it failed to compile against TRT >= 10. Port the runtime and codegen to the TRT10 API and require TensorRT >= 10:
the *Nd variants; set{Stride,Dilation} -> *Nd; IFullyConnectedLayer / addFullyConnected removed -> dense rebuilt with addConstant + addMatrixMultiply.
Also fix pre-existing issues that prevented this path from running end-to-end: the runtime had drifted from the current tvm-ffi API (TVMTensorCopyToBytes / TVMGetLastError, VectorToTrtDims over ffi::Array, a stale
overrideon the destructor), and the conv converters read a Relay-era "channels" attribute that Relax does not emit (output channels are now derived from the kernel shape).All tests are verified correct locally. This pr barely includes api updates and there is no new parts added