Expected behavior
TVM should output right results.
Actual behavior
For the following model,
onnxruntime and onnx's ReferenceEvaluator produce the following results:
onnxruntime: [[[[ 0.52760196 -0.04696967 0.13909698 0.33770403]
[ 0.00713499 -0.0047839 0.07727996 0.09848484]
[ 1.1356945 1.2606536 1.0541786 0.07991865]
[ 1.7707846 -0.1069039 0.5416299 1.1630629 ]]
[[ 1.5288247 1.5974303 0.04450445 1.2441877 ]
[ 0.37789103 0.20678943 0.2639845 0.46727613]
[ 1.0393754 2.0902128 0.22515067 1.8636966 ]
[ 1.2390026 -0.03022202 0.1429838 2.5852468 ]]
[[ 1.0609826 0.19212584 0.23427449 1.3817313 ]
[ 0.2130472 0.12426434 0.18794645 1.7725699 ]
[ 0.38522267 0.55802476 0.48586282 0.12431115]
[ 1.6056815 -0.088125 0.46956664 0.5826947 ]]
[[ 0.4485376 3.0486135 0.2851691 1.221788 ]
[ 0.12897041 0.56625 0.20755884 0.8285841 ]
[ 0.7572699 -0.03610509 0.8448761 1.3712262 ]
[ 0.9805093 0.9206943 1.141221 2.1911495 ]]]]
ReferenceEvaluator [[[[ 0.52760196 -0.04696967 0.13909698 0.33770403]
[ 0.00713499 -0.0047839 0.07727996 0.09848484]
[ 1.1356945 1.2606536 1.0541786 0.07991865]
[ 1.7707846 -0.1069039 0.5416299 1.1630629 ]]
[[ 1.5288247 1.5974303 0.04450445 1.2441877 ]
[ 0.37789103 0.20678943 0.2639845 0.46727613]
[ 1.0393754 2.0902128 0.22515067 1.8636966 ]
[ 1.2390026 -0.03022202 0.1429838 2.5852468 ]]
[[ 1.0609826 0.19212584 0.23427449 1.3817313 ]
[ 0.2130472 0.12426434 0.18794645 1.7725699 ]
[ 0.38522267 0.55802476 0.48586282 0.12431115]
[ 1.6056815 -0.088125 0.46956664 0.5826947 ]]
[[ 0.4485376 3.0486135 0.2851691 1.221788 ]
[ 0.12897041 0.56625 0.20755884 0.8285841 ]
[ 0.7572699 -0.03610509 0.8448761 1.3712262 ]
[ 0.9805093 0.9206943 1.141221 2.1911495 ]]]]
However, TVM outputs different results as follows:
TVM: [[[[0.52760196 0.50379753 0.13909698 0.23304316]
[0.00713499 0.05131219 0.07727996 0.09848484]
[1.1356945 1.2606536 1.8464966 0.05515035]
[1.7707846 1.1466533 0.94871765 1.1630629 ]]
[[1.5288247 1.5974303 0.07795388 1.2441877 ]
[0.37789103 0.20678943 0.2639845 0.32245842]
[1.0393754 2.0902128 0.3943734 1.8636966 ]
[1.2390026 0.3241619 0.25045007 1.78403 ]]
[[1.0609826 0.19212584 0.4103546 1.3817313 ]
[0.2130472 0.12426434 0.18794645 1.223217 ]
[0.38522267 0.55802476 0.85103613 0.12431115]
[1.6056815 0.94523036 0.82249177 0.5826947 ]]
[[0.4485376 3.0486135 0.2851691 1.221788 ]
[0.12897041 0.56625 0.20755884 0.8285841 ]
[0.7572699 0.3872638 0.8448761 1.3712262 ]
[0.9805093 0.9206943 1.141221 1.5120709 ]]]]
21.9% elements (14 / 64) are mismatched.
Mismatched elements: 14 / 64 (21.9%)
Max absolute difference among violations: 1.2535572
Max relative difference among violations: 11.726019
ACTUAL: array([[[[0.527602, 0.503798, 0.139097, 0.233043],
[0.007135, 0.051312, 0.07728 , 0.098485],
[1.135695, 1.260654, 1.846497, 0.05515 ],...
DESIRED: array([[[[ 0.527602, -0.04697 , 0.139097, 0.337704],
[ 0.007135, -0.004784, 0.07728 , 0.098485],
[ 1.135695, 1.260654, 1.054179, 0.079919],...
Environment
OS: Ubuntu 20.04
TVM: 0.23.dev0 (f4e28d3)
onnxruntime: 1.23.2
Steps to reproduce
This bug can be reproduced by the following code with the model in the attachment.
import numpy as np
import onnx
from onnx.reference import ReferenceEvaluator
import onnxruntime
import tvm
import tvm.testing
from tvm import relax
from tvm.relax.frontend.onnx import from_onnx
import pickle
def test() -> None:
onnx_model = onnx.load("11.onnx")
# Configure model format.
onnx_model.ir_version = 8
onnx_model.opset_import[0].version = 14
with open("inputs.pkl", 'rb') as fp:
inputs = pickle.load(fp)
# onnxruntime.
try:
ort_session = onnxruntime.InferenceSession(
onnx_model.SerializeToString(), providers=["CPUExecutionProvider"]
)
ort_output = ort_session.run([], inputs)
except Exception as e:
print(e)
print("This model cannot be executed by onnxruntime!")
sys.exit(1)
print(ort_output[0])
# ReferenceEvaluator
sess = ReferenceEvaluator("11.onnx")
re_output = sess.run(None, inputs)
print(re_output[0])
tvm.testing.assert_allclose(re_output[0], ort_output[0], rtol=0.1, atol=0.1)
# TVM
tvm_model = from_onnx(onnx_model, opset=14, keep_params_in_input=True)
tvm_model = relax.transform.DecomposeOpsForInference()(tvm_model)
tvm_model = relax.transform.LegalizeOps()(tvm_model)
# Separate model from parameters.
tvm_model, params = relax.frontend.detach_params(tvm_model)
# Compile the relax graph into a VM then run.
with tvm.transform.PassContext(opt_level=3):
ex = tvm.compile(tvm_model, target="llvm")
vm = relax.VirtualMachine(ex, tvm.cpu())
# Prepare inputs.
input_list = [
inputs[key.name_hint] for key in tvm_model["main"].params if key.name_hint in inputs
]
if params:
input_list += params["main"]
# Run model and check outputs.
vm.set_input("main", *input_list)
vm.invoke_stateful("main")
tvm_output = vm.get_outputs("main")
print(tvm_output)
tvm.testing.assert_allclose(tvm_output.numpy(), ort_output[0], rtol=0.1, atol=0.1)
if __name__ == "__main__":
test()
testcase.zip
Triage
Please refer to the list of label tags here to find the relevant tags and add them below in a bullet format (example below).
Expected behavior
TVM should output right results.
Actual behavior
For the following model,
onnxruntime and onnx's ReferenceEvaluator produce the following results:
However, TVM outputs different results as follows:
21.9% elements (14 / 64) are mismatched.
Environment
OS: Ubuntu 20.04
TVM: 0.23.dev0 (f4e28d3)
onnxruntime: 1.23.2
Steps to reproduce
This bug can be reproduced by the following code with the model in the attachment.
testcase.zip
Triage
Please refer to the list of label tags here to find the relevant tags and add them below in a bullet format (example below).