A hierarchical reinforcement learning-based testing framework for deep learning compilers.
In this study, we propose HiGen, a novel testing framework based on hierarchical reinforcement learning, to hierarchically configure neural network generators for detecting bugs in deep learning compilers.
HiGen integrates neural network generation, backend validation, differential testing, reward calculation, bug case collection, coverage measurement, crash deduplication, and triage utilities. It is designed to improve the effectiveness of deep learning compiler testing by guiding neural network generation toward bug-prone compiler behaviors.
- Overview
- Tested Environment
- External Dependencies
- Environment Setup
- Backend Installation
- Quick Start
- Project Structure
- Testing Modes
- Coverage Measurement
- Output Organization
- Main Arguments
- Environment Variables
- Bug Case Collection
- Crash Deduplication
- Bug Triage
- Version Control Notes
- Notes
HiGen follows a hierarchical testing workflow:
Dimension Selection -> Configuration Generation -> ONNX Model Generation
| | |
v v v
High-level Policy Low-level Policy NNSmith
| | |
+-------------------------+--------------------------+
|
v
Backend Execution and Differential Testing
|
v
Reward Feedback
The testing process consists of the following steps:
- The high-level policy selects promising configuration dimensions.
- The low-level policy generates concrete parameter configurations.
- NNSmith generates ONNX models according to the selected configuration.
- Multiple deep learning compiler/runtime backends execute the generated models.
- Execution results are compared to detect crashes, inconsistencies, and differential failures.
- Feedback signals are converted into rewards and used to guide subsequent exploration.
Supported backends:
| Backend | Usage |
|---|---|
| ONNX Runtime | Model validation and differential testing |
| TVM | Compiler testing and coverage measurement |
| OpenVINO | Compiler/runtime testing and coverage measurement |
| PyTorch | Oracle/reference backend in normal testing mode |
HiGen was mainly evaluated on the following Linux environment.
| Component | Version / Configuration |
|---|---|
| Operating System | Ubuntu 22.04.4 LTS |
| CPU | 13th Gen Intel(R) Core(TM) i9-13900K, 32 logical processors |
| GPU | NVIDIA GPU |
| Memory | 128.0 GiB |
| Python | Python 3.11 |
| Conda | Anaconda / Miniconda |
| TVM | 0.23.0dev, nightly/development build |
| OpenVINO | 2026.0.0.dev20251223, nightly/development build |
| ONNX Runtime | Installed through requirements.txt |
| NNSmith | Included in this repository under nnsmith-main/ |
| Coverage Tool | gcovr |
To record the exact environment on your machine, run:
lsb_release -a
uname -a
python --version
gcc --version
g++ --version
cmake --version
python -c "import onnx; print('onnx', onnx.__version__)"
python -c "import onnxruntime as ort; print('onnxruntime', ort.__version__)"
python -c "import tvm; print('tvm', tvm.__version__)"
python -c "import openvino as ov; print('openvino', ov.__version__)"
python -c "import gcovr; print('gcovr', gcovr.__version__)"If TVM or OpenVINO is built from source, we recommend recording the corresponding commit IDs:
git -C /path/to/tvm rev-parse HEAD
git -C /path/to/openvino rev-parse HEADThis repository includes the NNSmith source code used by HiGen for neural network generation. However, it does not include the source code or instrumented builds of TVM or OpenVINO. Users need to prepare TVM and OpenVINO separately according to their own system environment and selected testing mode.
| Component | Required For | Provided by This Repository |
|---|---|---|
| NNSmith | ONNX model generation | Yes |
| ONNX Runtime | ORT testing and differential testing | Installed by requirements.txt |
| TVM | TVM testing and TVM differential testing | No |
| OpenVINO | OpenVINO testing and OpenVINO differential testing | No |
| Instrumented TVM | TVM coverage measurement | No |
| Instrumented OpenVINO | OpenVINO coverage measurement | No |
| gcovr | Coverage collection | Installed by requirements.txt |
HiGen relies on NNSmith for ONNX model generation. The NNSmith source code used by HiGen is included in this repository under:
nnsmith-main/
The default configuration assumes that the NNSmith directory is located at:
HiGen/nnsmith-main/
If you move NNSmith to a different location, please modify the corresponding path configuration in:
higen/config.py
ONNX Runtime is required for the default testing mode and differential testing. It is installed through:
pip install -r requirements.txtVerify ONNX Runtime:
python -c "import onnxruntime as ort; print(ort.__version__)"TVM is required when running:
--compiler tvm
or when using TVM in differential testing:
--diff-backends "ort,ov,tvm"
If you only run ONNX Runtime testing, TVM is not required.
OpenVINO is required when running:
--compiler ov
or when using OpenVINO in differential testing:
--diff-backends "ort,ov,tvm"
If you only run ONNX Runtime testing, OpenVINO is not required.
git clone https://github.com/dutZ1855/HiGen.git
cd HiGenconda env create -f doc/environment.yml
conda activate higenpip install -r doc/requirements.txtpython -c "import onnx; print('onnx', onnx.__version__)"
python -c "import onnxruntime as ort; print('onnxruntime', ort.__version__)"
python -c "import numpy as np; print('numpy', np.__version__)"
python -c "import torch; print('torch', torch.__version__)"For TVM and OpenVINO testing, also verify:
python -c "import tvm; print('tvm', tvm.__version__); print(tvm.__file__)"
python -c "import openvino as ov; print('openvino', ov.__version__); print(ov.__file__)"Detailed TVM/OpenVINO installation instructions, including source builds and coverage-instrumented builds, are provided in:
doc/BACKEND_INSTALLATION.md
This separate guide covers:
- TVM installation from pip
- TVM source build
- TVM coverage-instrumented build
- OpenVINO installation from pip
- OpenVINO source build
- OpenVINO coverage-instrumented build
gcovr-based coverage export
For strict reproducibility, please record the exact TVM/OpenVINO versions or commit IDs used in your environment.
Run the following commands from the repository root.
python -m higen.main \
--big-epochs 20 \
--small-epochs 100 \
--compiler ortThis mode writes results to rl_runs_ort/ by default and compares:
ORT CPU vs ORT GPU vs PyTorch
python -m higen.main \
--big-epochs 20 \
--small-epochs 100 \
--compiler tvm \
--tvm-timeout 300This mode writes results to rl_runs_tvm/ by default and compares:
TVM CPU vs TVM GPU vs PyTorch
python -m higen.main \
--big-epochs 20 \
--small-epochs 100 \
--compiler ovThis mode writes results to rl_runs_ov/ by default and compares:
OpenVINO CPU vs OpenVINO GPU vs PyTorch
| File / Directory | Description |
|---|---|
higen/config.py |
Manages dimension pools, hyperparameters, paths, and global configurations. |
higen/reward.py |
Implements high-level and low-level reward functions, including vulnerability, validity, and diversity rewards. |
higen/utils/testing.py |
Encapsulates the workflow from parameter configuration to NNSmith model generation and backend validation. |
higen/utils/filter.py |
Provides crash-case deduplication based on normalized error signatures. |
higen/utils/triage_bug_cases.py |
Provides utilities for triaging and prioritizing reported bug cases. |
higen/env.py |
Implements CompilerFuzzEnv, exposes step_small_epoch and step_big_epoch, and maintains diversity statistics. |
higen/agents.py |
Provides PPO-based dimension selection and SAC-based configuration generation agents implemented with PyTorch. |
higen/main.py |
Main training and testing entry point. |
nnsmith-main/ |
NNSmith source code used for ONNX model generation. |
docs/BACKEND_INSTALLATION.md |
TVM/OpenVINO installation and coverage build guide. |
README.md |
Project documentation. |
HiGen provides two mutually exclusive testing modes.
Normal testing mode is used when --diff-backends is not specified.
| Compiler Option | Compared Backends | Output Directory |
|---|---|---|
--compiler ort |
ORT CPU, ORT GPU, PyTorch | rl_runs_ort/ |
--compiler tvm |
TVM CPU, TVM GPU, PyTorch | rl_runs_tvm/ |
--compiler ov |
OpenVINO CPU, OpenVINO GPU, PyTorch | rl_runs_ov/ |
If an inconsistency is detected, the corresponding model and execution information will be saved as a bug case.
Differential testing mode is enabled when --diff-backends is specified.
Example:
python -m higen.main \
--big-epochs 20 \
--small-epochs 100 \
--compiler ort \
--diff-device cpu \
--diff-backends "ort,ov,tvm"Supported backend names:
ort, ov, tvm
The output directory will be automatically created as:
rl_runs_diff_ort_ov_tvm/
When --diff-backends is specified, HiGen enters differential testing mode and will not perform the normal CPU/GPU/PyTorch comparison.
HiGen supports runtime coverage collection for TVM and OpenVINO using gcovr. To reproduce coverage results, TVM and OpenVINO must be compiled with coverage instrumentation flags.
For detailed coverage build instructions, please refer to:
docs/BACKEND_INSTALLATION.md
The coverage output is periodically written to:
coverage_by_steps.csv
The coverage file can be used to plot coverage curves during fuzzing.
Install gcovr:
pip install gcovr==8.6Verify gcovr:
gcovr --versionpython -m higen.main \
--big-epochs 50 \
--small-epochs 200 \
--compiler ort \
--diff-device cpu \
--diff-backends "tvm,ov,ort" \
--cov-reset --cov-every 50 \
--ov-cov-reset --ov-cov-every 50 \
--ov-cov-src /path/to/openvino \
--ov-cov-build /path/to/openvino/build_gcovAfter a run, users can manually export coverage with gcovr.
For TVM:
gcovr \
-r /path/to/tvm \
--object-directory /path/to/tvm/build_gcov \
--csv tvm_coverage.csvFor OpenVINO:
gcovr \
-r /path/to/openvino \
--object-directory /path/to/openvino/build_gcov \
--csv openvino_coverage.csvBy default, each run creates a new numbered subdirectory under the corresponding output root.
Example:
rl_runs_diff_ort_ov_tvm/3/
Each run directory contains:
| File / Directory | Description |
|---|---|
training.log |
Records training and reward information. |
bug_cases/ |
Stores generated models that trigger crashes, inconsistencies, or differential failures. |
coverage_by_steps.csv |
Stores coverage statistics if coverage measurement is enabled. |
The output root also maintains a LATEST file, which records the absolute path of the latest run directory.
To disable the session subdirectory mechanism and write directly to the output root, use:
--no-session-subdir| Argument | Description |
|---|---|
--big-epochs |
Number of high-level epochs. |
--small-epochs |
Number of low-level epochs within each high-level epoch. |
--compiler {ort,tvm,ov} |
Specifies the main validation backend. |
--tvm-timeout |
Overrides the TVM execution timeout in seconds. |
--run-root |
Overrides the default output directory. |
--diff-backends |
Enables differential testing and specifies backend list, such as "ort,ov,tvm". |
--diff-device |
Specifies the device used in differential testing, such as cpu or gpu. |
--diff-rtol |
Relative tolerance for differential comparison. |
--diff-atol |
Absolute tolerance for differential comparison. |
--cov-every |
TVM coverage sampling interval. Use 0 to disable TVM coverage. |
--cov-reset |
Resets TVM coverage data before running. |
--ov-cov-every |
OpenVINO coverage sampling interval. Use 0 to disable OpenVINO coverage. |
--ov-cov-reset |
Resets OpenVINO coverage data before running. |
--ov-cov-src |
OpenVINO source directory for coverage collection. Required when OpenVINO coverage is enabled or reset is requested. |
--ov-cov-build |
OpenVINO build directory for coverage collection. Required when OpenVINO coverage is enabled or reset is requested. |
--cov-out-csv |
Overrides the output path of coverage_by_steps.csv. |
--no-session-subdir |
Disables numbered session directories and writes directly to the output root. |
HiGen supports the following environment variables for timeout, process isolation, and crash deduplication control.
| Variable | Description |
|---|---|
HIGEN_ORT_TIMEOUT_S |
Timeout in seconds for ONNX Runtime inference in differential testing. Default: 60. |
HIGEN_ORT_ISOLATED |
Whether to run ONNX Runtime inference in an isolated process. Default: 1. Set to 0 to disable isolation. |
HIGEN_OV_TIMEOUT_S |
Timeout in seconds for OpenVINO inference in differential testing. Default: 60. |
HIGEN_TVM_ISOLATED |
Whether to run TVM inference in an isolated process. Default: 1. Set to 0 to disable isolation. |
HIGEN_DEDUP_CRASH |
Whether to enable crash-case deduplication. Default: 1. Set to 0 to disable deduplication. |
Example:
export HIGEN_ORT_TIMEOUT_S=60
export HIGEN_ORT_ISOLATED=1
export HIGEN_OV_TIMEOUT_S=60
export HIGEN_TVM_ISOLATED=1
export HIGEN_DEDUP_CRASH=1HiGen stores bug-triggering cases under bug_cases/. A typical bug case directory is named as:
big_<big_epoch>_small_<small_epoch>
Each bug case may contain:
| File | Description |
|---|---|
model.onnx |
The generated ONNX model. |
oracle.pkl |
Oracle input data. |
gir.pkl |
NNSmith graph intermediate representation. |
model.pth |
PyTorch model if available. |
error.log |
Error message, mismatch information, or traceback. |
Successful runs and invalid configurations are removed after validation to avoid cluttering the output directory.
For cases where the number of outputs does not match, HiGen treats them as invalid configurations instead of real bugs.
HiGen provides a crash-case deduplication filter to avoid repeatedly saving the same crash-type failure.
Deduplication is controlled by:
export HIGEN_DEDUP_CRASH=1To disable crash deduplication:
export HIGEN_DEDUP_CRASH=0The deduplication index is stored under the bug case directory as:
.seen_crash_signatures.jsonl
This file is generated automatically and should not be committed to the repository.
Not every reported mismatch represents a true compiler bug. Some differences may come from floating-point precision, FP16 rounding, NaN/Inf propagation, or backend-specific implementation details.
HiGen provides a triage utility:
python -m higen.utils.triage_bug_cases \
--cases-root /path/to/bug_cases \
--out-json triage.json \
--top 30The triage tool considers:
- Error type in
error.log - NaN/Inf mismatch patterns
- Maximum absolute difference
- Operators and data types in
model.onnx - Whether oracle inputs contain NaN or Inf
- Known high-risk operators such as
Acos,Asin, andAtan
The output is a prioritized list of bug candidates.
View only crash-like cases:
python -m higen.utils.triage_bug_cases \
--cases-root /path/to/rl_runs_diff_tvm_ort_ov/bug_cases \
--only-crashes \
--top 50Exclude known submitted operators:
python -m higen.utils.triage_bug_cases \
--cases-root /path/to/rl_runs_diff_tvm_ort_ov/bug_cases \
--exclude-ops "Acos,Asin,Atan" \
--top 50Exclude infrastructure or unstable mismatch types:
python -m higen.utils.triage_bug_cases \
--cases-root /path/to/rl_runs_diff_tvm_ort_ov/bug_cases \
--exclude-error-types "output_name_mismatch,nan_location_mismatch,inf_location_mismatch" \
--top 50Generated artifacts are not intended to be committed to the repository. Please exclude the following files and directories through .gitignore:
__pycache__/
*.pyc
.env
.venv/
config.local.yaml
rl_runs_*/
bug_cases/
training.log
coverage_by_steps.csv
triage*.json
*.onnx
*.pkl
*.pth
*.gcda
*.gcno
*.gcov
**/.seen_crash_signatures.jsonl
.DS_Store
- The project is tested mainly on Ubuntu 22.04.4 LTS.
- This repository includes the NNSmith source code used by HiGen.
- This repository does not include TVM source code, OpenVINO source code, or their instrumented builds.
- HiGen was tested with TVM
0.23.0devand OpenVINO2026.0.0.dev20251223, both of which are development/nightly-style builds. - Detailed TVM/OpenVINO installation and coverage build instructions are provided in
docs/BACKEND_INSTALLATION.md. - Users need to install TVM and OpenVINO according to their local environment and selected testing mode.
- For GPU execution, make sure CUDA, cuDNN, and related backend libraries are correctly installed.
- For TVM and OpenVINO coverage collection, source-built instrumented versions are required.
- OpenVINO coverage requires users to explicitly provide
--ov-cov-srcand--ov-cov-build. - Differential testing and normal testing are mutually exclusive.
- The generated bug cases should be triaged before being reported upstream.
- Generated artifacts such as
rl_runs_*,bug_cases/,training.log,coverage_by_steps.csv,*.onnx,*.pkl, and*.pthshould be excluded from version control.