This is the artifact for the OOPSLA'25 paper titled "An Empirical Study of Rust-Specific Bugs in the rustc Compiler".
This work conducts a comprehensive and systematic study of Rust-specific bugs in rustc, with a particular focus on the components that support its unique language features. Our analysis examines issues and fixes reported between 2022 and 2024, with a manual review of 301 valid issues. We categorize these bugs based on their causes, symptoms, affected compilation stages, and test case characteristics. Additionally, we evaluate existing rustc testing tools to assess their effectiveness and limitations.
Our bug collection and analysis approach is summarized in the following figure. First, we perform bug data collection by collecting all closed issues from the official Rust GitHub repository within a specified time frame (2022-01-01 to 2025-01-01). We apply an initial filter using official issue labels, focusing on those related to Rust-specific language mechanisms. Then, we manually filter irrelevant or ineligible issues, such as duplicates or those without test cases. For each remaining issue, we identify the corresponding pull request (PR) and extract the test case provided in the issue description. The final result is a curated dataset of rustc bugs, each comprising an issue, a PR, and a test case.
The artifact can be evaluated on a standard Linux-based system. It does not require any specialized hardware such as GPUs or high-core-count processors. The minimum recommended system requirements are:
- Operating System: Linux (any recent distribution)
- Memory: 8 GB RAM
- Storage: 2 GB of free disk space
No access to special-purpose hardware is necessary. The artifact is designed to run efficiently on a typical desktop or laptop environment used for software development.
There are two ways to run this project: using Docker or running it directly on a local machine.
A pre-configured Docker environment can simplify setup and ensure compatibility across systems. You can use the provided Dockerfile to build the environment with all necessary dependencies.
To build the Docker image named rustc-bug-study from source, run the following command (estimated running time: ~5 min)
docker build -t rustc-bug-study .Run the following command to create a new container.
docker run -it --rm \
-v $(pwd)/data_collection:/app/data_collection \
-v $(pwd)/plot:/app/plot \
-v $(pwd)/scripts:/app/scripts \
-v $(pwd)/Test_Case_Characteristics:/app/Test_Case_Characteristics \
-v $(pwd)/test_cases:/app/test_cases \
-v $(pwd)/data:/app/data \
-w /app \
rustc-bug-studyTo run the project locally, your system should meet the following requirements:
- Operating System: A Unix-like operating system (tested on Ubuntu)
- Rust: Installed and properly configured (tested on Rust 1.87)
- Python: Version 3.12 or higher (tested on Python 3.12)
Make sure cargo and python3 are available in your environment. You may also need to install Python packages listed in requirements.txt by running:
pip install -r requirements.txt-
data_collectionfolder: A Python project for automatically crawling GitHub issues. -
plotfolder: Contains all scripts for data visualization. -
scriptsfolder: Contains all scripts used to reproduce the results in our study. -
Test_Case_Characteristicsfolder: A Rust project for extracting ASTs and collecting node-level statistics to analyze test case features. -
test_casesfolder: Stores all collected bug-revealing test cases. -
Data Filesfolder:all_issues.csv: Contains all collected and labeled GitHub issues.type_node_counts.csv: Contains all the backend-related issues.item_node_counts.csv: Contains statistics ofItemnodes in ASTs. This file can be generated by running theTest_Case_Characteristicsproject.type_node_counts.csv: Contains statistics ofTypenodes in ASTs. This file can also be generated by running theTest_Case_Characteristicsproject.
Fig4a.pdf,Fig4b.pdf,Fig10a.pdf, andFig10b.pdf: These are the visualizations corresponding to Figures 4(a), 4(b), 10(a), and 10(b) in our paper. They can be plotted using the scripts in theplotfolder.
In the following section, we provide instructions for reproducing the results presented in the paper.
To reproduce Table 1, please run the following script to print statistics of our collect labels.
python3 ./scripts/labels.pyThe above script prints the following:
Label Count
========================================
A-HIR 20
A-THIR 1
----------------------------------------
A-MIR 43
A-mir-opt 78
A-mir-opt-inlining 23
A-mir-opt-GVN 0
A-mir-opt-nrvo 0
A-stable-MIR 1
----------------------------------------
A-type-system 25
A-inference 29
A-closures 29
A-coercions 13
A-const-generics 70
A-DSTs 0
A-zst 0
A-trait-system 77
A-impl-trait 68
A-trait-objects 27
A-auto-traits 14
A-implied-bounds 9
A-coinduction 0
A-coherence 14
----------------------------------------
A-lifetimes 70
A-borrow-checker 45
Backend (Excluded)
========================================
A-LLVM 275
A-gcc 2
A-cranelift 10
To reproduce Table 2, please run the following script to print statistics related to our bug collection phases.
python3 ./scripts/status.pyThe above script prints the following:
Status Count
------------------------------
Duplicate 88
Not a bug 28
Not reproducible 56
Discussion 6
Exclude 92
Valid 301
------------------------------
Total 571
Running the crawler is not required for the subsequent steps. Crawling can be time-consuming, and the collected data only contains raw GitHub issue information, without our annotations for bug causes, symptoms, etc.
To ensure consistency, we have already provided the annotated CSV files in this artifact. You can directly proceed to the reproduction steps using the provided ./data/all_issues.csv file.
If you still want to crawl the issues yourself, please run the following scripts.
Note: Please replace <token> with your GitHub authentication token.
cd data_collection
python3 main.py issue ./all_issues.csv <token>The generated all_issues.csv file would be stored in ./rustc-bug-study-artifact/data_collection/all_issues.csv.
In the first research question, we compute the distribution of bug causes. Specifically, the script reproduces Table 3.
python3 ./scripts/rq1_causes.pyThe above script prints the following:
Bug Cause Count Ratio
--------------------------------------------------
Trait & Bound 37 12.3%
Opaque types 38 12.6%
New solver 7 2.3%
Well-formedness 9 3.0%
Subtotal 91 30.2%
--------------------------------------------------
Borrow & Move 7 2.3%
Lifetime 34 11.3%
Subtotal 41 13.6%
--------------------------------------------------
Wrong implementations 34 11.3%
Missing cases 12 4.0%
Subtotal 46 15.3%
--------------------------------------------------
Basic structure 38 12.6%
Error handling & Reporting 75 24.9%
Compatibility 10 3.3%
Subtotal 123 40.9%
--------------------------------------------------
Total 301 100.0%
As for the bug prone compilation stages, please run the following script to reproduce Fig. 4 (a):
python3 ./scripts/rq1_components_a.pyThe above script prints the following:
Stage Cause Ratio
--------------------------------------------------------
1. AST Type System Errors 0.3%
1. AST Ownership & Lifetime Errors 3.7%
1. AST MIR Optimization Errors 0.0%
1. AST General Errors 4.0%
Subtotal 24 8.0%
--------------------------------------------------------
2. HIR Type System Errors 20.3%
2. HIR Ownership & Lifetime Errors 5.0%
2. HIR MIR Optimization Errors 0.0%
2. HIR General Errors 19.6%
Subtotal 135 44.9%
--------------------------------------------------------
3. MIR Type System Errors 6.3%
3. MIR Ownership & Lifetime Errors 5.0%
3. MIR MIR Optimization Errors 15.3%
3. MIR General Errors 8.6%
Subtotal 106 35.2%
--------------------------------------------------------
4. Code Gen Type System Errors 0.0%
4. Code Gen Ownership & Lifetime Errors 0.0%
4. Code Gen MIR Optimization Errors 0.0%
4. Code Gen General Errors 1.7%
Subtotal 5 1.7%
--------------------------------------------------------
5. Utility Type System Errors 3.3%
5. Utility Ownership & Lifetime Errors 0.0%
5. Utility MIR Optimization Errors 0.0%
5. Utility General Errors 4.7%
Subtotal 24 8.0%
--------------------------------------------------------
6. LLVM Type System Errors 0.0%
6. LLVM Ownership & Lifetime Errors 0.0%
6. LLVM MIR Optimization Errors 0.0%
6. LLVM General Errors 2.3%
Subtotal 7 2.3%
--------------------------------------------------------
Total 301 100.0%
similarly, please run the following script to reproduce Fig. 4 (b):
python3 ./scripts/rq1_components_b.pyThe above script prints the following:
Stage Cause Ratio
---------------------------------------------------------------------
2. HIR-type & wf check Type System Errors 25.2%
2. HIR-type & wf check Ownership & Lifetime Errors 3.0%
2. HIR-type & wf check MIR Optimization Errors 0.0%
2. HIR-type & wf check General Errors 23.0%
Subtotal 69 51.1%
---------------------------------------------------------------------
2. HIR-trait solving Type System Errors 17.8%
2. HIR-trait solving Ownership & Lifetime Errors 1.5%
2. HIR-trait solving MIR Optimization Errors 0.0%
2. HIR-trait solving General Errors 8.1%
Subtotal 37 27.4%
---------------------------------------------------------------------
2. HIR-Type Infer Type System Errors 1.5%
2. HIR-Type Infer Ownership & Lifetime Errors 4.4%
2. HIR-Type Infer MIR Optimization Errors 0.0%
2. HIR-Type Infer General Errors 1.5%
Subtotal 10 7.4%
---------------------------------------------------------------------
2. HIR-report Type System Errors 0.7%
2. HIR-report Ownership & Lifetime Errors 2.2%
2. HIR-report MIR Optimization Errors 0.0%
2. HIR-report General Errors 11.1%
Subtotal 19 14.1%
---------------------------------------------------------------------
3. MIR-MIR Transform Type System Errors 1.9%
3. MIR-MIR Transform Ownership & Lifetime Errors 4.7%
3. MIR-MIR Transform MIR Optimization Errors 42.5%
3. MIR-MIR Transform General Errors 0.9%
Subtotal 53 50.0%
---------------------------------------------------------------------
3. MIR-Borrow Check Type System Errors 16.0%
3. MIR-Borrow Check Ownership & Lifetime Errors 4.7%
3. MIR-Borrow Check MIR Optimization Errors 0.9%
3. MIR-Borrow Check General Errors 3.8%
Subtotal 27 25.5%
---------------------------------------------------------------------
3. MIR-Dataflow Analysis Type System Errors 0.0%
3. MIR-Dataflow Analysis Ownership & Lifetime Errors 0.0%
3. MIR-Dataflow Analysis MIR Optimization Errors 0.0%
3. MIR-Dataflow Analysis General Errors 8.5%
Subtotal 9 8.5%
---------------------------------------------------------------------
3. MIR-report Type System Errors 0.0%
3. MIR-report Ownership & Lifetime Errors 4.7%
3. MIR-report MIR Optimization Errors 0.0%
3. MIR-report General Errors 11.3%
Subtotal 17 16.0%
---------------------------------------------------------------------
Optionally, the following script can plot Fig. 4 (a) and Fig. 4 (b) and save them as ./plot/Fig4a.pdf and ./plot/Fig4b.pdf.
python3 ./plot/fig4a.py
python3 ./plot/fig4b.pyIf you are using the Docker container, the files Fig4a.pdf and Fig4b.pdf will be generated inside the container and can be directly accessed from your host machine in the ./rustc-bug-study-artifact/plot/ directory.
NOTE: When generating plots, you might see warnings like the following:
findfont: Generic family 'sans-serif' not found because none of the following families were found: Arial, DejaVu Sans, Liberation Sans, Bitstream Vera Sans, sans-serif. This warning is harmless and does not affect the correctness or display of the resulting PDF plots. This message can be ignored.
In the second research question, we compute the distribution of bug symptoms. Specifically, the following two script reproduces Table 4.
python3 ./scripts/rq2_a.pyThis script reproduces Table 4 by presenting the detailed breakdown of bug symptoms. It categorizes and counts each sub-type of symptom observed in the dataset. It prints the following:
Symptom Count Ratio
-----------------------------------------------------------------------------
1. Crash-Front-end Panic(valid) 42 14.0%
1. Crash-Front-end Panic(invalid) 75 24.9%
1. Crash-Back-end Crash 3 1.0%
Subtotal 120 39.9%
-----------------------------------------------------------------------------
2. Correctness Issues-Completeness Issues 56 18.6%
2. Correctness Issues-Soundness Issues 22 7.3%
Subtotal 78 25.9%
-----------------------------------------------------------------------------
3. Miscompilation-Inconsistent Output Issues 18 6.0%
3. Miscompilation-Safe Rust Causes UB 12 4.0%
Subtotal 30 10.0%
-----------------------------------------------------------------------------
4. Diagnostic Issues-Incorrect Warning/Error 20 6.6%
4. Diagnostic Issues-Improper Fix Suggestion 38 12.6%
Subtotal 58 19.3%
-----------------------------------------------------------------------------
5. Misoptimization-incorrect 9 3.0%
5. Misoptimization-performance 6 2.0%
Subtotal 15 5.0%
-----------------------------------------------------------------------------
Total 301 100.0%
The following script also reproduces Table 4, but focuses on showing the joint distribution and proportions of bug symptoms and their corresponding causes, providing a cross-tabulation of these two dimensions.
python3 ./scripts/rq2_b.pyIt prints the following:
Symptom GroupCause Count Ratio
----------------------------------------------------------------
1. Type System Errors 30 25.0%
1. Ownership & Lifetime Errors 3 2.5%
1. MIR Optimization Errors 19 15.8%
1. General Errors 68 56.7%
Subtotal 120 100.0%
----------------------------------------------------------------
2. Type System Errors 43 55.1%
2. Ownership & Lifetime Errors 17 21.8%
2. MIR Optimization Errors 7 9.0%
2. General Errors 11 14.1%
Subtotal 78 100.0%
----------------------------------------------------------------
3. Type System Errors 5 16.7%
3. Ownership & Lifetime Errors 4 13.3%
3. MIR Optimization Errors 12 40.0%
3. General Errors 9 30.0%
Subtotal 30 100.0%
----------------------------------------------------------------
4. Type System Errors 12 20.7%
4. Ownership & Lifetime Errors 16 27.6%
4. MIR Optimization Errors 1 1.7%
4. General Errors 29 50.0%
Subtotal 58 100.0%
----------------------------------------------------------------
5. Type System Errors 1 6.7%
5. Ownership & Lifetime Errors 1 6.7%
5. MIR Optimization Errors 7 46.7%
5. General Errors 6 40.0%
Subtotal 15 100.0%
----------------------------------------------------------------
In the third research question, we analyz the test case characteristics.
To reproduce Table 5, please run the following script to get the LOC information:
python3 ./scripts/rq3_loc.pyIt prints the following:
Test case (original) Value
--------------------------------------
Mean 17.8
Median 12.0
Second smallest non-zero 2
Max 346
Test case (reduced) Value
--------------------------------------
Mean 14.2
Median 11.0
Second smallest non-zero 2
Max 123
To reproduce Table 6, please run the following script to get the information of Item node:
python3 ./scripts/rq3_ast_item.pyIt prints the following:
Node Type Total Prevalence Average Max
Function 524 100.0% 1.9 8
Struct 130 37.6% 1.3 4
Impl 157 37.6% 1.5 6
Trait 144 34.3% 1.5 6
Use 64 20.3% 1.2 3
Type 29 7.4% 1.4 6
Enum 8 3.0% 1.0 1
Macro 11 3.0% 1.4 2
ExternCrate 7 2.6% 1.0 1
Static 7 2.2% 1.2 2
Mod 8 1.8% 1.6 3
Const 5 1.8% 1.0 1
Verbatim 4 1.1% 1.3 2
ForeignMod 2 0.7% 1.0 1
TraitAlias 1 0.4% 1.0 1
Union 0 0.0% 0.0 0
Other 0 0.0% 0.0 0
To reproduce Table 6, please run the following script to get the information of Type node:
python3 ./scripts/rq3_ast_type.pyIt prints the following:
Node Type Total Prevalence Average Max
Path 1262 88.2% 5.3 41
Reference 276 43.9% 2.3 10
Tuple 161 30.3% 2.0 8
ImplTrait 87 20.7% 1.6 10
Array 55 11.4% 1.8 10
TraitObject 49 10.7% 1.7 3
Ptr 35 7.7% 1.7 4
Infer 18 4.8% 1.4 2
BareFn 21 4.1% 1.9 5
Slice 13 3.0% 1.6 3
Never 1 0.4% 1.0 1
Paren 1 0.4% 1.0 1
Group 0 0.0% 0.0 0
Macro 0 0.0% 0.0 0
Verbatim 0 0.0% 0.0 0
Other 0 0.0% 0.0 0
To reproduce Table 7, please run the folllowing script to get all the information:
python3 ./scripts/rq3_features.pyIt prints the following:
Top 5 unstable features:
Feature Line Count Proportion
----------------------------------------------------------------------
#![feature(generic_const_exprs)] 13 17.8%
#![feature(type_alias_impl_trait)] 11 15.1%
#![feature(core_intrinsics)] 9 12.3%
#![feature(custom_mir)] 8 11.0%
#![feature(async_closure)] 3 4.1%
Number of valid rows with non-empty 'unstable features' (X): 73
Ratio (X / valid 'Status'): 24.3%
--------------------------------------------------------------------------------
Top 5 flags:
Flag Count Proportion
------------------------------------------------------------
-Zmir-opt-level=X 26 45.6%
-Zmir-enable-passes=+X 9 15.8%
+nightly 8 14.0%
-Copt-level=X 8 14.0%
--edition=X 7 12.3%
Number of valid rows with non-empty 'command' (X): 57
Ratio (X / valid 'Status'): 18.9%
--------------------------------------------------------------------------------
Top 5 traits:
Trait Count Proportion
--------------------------------------------------
Sized 32 49.2%
FnOnce 8 12.3%
Iterator 5 7.7%
Copy 4 6.2%
FnMut 3 4.6%
Number of valid rows with non-empty 'trait' (X): 65
Ratio (X / valid 'Status'): 21.6%
--------------------------------------------------------------------------------
Feature Count Frequency
------------------------------
Lifetime 104 34.6%
std 56 18.6%
dyn 30 10.0%
async 22 7.3%
core 19 6.3%
--------------------------------------------------------------------------------
Optionally, if you want to parse the AST and get all the information, please run the Rust project in Test_Case_Characteristics. The script is as follows:
cd Test_Case_Characteristics
cargo runThen, two files (item_node_counts.csv and type_node_counts.csv) will be saved in ./Test_Case_Characteristics folder, which is same as ./item_node_counts.csv and ./type_node_counts.csv.
If you are using the Docker container, these files can be directly accessed from your host machine in the ./rustc-bug-study-artifact/Test_Case_Characteristics/ directory.
IMPORTANT NOTE: During the execution of cargo run, you may encounter log messages such as Failed to parse file. This behavior is expected. As mentioned in our paper, a small number of files fail during AST parsing. As long as the final outputs ./Test_Case_Characteristics/item_node_counts.csv and ./Test_Case_Characteristics/type_node_counts.csv match the corresponding files provided in the artifact directory: ./item_node_counts.csv and ./type_node_counts.csv, the process has completed as intended and the results are correct.
In the last research question, we analyze the status of existing techniques.
To reproduce Fig. 10(a), please run the following script to get the distribution of bug symptoms across existing tools:
python3 ./scripts/rq4_a.pyIt prints the following:
Symptom Group Found by Count
---------------------------------------------------
1. Crash members 18
icemaker 56
developer 41
Rustlantis 1
fuzz-rustc 4
Subtotal 120
---------------------------------------------------
2. Correctness Issues members 40
developer 38
Subtotal 78
---------------------------------------------------
3. Miscompilation members 8
icemaker 1
developer 13
Rustlantis 8
Subtotal 30
---------------------------------------------------
4. Diagnostic Issues members 14
icemaker 2
developer 42
Subtotal 58
---------------------------------------------------
5. Misoptimization members 6
developer 9
Subtotal 15
---------------------------------------------------
To reproduce Fig. 10(b), please run the following script to get the distribution of bug causes across testing tools:
python3 ./scripts/rq4_b.pyIt prints the following:
Cause Group Found by Count
-------------------------------------------------------
Type System Errors members 31
icemaker 11
developer 47
fuzz-rustc 2
Subtotal 91
-------------------------------------------------------
Ownership & Lifetime Errors members 19
icemaker 1
developer 19
Rustlantis 1
fuzz-rustc 1
Subtotal 41
-------------------------------------------------------
MIR Optimization Errors members 14
icemaker 8
developer 19
Rustlantis 5
Subtotal 46
-------------------------------------------------------
General Errors members 22
icemaker 39
developer 58
Rustlantis 3
fuzz-rustc 1
Subtotal 123
-------------------------------------------------------
Optionally, the following script can plot Fig. 10 (a) and Fig. 10 (b) and save them as ./Fig10a.pdf and ./Fig10b.pdf.
python3 ./plot/fig10a.py
python3 ./plot/fig10b.pyIf you are using the Docker container, the files Fig10a.pdf and Fig10b.pdf will be generated inside the container and can be directly accessed from your host machine in the ./rustc-bug-study-artifact/plot/ directory.
NOTE: When generating plots, you might see warnings like the following:
findfont: Generic family 'sans-serif' not found because none of the following families were found: Arial, DejaVu Sans, Liberation Sans, Bitstream Vera Sans, sans-serif. This warning is harmless and does not affect the correctness or display of the resulting PDF plots. This message can be ignored.
The core components of this artifact that are intended for reuse include:
-
Issue Collection Scripts: These scripts can be easily adapted to collect GitHub issues over different timeframes or targeting different labels. Users only need to modify a few configuration parameters (e.g., target_labels, start_date, end_date) to collect a customized issue set.
-
Data Analysis and Visualization Scripts: All data processing and visualization components are reusable, provided that the input CSV files retain the expected column structure. These scripts generate summary statistics and plots that can be applied to other datasets with minimal modification.
-
AST Node Collection Tool: This tool extracts and summarizes AST node types from Rust programs. It is fully reusable—given any Rust source code as input, the tool can generate the corresponding AST node types and statistics automatically.
To adapt the artifact to new use cases:
-
Issue Collection: Edit the configuration file or script arguments to specify new target_labels or a different time range. The rest of the pipeline will function without changes.
-
Data Analysis: Ensure that input CSV files maintain the necessary columns (as described in the provided script comments). You can then reuse the scripts for aggregation and visualization.
-
AST Node Collection: Simply replace the input Rust file(s) with your own; the tool will output updated AST node statistics accordingly.
