A full-featured compiler for a statically-typed custom language, written in Java. This project implements a complete compiler pipeline including lexical analysis, parsing, semantic validation, and Java bytecode generation using ASM.
This compiler translates source code written in a custom language (.lang files) into executable Java bytecode. The compilation process follows a traditional multi-stage architecture with clear separation of concerns between lexing, parsing, semantic analysis, and code generation.
The language supports:
Data Types:
int— 32-bit signed integersfloat— 32-bit floating-point numbersbool— boolean values (true/false)string— immutable text strings- Arrays and custom structs
Control Flow:
if/else if/elsestatementsforloopswhileloops
Operations:
- Arithmetic:
+,-,*,/,%(modulo for integers only) - Comparison:
==,!=,<,>,<=,>= - Logical:
&&(and),||(or),!(negation) - String concatenation with
+ - Array/string indexing with
[]
Functions:
- User-defined function declarations and calls
- Optional parameters
- Return statements
Built-in Functions:
bool !(bool)— negate a booleanstring chr(int)— convert character code to stringint len(string or array)— get length
Variables:
- Type declarations with optional
finalkeyword - Automatic type promotion (int to float in mixed expressions)
The compiler is organized into four main pipeline stages:
Tokenizes source code into a stream of symbols. Handles keywords, operators, literals, and identifiers.
Builds an Abstract Syntax Tree (AST) from tokens using recursive descent parsing. The AST represents the program structure as nested expression and statement objects.
Validates the AST for semantic correctness:
- Type checking and type promotion rules
- Symbol table management
- Function and struct declarations
- Variable scope and redeclaration checks
- Throws
SemanticExceptionfor violations
Translates the validated AST into Java bytecode (.class files) using the ASM library. Generates optimized bytecode with proper stack management and local variable allocation.
- Java 11+
- Gradle
./gradlew buildOn Windows:
gradlew.bat build./gradlew run --args="-compiler path/to/file.lang"This generates a .class file that can be executed with the Java runtime.
Lexer only (output tokens):
./gradlew run --args="-lexer path/to/file.lang"Parser only (output AST):
./gradlew run --args="-parser path/to/file.lang"Semantic analysis only:
./gradlew run --args="-semantic path/to/file.lang"See code_example.lang for a complete example of the language syntax and features.
The project includes test suites for each compiler stage:
TestLexer.java— lexical analysis teststestParser.java— parsing teststestSemantic.java— semantic validation teststestCodegen.java— code generation tests
Run tests with:
./gradlew testFully Implemented:
- ✅ Lexical analysis
- ✅ Parsing and AST construction
- ✅ Type system and type checking
- ✅ Variable declarations (with
finalsupport) - ✅ Function declarations and calls
- ✅ Control flow (for, while, if/else)
- ✅ All operators (arithmetic, logical, comparison)
- ✅ String concatenation
- ✅ Arrays and array operations
- ✅ Structs (custom data types)
- ✅ Bytecode generation and execution
src/main/java/compiler/
├── Compiler.java # Entry point and CLI
├── Lexer/ # Tokenization
├── Parser/ # AST construction
│ └── StatementsAndExpressions/ # AST node classes
├── Semantic/ # Type checking and validation
└── CodeGenerator/ # Bytecode generation
build/ # Compiled output
- The compiler uses a visitor pattern for AST traversal during semantic analysis and code generation
- Type promotion (int to float) is handled automatically in mixed expressions
- The ASM library is used for efficient bytecode generation
- All symbols and variables are tracked in symbol tables maintained during compilation
This is an educational compiler project.