AI Compiler Engineer — I turn high‑level models (PyTorch · TensorFlow · JAX · ONNX) into optimized inference artifacts across CPU, GPU, FPGA, and custom accelerators, working across the TVM · MLIR · LLVM · IREE stack to cut build time, memory footprint, and on‑device latency.
Languages
Compilers & IR
Frameworks & Serving
Hardware & Acceleration
-
🔭 I’m currently an AI Software Engineer at DreamBig Semiconductor, optimizing LLM inference through graph- and kernel-level transformations in TVM and IREE for GPU, FPGA, and custom accelerators.
-
🧱 I architect compiler passes and analyses that cut build time, memory footprint, and runtime latency while preserving model fidelity—across the TVM, LLVM, MLIR, IREE stack targeting CUDA, ROCm, TensorRT, OpenVINO, Metal, Vulkan, and custom silicon.
-
🚗 I deploy and optimize models for edge / embedded AI accelerators, including NVIDIA Jetson Thor, for low‑latency on‑device inference.
-
💬 Ask me about Compilers, LLM inference, Transformers, GPU/CUDA, and HPC.
-
📧 Contact me at: hamza7771.861@gmail.com.


