CMU-CS-25-130
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-25-130

Towards Effortless High-Performance
Kernel Development for LLM Workloads

Jinqi (Kathryn) Chen

M.S. Thesis

August 2025

CMU-CS-25-130.pdf


Keywords: Large Language Models, LLM Inference, Machine Learning Compiler, High-Performance Computing, Distributed Systems, Programming Model

Recent advances in large language models (LLMs) have pushed GPU hardware to its limits, requiring highly optimized kernels for compute- and bandwidth-intensive operations such as matrix multiplication, attention, and inter-GPU communication. However, achieving state-of-the-art efficiency often demands deep low-level expertise, slowing development and limiting accessibility.

This thesis presents TIR+, a multi-level compiler framework that unifies high-level productivity and low-level optimization within a single compilation and runtime infrastructure. TIR+ spans from a Python-based tiling DSL, enabling rapid kernel prototyping, to a hardware-centric intermediate representation (IR), offering fine-grained control over memory, parallelism, and specialized instructions. Between these extremes, it provides optimized tensor libraries and reusable primitives. Crucially, TIR+ is distributed-aware, supporting multi-GPU execution with built-in communication management and compute–communication overlap. We demonstrate the capability of TIR+ through key LLM kernels, such as GEMM, attention, and fused compute–communication kernels. Among these cases, TIR+ matches the state-of-the-art performance with significantly less development effort than hand-tuned CUDA, demonstrating a unified and scalable path toward hardware-aware kernel optimization for current and future AI workloads.

49 pages

Thesis Committee:
Tianqi Chen (Chair)
Zhihao Jia

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu