What Are the Stages of Rust Compilation? (Parsing, HIR, MIR, LLVM IR)

Rust compiles code through Parsing, HIR, MIR, and LLVM IR stages to ensure safety and performance before generating machine code.

When the spinner starts

You type cargo build and watch the terminal progress bar. Behind that spinner, rustc is not performing one giant translation from text to CPU instructions. It breaks the job into distinct phases, each with a narrow responsibility. The compiler transforms your source code through four main representations before emitting a binary. Understanding these stages explains why Rust error messages are so precise, why borrow checking happens when it does, and how to debug compilation failures that refuse to make sense.

The kitchen pipeline

Think of a professional kitchen. You do not hand raw vegetables to the dishwasher. First, a prep cook washes and chops ingredients into uniform pieces. Next, a sous chef seasons and arranges them on a tray, checking that nothing is undercooked or contaminated. Finally, the line cook plates the dish and sends it out. Rust follows the same pipeline. It converts your .rs files through increasingly abstract representations before generating machine code. Each stage strips away syntax noise, enforces rules, and prepares the code for the next phase.

Stage one: Parsing and the AST

The journey starts with plain text. rustc reads your file character by character and groups them into tokens. Keywords, identifiers, operators, and punctuation become discrete units. The lexer handles this tokenization. It ignores whitespace and comments, keeping only the structural elements. The parser then arranges these tokens into an Abstract Syntax Tree. The AST captures the structure of your code without understanding types or semantics. A function definition becomes a node with children for the name, parameters, and body. A for loop becomes a node with an iterator and a block.

At this stage, the compiler only cares about syntax. Missing semicolons, mismatched braces, or invalid token sequences trigger parsing errors. The AST is language-specific and closely mirrors your source layout. It exists briefly in memory before the next phase consumes it. You rarely interact with the AST directly, but it is the foundation for everything that follows. The parser also performs basic error recovery, allowing the compiler to report multiple syntax mistakes in a single pass instead of stopping at the first one.

Trust the parser to catch typos early. If your code does not even form a valid tree, nothing else matters.

Stage two: Macro expansion and HIR

Once the parser finishes, macro expansion runs. Procedural macros and declarative macros replace their call sites with generated Rust code. This happens before type checking, which is why macro-generated code can sometimes produce confusing errors. The expanded code then feeds into the High-Level Intermediate Representation.

HIR desugars Rust syntax into a flatter, more uniform structure. It removes syntactic sugar like for loops, replacing them with their underlying iterator mechanics. It also performs name resolution and type inference. This is where the compiler figures out what T is in a generic function or whether a variable holds an i32 or a String. Type checking happens here. If you pass a u8 where an i32 is expected, the compiler rejects you with E0308 (mismatched types).

/// Calculates a doubled value with a fallback
fn calculate(x: i32) -> i32 {
    // HIR will desugar this match into explicit branches
    // Type inference resolves the return type to i32
    match x {
        0 => 10,
        _ => x * 2,
    }
}

HIR stays close to your original code structure but strips away ambiguity. It is the representation used for most diagnostic messages. When you see a compiler error pointing to a specific line, the message usually originates from HIR analysis. The compiler convention here is to keep HIR as readable as possible for debugging. You can inspect it directly using nightly flags. The community prefers explicit type annotations when inference fails, because HIR will stop dead without concrete types.

Let the compiler infer when it can. Add annotations when it cannot.

Stage three: MIR and the borrow checker

After type checking passes, the compiler lowers HIR into the Mid-Level Intermediate Representation. MIR is where Rust's safety guarantees get enforced. It transforms your code into a control flow graph. Every function becomes a set of basic blocks connected by edges. Variables are broken into individual parts. Borrows are tracked explicitly.

The borrow checker runs on MIR. It analyzes every reference, determines lifetimes, and verifies that mutable and immutable borrows never overlap. This is where E0502 (cannot borrow as mutable because it is also borrowed as immutable) gets caught. MIR also strips away high-level abstractions like closures and trait objects, replacing them with explicit function pointers and vtable lookups.

/// Attempts to mutate a vector while holding a reference
fn process(data: &mut Vec<String>) {
    // MIR tracks the mutable borrow of data
    let first = &data[0];
    // MIR will reject this because data is mutably borrowed
    // and first is an immutable borrow of its contents
    data.push(first.clone());
}

Optimization begins at this stage too. The compiler performs dead code elimination, constant folding, and loop unrolling. MIR is language-agnostic in structure but Rust-specific in semantics. It exists long enough for the borrow checker and early optimizations to run, then gets lowered further. You can inspect MIR to understand how the compiler views control flow and borrow scopes. The control flow graph makes it obvious why certain patterns trigger borrow errors. The compiler sees every possible execution path, not just the one you intended.

Read the borrow checker errors as graph analysis, not as personal criticism.

Stage four: LLVM IR and machine code

The final stage hands off to LLVM. MIR gets translated into LLVM Intermediate Representation. LLVM IR is a low-level, platform-agnostic assembly-like format. It uses a static single assignment form, meaning every variable is assigned exactly once. This structure makes aggressive optimization possible.

LLVM takes over from here. It performs register allocation, instruction scheduling, and architecture-specific optimizations. It knows how to emit x86, ARM, WebAssembly, or RISC-V machine code. The heavy lifting of turning Rust into fast binaries happens inside LLVM. Rust's compiler delegates code generation to LLVM because LLVM has decades of optimization research built into it.

You can inspect LLVM IR using compiler flags, though the output looks closer to assembly than Rust. The convention in the Rust community is to trust LLVM for performance tuning. Profile your code first. Only look at LLVM IR if you suspect the optimizer is missing a pattern that your CPU could handle better. LLVM handles the messy details of calling conventions, stack alignment, and instruction selection.

Do not fight LLVM's optimizer. Measure first, then tweak.

Why four stages instead of one

Separation of concerns drives better error messages. Type checking happens before borrow checking, so you get E0308 before E0502. The compiler fixes your type mistakes first, then checks your references. If it did both at once, you would drown in cascading errors.

The staged approach also enables incremental compilation. rustc caches HIR and MIR for unchanged modules. When you modify one file, the compiler only reprocesses the affected stages. This is why cargo build feels fast on the second run. The architecture also isolates safety checks from code generation. The borrow checker runs on MIR, which is normalized and predictable. LLVM handles performance, which is platform-dependent. Each stage does one thing well.

Stop trying to optimize before the borrow checker runs. Fix the structure first.

Debugging with intermediate output

When compilation fails in confusing ways, inspecting intermediate stages saves hours. Macro expansion often hides the real problem. A procedural macro might generate code that looks correct but violates trait bounds. You can see the expanded code by running the compiler with specific flags.

# Inspect macro-expanded code before type checking
rustc --pretty=expanded src/main.rs

# Inspect HIR to see desugared syntax and type inference
rustc -Z unpretty=hir src/main.rs

# Inspect MIR to see control flow and borrow tracking
rustc -Z unpretty=mir src/main.rs

The -Z flags require the nightly toolchain. Switch to it temporarily with rustup override set nightly. Run the command, then switch back with rustup override unset. The community convention is to use the cargo-expand crate instead of --pretty=expanded. It integrates better with Cargo workspaces and handles dependency expansion automatically. Install it with cargo install cargo-expand and run cargo expand in your project root.

Use the right tool for the layer you are debugging. Do not mix stages.

Pitfalls and compiler limits

Inspecting intermediate representations has limits. HIR output can be hundreds of lines long for a single function. MIR output looks like a state machine rather than Rust code. LLVM IR resembles assembly with SSA variables like %add1 and %ptr2. Do not expect readability. Use these tools to verify specific behaviors, not to read entire programs.

You will encounter E0282 (type annotations needed) when the compiler cannot infer a type from HIR analysis. This usually happens with generic functions that lack enough context. The compiler stops at HIR because it cannot proceed to MIR without knowing the concrete types. Add explicit type annotations or constrain the generic parameters.

Nightly flags change without warning. The -Z unpretty interface is unstable. Output formats may shift between releases. Do not write scripts that parse -Z unpretty output for production tooling. Use it for manual debugging only. The compiler team reserves the right to change intermediate representations to improve error messages or performance.

Treat intermediate output as a diagnostic window, not as a specification.

When to inspect which stage

Use --pretty=expanded or cargo expand when a macro generates unexpected code and you need to see the raw Rust before type checking. Use -Z unpretty=hir when type inference fails or you want to verify how the compiler desugars control flow and trait calls. Use -Z unpretty=mir when borrow checker errors feel wrong and you need to inspect the exact control flow graph and borrow scopes. Use -C llvm-args=-print-after-all when profiling shows a specific function is slower than expected and you want to verify LLVM optimization passes. Reach for standard cargo build output for everyday development; the intermediate stages are debugging tools, not daily drivers.

Keep your workflow simple. Inspect only when the error message leaves you guessing.

Where to go next