Why Is Rust Slow to Compile and What Can Be Done About It?

The coffee break that never ends

You type cargo build. You hit enter. The terminal spins. You walk to the kitchen. You make a sandwich. You eat the sandwich. You wash the plate. You check the terminal. It is still compiling.

This happens. Rust trades compile time for runtime speed and memory safety. The compiler does work that other languages defer to the runtime or skip entirely. You pay for zero-cost abstractions at the register, not at the destination. The binary you get is fast and safe because the compiler spent time proving it.

Understanding why the compiler takes time helps you tune the process. You can reduce wait times without sacrificing the guarantees that make Rust valuable.

Monomorphization: The price of generics

Rust generics are fast because they compile to concrete code for each type you use. This process is called monomorphization. The compiler generates a separate version of the function for every type parameter instantiation.

/// Finds the maximum of two values using generics.
fn max<T: PartialOrd>(a: T, b: T) -> T {
    // The compiler generates a comparison instruction
    // specific to the type T.
    if a >= b { a } else { b }
}

fn main() {
    // Triggers generation of max::<i32>.
    let int_max = max(10, 20);

    // Triggers generation of max::<f64>.
    let float_max = max(3.14, 2.71);

    // Triggers generation of max::<String>.
    let s1 = String::from("hello");
    let s2 = String::from("world");
    let string_max = max(s1, s2);
}

When you call max with an i32, the compiler produces machine code that compares integers. When you call it with an f64, it produces code that compares floating-point numbers. When you call it with String, it produces code that compares string slices. Each version is optimized for its specific type. There is no runtime dispatch overhead.

The trade-off is compile time. If you use a generic function with fifty different types, the compiler generates fifty functions. If you use a generic struct with fifty types, you get fifty struct layouts. The compiler must type-check and optimize each version. Large codebases with heavy generic usage can trigger a combinatorial explosion of generated code.

Generics are free at runtime because you paid for them at compile time.

The borrow checker's global view

Rust's safety guarantees come from the borrow checker. The borrow checker analyzes your code to ensure that references are always valid. It prevents data races, use-after-free, and dangling pointers.

This analysis requires a global view. The compiler must understand how data flows through your entire program. It tracks ownership, borrows, and lifetimes across function boundaries. It has to prove that no reference outlives the data it points to.

Imagine a conductor checking every instrument's sheet music before the concert starts. The conductor verifies that the violins don't play a note that clashes with the cellos three measures later. The conductor checks that the trumpets rest when the woodwinds take the solo. This check happens before the music plays.

In Rust, the borrow checker performs this analysis at compile time. It builds a graph of data dependencies. It checks that mutable borrows don't overlap with immutable borrows. It ensures that values are not moved after they have been borrowed. This analysis is computationally expensive. Complex generic code with many lifetime parameters can slow down the checker significantly.

The borrow checker reads the whole book before you turn the page.

LLVM: The optimization engine

Rust compiles to LLVM Intermediate Representation (IR). LLVM is a mature compiler infrastructure used by many languages. Rust hands the IR to LLVM, which runs a series of optimization passes.

These passes transform the code to make it faster and smaller. LLVM performs dead code elimination, inlining, loop unrolling, vectorization, and register allocation. It analyzes the control flow graph and reorders instructions for better CPU pipeline usage. It replaces expensive operations with cheaper equivalents when possible.

Each pass adds to compile time. The more aggressive the optimization level, the more passes LLVM runs. Debug builds use fewer passes. Release builds use many passes. The --release flag tells Cargo to use the release profile, which enables full optimization. This produces a faster binary but takes longer to compile.

LLVM is the muscle. Rust is the brain. The muscle takes time to flex.

Incremental compilation: The cache that saves you

Rust uses incremental compilation by default. When you modify a file, the compiler only recompiles the affected parts of your code. It caches intermediate artifacts in the target directory. Subsequent builds reuse these artifacts.

# Cargo.toml
[profile.dev]
# Incremental compilation is enabled by default.
# Explicitly setting it documents the intent.
incremental = true

The incremental cache stores type-checking results, MIR (Mid-level IR) optimizations, and codegen units. When you change a function, the compiler updates the cache for that function and any code that depends on it. If you change a dependency, the compiler rebuilds the dependency and any crates that use it.

Incremental compilation makes development fast. The first build of a large project can take minutes. Subsequent builds after small changes often take seconds. The cache persists across builds. It survives cargo clean only if you explicitly run it.

Convention aside: The community treats incremental = true as a given. You rarely see it in Cargo.toml because it is the default. If you disable it, you are opting out of a major performance feature. Only disable it for benchmarking compile times or in CI environments where caches are discarded anyway.

Trust the cache. If it says "Finished", it's done.

Tuning the build: Codegen units and LTO

You can tune compilation speed and binary performance using Cargo profiles. The most impactful settings are codegen-units and lto.

codegen-units controls how many parallel compilation threads the compiler uses. More units mean more parallelism and faster compilation. Fewer units mean less parallelism but better optimization opportunities. The compiler can inline functions and optimize across codegen units only when there is one unit.

# Cargo.toml
[profile.dev]
# Higher value for faster dev builds.
# Default is usually the number of CPU cores.
codegen-units = 16

[profile.release]
# One unit for maximum optimization.
# Slower compile, faster binary.
codegen-units = 1

Convention aside: The community standard for release builds is codegen-units = 1. This gives the best performance. For large projects where compile time is painful, developers often use codegen-units = 16 or 32 in release as a compromise. The runtime penalty is usually small, often less than 5%. Pick the value that balances your pain tolerance with performance needs.

lto stands for Link Time Optimization. LTO allows the linker to optimize across crate boundaries. It can inline functions from dependencies into your code. It can eliminate dead code from dependencies that you don't use.

# Cargo.toml
[profile.release]
# Thin LTO is faster to compile than fat LTO.
# It optimizes across crates but limits memory usage.
lto = "thin"

# Fat LTO gives maximum optimization.
# It can be very slow and memory-intensive.
# lto = "fat"

Thin LTO is the sweet spot for many projects. It provides significant optimization benefits without the extreme compile times of fat LTO. Fat LTO can double or triple compile time for large projects. Use it only when you have measured that thin LTO is not enough.

Tweak codegen-units until the pain stops, then stop tweaking.

Pitfalls: Macros, features, and false starts

Macros can hurt compile times. Macros expand to code at compile time. If a macro generates a lot of code, the compiler has to process all of it. Derive macros like serde_derive generate serialization and deserialization code for every field in a struct. Large structs with many fields generate large amounts of code.

use serde::{Serialize, Deserialize};

/// A large struct that triggers significant macro expansion.
#[derive(Serialize, Deserialize)]
struct BigConfig {
    // Each field generates serialization code.
    // Many fields mean more code to compile.
    field_1: String,
    field_2: i32,
    field_3: Vec<u8>,
    // ... imagine 100 fields here
}

If you see a long compile time after adding a derive macro, check the macro expansion. You can use cargo expand to see the generated code. If the expansion is huge, consider splitting the struct or using manual implementations for rarely used fields.

Feature flags can also impact compile times. Enabling many features in a dependency can pull in extra code and increase compile time. For example, serde with the derive feature is slower to compile than serde without it. tokio with full features is slower than tokio with only the features you need.

Audit your dependencies. Disable features you don't use. Use cargo tree --features to see what features are enabled. Remove unused dependencies. Every dependency adds to the compilation graph.

Compiler errors can appear after long waits. If you have complex generic constraints, the compiler might spend time resolving traits before failing. You might see E0277 (trait bound not satisfied) after a long compile. This happens because the compiler explores many paths to satisfy the bound before concluding it cannot. The error is correct, but the wait is frustrating. Simplify your trait bounds or split complex generic functions to reduce resolution time.

Check your dependencies. A single crate with too many features can kill your build speed.

Decision matrix

Use cargo check when you are iterating on logic and just need type errors. It skips code generation and is much faster than cargo build. Use cargo build when you need the binary or to verify linking. Use codegen-units higher than one when compile time is the bottleneck and you can accept a small runtime penalty. Use lto = "thin" when you want better optimization than default but full LTO is too slow. Use sccache or mold when building large projects repeatedly. sccache caches compiler results across builds. mold is a fast linker that reduces linking time.

Pick the tool that matches your current bottleneck. Compile time or runtime? You can't have both for free.

Where to go next

Rust takes time to compile because it double-checks your entire codebase for safety errors and optimizes the final program before you even run it. Think of it like a strict editor who proofreads and perfects your essay before you hand it in, rather than just letting you submit a rough draft. This upfront work means your program runs faster and crashes less often later.