What Is Monomorphization and How Does It Affect Binary Size?

The cookie cutter that vanishes

You're building a 2D game engine. You write a slick Vector2<T> struct to handle positions and velocities. You use it for floating-point positions and integer grid coordinates. You compile for release. The binary jumps from 2MB to 8MB. You didn't add much code. The compiler did.

This is monomorphization. It is the mechanism that makes Rust generics fast, and it is the reason your binary size can explode if you aren't careful.

How generics become machine code

Rust generics are templates. When you write a function with a type parameter T, you are describing a pattern, not a concrete implementation. The compiler cannot generate machine code for a pattern. Machine code must operate on specific types with known sizes and instructions.

Monomorphization is the process where the compiler fills in the blanks. It takes your generic code and generates a separate, specialized version for every concrete type you actually use.

Think of a generic function as a cookie cutter. The cutter defines the shape, but it's not edible. When you use the function with a specific type, the compiler presses that cutter into the dough of that type and bakes a cookie. The cookie is the actual machine code. If you use the function with i32, you get an i32 cookie. If you use it with f64, you get an f64 cookie. The cutter disappears. The binary contains only the cookies.

This approach gives you zero-cost abstractions. The generated code is as fast as if you had written the function by hand for each type. There is no runtime overhead. There is no dynamic dispatch. The cost is paid in compile time and binary size, not in execution speed.

Minimal example

Here is a generic function that finds the largest element in a slice.

/// Finds the largest element in a slice.
fn largest<T: PartialOrd>(list: &[T]) -> &T {
    let mut largest = &list[0];
    for item in list {
        // Compare items using the trait bound.
        if item > largest {
            largest = item;
        }
    }
    largest
}

fn main() {
    let ints = vec![1, 2, 3];
    let chars = vec!['a', 'b', 'c'];

    // Compiler generates largest::<i32> here.
    let _ = largest(&ints);

    // Compiler generates largest::<char> here.
    let _ = largest(&chars);
}

When you compile this, the compiler analyzes the call sites. It sees largest(&ints). It infers that T is i32. It checks that i32 implements PartialOrd. It generates a function largest_i32 that compares integers using integer comparison instructions. It does the same for char, generating largest_char.

The final binary contains two distinct functions. There is no largest function at runtime. There is no vtable lookup. There is no dynamic dispatch. Just two static functions tailored to their types.

If you try to call largest with a type that does not implement PartialOrd, the compiler rejects you with E0277 (trait bound not satisfied). This check happens at compile time. You never get a runtime panic for a missing trait.

Trust the compiler to generate the right instructions. You provide the logic; Rust provides the specialization.

Realistic impact on binary size

The math of monomorphization is simple but brutal. A generic struct with fifty methods, each compiling to a kilobyte of machine code, takes fifty kilobytes per type. Use that struct with twenty types, and you've added a megabyte to your binary. Use it with a hundred types, and you're at ten megabytes.

Consider a math library with a generic Matrix2<T> struct.

/// A simple 2x2 matrix for math operations.
struct Matrix2<T> {
    data: [[T; 2]; 2],
}

impl<T: std::ops::Add<Output = T> + Copy> Matrix2<T> {
    /// Adds two matrices element-wise.
    fn add(&self, other: &Matrix2<T>) -> Matrix2<T> {
        Matrix2 {
            data: [
                [self.data[0][0] + other.data[0][0], self.data[0][1] + other.data[0][1]],
                [self.data[1][0] + other.data[1][0], self.data[1][1] + other.data[1][1]],
            ],
        }
    }
}

fn main() {
    let m1 = Matrix2 { data: [[1.0, 2.0], [3.0, 4.0]] };
    let m2 = Matrix2 { data: [[0.5, 0.5], [0.5, 0.5]] };
    // Generates Matrix2::<f64>::add.
    let _ = m1.add(&m2);

    let i1 = Matrix2 { data: [[1, 2], [3, 4]] };
    let i2 = Matrix2 { data: [[1, 1], [1, 1]] };
    // Generates Matrix2::<i32>::add.
    let _ = i1.add(&i2);
}

The add method gets duplicated. One version uses floating-point addition instructions. The other uses integer addition. The binary size grows linearly with the number of types.

Convention is to mark small generic helper functions with #[inline]. This tells the compiler to prefer embedding the function body at the call site. If two places call largest::<i32>, inlining allows the linker to see the identical code and merge it. Without inlining, you might get two separate copies of the same function in the binary. The community calls this the "inline to deduplicate" pattern. It doesn't stop monomorphization, but it helps the optimizer reduce bloat.

Generics trade bytes for cycles. If your binary fits in the cache, the speed wins. If your binary is gigabytes, switch to trait objects.

Pitfalls and compiler behavior

The biggest pitfall is uncontrolled bloat. If you write a generic library and users instantiate it with dozens of types, the binary size can explode. The compiler does not deduplicate across crates by default. Each crate monomorphizes independently.

If crate_a generates largest::<i32> and crate_b generates largest::<i32>, the linker sees two identical functions and keeps both. This is called code duplication. Link-Time Optimization (LTO) can fix this. LTO runs optimization passes after linking, allowing the compiler to see the whole program and merge identical monomorphized functions from different crates. LTO recovers space but increases compile time. The trade-off is time for space.

Another pitfall is compile time. Monomorphization happens at compile time. If you have a complex generic type used with many types, the compiler has to generate and optimize many versions. This can slow down builds significantly. The compiler caches object files, so incremental builds help, but full rebuilds can be painful.

If you forget a trait bound, the compiler stops you with E0277. If you pass the wrong type, you get E0308 (mismatched types). These errors are clear and actionable. The compiler guides you to fix the type or add the bound.

Monitor your binary size. Bloat creeps in silently when you add a new type to a generic collection. Use tools like cargo bloat or cargo-llvm-lines to track which monomorphizations are consuming space.

Decision: when to use generics vs alternatives

Use generics when you need static dispatch and the type set is finite. The compiler generates specialized code for each type, giving you the speed of hand-written implementations without the repetition. This is the default choice for most Rust code.

Use trait objects when binary size matters more than peak performance. A Box<dyn Trait> stores a pointer to a vtable, so the compiler generates only one copy of each method. You pay a small runtime cost for the indirection, but you avoid code duplication. A vtable is a small table the runtime consults to find the correct function address. It's like a menu at a restaurant: you point to the item, the waiter brings the food. With generics, the waiter memorizes your order and brings the food directly. Faster, but the waiter needs to know every possible order beforehand.

Use #[inline] on small generic functions to help the compiler merge duplicates. This doesn't stop monomorphization, but it signals the optimizer to embed the code at call sites, which can reduce overall size if the same version is called from multiple places.

Use LTO (Link-Time Optimization) when you suspect bloat across crate boundaries. LTO runs optimization passes after linking, allowing the compiler to see the whole program and merge identical monomorphized functions from different crates. Enable it in Cargo.toml with [profile.release] lto = true.

Use concrete types when the generic parameter is only used in a few places. If you only need i32 and f64, writing two separate structs might be simpler and avoids generic complexity. This is rare, but it can help in embedded contexts where every byte counts.

Pick the tool that matches your constraint. Speed needs generics. Size needs traits. Balance needs LTO.

Where to go next

Monomorphization is like a factory that builds a custom tool for every specific material you give it, rather than using one universal tool. This ensures the final program runs very fast because the code is perfectly optimized for each data type, but it makes the final file larger because it includes a separate copy of the instructions for every type you used.