The copy-paste optimization
You're racing against a deadline. Your Rust application is sluggish. You run a profiler and spot a tiny function called millions of times. It's doing almost nothing, yet the call overhead is dragging down the frame rate. You remember a magic attribute from a blog post: #[inline]. You paste it above the function, rebuild, and the benchmark jumps 15%. Victory.
Then you decide to be thorough. You add #[inline] to every function in the crate. Compile time triples. The binary doubles in size. The benchmark drops 20%. The CPU is now spending more time fetching instructions than executing them. #[inline] is a tool, not a spell. It trades binary size and compile time for execution speed, but only when the math works out.
What inlining actually does
Inlining is copy-pasting. When you call a function, the CPU jumps to the function's address, executes the code, and jumps back. That jump costs cycles. The CPU has to save state, update the instruction pointer, and potentially miss a branch prediction. Inlining tells the compiler: "Don't jump. Just paste the function's code right here where the call happens."
Imagine a recipe that says "Add the spice mix." You have to walk to the pantry, get the mix, come back, and add it. If you're making 100 dishes, that walking adds up. Inlining is like pre-mixing the spices into every bowl before you start. You skip the pantry trips, but now every bowl has the spices sitting in it, taking up space. If the spice mix is tiny, the space cost is negligible, and you saved all those pantry trips. If the mix is huge, you've wasted a lot of spice and made the bowls too big to fit on the counter.
/// Adds two integers. Small body, likely to be inlined.
#[inline]
fn add(a: i32, b: i32) -> i32 {
// The body is one instruction.
// Inlining removes the call overhead entirely.
a + b
}
fn main() {
// The compiler might replace this call with `2 + 3` directly.
// The result is computed at compile time or with a single ALU op.
let result = add(2, 3);
println!("{}", result);
}
The #[inline] attribute is a hint. The compiler makes the final call. Rust's optimizer is smart. It often inlines small functions automatically, even without the attribute. The attribute nudges the compiler when it's on the fence. The compiler weighs the size of the function against the number of call sites. If the function is small and called often, the hint usually wins. If the function is large or called once, the compiler ignores the hint to keep the binary lean.
Trust the optimizer. It knows your binary size budget better than you do.
The hidden cost: Instruction cache
Modern CPUs are fast, but they have a hierarchy. The fastest memory is the L1 instruction cache, sitting right on the processor die. It's tiny, usually 32 to 64 kilobytes. If the code the CPU is executing fits in L1, it runs at full speed. If the code is larger, the CPU has to fetch instructions from L2 cache or main memory, which is orders of magnitude slower.
When you inline a function everywhere, the binary grows. Every call site gets a copy of the function body. If you inline too aggressively, the hot path of your program might blow past the L1 cache size. The CPU starts thrashing the cache, evicting useful instructions to make room for duplicated code. Performance tanks.
This is why profiling matters. Inlining helps when the function is small enough that the duplication doesn't hurt the cache, and frequent enough that the call overhead is significant. Inlining hurts when the duplication pushes code out of cache, or when the function is called rarely so the call overhead doesn't matter.
Convention aside: The Rust community generally avoids #[inline(always)] except in very specific cases. #[inline(always)] forces the compiler to inline, even if it's a bad idea. This can cause binary bloat and cache thrashing. The safe default is #[inline], which gives the compiler permission to inline but lets it back down if the cost is too high.
Don't fight the hardware. If your code doesn't fit in L1, no amount of inlining will save you.
Generics and the inlining dance
Generics are where inlining shines. When you write a generic function, the compiler creates a concrete version for each type you use. This process is called monomorphization. If the generic function is small, inlining merges the generic code into the caller, allowing the compiler to use type-specific optimizations.
For example, if the type is u8, the compiler can use 8-bit registers and byte-sized operations. If the function is a separate call, the compiler might be forced to use 64-bit registers for alignment, wasting space and cycles. Inlining unlocks these micro-optimizations because the compiler sees the concrete type at the call site.
/// Computes the squared distance between two points.
/// Inlining helps the compiler optimize math for the specific type.
#[inline]
fn dist_sq<T>(x1: T, y1: T, x2: T, y2: T) -> T
where
T: std::ops::Sub<Output = T> + std::ops::Mul<Output = T>,
{
let dx = x2 - x1;
let dy = y2 - y1;
dx * dx + dy * dy
}
fn main() {
// The compiler sees `f64` here.
// It can inline `dist_sq` and use floating-point registers.
let d = dist_sq(0.0f64, 0.0f64, 3.0f64, 4.0f64);
println!("{}", d);
}
Trait methods benefit from inlining too. If a trait method is not inlined, the compiler might have to use a vtable or dynamic dispatch, or just a regular call. Inlining allows the compiler to optimize across the trait boundary when the concrete type is known.
Convention aside: If you're writing a trait that's meant to be fast, mark the methods #[inline]. This gives the compiler permission to inline the implementation when it knows the concrete type. Without it, the compiler might be conservative and keep the call, losing optimization opportunities. This is common in crates like num-traits or std::iter::Iterator.
Mark trait methods #[inline] if you want callers to get the best performance.
Pitfalls and compiler feedback
Inlining isn't free. The compiler warns you if you ask for something impossible. If you use #[inline(always)] on a recursive function, the compiler will reject it. You can't inline infinite recursion. The warning looks like "function cannot be inlined because it is recursive."
// This will trigger a warning.
// Recursive functions cannot be inlined always.
#[inline(always)]
fn factorial(n: u64) -> u64 {
if n == 0 { 1 } else { n * factorial(n - 1) }
}
The compiler also warns if a function is too large to inline. The warning mentions "function body is too large to inline." This happens when the function has many instructions, complex control flow, or calls other functions that can't be inlined.
Inlining changes the stack trace. The function disappears from the call stack. Debuggers handle this with debug information, but if you're looking at a raw trace, the function might be gone. This is usually fine, but good to know when debugging.
Don't force the compiler. #[inline(always)] is a sledgehammer that usually breaks the window you're trying to fix.
When to use inline
Use #[inline] for small, hot functions in tight loops where profiling shows call overhead is the bottleneck. Use #[inline] on trait methods to enable optimization across abstraction boundaries when the concrete type is known. Use #[inline(always)] only when you have measured that the default behavior is insufficient and you accept the risk of binary bloat. Use #[inline(never)] when you need to reduce binary size or when a function is large and called rarely. Reach for profiling tools before adding attributes. Guessing about performance is rarely accurate.
Measure first. Inline second. Regret never.