When your binary grows overnight
You add a single generic helper to your Rust project. It looks elegant. It compiles instantly. Then you run cargo build --release and watch your binary jump from two megabytes to forty. The compiler did not crash. It quietly duplicated your function a hundred times, once for every type you fed it. You need to find exactly which types are multiplying your code before you start guessing.
What monomorphization actually does
Rust handles generics through monomorphization. The compiler takes your generic function and generates a separate copy for every concrete type you use it with. If you write a function that works on i32, String, and Vec<u8>, the compiler produces three distinct machine code versions. Each version is optimized for its specific type, which is why generics are fast. The tradeoff is space. Every copy lives in your binary. When you nest generics, use collections, or depend on heavy crates, those copies multiply exponentially.
Think of it like a bakery making custom cakes. A generic recipe is the template. Every time an order comes in for a different flavor, the baker prints a new recipe card tailored to that flavor. The cakes taste better because the instructions are precise. But if you get fifty orders for slightly different flavors, your kitchen fills up with recipe cards. You need a way to count them before the shelves collapse.
Monomorphization happens at compile time. The compiler does not keep the generic template in the final binary. It replaces it with concrete implementations. This is why Rust generics have zero runtime overhead. It is also why a single fn process<T: Trait>(item: T) can become process_i32, process_String, process_CustomStruct, and so on. Each instantiation gets its own symbol, its own stack layout, and its own optimized instruction sequence.
How cargo-llvm-lines measures the damage
cargo-llvm-lines is that counter. It compiles your project, intercepts the LLVM intermediate representation, and counts the lines of code generated for each function. It groups identical functions by their mangled name and tells you exactly how many lines each one occupies. High line counts on a single function mean the compiler generated many monomorphized versions.
The tool does not measure final binary size. It measures LLVM IR lines. LLVM IR is a low-level, platform-independent representation that sits between Rust and machine code. One line of IR usually corresponds to one machine instruction after optimization. The correlation is not perfect. The linker strips unused symbols, dead code elimination removes unreachable branches, and compression changes final bytes. But IR lines give you a reliable proxy for what the compiler is actually generating.
Install it and run it against your release build. Always use release mode. Debug builds skip optimizations and produce misleading line counts. The compiler leaves in debug metadata, unoptimized loops, and unmerged functions that inflate the numbers artificially.
// Run these commands in your terminal
// Install the tool once per machine
cargo install cargo-llvm-lines
// Compile your project and count the generated LLVM lines
cargo llvm-lines --release
The output prints a table. The first column shows the function name. The second column shows the total lines across all instantiations. The third column shows how many times that function was generated. Look for functions where the line count is high and the instance count is above one. Those are your bloat sources.
Reading the output
When you run the command, cargo-llvm-lines triggers a normal compilation but tells rustc to emit LLVM IR instead of machine code. It parses that IR, extracts function definitions, and strips away the compiler's internal mangling to show readable names. It then aggregates the line counts. The tool does not change your code. It just reads what the compiler is about to assemble.
If you see std::vec::Vec<T>::push with three thousand lines and fifty instances, your code is creating vectors of fifty different types and pushing to them. The compiler generated fifty separate push implementations. Each one handles the memory layout, alignment, and drop logic for its specific T. That is expected behavior. It only becomes a problem when those instances bloat your final binary beyond what your target platform can handle.
Raw output gets noisy fast. Large projects print thousands of functions. Use the --crates flag to see which dependency is responsible. This separates your code from serde, tokio, or std.
# Group the output by crate to isolate the source of bloat
cargo llvm-lines --release --crates
Once you spot a suspicious function, narrow the view. The --filter flag accepts a substring and shows only matching functions. This lets you verify whether the bloat comes from your module or a third-party crate.
# Isolate a specific generic to see its type instantiations
cargo llvm-lines --release --filter "my_crate::process_data"
Convention aside: always run cargo llvm-lines after a clean build. Stale artifacts in target/ can skew the counts. Add --locked if you want to guarantee identical dependency versions across runs. The tool caches compilation results, and a dirty cache will hide the actual line counts.
Treat the output as a map, not a verdict. High line counts do not automatically mean bad code. They mean the compiler did exactly what you asked. Your job is to decide whether the duplication is worth the speed.
A realistic bloat scenario
Consider a data processing pipeline that reads configuration, parses records, and writes results. You write a generic serializer to handle multiple output formats.
/// Serializes a record into the target format
fn serialize_record<T: std::fmt::Display>(record: &T) -> String {
// Format the record and wrap it in metadata
let payload = format!("{{\"data\": \"{}\"}}", record);
// Add a timestamp header for logging
let timestamp = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.unwrap()
.as_secs();
format!("[{}] {}", timestamp, payload)
}
You call this function with String, i64, f32, CustomEvent, and NetworkPacket. The compiler generates five separate functions. Each one contains the full formatting logic, the timestamp calculation, and the string allocation. If you add more types, the count grows. Run cargo llvm-lines --release --filter "serialize_record" and you will see the instance count climb. The line count multiplies because every instantiation includes the same boilerplate.
The fix depends on what varies and what stays the same. The timestamp and JSON wrapping are identical across types. Only the Display formatting changes. Extract the shared logic into a non-generic helper.
/// Wraps a pre-formatted string with metadata
fn wrap_with_metadata(payload: &str) -> String {
// Calculate the current epoch time once
let timestamp = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.unwrap()
.as_secs();
// Return the combined log line
format!("[{}] {{\"data\": \"{}\"}}", timestamp, payload)
}
/// Serializes a record using the shared wrapper
fn serialize_record<T: std::fmt::Display>(record: &T) -> String {
// Format only the type-specific part
let payload = record.to_string();
// Delegate the repetitive wrapping to the non-generic helper
wrap_with_metadata(&payload)
}
Now the compiler generates one copy of wrap_with_metadata and multiple thin wrappers for serialize_record. The line count drops. The binary shrinks. The runtime behavior stays identical.
Fixing the problem
Finding the bloat is only half the work. Fixing it requires tradeoffs. The compiler generates separate copies because it wants to optimize each one. Removing the copies usually means removing the optimization.
If a function is generic over a trait, you can switch to dynamic dispatch. Wrapping the type in Box<dyn Trait> or &dyn Trait tells the compiler to generate one function that uses a vtable. You lose compile-time specialization, but you gain a single binary copy. The runtime cost is a virtual call and an extra pointer indirection. It is usually negligible unless the function sits in a tight loop.
If the function is generic over concrete types like Vec<T> or HashMap<K, V>, trait objects will not help. You need to reduce the number of unique T values. Introduce a wrapper type that normalizes the input before it reaches the generic. Or extract the shared logic into a non-generic helper that takes slices or trait objects.
Dependency bloat often comes from feature flags. Crates like serde or regex enable optional features that pull in heavy generic machinery. Check your Cargo.toml. Disable features you do not use. Many crates offer default-features = false to strip out the heavy generics.
Some developers reach for #[inline] to shrink binaries. The attribute hints the compiler to paste the function body at the call site instead of generating a separate copy. It can reduce binary size when the function is small and called frequently. It also increases compile time and can bloat the instruction cache if overused. Reserve it for hot paths where profiling proves a win.
Iterate deliberately. Make one change. Run cargo llvm-lines --release again. Verify the line count drops. Commit the refactor. Do not chase marginal gains across the entire codebase. Focus on the top three functions by line count. They usually account for eighty percent of the bloat.
When to reach for what
Use cargo llvm-lines when your release binary exceeds your target size limit and you need a data-driven starting point. Use dynamic dispatch (Box<dyn Trait>) when a generic function is instantiated dozens of times over trait bounds and the performance difference is acceptable. Use feature flag pruning when third-party dependencies are generating thousands of lines you never call. Use wrapper types or normalization when you control the input types and can reduce the number of unique monomorphizations. Use #[inline] sparingly when profiling shows a small generic function is duplicating across call sites and consuming instruction cache.
Treat the line count as a symptom, not a disease. Optimize for the binary size your users actually need, not for a perfectly flat function list.