How to Reduce Binary Size in Rust

The 12-megabyte surprise

You write a twenty-line command-line tool. It counts words in a file. You run cargo build and check the output directory. The binary is eleven megabytes. You stare at it. A comparable Python script is a few kilobytes. A C program is under a hundred kilobytes. Rust just handed you a brick.

The compiler is not being difficult. It is doing exactly what you asked. By default, Rust prioritizes fast compilation, detailed debugging information, and maximum runtime safety. It leaves the instruction manual, the safety cones, and the extra toolboxes inside the final executable. Shipping that to users or flashing it onto a microcontroller is a mistake. Shrinking the binary requires telling the compiler exactly what to discard and how hard to optimize.

Debug builds are construction sites

Think of a debug build like an active construction site. The foreman leaves blueprints on every table. Safety markers are taped to every beam. The crew keeps spare tools in every room because they do not know which wall they will tear down next. This setup makes it easy to walk around, inspect the work, and fix mistakes quickly. It also makes the site massive.

A release build packs the building for occupancy. The blueprints get archived. The safety markers come down. The crew consolidates tools into a single storage closet. The structure is the same, but the footprint shrinks dramatically. Rust's default debug profile keeps the construction site open. Switching to release mode closes it.

// This is just a placeholder to show the structure.
// Real size reduction happens in Cargo.toml and build flags.
fn main() {
    // Debug mode compiles fast.
    // It embeds debuginfo for GDB and LLDB.
    // It skips aggressive optimization passes.
    println!("Hello, world");
}

Run cargo build --release to flip the switch. The compiler activates LLVM's optimization pipeline. It inlines functions, eliminates dead code, unrolls loops, and replaces runtime checks with compile-time guarantees. A typical hello-world binary drops from several megabytes to under a hundred kilobytes. That single flag solves eighty percent of binary size problems.

Trust the release profile. It is the foundation everything else builds on.

The release profile baseline

The --release flag points to the [profile.release] section in your Cargo.toml. You can tune it without touching environment variables. The most impactful setting is opt-level. It controls how aggressively LLVM optimizes machine code.

[profile.release]
# opt-level = 3 is the default. It maximizes speed.
# opt-level = "s" optimizes for size without sacrificing much speed.
# opt-level = "z" optimizes for size and disables some vectorization.
opt-level = "s"

The community convention is to stick with 3 for performance-critical tools and switch to "s" or "z" when binary size matters more than peak throughput. "z" tells LLVM to prefer smaller instructions over faster ones. It disables auto-vectorization and prefers scalar operations. Use it when you are shipping to constrained devices or distributing a CLI tool over slow networks.

Change the level, run cargo build --release, and check the file size. The difference is usually immediate.

Link-time optimization

LLVM optimizes each compilation unit independently by default. It sees a function, optimizes it, and moves on. It does not look across crate boundaries. Link-time optimization changes that. It waits until every object file is ready, then runs the optimizer on the entire program as a single unit.

LTO eliminates cross-crate dead code, inlines functions across dependencies, and merges duplicate constants. A program that pulls in serde, clap, and tokio often sheds megabytes after LTO. The trade-off is compile time. LTO forces the linker to run the optimizer on the whole graph. Compilation can double or triple.

[profile.release]
# true enables LTO. It runs after all crates are compiled.
# "thin" is faster but less aggressive. It only optimizes 
# the final linking step without full cross-crate analysis.
lto = true

Enable lto = true when you are ready to ship. Keep it disabled during active development. The compiler will thank you with faster feedback loops.

Codegen units and parallelism

Rust splits your crate into multiple codegen units to compile them in parallel. More units mean faster builds. Fewer units mean the optimizer has a wider view of the code, which often produces smaller and faster binaries.

[profile.release]
# Default is usually the number of CPU cores.
# Setting it to 1 forces sequential compilation.
# The optimizer sees the whole crate at once.
codegen-units = 1

Setting codegen-units = 1 is a common convention for final releases. It sacrifices build speed for optimization quality. The compiler cannot split the crate into chunks, so it runs a more thorough analysis on every function and data structure. Combine it with LTO for maximum reduction.

Do not leave this at one during development. Your terminal will freeze while waiting for incremental rebuilds.

Stripping the fat

Even after optimization, the binary contains symbol tables, debug information, and relocation data. Stripping removes metadata that the runtime does not need. You can do it via the linker, the strip command, or Cargo's built-in setting.

[profile.release]
# "debuginfo" removes DWARF data but keeps symbols.
# "symbols" removes everything except what the dynamic linker needs.
# "none" keeps everything. This is the debug default.
strip = "symbols"

The strip = "symbols" setting is the modern convention. It replaces the old RUSTFLAGS="-C link-arg=-s" pattern. Cargo passes the correct flags to your platform's linker automatically. On Linux, it calls strip --strip-all. On macOS, it uses strip -x. The result is identical. The binary shrinks, and the executable runs exactly the same.

Convention aside: developers often reach for cargo bloat to measure what is actually inside the binary. Install it with cargo install cargo-bloat, then run cargo bloat --release --crates. It breaks down size by crate and function. You will quickly see which dependency is hoarding space.

Dependency bloat and feature flags

Rust's ecosystem is powerful. It is also heavy. A single dependency can pull in dozens of transitive crates. Many of them enable features you never use. serde enables derive, std, and alloc by default. clap pulls in atty, strsim, and textwrap. Each feature flag unlocks code paths, macros, and runtime checks that inflate the binary.

Audit your Cargo.toml. Disable features you do not need.

[dependencies]
# Disable default features to avoid pulling in std or heavy backends.
# Explicitly enable only what your code actually calls.
serde = { version = "1", default-features = false, features = ["alloc"] }

The alloc feature gives you Vec and String without pulling in the full standard library. The core feature gives you primitives and slices without heap allocation. Switching from std to core or alloc is a common convention in embedded and size-sensitive projects. It requires marking your crate as #![no_std] and providing a panic handler, but the size savings are dramatic.

Read dependency documentation before adding them. A lighter crate often exists. clap is heavy. os_str_bytes or tinystr might be enough. serde_json is massive. serde_cbor or bincode shrink payloads and binaries.

Pitfalls and trade-offs

Aggressive optimization changes how the compiler schedules instructions. It can reorder memory accesses, eliminate bounds checks, and replace loops with lookup tables. These transformations are mathematically sound for safe Rust code. They break when you introduce undefined behavior.

If your program uses unsafe blocks that assume a specific memory layout or execution order, opt-level = "z" or LTO might expose the bug. The compiler will not warn you. It will generate smaller machine code that crashes at runtime. Test thoroughly after changing optimization levels.

Another pitfall is over-stripping. Removing all symbols breaks debuggers and crash reporters. If you ship to users and want stack traces, keep strip = "debuginfo" or generate separate debug symbols with RUSTFLAGS="-C split-debuginfo=packed". The binary stays small. The debug data lives in a companion file.

LTO also increases memory usage during linking. Machines with four gigabytes of RAM can swap heavily when linking large dependency graphs. Add swap space or lower codegen-units to two before enabling full LTO.

Optimization is a dial, not a switch. Turn it until the binary fits your constraints, then stop.

Decision matrix

Use opt-level = "s" when you need a smaller binary without sacrificing noticeable runtime performance. Use opt-level = "z" when every kilobyte matters and you are targeting embedded devices or distribution over constrained networks. Use lto = true when your dependency graph is large and cross-crate dead code elimination will remove megabytes of unused functions. Use codegen-units = 1 when you want the optimizer to analyze the entire crate at once and you can tolerate slower build times. Use strip = "symbols" when you are shipping a final release and do not need runtime debugging or crash symbolication. Use default-features = false on dependencies when you only need a subset of a crate's functionality and want to avoid pulling in heavy backends. Reach for #![no_std] and core/alloc when the standard library's runtime initialization and heap allocator are too heavy for your target environment.

Treat optimization settings as part of your build contract. Lock them in Cargo.toml. Do not rely on environment variables in CI.

Where to go next

Reducing binary size in Rust builds your program for production use, making it run faster and take up less space on disk. It works like packing a suitcase for a trip: you remove the heavy, unnecessary items (debug symbols) and organize the rest tightly (optimizations) so everything fits better.