The speed trap
You finish your Rust project. It compiles. You run cargo run. The output is correct. Then you time it. It takes four seconds. You run the same logic in a C program or a Go binary and it takes 0.4 seconds. You didn't write bad code. You just forgot to turn on the engine.
Cargo defaults to debug mode. This default prioritizes fast compilation and helpful debugging information. It does not prioritize runtime performance. When you ship code or measure speed, you need release mode. Release mode tells the compiler to stop helping you debug and start making the code fast.
Debug versus release
Debug mode is like a student driver with an instructor in the passenger seat. The instructor checks every mirror, verifies every speed limit, and ensures every turn signal is used. The car moves slowly, but mistakes are caught immediately. The instructor also carries a detailed log of every action, which is useful if something goes wrong.
Release mode is the driver alone on the highway. The instructor is gone. The checks are removed. The car moves fast. The detailed log is discarded to save space. If the driver makes a mistake, the consequences are immediate and the log won't help.
Rust uses this distinction to give you the best of both worlds. During development, you want the instructor. You want fast rebuilds and clear error messages. When you deploy, you want the highway. You want maximum speed and minimal size.
What changes in release mode
When you add --release, Cargo switches to the release profile. This profile changes several compiler flags. The most important changes are:
- Optimizations are enabled. The compiler performs aggressive transformations to speed up execution.
- Debug information is stripped. The binary does not include symbols needed for a debugger.
- Assertions are filtered.
debug_assert!calls are removed.
The compiler performs inlining, dead code elimination, loop unrolling, and vectorization. Inlining copies the body of a small function directly into the caller, removing the overhead of a function call. Dead code elimination removes code that is never reached. Loop unrolling duplicates loop bodies to reduce branch overhead. Vectorization uses SIMD instructions to process multiple data elements in parallel.
These optimizations require significant analysis. The compiler spends more time thinking about your code, which increases build time. Release builds are slower to compile. This is the trade-off. You pay with build time to gain runtime speed.
// A simple function that benefits from inlining.
// In debug mode, this is a function call.
// In release mode, the compiler likely inlines this into the caller.
fn square(x: f64) -> f64 {
x * x
}
fn main() {
// The compiler analyzes this loop.
// In release mode, it may unroll the loop and use SIMD instructions.
let mut sum = 0.0;
for i in 0..10000 {
sum += square(i as f64);
}
println!("{sum}");
}
Run this with cargo run and then cargo run --release. The output is the same. The execution time is often dramatically different. For compute-heavy code, the difference can be an order of magnitude.
The debug_assert! trap
The most dangerous difference between debug and release mode is debug_assert!. This macro behaves like assert!, but it is removed entirely in release builds.
fn divide(a: f64, b: f64) -> f64 {
// This check exists in debug mode.
// This check vanishes in release mode.
debug_assert!(b != 0.0, "Divisor must not be zero");
a / b
}
fn main() {
// In debug mode, this panics.
// In release mode, this returns NaN or Inf and continues.
let result = divide(10.0, 0.0);
println!("{result}");
}
If you test this in debug mode, the program panics when b is zero. You fix the bug. You ship the release build. A user passes zero. The debug_assert! is gone. The division produces NaN. The program continues with garbage data and crashes later in an unrelated place.
Use debug_assert! only for invariants that are guaranteed by the logic but expensive to check. Use assert! for invariants that must hold in production. If the check matters for correctness, use assert!.
Treat debug_assert! as a comment that the compiler enforces during development. If the invariant matters in production, use assert!.
Binary size and stripping
Debug mode includes debug symbols. These symbols map machine code back to your source code. They are essential for debugging. They also bloat the binary. A release binary without stripping can still be large because it includes metadata and unoptimized sections.
The strip command removes symbols from a binary. This reduces file size significantly. It also removes the ability to debug the binary with a debugger.
# Build the release binary
cargo build --release
# Strip the binary to reduce size
strip target/release/my_project
You can automate stripping in Cargo.toml. This is the modern convention.
[profile.release]
# Strip symbols from the final binary.
# "symbols" removes all symbols. "debuginfo" keeps some for backtraces.
strip = "symbols"
Convention aside: strip = "symbols" is the standard for production binaries where size matters. If you need backtraces in production, use strip = "debuginfo" or leave stripping disabled and rely on external symbol servers.
Tuning the release profile
The default release profile is a good starting point. For performance-critical applications, you can tune it further. The two most impactful settings are lto and codegen-units.
Link Time Optimization (lto) allows the compiler to optimize across crate boundaries. Normally, each crate is compiled independently. With LTO, the linker sees the entire program and can inline functions across crates, eliminate dead code across crates, and optimize data layouts globally. LTO increases build time significantly but can improve runtime performance and reduce binary size.
Code generation units (codegen-units) control parallelism. The compiler splits the work into units and processes them in parallel. More units mean faster builds but less optimization opportunity. Fewer units mean slower builds but better optimization. The default is the number of CPU cores. Setting codegen-units = 1 forces the compiler to optimize the entire crate as a single unit. This maximizes optimization depth at the cost of build time.
[profile.release]
# Enable Link Time Optimization.
# Allows cross-crate inlining and dead code elimination.
# Increases build time.
lto = true
# Set codegen units to 1 for maximum optimization.
# Default is number of cores. 1 is slower but faster runtime.
codegen-units = 1
# Optimize for size. "s" optimizes for size. "z" aggressively optimizes for size.
# Default is "3" for speed.
opt-level = 3
LTO trades build time for runtime speed. Measure the gain. If your build time doubles for a 2% speedup, you probably don't need it.
Set codegen-units = 1 only for the final build. Your daily builds will suffer.
Cross-compilation
You often need to build for a different target than your development machine. Rust supports cross-compilation out of the box. You need to install the target and pass it to Cargo.
# Install the target toolchain
rustup target add x86_64-unknown-linux-musl
# Build for the target in release mode
cargo build --release --target x86_64-unknown-linux-musl
The musl target produces a statically linked binary. This binary does not depend on system libraries. It runs on almost any Linux system. This is useful for deployment in containers or minimal environments.
Cross-compilation works with release mode. The optimizations apply to the target architecture. The binary is optimized for the target CPU.
Pitfalls and undefined behavior
Release mode changes how the compiler generates code. This can expose bugs that debug mode hides. Debug mode sometimes adds padding, checks, or initialization that prevents undefined behavior from manifesting. Release mode removes these safety nets.
If your code works in debug mode but segfaults in release mode, you likely have undefined behavior. The compiler assumes your code is valid. If it is not, the compiler may generate code that crashes or produces incorrect results. Release mode is a stress test for undefined behavior.
Another pitfall is println! in hot loops. println! is buffered, but it still has overhead. In a tight loop, println! can dominate execution time. Use println! for debugging. Use tracing or log for production logging. These libraries can be configured to disable logging at runtime without code changes.
Release mode doesn't fix undefined behavior. It just changes when the explosion happens.
Realistic workflow
Your development workflow should separate debug and release usage.
- Active development: Use
cargo run. Fast iteration. Helpful errors. - Performance testing: Use
cargo run --release. Measure speed. Profile the code. - Deployment: Use
cargo build --release. Produce the artifact. Strip the binary. Distribute.
# Daily development
cargo run
# Performance check
cargo run --release
# Build for deployment
cargo build --release
strip target/release/my_project
This workflow keeps development fast and ensures you test the actual performance of your code. Never optimize based on debug timings. Debug timings are meaningless for performance.
Decision matrix
Use cargo run for active development when you need fast rebuilds and debug symbols. Use cargo run --release for performance measurement when you need to benchmark runtime speed. Use cargo build --release for deployment when you need an optimized binary for distribution. Use lto = true when you need maximum speed or smaller binaries and can tolerate longer build times. Use strip = true when binary size matters for distribution or embedded targets. Use codegen-units = 1 when optimization depth is more important than build parallelism. Use debug_assert! only for invariants that are guaranteed by logic but expensive to check in production. Use assert! for invariants that must hold in production.
Measure before optimizing. The default release profile is sufficient for most projects. Tune lto and codegen-units only when profiling shows a bottleneck and the build time cost is acceptable.