How to View Generated Assembly from Rust Code

Generate assembly output from Rust code using the rustc --emit=asm flag.

When the source code isn't enough

You wrote a hot loop. You suspect it's doing more work than it should. Maybe it's allocating on every iteration. Maybe the compiler didn't unroll it. Maybe you're passing a struct by value when a reference would suffice, and you want to see if the compiler caught that. Rust gives you the source, but the CPU only speaks assembly. Seeing the assembly bridges the gap between your logic and the silicon. It turns "I think this is fast" into "I know this is fast."

Viewing assembly is how you verify optimizations, debug performance regressions, and understand the ABI. It's also how you learn what the compiler is actually doing with your code. The compiler is a powerful optimizer, but it's not magic. Sometimes it needs a hint. Sometimes it gets confused. Sometimes it does something clever you didn't expect. The assembly shows you the truth.

What assembly actually is

Assembly is the textual representation of machine code. It's the lowest level of abstraction before raw binary. Every Rust program eventually becomes assembly, which the assembler turns into object files, which the linker turns into an executable. The assembly depends on your target architecture. x86_64 assembly looks different from ARM64 assembly. The instructions, registers, and calling conventions change.

Think of Rust code as a recipe for a cake. "Mix flour and eggs." "Bake at 350 degrees." The assembly is the exact sequence of muscle contractions in the baker's arm, the precise rotation of the wrist, the millisecond timing of the pour. The recipe is readable. The muscle movements are messy, repetitive, and only make sense if you understand the body. Assembly is the muscle movements of your program. You don't write it by hand. You read it to understand what the machine is actually doing.

Rust compiles to LLVM IR first, then to assembly. LLVM IR is an intermediate representation that the optimizer works on. The assembly is the final product after all optimizations are applied. If you want to see the effect of optimizations, you need to look at the assembly, not the IR.

The firehose approach: rustc --emit=asm

The most direct way to get assembly is to ask rustc to emit it. You can compile a file and output the assembly to a text file.

/// Adds two integers.
/// This function is simple enough to be inlined or optimized away entirely.
fn add(a: i32, b: i32) -> i32 {
    a + b
}

fn main() {
    let result = add(10, 20);
    println!("{}", result);
}

Run this command in your terminal:

rustc --emit=asm main.rs -o main.s

This generates a file named main.s. Open it in a text editor. You'll see a lot of text. Most of it is not your code. It's panic handlers, stack unwinding, startup code, and runtime initialization. rustc compiles the whole world, and the assembly reflects that.

The convention here is to use rustc --emit=asm only when you need a full dump of a single file for offline analysis or scripting. For interactive work, it's too noisy. The output includes everything, so finding your function requires searching through hundreds of lines of boilerplate.

Don't fight the compiler here. Reach for cargo asm for focused inspection.

The focused approach: cargo asm

The community standard for viewing assembly is cargo asm. It's a Cargo subcommand that parses the binary and extracts the assembly for specific functions. It handles dependencies, optimization levels, and symbol resolution automatically.

Install it with:

cargo install cargo-asm

Then run it on your project:

cargo asm add

This outputs the assembly for the add function. The output is clean. It shows just the function you asked for, with annotations for source lines. You can use regex to match multiple functions:

cargo asm 'add|sum'

cargo asm also respects your build profile. By default, it uses the dev profile, which has no optimizations. To see optimized assembly, use the release profile:

cargo asm --release add

The convention is to use cargo asm for local iteration. It's fast, focused, and integrates with your existing project. You can run it from your editor's terminal or a command palette. It's the tool you reach for when you're tweaking a function and want to see the result immediately.

Trust cargo asm for daily work. It saves you from the firehose.

Reading the output: registers and instructions

Assembly output varies by architecture. On x86_64 Linux, you'll see AT&T syntax or Intel syntax. cargo asm uses Intel syntax by default, which is more readable for most people. Here's what a typical function looks like:

example::add:
    push    rbp
    mov     rbp, rsp
    mov     DWORD PTR [rbp-4], edi
    mov     DWORD PTR [rbp-8], esi
    mov     eax, DWORD PTR [rbp-4]
    mov     edx, DWORD PTR [rbp-8]
    add     eax, edx
    pop     rbp
    ret

This is unoptimized debug assembly. It's verbose and slow. It saves arguments to the stack, loads them back, and adds them. The push and pop set up the stack frame. The mov instructions move data between registers and memory. The add instruction performs the addition. The result is in eax.

Now look at the optimized version:

example::add:
    lea     eax, [rdi+rsi]
    ret

The optimizer removed the stack frame. It loaded arguments directly from registers (rdi and rsi). It used lea (load effective address) to perform the addition, which is a common trick because lea doesn't affect flags and can execute in parallel with other instructions. The function is two instructions.

Reading assembly requires knowing the registers and common instructions. On x86_64, rax, rbx, rcx, rdx, rsi, rdi, r8, r9 are general-purpose registers. The calling convention specifies which registers hold arguments and return values. On System V AMD64 ABI, the first six integer arguments are in rdi, rsi, rdx, rcx, r8, r9. The return value is in rax.

Look for patterns. Vectorization shows up as vmov, vadd, vpslld instructions operating on wide registers like ymm or zmm. Loop unrolling shows up as repeated blocks of code. Inlining shows up as your function disappearing because it's merged into the caller.

If the function is missing, it got inlined. Add #[inline(never)] to force it to stay put.

Debug vs Release: two different programs

Debug and release builds produce different assembly. Debug builds prioritize fast compilation and debugging. They disable optimizations, add bounds checks, and include debug info. Release builds prioritize performance. They enable optimizations, remove dead code, and strip debug info.

The assembly for a debug build is not representative of runtime performance. It's a different program. Analyzing debug assembly for performance is like analyzing a prototype for fuel efficiency. The prototype works, but it's not the final car.

Always use the release profile when inspecting assembly for performance. cargo asm --release is the standard. If you're using rustc, add -O or --release:

rustc -O --emit=asm main.rs -o main.s

There's a nuance with Cargo. By default, Cargo compiles crates in parallel using multiple codegen units. This speeds up compilation but can limit optimization opportunities across crate boundaries. For maximum optimization, set codegen-units=1 in your .cargo/config.toml:

[build]
codegen-units = 1

This forces Cargo to compile the entire crate in a single unit, allowing the optimizer to see the whole picture. It slows down compilation, so use it only for release builds or final profiling. cargo asm handles this automatically when you pass --release, so you don't need to worry about it.

Never analyze performance based on debug assembly. It's a lie.

Pitfalls and compiler tricks

Viewing assembly reveals compiler behavior, but it also reveals traps. Here are common pitfalls.

Inlining can hide functions. If you ask for the assembly of a function and it's not there, the compiler inlined it into the caller. Inlining is usually good. It removes call overhead and enables further optimization. If you need to see the function, add #[inline(never)]. This is a convention for performance debugging. It forces the compiler to emit the function as a separate symbol.

/// Forces this function to remain a separate symbol.
/// Use this only for debugging assembly, not in production code.
#[inline(never)]
fn critical_loop(data: &[u8]) -> usize {
    let mut count = 0;
    for byte in data {
        if *byte > 128 {
            count += 1;
        }
    }
    count
}

Optimization levels matter. -O enables level 2 optimization. -C opt-level=3 enables level 3, which includes more aggressive inlining and auto-vectorization. The default for release is level 2. If you want level 3, set it in your Cargo.toml:

[profile.release]
opt-level = 3

Target features affect assembly. The compiler generates different code based on the CPU features available. If you compile for a generic x86_64 CPU, you won't see AVX instructions. If you compile for native, you'll see instructions specific to your CPU. Use -C target-cpu=native to enable all features of the host CPU. This is useful for local testing but not for distributing binaries.

rustc -C target-cpu=native -O --emit=asm main.rs -o main.s

The compiler might optimize away code entirely. If a function has no side effects and its result is unused, the compiler deletes it. This is dead code elimination. It's correct behavior. If you want to see the assembly, use the result or add std::hint::black_box to prevent optimization:

use std::hint::black_box;

fn main() {
    let data = vec![1, 2, 3];
    let result = sum(&data);
    black_box(result); // Prevents the compiler from optimizing away sum.
}

black_box tells the compiler that the value is used in an opaque way, so it can't assume anything about it. This is a convention for microbenchmarks and assembly inspection.

Trust the optimizer, but verify. If the assembly looks wrong, check your flags, your inlining, and your usage.

Decision: which tool fits your workflow

Use rustc --emit=asm when you need a full dump of a single file for offline analysis or scripting. Use cargo asm when you're iterating on a function and want to see the assembly without leaving your editor or terminal. Use Godbolt (Compiler Explorer) when you want to share a snippet, compare compilers, or test small changes instantly. Use profiling tools like perf or flamegraph when you need to find where the time is spent, not just what the code looks like. Use objdump when you need to disassemble a binary file directly, such as a release artifact from CI.

Godbolt is your sandbox. Use it.

Where to go next