How to avoid unnecessary allocations

Cut Rust heap allocations by pre-sizing collections, chaining iterators without intermediate Vecs, borrowing instead of owning, using Cow, and reusing buffers across calls.

The hidden cost of "just call collect"

You've written a nice short Rust function. It reads some numbers, doubles them, keeps the positive ones, joins them with commas. It works. You ship it. A profiler later tells you 30% of CPU time is spent in malloc and free. Nothing in your code looks expensive. So where's the time going?

The answer is almost always heap allocations. Each String::from, each vec![], each .collect::<Vec<_>>() asks the allocator for memory. Allocators are not slow exactly, but they're not free, and once you do millions of them in a tight loop the bill adds up. Worse, allocations cause cache pressure: the value you just put on the heap may not be hot in cache when you read it back, so every access is a small stall.

The good news is that Rust gives you a lot of tools to write the same logic without allocating. The trick is to learn where the allocations are hiding.

Where allocations sneak in

Three big sources, in roughly the order you'll meet them:

Resizing collections. A Vec::new() starts with capacity zero. The first push allocates room for one element. The second push reallocates room for two. The fifth push reallocates again. By the time you've pushed a thousand items, you've reallocated about ten times, and each reallocation copies all previous elements into the new buffer. That's wasted work if you knew the size upfront.

Intermediate collections in iterator chains. Each .collect::<Vec<_>>() allocates. If you have three of them in a row to "split things up for readability," you've allocated three vectors when one might do.

String building. format! allocates. to_string() allocates. String::from allocates. Concatenating strings with + allocates. None of these are wrong, but in a hot path you may not need any of them.

Pre-allocating capacity

If you know roughly how many elements a collection will hold, tell it.

fn doubled_positives(input: &[i32]) -> Vec<i32> {
    // We can't have more output elements than input elements, so reserve
    // input.len() up front. One allocation, no resizes.
    let mut out = Vec::with_capacity(input.len());

    for &n in input {
        if n > 0 {
            // .push won't trigger a realloc because capacity is already big enough.
            out.push(n * 2);
        }
    }

    // Note: the Vec keeps its full capacity even if half the elements were filtered.
    // Call .shrink_to_fit() if memory is tighter than time.
    out
}

The same trick works for String::with_capacity, HashMap::with_capacity, HashSet::with_capacity, and almost every collection in std. If you're writing a function that takes an Iterator, you can ask it for size_hint() to get a lower bound, or use Iterator::collect which already tries to reserve correctly when the iterator implements ExactSizeIterator.

Iterators are the cheap path

Iterators in Rust are lazy. data.iter().filter(...).map(...) does not allocate; it builds a small struct that knows how to compute the next element on demand. As long as you don't collect in the middle, the whole chain runs as a single loop after the compiler is done with it.

fn sum_of_doubled_positives(input: &[i32]) -> i32 {
    // No intermediate Vec, no allocations. The iterator chain is fused into
    // a single loop by the optimizer.
    input.iter()
        .filter(|&&x| x > 0)
        .map(|&x| x * 2)
        .sum()
}

Compare with the version that allocates twice:

fn sum_of_doubled_positives_slow(input: &[i32]) -> i32 {
    // Allocates a Vec for the filter step.
    let positives: Vec<i32> = input.iter().filter(|&&x| x > 0).copied().collect();
    // Allocates another Vec for the map step.
    let doubled: Vec<i32> = positives.iter().map(|&x| x * 2).collect();
    // Finally a third pass to sum.
    doubled.iter().sum()
}

Both are correct. The first does the same work in one pass, with zero allocations.

Borrow instead of own

A function signature like fn process(s: String) forces the caller to hand over an owned String. If the caller had a &str, they have to allocate one. Take a &str instead unless you genuinely need to own the value:

// Bad: caller has to allocate even if they had a &str.
fn print_upper(s: String) {
    println!("{}", s.to_uppercase());
}

// Good: works with &str, &String, and even string literals at no cost.
fn print_upper_borrowed(s: &str) {
    println!("{}", s.to_uppercase());
}

The same lesson applies to Vec<T> versus &[T]. Take the slice when you only need to read.

When you DO need ownership: Cow

Sometimes a function might or might not need to allocate, depending on the input. The classic example is "lowercase a string, but only if it isn't already lowercase." If the input is already lowercase, you have nothing to do; if it isn't, you need a new owned string. Cow<'a, str> ("clone on write") expresses this:

use std::borrow::Cow;

// Returns the same slice if no work was needed; otherwise an owned String.
fn lower(input: &str) -> Cow<'_, str> {
    if input.bytes().all(|b| !b.is_ascii_uppercase()) {
        // Cheap: just borrow the existing slice. No allocation.
        Cow::Borrowed(input)
    } else {
        // Expensive: allocate a new String for the lowercased version.
        Cow::Owned(input.to_lowercase())
    }
}

Callers treat Cow<str> like a &str for reading. They only pay for the allocation when one was actually needed.

Reusing buffers across calls

If a function allocates a buffer every call, hoist the buffer out and pass it in, clearing it between uses. Vec::clear keeps the allocation but resets the length to zero.

// Slower: allocates a new Vec every call.
fn build_lines_slow(items: &[u32]) -> Vec<String> {
    items.iter().map(|n| format!("item={}", n)).collect()
}

// Faster: reuse a caller-provided buffer.
fn build_lines_into(items: &[u32], out: &mut Vec<String>) {
    out.clear();
    out.reserve(items.len());
    for n in items {
        // We still allocate a String per item, but no Vec growth happens.
        out.push(format!("item={}", n));
    }
}

If you push the same idea further, you can reuse the String buffer too with write! from std::fmt:

use std::fmt::Write;

fn build_into(items: &[u32], out: &mut String) {
    out.clear();
    for n in items {
        // write! reuses out's existing allocation. No new String per iteration.
        write!(out, "item={}\n", n).unwrap();
    }
}

Knowing when to bother

Allocations only matter if the function is hot. For a CLI that runs once and exits, replacing format! with write! saves nanoseconds nobody will notice. For a per-request handler in a web server, the same change can cut tail latency in half. Profile first.

A few good measurement tools:

  • cargo flamegraph shows which functions accumulate samples.
  • dhat-rs (heap profiler) shows total allocation counts and bytes per call site.
  • criterion benchmarks the same function across changes.

You'll often find that one or two call sites account for almost all the allocator pressure. Fix those, ignore the rest.

A pitfall: clone() is sometimes free, sometimes not

u32 clones are free; the bit pattern is copied. String clones allocate. Rc<T> clones bump a counter, no heap work. The compiler can't tell you the difference because all of these implement Clone. Skim a function for .clone() calls and ask, for each, "what does this actually copy?". If the answer is String or Vec, consider whether you could borrow instead.

Where to go next