How to Minimize Heap Allocations in Rust

The hidden cost of `collect()`

You are writing a command-line tool that parses a configuration stream. The stream contains a program name, a query string, and a file path. You write a function that takes a Vec<String>. You call .collect() on your input source to build that vector. Your code works. Then you test it with a massive log file. Your tool crashes with an out-of-memory error. The parser only needs three lines. You allocated gigabytes for a job that required kilobytes.

The problem is the collection. You forced Rust to allocate a contiguous block of memory for every single item before you could read the first one. You filled a warehouse before you started unpacking boxes. Rust gives you a better way. You can pass the stream directly. You process items as they arrive. You never hold the whole dataset in memory at once. You avoid the heap allocation entirely.

Iterators are state machines, not lists

An iterator in Rust is not a list. It is a state machine that knows how to produce the next item. When you call .next(), the state machine advances. It returns the item and updates its internal state. If there are no more items, it returns None.

The iterator does not store the items. It generates them or fetches them from a source. This distinction matters for memory. A Vec stores all items. An iterator stores only the current position. Passing an iterator to a function lets the function pull items one by one. The function controls the pace. The function decides when to stop. The function avoids allocating a container for items it might never see.

Think of an iterator like a tap on a water main. You can open the tap and fill a bucket. That's collect(). You need the bucket, and you need space for the water. Or you can hold a cup under the tap and drink as it flows. That's iterating. You only hold what you're currently using. The water main doesn't care. The tap just delivers.

Stop filling the warehouse. Turn on the conveyor belt.

Minimal example: consuming arguments

The standard library provides std::env::args(), which returns an iterator over command-line arguments. It does not return a Vec. This is intentional. It lets you process arguments without allocating a collection.

/// Consumes an iterator of strings to build a configuration, avoiding intermediate allocation.
fn build_config<I: Iterator<Item = String>>(mut args: I) -> Result<Config, String> {
    // Extract the first item. The iterator advances. No Vec created.
    let _program_name = args.next().ok_or("No program name")?;
    
    // Extract the second item.
    let query = args.next().ok_or("No query")?;
    
    // Extract the third item.
    let file_path = args.next().ok_or("No file path")?;
    
    // Return the config. The iterator is consumed and dropped.
    Ok(Config { query, file_path })
}

struct Config {
    query: String,
    file_path: String,
}

fn main() {
    // std::env::args() returns an iterator, not a Vec.
    // We pass the iterator directly. Zero heap allocation for the collection.
    let config = build_config(std::env::args());
    
    match config {
        Ok(c) => println!("Config loaded: {} -> {}", c.query, c.file_path),
        Err(e) => eprintln!("Error: {}", e),
    }
}

The function signature I: Iterator<Item = String> accepts any type that implements the Iterator trait. The mut args parameter is crucial. Calling .next() consumes the iterator state, so the parameter must be mutable. The function pulls exactly three items. If the iterator has more items, they are ignored. If it has fewer, the function returns an error. No intermediate storage is ever created.

The iterator is a state machine. Once .next() is called, that item is gone forever.

Walkthrough: the lifecycle of a consumed iterator

When build_config runs, here is what happens.

std::env::args() creates an Args struct. This struct holds a pointer to the environment data. It does not allocate a Vec.
build_config receives the Args struct by value. The ownership moves into the function.
The first args.next() call executes. The Args implementation checks the internal pointer, extracts the first string, advances the pointer, and returns Some(String).
The String is moved into _program_name. The heap allocation for that string exists, but the allocation for the collection does not.
Steps 3 and 4 repeat for query and file_path.
The function returns. The args variable goes out of scope. The Args struct is dropped. Since Args holds no heap allocation for a collection, the drop is trivial.

If you had used .collect::<Vec<_>>(), step 1 would have triggered an allocation for the Vec header and its capacity. Step 2 would have pushed each string into the vector, potentially reallocating and copying pointers as the vector grew. You would have paid for the vector allocation, the capacity growth, and the pointer indirection. By passing the iterator, you paid only for the strings you actually kept.

Trust the borrow checker. It usually has a point.

Realistic example: validation before allocation

Iterators shine when you need to inspect data before deciding to allocate. You can validate a header, check a checksum, or peek at the first element. If the validation fails, you return early. You never allocate the container for the body.

/// Uses Peekable to inspect the first element without consuming it permanently.
fn validate_and_process<I: Iterator<Item = String>>(mut records: I) -> Result<Vec<String>, String> {
    // Wrap the iterator in Peekable. This adds a buffer for one item.
    let mut peekable = records.peekable();
    
    // Peek looks ahead without advancing the iterator.
    // The item remains in the iterator for the next .next() call.
    match peekable.peek() {
        Some(header) if header.starts_with("VALID:") => {},
        _ => return Err("Bad header".to_string()),
    }
    
    // Only allocate if validation passed.
    // The Vec allocation happens here, not at the start.
    // If the header was bad, we allocated nothing for the body.
    Ok(peekable.collect())
}

The Peekable wrapper adds a tiny bit of overhead. It stores one item in a buffer so .peek() can return a reference without consuming the item. This buffer is on the stack. It does not trigger a heap allocation. The heap allocation for the Vec only happens in the Ok branch. If the header is invalid, the function returns an error immediately. The caller avoids allocating memory for data they will never use.

Inspect first, allocate second. That's how you save memory on error paths.

Pitfalls: one-way streets and trait bounds

Iterators have constraints. You must respect them or the compiler will stop you.

Iterators are consumed. You cannot iterate over the same iterator twice. If you try to call .next() after the iterator is exhausted, you get None. If you try to use the iterator variable after moving it, the compiler rejects you with E0382 (use of moved value).

let mut args = std::env::args();
let first = args.next();
// E0382: use of moved value: `args`
// let second = args.next(); // This would fail if args was moved, but here it's mut.
// However, if you passed args to a function, it's gone.

If you need to iterate twice, you have two choices. Collect into a Vec and iterate over the slice. Or use .peekable() if you only need to look ahead one item. Or use .by_ref() to borrow the iterator instead of moving it, though this still consumes items.

// E0596: cannot borrow `args` as mutable, as it is not declared mutable
// fn bad(args: Iterator<Item = String>) {
//     args.next(); // Error: args is not mut
// }

The compiler enforces mutability. Iterators must be mutable to advance. If you forget mut, you get E0596.

Trait bounds can be tricky. If your function requires the iterator to yield references, you must handle lifetimes.

/// Takes an iterator yielding references. The caller must keep the source alive.
fn process_refs<'a, I: Iterator<Item = &'a str>>(iter: I) {
    for item in iter {
        println!("{}", item);
    }
}

If you pass a temporary iterator that drops its source, you get E0716 (temporary value dropped while borrowed). The iterator must not outlive the data it references.

Iterators are one-way streets. If you need to go back, you paid for the map.

Convention: `IntoIterator` for ergonomics

When writing library functions, the community convention is to accept IntoIterator instead of Iterator. This allows the caller to pass a Vec, a slice, or an iterator. The function still receives an iterator internally. The conversion happens at the call site.

/// Accepts anything that can become an iterator.
/// The caller can pass a Vec, a slice, or an iterator.
fn sum_lengths<I: IntoIterator<Item = String>>(items: I) -> usize {
    // into_iter() converts the argument into an iterator.
    // This is zero-cost for iterators. It borrows or moves for collections.
    items.into_iter().map(|s| s.len()).sum()
}

This pattern is idiomatic. It makes your API flexible without sacrificing performance. The caller decides how to provide the data. Your function just consumes it.

Use IntoIterator in public APIs. Use Iterator in internal helpers where you control the input.

Decision: iterators versus collections

Choose the right tool based on your access pattern and memory constraints.

Use impl Iterator<Item = T> when you consume the data once and want to avoid allocating a container. Use impl Iterator<Item = T> when the data source is infinite or lazily generated, like reading from a network stream or generating primes. Use impl Iterator when you want to chain transformations like .filter() and .map() without creating intermediate collections.

Reach for Vec<T> when you need random access by index. Reach for Vec<T> when the caller needs to mutate the collection after passing it. Reach for Vec<T> when you must iterate over the data multiple times and the cost of re-fetching is higher than the allocation cost.

Pick &[T] when you have a slice of data that lives elsewhere and you only need read access. Pick &[T] when you want zero-copy semantics and the lifetime is simple.

Default to the iterator. Collect only when you have a reason.

Where to go next

Think of heap allocation like renting a storage unit for your data; it costs time and memory to set up. By passing an iterator directly, you move the data straight from the source to its final destination without stopping at a temporary storage unit. This saves memory and makes your program faster by avoiding unnecessary copying.