How to Efficiently Build Strings in Rust
You're building a log aggregator. You read lines from a file and want to stitch them into a single report. You grab a String, loop through the lines, and use the + operator to append each one. For a hundred lines, it's instant. You run it on a production log with a million lines, and the CPU spikes to 100% while the program crawls. The logic is identical. The difference is how Rust handles memory behind the scenes.
Don't let the + operator trick you. It looks like math, but it's a memory allocation trap.
The backpack analogy
A String in Rust is a growable buffer on the heap. Think of it like a backpack. When you create an empty string, you get a small backpack. As you add characters, they go inside. When the backpack fills up, Rust doesn't stretch the fabric. It buys a larger backpack, moves every item from the old one to the new one, and discards the old backpack. This reallocation is expensive. If you add items one by one without planning, you trigger this move operation constantly. Pre-allocating capacity is simply buying the right-sized backpack before you start packing.
Measure twice, allocate once. A capacity estimate pays for itself after the first reallocation.
Pre-allocating the buffer
When you know the approximate size of the final string, use String::with_capacity. This tells Rust to allocate a block of memory large enough to hold that many bytes upfront. The string tracks three numbers: a pointer to the memory, the current length (how many bytes are used), and the capacity (how many bytes are available).
fn build_report(lines: &[&str]) -> String {
// Estimate total length to minimize reallocations.
// Sum of line lengths plus one byte per newline.
let estimated_len: usize = lines.iter().map(|l| l.len() + 1).sum();
// Allocate the buffer once with the estimated size.
let mut report = String::with_capacity(estimated_len);
for line in lines {
// push_str appends directly to the buffer.
// No new allocation occurs if capacity is sufficient.
report.push_str(line);
report.push('\n');
}
report
}
When you call push_str, Rust checks if the new data fits within the remaining capacity. If it does, Rust copies the bytes and updates the length. No allocation happens. If the data doesn't fit, Rust allocates a larger buffer, moves the existing content, copies the new data, and frees the old buffer.
Trust the doubling strategy for unknown sizes, but pre-allocate when you have the data.
The math of geometric growth
Rust's default growth strategy uses geometric expansion. When the buffer fills, Rust allocates a buffer roughly twice the size. This isn't arbitrary. It ensures amortized constant time for pushes. If you push N characters without pre-allocation, the total work is proportional to N, not N squared. The reallocations happen logarithmically. You might reallocate 20 times for a million characters. That's acceptable. But if you can avoid those 20 allocations entirely, you save the CPU cycles and reduce memory fragmentation. Pre-allocation is the optimization that removes the logarithmic overhead.
Pre-allocation removes the logarithmic overhead. Use it when the estimate is cheap to compute.
Formatting in loops
Real code often involves formatting. You might be tempted to use format! inside a loop. format! creates a new String every time. Using format! in a loop creates a temporary string for each iteration, which push_str then copies into the main buffer. This doubles the allocation pressure.
The community prefers write! for formatted output inside loops. write! writes directly to any type that implements std::io::Write, including String. It avoids the intermediate allocation of format!.
use std::fmt::Write;
fn build_csv(rows: &[(&str, i32)]) -> String {
// Pre-allocate based on row count and average size.
// A rough estimate prevents most reallocations.
let mut csv = String::with_capacity(rows.len() * 20);
for (name, value) in rows {
// write! appends formatted text directly to the String.
// No intermediate String allocation occurs.
// write! returns Result, but String::write never fails
// unless the system is out of memory.
let _ = write!(csv, "{},{}\n", name, value);
}
csv
}
Convention aside: The let _ = pattern signals to readers that you considered the Result and chose to ignore it because String writes are infallible in practice. It's a clearer signal than write!(...).unwrap(), which panics on allocation failure but adds runtime overhead. let _ = compiles to a drop with no runtime cost.
Treat write! as the loop-native formatter. It keeps the buffer hot and the allocator cold.
Pitfalls and compiler friction
The + operator is the silent killer. a + b consumes a and returns a new String. In a loop, s = s + part moves s, allocates a new buffer, copies everything, and assigns the result back. This happens every iteration. The compiler allows this, but the performance cost is quadratic. You won't get a compiler error, but your profiler will show time spent in allocation routines.
If you try to concatenate two String values directly, the compiler rejects you with E0308 (mismatched types). The Add trait is implemented for String + &str, not String + String. You must borrow the second string: s1 + &s2. This design forces you to think about ownership, but it also makes the + operator slightly awkward for repeated use.
Convention aside: Use push for single characters and push_str for slices. push validates that the character is valid UTF-8 and handles multi-byte encoding. push_str assumes the input is valid UTF-8 and copies bytes directly. If you have a char, push is the safe choice. If you have a &str, push_str is faster because it skips validation. Using push on a &str requires iterating characters, which is slower than a bulk copy.
If your profiler shows string allocations, hunt for the + operator. It's almost always the culprit.
Decision matrix
Use String::with_capacity when you can estimate the final size. Even a rough guess prevents multiple reallocations.
Use push_str when appending known slices of text in a loop. It writes directly to the buffer without intermediate allocations.
Use write! when you need formatted output inside a loop. It writes to the buffer directly, avoiding the temporary string that format! creates.
Use format! when constructing a single string from multiple values outside a loop. The convenience outweighs the cost for one-off construction.
Avoid + for building strings in loops. It allocates a new string on every iteration and copies the entire content.
Reach for Vec<u8> when building binary data or when you need byte-level manipulation. String enforces UTF-8, which adds overhead if you're just moving bytes.
Pick the tool that matches your data. Strings are for text. Bytes are for data. Don't force UTF-8 where it doesn't belong.