How to Split a String in Rust

The rope and the scissors

You have a log line like 2023-10-27 ERROR: disk full. You need the date, the level, and the message. Or you're processing a CSV where fields are separated by commas, but the last field might contain a comma inside a comment. Or you just want to count words in a user's bio. Splitting strings is the first thing you reach for when data arrives as a blob and you need pieces.

Rust gives you a family of split methods. They all share a core philosophy: split without copying. The methods return iterators that yield slices pointing back to the original string. No new allocations happen for the pieces. You get the data you need with zero overhead, and the borrow checker ensures you never hold a slice after the original string dies.

Zero-copy slicing and UTF-8

A Rust string is a sequence of bytes that forms valid UTF-8 characters. Characters have different lengths. An ASCII letter is one byte. An emoji can be four bytes. You cannot split a string at an arbitrary byte index without risking cutting a character in half.

The split methods handle the byte math for you. They scan the string, find the delimiter, and ensure the cut happens at a valid character boundary. The result is a &str slice. A slice is just a pointer and a length. It borrows the original data.

Think of the string as a roll of film. Splitting is finding a frame marker and noting where the next frame starts. You don't cut the film. You just record the start and end positions. This is why split is fast. It avoids allocating new buffers for every piece.

/// Split a log line into parts without allocating.
fn parse_log(line: &str) {
    // split_whitespace returns an iterator.
    // Each call to next() yields a &str slice.
    // No new strings are created.
    let mut parts = line.split_whitespace();
    
    // Take the first three parts.
    let date = parts.next();
    let level = parts.next();
    let message = parts.next();
    
    println!("Date: {:?}, Level: {:?}, Msg: {:?}", date, level, message);
}

fn main() {
    parse_log("2023-10-27 ERROR: disk full");
}

The iterator holds a pointer to the original string and an index. It advances through the bytes as you request pieces. If you drop the iterator early, the rest of the string is untouched. This lazy behavior saves memory and CPU. Don't collect into a vector unless you actually need random access to all pieces.

Trust the iterator. It's lazy.

The iterator machine

When you call split, you pass a pattern. The pattern tells the iterator where to cut. The simplest patterns are a char or a &str.

fn main() {
    let csv = "apple,banana,cherry";
    
    // Split by a specific character.
    // Returns an iterator of &str slices.
    // The iterator borrows csv.
    for fruit in csv.split(',') {
        println!("Fruit: {}", fruit);
    }
}

The loop consumes the iterator. Each iteration finds the next delimiter and yields the slice before it. The last slice includes everything after the final delimiter. If the string ends with the delimiter, the last slice is empty.

You can also pass a closure as the pattern. The closure receives a char and returns true if that character is a delimiter. This lets you split by multiple delimiters or complex rules.

/// Split by any punctuation or whitespace.
fn tokenize(text: &str) -> Vec<&str> {
    // The closure defines a custom delimiter rule.
    // split calls this for each character.
    // Returns true for any char that is punctuation or whitespace.
    text.split(|c: char| c.is_whitespace() || c.is_punctuation())
        .filter(|s| !s.is_empty()) // Remove empty strings from adjacent delimiters.
        .collect()
}

fn main() {
    let text = "Hello, world! Rust is... great.";
    let tokens = tokenize(text);
    println!("{:?}", tokens); // ["Hello", "world", "Rust", "is", "great"]
}

The closure approach is powerful but slightly slower than a simple char or &str pattern. The compiler can optimize fixed patterns more aggressively. Use a closure only when you need flexible logic.

Convention aside: when splitting by a single character, pass a char literal like ',' instead of a string slice like ",". The compiler generates tighter code for char splits, and the intent is clearer. Reach for char patterns whenever the delimiter is a single code point.

Real-world parsing patterns

Splitting is rarely just about getting pieces. You usually need to handle edge cases. Empty fields, missing delimiters, or fixed structures. Rust provides variants of split to match these scenarios.

Splitting once

If you expect exactly one delimiter, split_once is the right tool. It returns an Option containing two parts. This avoids the overhead of an iterator and makes the code self-documenting.

/// Extract the domain from a URL.
fn get_domain(url: &str) -> Option<&str> {
    // split_once finds the first delimiter.
    // Returns Some((before, after)) or None.
    // This is faster and safer than split(...).next().
    let (_, after) = url.split_once("://")?;
    
    // The domain ends at the first slash.
    after.split('/').next()
}

fn main() {
    let url = "https://rust-lang.org/docs/book";
    println!("Domain: {:?}", get_domain(url));
}

The community prefers split_once for single breaks. It returns Option<(&str, &str)>. This makes the code self-documenting. You're looking for one cut, not a stream of pieces. Use split_once when you parse key-value pairs, extract paths, or split a header from a body.

Limiting splits

CSV files often have a fixed number of columns. The last column might contain the delimiter character. If you use split, the last column gets chopped up. splitn limits the number of splits.

/// Parse a CSV line with a fixed number of columns.
fn parse_csv_line(line: &str) -> Option<(&str, &str, &str)> {
    // splitn limits the number of splits to 3.
    // The third piece contains the rest of the string.
    // This protects the last field from internal delimiters.
    let mut parts = line.splitn(3, ',');
    
    let name = parts.next()?;
    let email = parts.next()?;
    let bio = parts.next()?;
    
    Some((name, email, bio))
}

fn main() {
    let line = "Alice,alice@example.com,Loves Rust, coding, and pizza";
    if let (name, email, bio) = parse_csv_line(line) {
        println!("Name: {}, Email: {}, Bio: {}", name, email, bio);
    }
}

splitn takes a count. It yields at most that many pieces. The final piece includes everything after the last split, even if it contains delimiters. This is the standard pattern for parsing structured data where the last field is free-form.

Use splitn for CSV. Your last field will thank you.

Keeping the delimiter

Sometimes the delimiter carries meaning. Line endings, record markers, or separators that must be preserved. split_inclusive keeps the delimiter attached to each piece.

fn process_records(data: &str) {
    // split_inclusive keeps the delimiter at the end of each piece.
    // Useful when the delimiter is part of the record format.
    for record in data.split_inclusive('\n') {
        println!("Record: {:?}", record);
    }
}

fn main() {
    let data = "record1\nrecord2\nrecord3\n";
    process_records(data);
}

This is less common but essential for stream processing where you need to emit complete chunks including their terminators.

Pitfalls and compiler traps

Splitting strings exposes a few common traps. The compiler catches most of them, but understanding the root cause saves debugging time.

Empty strings

Adjacent delimiters produce empty slices. "a,,b".split(',') yields ["a", "", "b"]. If you want to skip empty results, filter them out or use split_whitespace.

fn main() {
    let text = "a,,b";
    
    // split produces empty strings for adjacent delimiters.
    let all: Vec<&str> = text.split(',').collect();
    println!("{:?}", all); // ["a", "", "b"]
    
    // Filter to remove empty strings.
    let non_empty: Vec<&str> = text.split(',').filter(|s| !s.is_empty()).collect();
    println!("{:?}", non_empty); // ["a", "b"]
}

split_whitespace automatically skips empty results because it treats consecutive whitespace as a single delimiter. Use split_whitespace for natural language tokenization. Use split with filtering when you need to preserve structure but drop blanks.

Lifetimes and borrowing

The slices returned by split borrow the original string. You cannot return a slice if the original string dies.

fn bad_example() -> &str {
    let s = String::from("hello world");
    // E0515: cannot return value referencing local variable
    // The slice points to s. s drops at the end of the function.
    // The slice would point to freed memory.
    s.split(' ').next().unwrap()
}

The compiler rejects this with E0515 (cannot return value referencing local variable). The slice points to s. s drops at the end of the function. The slice would point to garbage. The borrow checker stops this.

If you need to return owned data, convert the slice to a String.

fn get_first_word(s: &str) -> String {
    // Convert the slice to an owned String.
    // This allocates a new buffer.
    s.split(' ').next().map(String::from).unwrap_or_default()
}

Allocation is the price of ownership. Avoid it when possible. Return &str if the caller owns the data. Return String only when the data must outlive the input.

Byte slices

split lives on str. It does not work on &[u8].

fn bad_bytes(data: &[u8]) {
    // E0277: the trait bound &[u8]: AsRef<str> is not satisfied
    // split expects a string pattern, not raw bytes.
    // data.split(b','); // Error
}

The compiler rejects byte slices with E0277 (trait bound not satisfied). split expects a string pattern, not raw bytes. Convert to &str first with from_utf8.

use std::str;

fn good_bytes(data: &[u8]) -> Result<Vec<&str>, str::Utf8Error> {
    // Convert bytes to &str.
    // This checks UTF-8 validity.
    let text = str::from_utf8(data)?;
    Ok(text.split(',').collect())
}

Always validate UTF-8 before splitting. Raw bytes might contain invalid sequences. from_utf8 returns a Result. Handle the error or panic if the data is trusted.

Splitting by string vs char

Splitting by a &str pattern is more general but slower. The pattern matching logic handles multi-byte delimiters and overlapping cases. Splitting by a char is optimized for single code points.

fn main() {
    let text = "a,b,c";
    
    // char split is faster for single characters.
    let by_char: Vec<&str> = text.split(',').collect();
    
    // &str split works but has more overhead.
    let by_str: Vec<&str> = text.split(",").collect();
    
    assert_eq!(by_char, by_str);
}

The performance difference is small for short strings. It matters in tight loops processing gigabytes of data. Profile before optimizing. Use char for readability and speed when the delimiter is a single character.

Decision matrix

Use split_whitespace when you need to tokenize natural language or ignore all whitespace variations. It handles spaces, tabs, and newlines automatically and skips empty results.

Use split when you need to divide by a specific delimiter and want all resulting pieces. It returns an iterator of slices, so it allocates nothing until you collect.

Use split_once when you expect exactly one delimiter or only care about the first break. It returns an Option with two parts, avoiding the overhead of an iterator and making the intent clear.

Use splitn when you have a fixed number of fields and the delimiter might appear in the last field. It limits the number of splits, preserving the rest of the string in the final piece.

Use split_inclusive when you need to keep the delimiter attached to each piece. This is useful for processing chunks where the separator carries meaning, like line endings or record markers.

Reach for char patterns over &str patterns when the delimiter is a single character. The compiler optimizes char splits more aggressively, and the code reads cleaner.

Counter-intuitive but true: the more you collect, the slower your code gets. Keep data as iterators as long as possible. Chain filters and maps. Collect only at the boundary where you need a concrete collection.

Where to go next

Splitting a string breaks a long piece of text into smaller, manageable chunks based on a rule, like spaces or punctuation. You use this when you need to process individual words or sentences from a larger block of text. Think of it like cutting a loaf of bread into slices so you can handle each piece separately.