The UTF-8 boundary trap
You write let sub = text[0..5]; because you want the first five characters. The code compiles without warnings. You run the program with the input "Hello, δΈη". The program crashes with a panic: byte index 5 is not a char boundary.
You didn't do anything wrong in your head. You asked for five characters. Rust interpreted your request as five bytes. Five bytes isn't always five characters. The slice landed in the middle of a multi-byte character, and Rust refuses to hand you invalid UTF-8 data. Slicing a string in Rust requires byte offsets that align exactly with character starts. If you slice in the middle of a character, you create a broken string, and the runtime stops you before that broken string can corrupt your program.
Bytes, characters, and the cost of safety
Rust stores String data as UTF-8 encoded bytes. UTF-8 is a variable-width encoding. ASCII characters like a, z, 0, and punctuation take one byte. Characters from many other scripts, like Chinese, Japanese, or Korean, typically take three bytes. Emoji and rare symbols can take four bytes.
When you index a string with [start..end], the indices are byte offsets, not character counts. The compiler treats the string as a sequence of bytes. It checks that start and end fall on valid character boundaries. If they don't, the program panics at runtime.
Think of a string as a row of houses. In some languages, every house is the same width. You can jump to house five by counting five units. In Rust, houses vary in width. Some are one brick wide, some are three. If you count five bricks, you might land in the middle of a wide house. Rust won't let you break the wall. You have to count houses to find the right wall.
This design guarantees that every &str in Rust is valid UTF-8. You never have to check for encoding errors when you receive a string slice. The safety check happens once, at the slice point. The cost is that you must calculate byte offsets carefully when you work with character counts.
Finding the byte offset with char_indices
When you need to slice by character count, use char_indices(). This iterator yields pairs of (byte_offset, char) as it decodes the string. You can advance the iterator to the character you want, grab the byte offset, and use that offset for slicing.
fn main() {
let text = "Hello, δΈη";
// char_indices yields (byte_offset, char) for each character
// .nth(5) advances the iterator to the 6th character (index 5)
// This stops early and avoids iterating the whole string
let byte_offset = text.char_indices()
.nth(5)
.map(|(i, _)| i) // Extract the byte index from the tuple
.unwrap_or(text.len()); // Fallback to end if string is shorter
// Slice from the calculated byte offset to the end
// This is safe because char_indices guarantees valid boundaries
let sub = &text[byte_offset..];
println!("{sub}"); // Output: "δΈη"
}
The iterator decodes UTF-8 one character at a time. It tracks the current byte position. When you call nth(5), the iterator calls next() five times and stops. You get the byte offset where the sixth character begins. The slice &text[byte_offset..] starts at that boundary, so the compiler and runtime accept it.
Convention aside: Use char_indices() when you need the byte offset. Use chars() when you only need the characters and don't care about positions. chars() drops the byte index, which saves a tiny amount of overhead, but char_indices() is the right tool when you need to slice.
Slicing is a zero-cost operation once you have the byte offset. The slice is just a pointer and a length. No data is copied. The work happens in finding the offset, not in creating the slice.
The O(1) alternative: split_at
If you already know the byte offset, use split_at(). This method splits the string into two parts at a given byte index. It checks the boundary and returns a tuple of two slices. It runs in O(1) time because it doesn't iterate the string. It just validates the index and adjusts pointers.
fn main() {
let text = "Rust is great";
// Byte index 4 is the space after "Rust"
// split_at checks the boundary and returns two views
// This is O(1) and does not allocate or copy data
let (prefix, suffix) = text.split_at(4);
println!("Prefix: {prefix}"); // "Rust"
println!("Suffix: {suffix}"); // " is great"
}
split_at panics if the index is not a valid character boundary. Use it when you are confident the offset is correct, such as when you parsed the offset from a fixed-width header or calculated it from previous safe operations.
If you need to handle invalid offsets gracefully, use get() instead. get() returns an Option<&str>. It returns None if the range is out of bounds or crosses a character boundary. This lets you handle errors without panicking.
fn main() {
let text = "Hello, δΈη";
// Byte index 7 is the start of 'δΈ'
// Byte index 10 is the end of 'δΈ'
// get returns Option<&str> to handle invalid ranges safely
let sub = text.get(7..10);
match sub {
Some(s) => println!("Found: {s}"), // "δΈ"
None => println!("Invalid range"),
}
// Byte index 8 is inside 'δΈ', so this returns None
let bad = text.get(8..10);
println!("Bad range: {bad:?}"); // None
}
Convention aside: The community prefers get() over is_char_boundary() followed by a slice. get() combines the check and the slice in one call. It's clearer and less error-prone. If you see is_char_boundary() in code, it's usually because the author needs to validate an offset without creating a slice, such as when building a custom parser state machine.
Trust get() for safe slicing. It handles the boundary check and the slice atomically.
Realistic example: parsing a log line
Real code often needs to extract substrings from structured text. Consider a log parser that extracts the username from a line formatted as "USER:alice:logged_in". The username length varies, but the delimiters are fixed bytes. You can find the delimiters by byte offset and slice safely.
/// Extracts the username from a log line with format "USER:username:action"
/// Returns None if the format is invalid or the username is empty
fn extract_username(line: &str) -> Option<&str> {
// Find the first colon
let first_colon = line.find(':')?;
// Find the second colon after the first one
let second_colon = line[first_colon + 1..].find(':')?;
// Calculate byte offsets relative to the start of the string
// first_colon is the index of ':'
// username starts at first_colon + 1
let start = first_colon + 1;
// second_colon is relative to the slice, so add first_colon + 1
let end = first_colon + 1 + second_colon;
// Use get to slice safely
// This handles cases where offsets are out of bounds
line.get(start..end).filter(|s| !s.is_empty())
}
fn main() {
let log = "USER:alice:logged_in";
match extract_username(log) {
Some(user) => println!("User: {user}"), // "alice"
None => println!("Invalid log line"),
}
}
The find() method returns byte offsets. It scans for the delimiter byte. Once you have the offsets, you use get() to extract the substring. The filter() call ensures the username isn't empty. This function handles variable-length usernames and returns None for malformed input.
Don't assume find() returns character indices. It returns byte offsets. The offsets work directly with get() because both operate on bytes.
Pitfalls and performance traps
The len() method returns the byte length, not the character count. If you use len() to calculate a character-based slice, you will get wrong results. For "Hello, δΈη", len() returns 13. The character count is 9. Using len() as a character count leads to off-by-one errors and boundary panics.
fn main() {
let text = "Hello, δΈη";
// len() returns bytes, not characters
println!("Bytes: {}", text.len()); // 13
println!("Chars: {}", text.chars().count()); // 9
// This panics because 9 is not a valid byte boundary
// let sub = &text[0..9]; // Panic: byte index 9 is not a char boundary
}
Performance trap: Calling char_indices().nth(n) inside a loop creates an O(NΒ²) algorithm. Each call to nth(n) iterates from the start of the string to index n. If you do this for every character, you rescan the string repeatedly.
// BAD: O(NΒ²) performance
fn print_chars_slow(text: &str) {
for i in 0..text.chars().count() {
// nth(i) iterates from start every time
let byte = text.char_indices().nth(i).map(|(b, _)| b).unwrap();
let ch = text.chars().nth(i).unwrap();
println!("{byte}: {ch}");
}
}
// GOOD: O(N) performance
fn print_chars_fast(text: &str) {
// Iterate once and collect offsets
for (byte, ch) in text.char_indices() {
println!("{byte}: {ch}");
}
}
Use a single char_indices() loop when you need to process multiple characters. Cache the byte offsets if you need random access later. The compiler can't optimize away repeated iterations. You must structure the loop to avoid rescanning.
Compiler error: If you try to slice a String and assign it to a String, you get E0308 (mismatched types). Slicing produces a &str, not a String. You must call .to_string() or .into() to convert the slice to an owned string.
fn main() {
let text = String::from("Hello");
// E0308: mismatched types
// expected `String`, found `&str`
// let sub: String = &text[0..5];
// Correct: convert slice to owned String
let sub: String = text[0..5].to_string();
println!("{sub}");
}
Slicing is cheap. Converting to String allocates memory. Slice as much as possible. Convert to String only when you need ownership.
Decision matrix
Use char_indices() when you need to convert a character count into a byte offset for slicing.
Use split_at(byte_offset) when you already have a valid byte offset and want to split the string into two parts without allocating.
Use get(start..end) when you have byte offsets that might be invalid and need to handle the failure gracefully without panicking.
Use chars() when you only need to iterate over characters and don't care about byte positions.
Use is_char_boundary(byte_offset) when you must validate a byte offset before performing a raw slice operation in a low-level context.
Use find() and rfind() when you need to locate delimiters by byte offset for structured parsing.
Use to_string() on a slice only when you need an owned String. Slices are &str and don't allocate.