You can iterate over the characters of a Rust string using the .chars() method, which yields char items representing Unicode scalar values, or .bytes() for raw byte access. Use .chars() when you need to handle multi-byte characters correctly, as Rust strings are UTF-8 encoded and a single character can span multiple bytes.
Here is a practical example showing the difference between iterating by character versus by byte:
fn main() {
let s = "Hello, δΈη";
// Iterate over Unicode characters (recommended for text processing)
println!("Characters:");
for c in s.chars() {
println!("'{}' (len: {} bytes)", c, c.len_utf8());
}
// Iterate over raw bytes (faster but breaks multi-byte chars)
println!("\nBytes:");
for b in s.bytes() {
println!("0x{:02x}", b);
}
}
If you need the index along with the character, use .char_indices(). This is crucial because the byte index does not align with the character index in UTF-8 strings.
fn main() {
let s = "Rust π¦";
for (i, c) in s.char_indices() {
println!("Index {}: '{}'", i, c);
}
}
Key Takeaways:
- Use
.chars()for almost all text manipulation tasks to ensure correct handling of emojis, accented characters, and non-Latin scripts. - Use
.bytes()only when you are performing low-level binary operations or need maximum performance and are certain the string contains only ASCII. - Use
.char_indices()if you need to slice the string or track position, as slicing by byte index on a multi-byte character will panic.