How to Iterate Over Characters in a Rust String

You can iterate over the characters of a Rust string using the `.chars()` method, which yields `char` items representing Unicode scalar values, or `.bytes()` for raw byte access.

You can iterate over the characters of a Rust string using the .chars() method, which yields char items representing Unicode scalar values, or .bytes() for raw byte access. Use .chars() when you need to handle multi-byte characters correctly, as Rust strings are UTF-8 encoded and a single character can span multiple bytes.

Here is a practical example showing the difference between iterating by character versus by byte:

fn main() {
    let s = "Hello, δΈ–η•Œ";

    // Iterate over Unicode characters (recommended for text processing)
    println!("Characters:");
    for c in s.chars() {
        println!("'{}' (len: {} bytes)", c, c.len_utf8());
    }

    // Iterate over raw bytes (faster but breaks multi-byte chars)
    println!("\nBytes:");
    for b in s.bytes() {
        println!("0x{:02x}", b);
    }
}

If you need the index along with the character, use .char_indices(). This is crucial because the byte index does not align with the character index in UTF-8 strings.

fn main() {
    let s = "Rust πŸ¦€";

    for (i, c) in s.char_indices() {
        println!("Index {}: '{}'", i, c);
    }
}

Key Takeaways:

  1. Use .chars() for almost all text manipulation tasks to ensure correct handling of emojis, accented characters, and non-Latin scripts.
  2. Use .bytes() only when you are performing low-level binary operations or need maximum performance and are certain the string contains only ASCII.
  3. Use .char_indices() if you need to slice the string or track position, as slicing by byte index on a multi-byte character will panic.