You cannot directly slice a String using integer indices because Rust enforces UTF-8 validity, so you must use char_indices() to find valid byte boundaries or use the split_at method with a pre-calculated byte offset. Attempting to slice at an invalid UTF-8 boundary will panic at runtime, so always ensure your indices align with character starts.
Here is the safe, idiomatic way to extract a substring by character count using char_indices():
fn main() {
let text = "Hello, δΈη"; // Contains multi-byte characters
let mut char_count = 0;
let mut byte_index = 0;
// Find the byte index for the 5th character
for (i, _) in text.char_indices() {
if char_count == 5 {
byte_index = i;
break;
}
char_count += 1;
}
// Safe slicing using the calculated byte index
let substring = &text[byte_index..];
println!("Substring: {}", substring); // Output: "δΈη"
}
If you already know the exact byte offset (for example, from a previous calculation or a fixed-width encoding context), you can use split_at which returns an Option or panics if the index is invalid, but split_at is generally safer than direct indexing if you handle the result:
fn main() {
let text = "Rust is great";
// Split at byte index 4 (after "Rust")
// This will panic if 4 is not a valid UTF-8 boundary
let (prefix, suffix) = text.split_at(4);
println!("Prefix: {}", prefix); // "Rust"
println!("Suffix: {}", suffix); // " is great"
// To get just the substring without the prefix:
let sub = &text[4..];
println!("Direct slice: {}", sub);
}
Remember that String indices are byte offsets, not character counts. If you iterate over a string with .chars(), you are counting characters, but the underlying String data is stored as bytes. Slicing &str requires that the start and end indices fall exactly on the boundary of a UTF-8 character. If you slice in the middle of a multi-byte character (like the Chinese characters in the first example), Rust will panic to prevent creating invalid UTF-8 data. Always calculate byte offsets carefully when working with user input or variable-length text.