You're cleaning up some user input. Maybe it's email addresses, maybe it's tag names, maybe it's the answer to a yes/no prompt. Whatever it is, you want it normalized: everything in one case so comparisons stop tripping over "YES" versus "yes" versus "Yes". In most languages this is a one-liner. Rust is no exception, but the moment you scratch the surface you find Unicode lurking beneath, and a couple of details that quietly matter.
The short version: to_uppercase() and to_lowercase() on a &str give you a new String with the case flipped. The slightly longer version explains why those methods return owned strings instead of mutating in place, why they sometimes produce strings of a different length than the input, and what to do when ASCII would actually be enough.
The basic moves
let text = "Hello, World!";
// to_uppercase / to_lowercase return a new String. The original &str
// is untouched, because string slices are immutable views.
let upper = text.to_uppercase(); // "HELLO, WORLD!"
let lower = text.to_lowercase(); // "hello, world!"
println!("{} | {} | {}", text, upper, lower);
A few facts worth pinning down. First, text here is a &str, a borrowed view into a string somewhere. You can't change it because string slices are immutable. The case-conversion methods produce a fresh String that owns its bytes. The original is still there, still readable, still pointing at the same bytes it always did.
Second, this works on both &str and String, because String derefs to &str automatically. So String::from("hi").to_uppercase() is fine, and so is "hi".to_uppercase().
Third, the result is allocated on the heap. to_uppercase cannot reuse the source bytes, because, surprise, the upper and lower case versions of a string can have different byte lengths. We'll get to why in a moment.
What "uppercase" actually means
Here's where Rust gets interesting. Strings in Rust are UTF-8 encoded Unicode. That means "straΓe" (German for "street") is a perfectly valid string, and so is "Δ°stanbul" with its dotted capital I, and so is "ζ₯ζ¬θͺ". When you ask for the uppercase form, Rust consults the Unicode database, which has rules for every character in every script.
Most of the time the rules are boring: 'a' becomes 'A'. But sometimes:
// German sharp s uppercases to TWO characters: SS
let german = "straΓe";
println!("{}", german.to_uppercase()); // "STRASSE"
// The Turkish dotted capital I lowercases to a regular i with a dot above
let turkish = "Δ°STANBUL";
println!("{}", turkish.to_lowercase()); // "iΜstanbul"
// Greek final sigma is contextual
let greek = "ΞΞΞ₯ΣΣΞΞ₯Ξ£";
println!("{}", greek.to_lowercase()); // "ΞΏΞ΄Ο
ΟΟΞ΅Ο
Ο"
The first example is the classic surprise. The German "Γ" character lowercases as itself, but when you uppercase it, you get two letters: SS. So the byte length of "straΓe".to_uppercase() is longer than the original. This is exactly why the methods return a brand new String and not, say, an in-place mutation: the result might not fit in the same memory.
You don't need to memorize the rules. You just need to know that "uppercase" in Unicode is not always character-for-character.
Iterating without allocating
If you only need to compare a single character at a time, or want to stream characters through some other process, the char type has its own to_uppercase() and to_lowercase(). They return iterators, because, as we saw, one input char can produce multiple output chars.
// Each char has to_uppercase / to_lowercase methods that yield an iterator
let c = 'Γ';
let upper: String = c.to_uppercase().collect(); // "SS"
// to_ascii_uppercase / to_ascii_lowercase return a single char, ASCII only
let a = 'a';
let up = a.to_ascii_uppercase(); // 'A'
println!("{} -> {}", a, up);
If you write a loop and do this character-by-character, you're paying for the Unicode logic on every step. Usually that doesn't matter, but in tight loops over huge strings it can.
ASCII fast path
Sometimes you're not dealing with international text at all. Maybe you're parsing HTTP headers (which the spec defines as ASCII case-insensitive), or normalizing a hex string, or processing log lines that are all English. In those cases, you don't need the Unicode machine. ASCII has its own much faster methods:
let header = "Content-Type";
// Allocates a new String, but uses simple ASCII-only logic. Fast.
let lower = header.to_ascii_lowercase(); // "content-type"
// In-place mutation; works on String/&mut str. No allocation at all.
let mut s = String::from("Hello");
s.make_ascii_lowercase(); // mutates in place
println!("{}", s); // "hello"
// Comparison helper that ignores ASCII case without allocating
assert!("Content-Type".eq_ignore_ascii_case("CONTENT-TYPE"));
Three rules of thumb. If you're 100% sure the data is ASCII, prefer the _ascii_ variants. They're faster, they don't allocate when you use the in-place forms, and there's no chance of a Turkish-I surprise hiding in your tests. If the data could be any Unicode, use the regular methods and accept the allocation. If you're just comparing for equality, use eq_ignore_ascii_case (or to_lowercase().eq(other.to_lowercase()) for full Unicode), which avoids you having to build a normalized form just to throw it away.
A more realistic example
Let's say you're writing a tiny URL slug generator: take a title like "Rust: A Brief Overview!" and produce "rust-a-brief-overview". Lowercasing is one of the steps.
// Convert a title to a URL-safe slug.
fn slugify(title: &str) -> String {
title
.to_lowercase() // unicode-aware lowercase
.chars()
.map(|c| {
// Keep ASCII letters/digits, replace everything else with '-'
if c.is_ascii_alphanumeric() {
c
} else {
'-'
}
})
.collect::<String>()
// Squash runs of '-' into single '-' and trim leading/trailing '-'
.split('-')
.filter(|piece| !piece.is_empty())
.collect::<Vec<_>>()
.join("-")
}
fn main() {
println!("{}", slugify("Rust: A Brief Overview!"));
// "rust-a-brief-overview"
println!("{}", slugify("Γber die StraΓe"));
// "ber-die-strasse" (umlauts lowercase fine, Γ becomes ss, then non-ascii dropped)
}
Notice we used to_lowercase (Unicode-aware) and then is_ascii_alphanumeric to strip out anything that isn't safe for a URL. That ordering matters: lowercasing before stripping means the German "Γ" gets to expand into "ss" first, and "ss" is valid ASCII alphanumeric. If you stripped first, you'd lose the "Γ" entirely.
Pitfalls
Trying to compare without normalizing. "Hello" == "HELLO" is false. Always. If you want a case-insensitive comparison, either normalize both sides first, or use eq_ignore_ascii_case for ASCII data.
Locale weirdness. Rust's case methods follow the Unicode default casing rules, which are locale-independent. They don't care that you're in Turkey, where "I".to_lowercase() should arguably yield a dotless "Δ±". If you need locale-aware casing, you're outside the standard library and into crates like icu. For the vast majority of code, the default is what you want.
Forgetting that case conversion allocates. Every call to to_uppercase/to_lowercase produces a fresh String. If you do this millions of times in a hot loop, that's millions of small allocations. Switch to ASCII variants if you can, or build a single buffer once and reuse it.
Unicode strings can change length. As discussed, to_uppercase may return a longer string than its input. If your code assumes "uppercasing keeps the byte length the same," it's wrong on Unicode and right only on ASCII. Don't index into the result by source byte offsets.
The compiler error you'll see if you accidentally try to mutate a &str:
error[E0596]: cannot borrow `*text` as mutable, as it is behind a `&` reference
--> src/main.rs:3:5
|
3 | text.make_ascii_lowercase();
| ^^^^ `text` is a `&` reference, so the data it refers to cannot be borrowed as mutable
The fix is to either own the string (let mut s = text.to_string()) or accept a &mut str/&mut String parameter.