How to use regex crate in Rust regular expressions

Parsing text without the pain

You're building a CLI tool to analyze server logs. You need to extract IP addresses, timestamps, and error codes from thousands of lines. You reach for a regular expression. In JavaScript or Python, you'd write a pattern string, pass it to a function, and hope the syntax is correct. Rust gives you the regex crate, which follows a similar workflow but enforces a discipline that prevents runtime crashes and performance traps. The crate forces you to handle invalid patterns upfront. It also guarantees that your regex will never hang your program, regardless of how malicious the input data is. This combination of safety and performance makes regex the standard choice for text processing in Rust.

Pre-compilation is the key

Rust's standard library does not include regular expressions. You must add the regex crate to your project. The core concept is pre-compilation. When you create a Regex, you are not just storing a string. You are compiling that string into a finite state machine. This compilation happens once. After that, matching is blazing fast.

Think of it like a chef reading a recipe. Regex::new is the chef studying the instructions, prepping the ingredients, and setting up the station. If the recipe says "add three cups of flour and then subtract five," the chef stops you immediately. You do not find out the recipe is broken until you try to bake. The regex crate validates your pattern at the moment you create the Regex object. If the pattern is invalid, Regex::new returns an error. You cannot accidentally create a regex that fails silently or panics later.

This design also eliminates catastrophic backtracking. Many regex engines can hang indefinitely on certain patterns and inputs. The regex crate guarantees linear time complexity. It will scan your text once and finish. If the pattern matches, you get the result. If it does not, you get a failure. There is no exponential slowdown. You trade some exotic features of Perl-compatible regex for this safety and speed.

Minimal example

Add regex to your Cargo.toml dependencies. Then compile a pattern and use it to check strings.

use regex::Regex;

fn main() {
    // Compile the pattern. Regex::new returns a Result because patterns can be syntactically invalid.
    // expect() panics with a message if compilation fails. This is safe for CLI tools where a bad pattern is a developer error.
    let re = Regex::new(r"^\d{3}-\d{4}$").expect("Invalid phone pattern");

    let valid = "555-0199";
    let invalid = "555-019";

    // is_match returns true if the pattern matches anywhere in the input.
    println!("{}: {}", valid, re.is_match(valid));
    println!("{}: {}", invalid, re.is_match(invalid));
}

Convention aside: Always use raw strings r"..." for regex patterns. Rust strings escape backslashes, so \d becomes \\d. Raw strings preserve the backslash, making patterns readable. Writing r"\d+" is much cleaner than "\\d+".

Compile the regex once. Never recreate it inside a loop.

What happens under the hood

When you call Regex::new, the crate parses your pattern, validates syntax, and builds an internal state machine. This allocation happens on the heap. The resulting Regex object is immutable and thread-safe. It implements Send and Sync, which means you can share it across threads without locks.

The heavy lifting is done upfront. Calling .is_match() or .captures() just runs the pre-built machine against your text. This separation is why regex in Rust feels slightly more verbose than in dynamic languages but runs significantly faster in tight loops.

Cloning a Regex is cheap. re.clone() does not copy the state machine. It clones a pointer to the shared machine. This makes passing regexes around efficient. You can hand a Regex to multiple threads, and they will all share the same compiled data.

Convention aside: In production libraries, avoid unwrap() or expect() on Regex::new. Return the Result to the caller. In CLI tools or tests, expect with a descriptive message is acceptable because a bad pattern indicates a bug in your code, not a runtime condition.

Trust the linear time guarantee. The regex crate will not hang.

Extracting data with captures

Checking for a match is useful, but often you need to extract parts of the text. Use captures to find groups. Named groups make your code robust against pattern changes.

use regex::Regex;

fn extract_emails(text: &str) -> Vec<String> {
    // Pattern to find simple email addresses.
    // Named groups make extraction cleaner than index-based access.
    // The name "email" lets us retrieve the match by label, not position.
    let re = Regex::new(r"(?P<email>[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,})").unwrap();

    let mut results = Vec::new();

    // captures_iter returns an iterator over all non-overlapping matches.
    // This is efficient for processing large texts without allocating intermediate strings.
    for cap in re.captures_iter(text) {
        // Get the named group. unwrap() is safe here because the pattern guarantees the group exists.
        // If the group were optional, you would need to handle None.
        let email = cap.name("email").unwrap().as_str();
        results.push(email.to_string());
    }

    results
}

Convention aside: Use named groups whenever you extract data. Index-based access like cap.get(1) breaks if you reorder groups or add new ones. Named groups like cap.name("email") self-document and survive refactoring.

Use named groups. Index-based access is fragile.

Transforming text with replace

The regex crate shines when you need to replace text. replace_all replaces every match. You can pass a string, or a closure that computes the replacement dynamically.

use regex::Regex;

fn redact_emails(text: &str) -> String {
    let re = Regex::new(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}").unwrap();

    // replace_all accepts a closure that receives the captures for each match.
    // This allows complex transformations based on the matched text.
    let redacted = re.replace_all(text, |cap: &regex::Captures| {
        let email = cap.get(0).unwrap().as_str();
        // Extract the username part before the @ symbol.
        let username = &email[..email.find('@').unwrap()];
        format!("[{}@...]", username)
    });

    redacted.into_owned()
}

The closure receives a Captures object. You can inspect the match and return a custom string. replace_all returns a Cow<str>. If no replacements were made, it returns a borrowed reference to the original string, avoiding allocation. If replacements occurred, it returns an owned String. Call .into_owned() if you need a String.

Convention aside: Prefer replace_all over replace. replace only changes the first match, which is rarely what you want. If you need the first match, use replace, but be explicit about the intent.

Reach for closures in replace_all when the replacement depends on the match content.

Pitfalls and compiler errors

Recreating a regex inside a loop destroys performance. Each call to Regex::new pays the compilation cost. If you process a million lines, you compile the pattern a million times. Move the Regex creation outside the loop. Cache it at module scope if it is used globally.

If you try to match a Vec<u8> directly, the compiler rejects you with E0277 (trait bound not satisfied). Regex operates on UTF-8 strings. You must convert bytes to a string first, or use the regex::bytes module for binary data.

use regex::Regex;

fn main() {
    let re = Regex::new(r"hello").unwrap();
    let data: Vec<u8> = vec![104, 101, 108, 108, 111]; // "hello" in bytes

    // This fails with E0277. is_match expects &str or &String.
    // re.is_match(&data); // Error!

    // Convert to string slice, handling potential invalid UTF-8.
    let text = std::str::from_utf8(&data).unwrap();
    println!("{}", re.is_match(text));
}

When you have many patterns to check, using multiple Regex objects is slow. Each regex scans the text independently. Use RegexSet to check multiple patterns in a single pass. RegexSet builds a combined machine that identifies which patterns match without redundant work.

use regex::RegexSet;

fn classify_token(token: &str) -> &'static str {
    // RegexSet compiles multiple patterns into a single machine.
    // This is much faster than checking three separate Regex objects.
    let set = RegexSet::new(&[
        r"^\d+$",       // digits
        r"^[a-z]+$",    // lowercase
        r"^[A-Z]+$",    // uppercase
    ]).unwrap();

    // matches returns a set of indices for patterns that matched.
    let matches = set.matches(token);

    if matches.matched(0) {
        "DIGIT"
    } else if matches.matched(1) {
        "LOWER"
    } else if matches.matched(2) {
        "UPPER"
    } else {
        "OTHER"
    }
}

Convention aside: Cache global regexes using std::sync::OnceLock. It is in the standard library and thread-safe. Define the regex at module scope and initialize it lazily. This ensures compilation happens exactly once, on first access.

If you check ten patterns, use RegexSet.

Decision matrix

Use the regex crate when you need pattern matching beyond simple substring checks. Use the regex crate when you need to extract groups or replace text based on complex patterns. Use str::contains or str::starts_with for simple literal checks; regex adds overhead you do not need. Use a parser combinator library like nom when you are parsing structured data with nested rules; regex struggles with recursion and context. Use once_cell::sync::Lazy or std::sync::OnceLock to cache a Regex instance at module scope when the pattern is constant and used across the application. Use regex::bytes when you are processing binary data that may not be valid UTF-8. Use RegexSet when you need to check multiple patterns against the same text; it combines them into a single scan.

Where to go next

The regex crate lets you find and replace text patterns in your Rust code. You define a pattern once, then use it to check if text matches or to swap out parts of a string. It's like using a search-and-replace tool that understands complex rules, such as finding lines that start with specific characters.