How to check if string contains substring

The substring check

You're building a CLI tool that reads configuration flags. The user types --verbose, and you need to know if that flag exists in the input string before you do anything else. Or maybe you're filtering a list of filenames and only want the ones ending in .rs. In Python, you'd write "verbose" in args. In JavaScript, args.includes("verbose"). Rust gives you something similar, but it comes with a few details about how strings are stored that change how you think about the search.

The method you reach for is contains. It lives on string slices, &str. It returns a bool. The twist is that Rust strings are UTF-8 bytes, not arrays of characters. contains searches for a byte sequence that matches the UTF-8 encoding of the substring. This means it works correctly with emojis, accented characters, and CJK text without you needing to do extra work. It also means the search is fast because it operates on bytes, but the index you get back is a byte index, not a character count.

What contains actually does

contains is a method on &str. It takes a pattern and returns true if the pattern appears anywhere in the string. The signature looks like this:

fn contains<P: Pattern>(&self, pat: P) -> bool

The P: Pattern part is the key. Pattern is a trait that allows contains to accept different kinds of search targets. You can pass a &str, a char, or even a closure that tests each character. This flexibility is built into the standard library, so you don't need separate methods for strings, characters, and predicates.

When you call text.contains("sub"), the compiler resolves the Pattern trait for &str. At runtime, contains scans the bytes of text. It uses an optimized algorithm, often falling back to memchr for single-byte searches or a fast multi-byte matcher for longer substrings. The search stops as soon as it finds a match. If it reaches the end without finding the pattern, it returns false. No allocation happens. The method works directly on the byte slice.

fn main() {
    let log_entry = "2024-05-20 ERROR: disk full";
    
    // `contains` checks for the presence of a substring.
    // It returns `true` if the pattern is found, `false` otherwise.
    // The method works on `&str` slices, which is the standard string view in Rust.
    let is_error = log_entry.contains("ERROR");
    
    if is_error {
        println!("Alert triggered");
    }
}

contains also works on String values. String dereferences to &str automatically, so you can call contains on a String without converting it first. This is called deref coercion. It lets you write code that works with both owned strings and borrowed slices without cluttering the syntax.

Walking through the search

Consider the code above. log_entry is a &str pointing to a string literal. contains is called with "ERROR", which is also a &str. The compiler sees that &str implements the Pattern trait, so the call is valid. At runtime, contains iterates over the bytes of log_entry. It compares sequences of bytes against the bytes of "ERROR". When it finds the match, it returns true. The if block executes.

If the string were "2024-05-20 INFO: disk full", the search would scan the entire byte sequence, find no match, and return false. The if block would be skipped. The search is linear in the worst case, but the implementation is highly optimized. For ASCII substrings, it can skip bytes quickly using properties of the search pattern. For multi-byte UTF-8 characters, it respects byte boundaries and never matches a partial character.

Real-world usage and costs

In real code, you often need more than a simple case-sensitive check. Case-insensitive search is common. The naive approach is to convert both strings to lowercase and then check. This works, but it has a cost.

/// Filters a list of user inputs for a keyword, ignoring case.
/// 
/// This example demonstrates a common pattern: case-insensitive search.
/// Note the allocation cost of `to_lowercase` inside the loop.
fn has_keyword(inputs: &[String], keyword: &str) -> bool {
    // Convert keyword once to avoid repeated work.
    let keyword_lower = keyword.to_lowercase();
    
    // Iterate over inputs and check each one.
    inputs.iter().any(|input| {
        // `to_lowercase` allocates a new `String`.
        // This is safe but has a performance cost proportional to the input size.
        input.to_lowercase().contains(&keyword_lower)
    })
}

to_lowercase allocates a new String on the heap. If you call this inside a tight loop over thousands of strings, the allocation overhead adds up. The allocation is necessary because lowercase conversion can change the length of the string in some languages, and Rust strings must be valid UTF-8. For simple ASCII text, the cost is small. For large datasets, consider using a crate that supports case-insensitive search without allocation, or normalize the text once before searching.

Another common scenario is checking for multiple patterns. You can chain contains calls or use any with an iterator.

fn main() {
    let text = "rust is cool and fast";
    
    // Check if the text contains any of several keywords.
    // `any` short-circuits: it stops as soon as one keyword matches.
    let keywords = ["rust", "go", "python"];
    let has_language = keywords.iter().any(|kw| text.contains(kw));
    
    println!("Contains a language keyword: {}", has_language);
}

any is efficient here. It evaluates the closure for each keyword until one returns true. If text.contains("rust") returns true, any stops and returns true without checking "go" or "python". This short-circuiting behavior saves work.

Pitfalls and edge cases

The contains method is safe and simple, but there are traps if you misuse the results or ignore UTF-8 details.

If you need the position of the substring, contains is the wrong tool. Use find instead. find returns an Option<usize> containing the byte index of the first match. The byte index is not the same as the character index. If you slice the string using the byte index, you might panic.

fn main() {
    let text = "rust is cool";
    
    // `find` returns the byte index of the match.
    if let Some(pos) = text.find("cool") {
        // Slicing at `pos` is safe here because "cool" is ASCII.
        println!("Found at byte index: {}", pos);
    }
}

If the substring is multi-byte, the byte index might fall inside a character. Slicing at that index panics with a "byte index is not a char boundary" error. The compiler won't catch this. Use char_indices to find safe boundaries, or use find only when you know the substring is ASCII. If you need character-aware slicing, convert to chars and work with indices there, or use a crate that handles Unicode segmentation.

Normalization is another edge case. The character "é" can be encoded as a single code point (composed) or as "e" followed by a combining accent (decomposed). Both look the same, but they have different byte sequences. contains matches bytes. If you search for the composed form and the text has the decomposed form, contains returns false. This is a normalization issue, not a Rust bug. If you're processing international text, normalize the strings before searching using a crate like unicode-normalization.

The Pattern trait allows closures, which is powerful but can be misused. You can search for any predicate.

fn main() {
    let text = "rust is cool";
    
    // Search for any vowel using a closure.
    // The closure receives each character and returns true if it matches.
    // This is idiomatic for checking properties rather than fixed substrings.
    let has_vowel = text.contains(|c| matches!(c, 'a' | 'e' | 'i' | 'o' | 'u'));
    println!("Has vowel: {}", has_vowel);
}

This works because contains calls the closure for each character until it returns true. It's efficient for simple predicates. Avoid heavy computation inside the closure. The closure runs for every character in the string.

Choosing the right tool

Use contains when you need a boolean check for a substring or character. Use find when you need the byte offset of the match to extract surrounding text. Use starts_with or ends_with when validating prefixes or suffixes; these methods are faster because they check boundaries first and avoid scanning the whole string. Use contains with a closure when searching for a predicate, like checking if a string contains any digit. Reach for aho-corasick when searching for multiple patterns in large text; the crate builds a finite automaton that scans the input in a single pass.

Convention favors contains('x') for single characters. It's slightly more efficient and signals intent. Use contains("x") only when the pattern is a string. The compiler accepts both, but the character form avoids creating a temporary string slice.

Case-insensitive search costs allocation. Pay the price only when you have to. If performance matters, normalize the text once or use a specialized crate.

Trust the byte index. If you slice a string, verify the boundary. The compiler won't save you here.

Where to go next

Checking if a string contains a substring verifies if one piece of text is hidden inside another. It's like looking for a specific word in a sentence to see if it's there. You use this whenever you need to make a decision based on whether certain text is present.