How to use HashSet in Rust

HashSet stores unique values with constant-time insert and contains. Use insert's bool return for dedup, the union/intersection/difference methods for set algebra, and derive Hash + Eq for your own types.

When you only care if something is in the bag

You're processing a list of words. You want to know which ones are unique. Or you have a stream of user IDs and you need to detect when one shows up twice. Or you're building the world's tiniest spell-checker, and you need a "is this a real word, yes or no" lookup.

A Vec is the wrong tool. Checking whether a Vec contains a value means scanning every element until you find it (or run out). For a few items that's fine. For ten thousand it's a noticeable pause every time. What you want is a data structure that, given a value, can answer "is this in the collection?" in roughly constant time, regardless of how big it gets. That's HashSet.

A HashSet<T> is a collection of unique values. Add a value, it's there. Add the same value again, it's still just there once. Ask whether a value is in the set, and you get an answer almost instantly because the set hashes your query and jumps straight to where it would have stored it.

The simplest possible use

use std::collections::HashSet;

fn main() {
    // HashSet::new() creates an empty set with default hasher (SipHash 1-3,
    // which is randomly seeded to resist adversarial collisions).
    let mut seen = HashSet::new();

    // Insert returns true if the value was new, false if it was already there.
    // We can use that return value to detect duplicates as we go.
    seen.insert("apple");
    seen.insert("banana");
    let was_new = seen.insert("apple"); // false, "apple" was already in.

    println!("apple was new this time? {was_new}");
    println!("size: {}", seen.len());     // prints 2
    println!("contains kiwi? {}", seen.contains("kiwi")); // false
}

A few things to notice. insert returns bool. People often forget that and the warning slides past, but it's a useful return: it tells you "this is the first time we've seen this value." Lots of dedup logic falls out of that one fact.

contains takes a borrowed reference. You don't need to construct a fresh value to query the set; you can ask with whatever you've already got. That's why seen.contains("kiwi") works without a .to_string().

Counting uniques in one line

HashSet plays well with iterators. Counting unique items in a slice becomes a one-liner:

use std::collections::HashSet;

fn main() {
    let words = ["the", "cat", "sat", "on", "the", "mat"];

    // collect into HashSet<&str>; each duplicate is silently merged. .len()
    // then tells us how many distinct values there were.
    let unique = words.iter().collect::<HashSet<_>>().len();
    println!("{unique} unique words"); // 5
}

For a Vec<String> you'd typically into_iter().collect::<HashSet<String>>(), which moves the strings into the set so you don't need clones.

Set algebra

The thing HashSet is named after, but most beginners don't reach for: union, intersection, and difference. These are the operations that make a set worth its name.

use std::collections::HashSet;

fn main() {
    let a: HashSet<i32> = [1, 2, 3, 4].into_iter().collect();
    let b: HashSet<i32> = [3, 4, 5, 6].into_iter().collect();

    // intersection returns an iterator of items in both sets. We collect into
    // a fresh HashSet to materialise it. .copied() turns &i32 into i32 by
    // copying, which is free for small Copy types.
    let common: HashSet<i32> = a.intersection(&b).copied().collect();
    let either: HashSet<i32> = a.union(&b).copied().collect();
    let only_in_a: HashSet<i32> = a.difference(&b).copied().collect();

    println!("intersection: {common:?}");      // {3, 4}
    println!("union: {either:?}");             // {1, 2, 3, 4, 5, 6}
    println!("only in a: {only_in_a:?}");      // {1, 2}
}

There's also symmetric_difference, which is "in either but not both." If you find yourself doing for-loops with manual contains checks, you almost certainly want one of these methods instead.

Custom types as set members

Built-in types (numbers, strings, tuples of these) work in a HashSet out of the box. For your own types, you have to opt in to the trait machinery:

use std::collections::HashSet;

// HashSet needs to know how to hash, compare for equality, and (for clear
// error messages) format its elements. PartialEq + Eq is the equality pair.
// Hash gives the hashing function. Without these, HashSet won't compile.
#[derive(Debug, Hash, Eq, PartialEq)]
struct UserId(u64);

fn main() {
    let mut seen = HashSet::new();
    seen.insert(UserId(101));
    seen.insert(UserId(202));

    let probe = UserId(101);
    println!("seen 101? {}", seen.contains(&probe)); // true
}

If you skip Hash or Eq, the error is direct:

error[E0277]: the trait bound `UserId: Hash` is not satisfied
  --> src/main.rs:8:5
   |
8  |     seen.insert(UserId(101));
   |          ^^^^^^ the trait `Hash` is not implemented for `UserId`

The fix is the four-trait derive line above. The two equalities (PartialEq and Eq) are siblings: every Eq is also a PartialEq, but Rust splits them so floats can implement only PartialEq (because of NaN). For set keys, you need both.

Speed knobs

The default hasher is cryptographically strong, randomly seeded, and slower than necessary for non-adversarial workloads. If you're hashing millions of integers in a hot loop and you don't need protection against attacker-chosen collisions, swap in a faster hasher:

[dependencies]
ahash = "0.8"
use std::collections::HashSet;
use ahash::RandomState;

// AHash is much faster than the default for typical data. The downside is
// it's not designed to resist adversarial input, so don't use it for sets
// keyed by user-controlled strings on a public web server.
let mut fast: HashSet<u64, RandomState> = HashSet::default();
fast.insert(1);

For the opposite case, where you want deterministic iteration order or ordered traversal, use BTreeSet instead. It's slightly slower per operation but keeps elements sorted.

Common pitfalls

Iteration order. HashSet makes no promise about iteration order. Each run of your program may produce a different order even with the same input, because the hasher is randomly seeded. If you need stable order, sort after collecting, or use BTreeSet.

Mutating elements after insertion. HashSet keys are immutable. There's no get_mut. If you need to change an element's identity, you have to remove it and re-insert. Otherwise the hash and the storage location no longer match, and lookups silently break.

Hashing floats. f32 and f64 don't implement Eq (because NaN != NaN), so they can't be HashSet members directly. Either round to integers, use OrderedFloat from the ordered_float crate, or work in a fixed-point representation.

Storing references that don't outlive the set. HashSet<&str> borrows; the strings must live at least as long as the set. If the strings come and go, store Strings instead.

When to reach for what

Use HashSet when you want fast "is X present?" lookups and don't care about order. Use BTreeSet when you want sorted iteration or smaller per-element memory at the cost of O(log n) operations. Use a Vec when membership tests are rare and you mostly iterate. Use HashMap when you want associated values, not just membership.

For deduplicating a stream while keeping insertion order, the indexmap crate gives you IndexSet, which combines hash lookups with insertion order tracking.

Where to go next