How to remove duplicates from Vec

The most efficient way to remove duplicates from a `Vec` while preserving order is to sort the vector and call `dedup()`, which works in-place for types implementing `PartialEq` and `Ord`.

The neighbor rule

You have a Vec of user IDs coming from an API. The API is messy. It sends duplicates. You need a clean list. You reach for dedup(), run the code, and stare at the output. The duplicates are still there. The compiler didn't complain. The logic is just wrong.

dedup() is a local operation. It only compares adjacent elements. It doesn't scan the whole vector. To remove all duplicates, you have to group them first.

Think of dedup() as a quality control inspector on a conveyor belt. The inspector looks at the current item and the one that just passed. If they match, the inspector tosses the current one. If the duplicates are separated by other items, the inspector never sees them together. dedup() assumes the duplicates are already neighbors. Your job is to make them neighbors.

Sort and dedup: the fast path

When order doesn't matter, sorting followed by deduplication is the fastest approach. It runs in-place. It requires no heap allocations. It only needs the elements to implement Ord.

fn main() {
    let mut numbers = vec![3, 1, 2, 3, 1, 4, 2];

    // Sort brings identical values next to each other.
    // Unstable sort is faster for primitives like i32.
    numbers.sort_unstable();

    // Dedup removes consecutive duplicates in-place.
    // It compacts the vector and updates the length.
    numbers.dedup();

    println!("{:?}", numbers); // [1, 2, 3, 4]
}

Sorting rearranges the data. sort_unstable uses an algorithm like quicksort or timsort without tracking the relative order of equal elements. That tracking costs time. If you don't care whether the first 1 or the second 1 survives, sort_unstable is the choice. The community convention is to use sort_unstable for primitives and simple types. Use sort only when stability matters, such as when sorting structs where you want to preserve the original order of items with equal keys.

dedup() scans the sorted vector once. It compares vec[i] with vec[i-1]. If they are equal, it skips vec[i]. If not, it keeps it. The method compacts the vector by shifting elements left and truncating the length. The return value is the new length, though you rarely need to capture it since the vector is already updated.

Sort first. dedup is blind to distance.

Preserving order with HashSet

When the sequence matters, sorting destroys the information you need. Logs, timestamps, and event streams often require deduplication without reordering. The solution is a HashSet.

A HashSet tracks seen values using a hash table. Lookups are O(1) on average. You iterate through the vector, check if the item is in the set, and keep it only if it's new.

use std::collections::HashSet;

fn main() {
    let numbers = vec![3, 1, 2, 3, 1, 4, 2];

    // HashSet tracks seen values for O(1) lookups.
    let mut seen = HashSet::new();
    // Pre-allocate capacity to avoid reallocations.
    let mut unique = Vec::with_capacity(numbers.len());

    for num in numbers {
        // Insert returns true only if the value was new.
        if seen.insert(num) {
            unique.push(num);
        }
    }

    println!("{:?}", unique); // [3, 1, 2, 4]
}

This approach preserves the order of first occurrences. It works for any type that implements Hash and Eq. It is slower than sort-and-dedup because of hashing overhead and heap allocation for the set. It also allocates a new vector. The trade-off is order preservation and flexibility with types that don't implement Ord.

The HashSet preserves order but pays for it with memory and hashing overhead.

In-place order preservation with retain

Allocating a second vector can be expensive for large datasets. You can combine order preservation with in-place modification using retain. The retain method keeps elements where a closure returns true. You can use the HashSet insertion return value directly inside the closure.

use std::collections::HashSet;

fn main() {
    let mut numbers = vec![3, 1, 2, 3, 1, 4, 2];
    let mut seen = HashSet::new();

    // Retain keeps elements where the closure returns true.
    // Insert returns true on first encounter, false on duplicates.
    numbers.retain(|num| seen.insert(*num));

    println!("{:?}", numbers); // [3, 1, 2, 4]
}

This pattern modifies the original vector. It avoids allocating a second buffer. The HashSet still allocates memory for its buckets, but the vector itself stays in place. This is useful when you are processing a large vector and want to minimize allocation pressure. The closure captures the HashSet by mutable reference. retain handles the borrowing correctly.

Use retain with a HashSet when you need order preservation without allocating a second vector.

Custom equality with dedup_by

Sometimes "duplicate" doesn't mean exactly equal. You might want to ignore case in strings, or compare only specific fields of a struct. dedup_by lets you define custom equality logic. It still requires sorting first to group the items, but the comparison is up to you.

#[derive(Debug, Clone)]
struct Point { x: i32, y: i32 }

fn main() {
    let mut points = vec![
        Point { x: 1, y: 10 },
        Point { x: 2, y: 20 },
        Point { x: 1, y: 30 }, // Duplicate x
        Point { x: 3, y: 40 },
    ];

    // Sort by x so points with same x are neighbors.
    points.sort_by_key(|p| p.x);

    // Dedup based only on x coordinate.
    points.dedup_by(|a, b| a.x == b.x);

    println!("{:?}", points);
}

Here, two points are duplicates if their x coordinates match. The y coordinate is ignored. sort_by_key ensures points with the same x are adjacent. dedup_by uses the custom predicate to decide what to keep. The first occurrence in the sorted group survives.

Define your own equality with dedup_by. The default == isn't the only option.

Trait bounds and compiler errors

The methods above rely on trait bounds. sort_unstable requires Ord. dedup requires PartialEq. HashSet requires Hash and Eq. If your type doesn't implement these traits, the compiler stops you.

If you try to put a custom struct in a HashSet without deriving Hash and Eq, the compiler rejects it with E0277 (trait bound not satisfied). The error message points to the missing trait. You fix it by adding #[derive(Hash, Eq, PartialEq)] to your struct. Note that HashSet requires both Eq and PartialEq. Eq implies PartialEq, but the derive macro needs both attributes to generate the correct implementations.

Hashing maps values to buckets. Collisions happen when different values hash to the same bucket. Eq resolves collisions by checking exact equality. Without Eq, the set can't distinguish items in the same bucket. The compiler enforces this to prevent silent logic errors.

Derive Hash and Eq. The compiler won't guess for you.

Decision matrix

Use sort_unstable followed by dedup when order doesn't matter and you want maximum speed with zero heap allocations.

Use HashSet filtering when you must preserve the original insertion order.

Use retain with a HashSet when you need order preservation and want to modify the vector in place to avoid allocating a second buffer.

Use dedup_by when your definition of duplicate depends on custom logic, like ignoring case or comparing specific fields.

Use BTreeSet when you want a sorted unique collection in a single step and don't mind the overhead of a tree structure.

Where to go next

Removing duplicates from a Vec eliminates repeated items so each value appears only once. You use it when you need a clean list of unique items, like filtering a list of user IDs. Think of it like sorting a deck of cards and then taking out the extra copies of each card.