How to use barrier for thread synchronization

You have four workers processing chunks of a video frame. Each worker finishes at a different speed. If you render the frame before everyone is done, you get a glitched mess. If you wait for the slowest one manually, you waste CPU cycles spinning. You need a gate that stays shut until every worker arrives, then opens all at once. That is what std::sync::Barrier does.

The revolving door analogy

Think of a barrier like a revolving door in a secure building. The door won't turn until exactly four people step into the slots. Three people can stand there forever, but the door stays locked. The moment the fourth person arrives, the mechanism engages, the door spins, and everyone moves to the next room simultaneously.

The barrier tracks how many threads have arrived. It blocks each thread until the count matches the total. Then it releases them all. The barrier doesn't care about the order. It only cares about the count.

Minimal example

use std::sync::{Arc, Barrier};
use std::thread;

fn main() {
    // Create a barrier for 4 threads.
    // The barrier lives on the heap via Arc so threads can share it.
    let barrier = Arc::new(Barrier::new(4));

    let mut handles = vec![];

    // Spawn 4 threads, each getting a clone of the Arc.
    for i in 0..4 {
        // Convention: Use Arc::clone to signal pointer cloning, not deep cloning.
        let b = Arc::clone(&barrier);
        let handle = thread::spawn(move || {
            println!("Thread {i} is working...");
            
            // Block here until all 4 threads call wait().
            b.wait();
            
            println!("Thread {i} passed the barrier!");
        });
        handles.push(handle);
    }

    // Wait for threads to finish.
    for h in handles {
        h.join().unwrap();
    }
}

The community prefers Arc::clone(&barrier) over barrier.clone(). Both compile, but the explicit form signals to readers that you are cloning the smart pointer, not the underlying data. It prevents confusion with deep clones.

What happens at runtime

When you call Barrier::new(4), the barrier allocates internal state and sets a target count of four. Each thread calls wait(). The first three threads to arrive park themselves. They stop executing and release the CPU. The operating system can schedule other tasks.

The fourth thread arrives, sees the count is full, and wakes up the other three. All four threads return from wait() and continue execution. The return value of wait() is a BarrierWaitResult. This struct tells you which thread is the "leader" for this round. The leader thread gets a BarrierWaitResult where is_leader() returns true. The others get false.

Trust the barrier to wake everyone up. You don't need to manage the wake-up logic.

Realistic example: Two-phase computation

Barriers shine in pipelines where threads do independent work, then need to synchronize before a shared phase. The leader pattern lets you avoid a Mutex for post-barrier work.

use std::sync::{Arc, Barrier};
use std::thread;

/// Simulates a two-phase parallel computation.
/// Phase 1: Independent work. Phase 2: Shared aggregation.
fn parallel_pipeline() {
    let num_threads = 3;
    // Barrier ensures Phase 2 starts only after all threads finish Phase 1.
    let barrier = Arc::new(Barrier::new(num_threads));

    let mut handles = vec![];

    for id in 0..num_threads {
        let b = Arc::clone(&barrier);
        let handle = thread::spawn(move || {
            // Phase 1: Do work independently.
            let result = id * 10;
            println!("Thread {id} computed {result}");

            // Synchronize: Wait for all threads to finish Phase 1.
            let wait_result = b.wait();

            // Phase 2: Only the leader aggregates results.
            // This avoids race conditions without a Mutex.
            if wait_result.is_leader() {
                println!("Leader thread {id} aggregating results...");
                // In real code, you'd access shared state here.
                // Since only one thread runs this block, no lock is needed.
            } else {
                println!("Thread {id} waiting for leader to finish aggregation.");
            }
        });
        handles.push(handle);
    }

    for h in handles {
        h.join().unwrap();
    }
}

If all threads try to lock a mutex after the barrier, you get a thundering herd. The threads wake up simultaneously and contend for the lock. The leader pattern serializes the work without the lock overhead. Only one thread does the work. The others skip the block. This is faster and simpler.

Use the leader thread to do one-time work. Skip the Mutex when only one thread needs to act.

Reusability

Barriers reset automatically. After all threads pass, the internal counter goes back to zero. You can call wait() again for the next phase. This makes barriers perfect for iterative algorithms like parallel matrix multiplication or simulation steps. You don't need to recreate the barrier every iteration.

use std::sync::{Arc, Barrier};
use std::thread;

fn iterative_simulation() {
    let num_threads = 2;
    let barrier = Arc::new(Barrier::new(num_threads));

    let handle = thread::spawn({
        let b = Arc::clone(&barrier);
        move || {
            for step in 0..3 {
                println!("Thread 1 step {step}");
                b.wait(); // Waits for Thread 2 at each step.
            }
        }
    });

    for step in 0..3 {
        println!("Thread 0 step {step}");
        barrier.wait(); // Waits for Thread 1 at each step.
    }

    handle.join().unwrap();
}

The barrier tracks rounds implicitly. You call wait(), it blocks until the group arrives, then resets. The next wait() starts a new round. This saves allocation overhead in tight loops.

Pitfalls and errors

If a thread panics or exits without calling wait(), the barrier never reaches the target count. The remaining threads block forever. This is a deadlock. The program hangs. Rust doesn't prevent this at compile time. You must ensure every thread that clones the barrier actually calls wait().

If you drop the barrier while threads are blocked in wait(), the waiting threads panic. This happens if you lose all Arc handles to the barrier. Keep a handle alive in the main thread or ensure the barrier outlives the threads.

Mismatched counts cause silent bugs. If you create a barrier for four threads but only spawn three, the three threads deadlock. If you spawn five threads, the fifth thread blocks indefinitely waiting for the next group. Always match the count to the number of clones.

If you forget to wrap the barrier in Arc, the compiler stops you. You try to move the barrier into the first thread, then the second iteration fails with E0382 (use of moved value). The barrier must be shared. Arc is the tool.

Ensure every thread calls wait. A missing call turns your barrier into a permanent wall.

When to use Barrier

Use Barrier when multiple threads must synchronize at a specific point before continuing. Use Barrier when you need a reusable synchronization point in a loop. Use Mutex when threads need to share mutable state, not just coordinate timing. Use Condvar when threads need to signal arbitrary events, not just count arrivals. Use JoinHandle when you only need to wait for a thread to finish, not synchronize mid-execution. Reach for Barrier when the leader pattern lets you avoid a lock for post-sync work.

Barriers synchronize time. Mutexes protect data. Don't confuse the two.

Where to go next

A barrier is like a meeting point where threads must all arrive before any of them can continue. You use it to coordinate tasks so that every thread finishes a specific step before the group starts the next one together. It ensures no thread gets ahead of the others at critical moments.