When channels get crowded
You are building a parallel event processor. Three worker threads need to pull tasks from a shared queue. You reach for std::sync::mpsc. It works fine until you realize mpsc only supports a single receiver. You switch to a Mutex<VecDeque>. It works until you profile the application and see threads spending 40% of their time waiting for the mutex lock. The contention kills throughput. You need a queue that handles multiple producers and multiple consumers without a central lock, or a structure that lets threads steal work from each other without blocking.
Crossbeam is the toolkit for this exact situation. It provides high-performance, lock-free synchronization primitives that outperform standard library alternatives in parallel workloads. It is not a replacement for the standard library. It is the set of tools you reach for when standard locks become the bottleneck or when you need concurrency patterns the standard library does not provide.
What crossbeam actually is
Crossbeam is a family of crates designed for low-level concurrent programming. The ecosystem splits into three main parts:
crossbeam-channelprovides message-passing channels with support for multiple producers and multiple consumers, plus aselect!macro for polling multiple channels.crossbeam-dequeprovides work-stealing deques for building parallel task schedulers.crossbeam-utilsprovides low-level utilities likeAtomicCellfor atomic operations on anyCopytype, scoped threads, and adaptive backoff for spin loops.
Lock-free data structures guarantee progress even if threads get preempted. No thread can block another thread's ability to make progress. This contrasts with mutex-based structures where one thread holding a lock can stall all other threads waiting for that lock. Lock-free algorithms rely on atomic operations provided by the CPU, such as compare-and-swap, to update shared state without acquiring a lock.
Crossbeam channels use lock-free algorithms for sending and receiving. Work-stealing deques use atomic pointers to allow one thread to pop an item from another thread's queue without coordination. AtomicCell wraps any Copy type in an atomic container, giving you the flexibility of AtomicUsize for custom flags or small structs.
Channels that scale
The most common entry point to crossbeam is crossbeam-channel. Unlike std::sync::mpsc, which is single-producer single-consumer, crossbeam channels support multiple producers and multiple consumers. They are also faster due to optimized lock-free internals.
use crossbeam_channel::unbounded;
fn main() {
// unbounded creates a channel with no capacity limit.
// Messages are allocated on the heap as needed.
let (sender, receiver) = unbounded();
// Clone the sender to move it into multiple threads.
let tx1 = sender.clone();
let tx2 = sender.clone();
std::thread::spawn(move || {
// Send does not block in an unbounded channel.
// It allocates and pushes atomically.
tx1.send("data from thread 1").unwrap();
});
std::thread::spawn(move || {
tx2.send("data from thread 2").unwrap();
});
// Drop the original sender so the channel closes when threads finish.
drop(sender);
// Receive messages. recv blocks until a message is available.
while let Ok(msg) = receiver.recv() {
println!("Received: {}", msg);
}
}
The unbounded channel never blocks on send. This is useful when producers are fast and you want to buffer work. However, unbounded channels can grow without limit and cause out-of-memory errors if producers outpace consumers. Use bounded(capacity) when you need backpressure. A bounded channel blocks the sender when the buffer is full, forcing producers to slow down and match the consumer's rate.
Bounded channels provide flow control. Unbounded channels provide throughput. Pick the one that matches your workload.
The select! macro
One of the strongest features of crossbeam-channel is the select! macro. It allows you to wait for messages on multiple channels simultaneously without polling or complex state machines. The macro expands to a match expression that efficiently registers interest in each channel and wakes up when any one of them has data.
use crossbeam_channel::{bounded, select};
use std::time::Duration;
fn main() {
let (data_tx, data_rx) = bounded(10);
let (stop_tx, stop_rx) = bounded(1);
// Producer sends data and then a stop signal.
std::thread::spawn(move || {
for i in 0..5 {
data_tx.send(i).unwrap();
std::thread::sleep(Duration::from_millis(100));
}
// Signal shutdown after data is sent.
stop_tx.send(()).unwrap();
});
// Consumer uses select! to handle both data and shutdown.
loop {
select! {
recv(data_rx) -> msg => {
match msg {
Ok(val) => println!("Processing: {}", val),
Err(_) => break, // Channel closed
}
}
recv(stop_rx) -> _ => {
println!("Shutdown signal received");
break;
}
}
}
}
The select! macro simplifies event loops. You can listen for data, control signals, timeouts, and channel closures in a single readable block. The macro is fair: if multiple channels have messages, it cycles through them to avoid starvation. This pattern replaces verbose polling loops and reduces bugs in concurrent control flow.
Use select! whenever you need to react to multiple asynchronous sources. It turns complex concurrency logic into a straightforward pattern match.
AtomicCell for small data
The standard library provides atomic types like AtomicUsize, AtomicBool, and AtomicPtr. These are great for counters and flags. What if you need an atomic flag with multiple bits, or an atomic small struct? The standard library does not support atomic operations on arbitrary types.
crossbeam-utils provides AtomicCell<T> for any type that implements Copy. It wraps the value in an UnsafeCell and uses atomic operations to read and write. This gives you the flexibility to create atomic flags, counters, or small configurations without defining custom atomic types.
use crossbeam_utils::AtomicCell;
use std::sync::Arc;
#[derive(Copy, Clone)]
struct Flags {
active: bool,
count: u8,
}
fn main() {
// AtomicCell requires the inner type to be Copy.
let cell = Arc::new(AtomicCell::new(Flags { active: true, count: 0 }));
// Load the entire struct atomically.
let flags = cell.load();
println!("Active: {}, Count: {}", flags.active, flags.count);
// Store a new value atomically.
cell.store(Flags { active: false, count: 5 });
// Compare-and-swap for conditional updates.
let old = cell.load();
let new = Flags { active: old.active, count: old.count + 1 };
cell.compare_exchange(old, new).unwrap();
}
AtomicCell is a powerful tool for lock-free state management. It works for any Copy type, including enums, tuples, and small structs. The size of the type matters: larger types may require wider atomic operations, which can be slower or unsupported on some architectures. Stick to small types for best performance.
If you try to use AtomicCell with a non-Copy type like String, the compiler rejects you with E0277 (trait bound not satisfied). AtomicCell only works for types that can be copied bitwise. Use Arc or Mutex for heap-allocated data.
Work-stealing deques
Work-stealing is a concurrency pattern where threads maintain their own queues of work. When a thread runs out of work, it steals tasks from another thread's queue. This balances load dynamically and reduces contention compared to a single shared queue. crossbeam-deque provides the primitives to build work-stealing schedulers.
The deque has two sides: the producer side for pushing work, and the stealer side for popping work from other threads. Injector is a shared producer that multiple threads can push to. Stealer is a handle that allows one thread to pop from another thread's deque. Worker manages a deque for a single thread and provides a stealer for others.
use crossbeam_deque::{Injector, Stealer};
use crossbeam_utils::thread;
fn main() {
// Injector allows multiple threads to push tasks.
let injector = Injector::new();
// Create workers with stealers for each.
let mut stealers = Vec::new();
for _ in 0..4 {
let worker = crossbeam_deque::Worker::new();
stealers.push(worker.stealer());
}
// Push initial tasks to the injector.
for i in 0..10 {
injector.push(i);
}
// Process tasks using scoped threads.
thread::scope(|s| {
for stealer in &stealers {
s.spawn(|_| {
loop {
// Try to steal work from other threads.
match stealer.steal() {
Ok(task) => {
println!("Stole task: {}", task);
}
Err(crossbeam_deque::StealError::Empty) => {
// No work available, break or sleep.
break;
}
Err(crossbeam_deque::StealError::Disabled) => {
// Injector was dropped, stop.
break;
}
}
}
});
}
});
}
Work-stealing deques are the backbone of parallel task schedulers like Rayon. They minimize contention by keeping work local and only coordinating when threads are idle. This pattern scales well on multi-core systems.
Building a work-stealing scheduler from scratch is complex. crossbeam-deque gives you the low-level primitives, but you still need to manage thread lifetimes, task priorities, and shutdown logic. Use crossbeam-deque when you are implementing a custom parallel algorithm or scheduler. For general-purpose parallel iteration, use Rayon instead.
Adaptive backoff for spin loops
Spin loops are common in lock-free programming. A thread repeatedly checks a condition until it becomes true. A naive spin loop burns CPU cycles and can cause contention. crossbeam-utils provides Backoff to adapt the spin behavior based on contention.
Backoff starts with a tight loop. As it detects contention, it inserts memory fences, yields the CPU, and eventually sleeps. This reduces power consumption and improves throughput under high contention.
use crossbeam_utils::Backoff;
use std::sync::atomic::{AtomicBool, Ordering};
fn main() {
let flag = AtomicBool::new(false);
let backoff = Backoff::new();
// Spin until the flag is set.
while !flag.load(Ordering::SeqCst) {
// snooze() adapts based on contention.
// It does nothing at first, then yields, then sleeps.
backoff.snooze();
}
println!("Flag is set!");
}
The community convention for spin loops is to use Backoff. It handles the low-level details of memory ordering and yielding automatically. Write backoff.snooze() instead of a raw loop. This ensures your spin loops behave well under contention and do not waste CPU resources.
Pitfalls and compiler errors
Lock-free does not mean race-free. Crossbeam provides tools for concurrent access, but you still need to ensure correctness. Data races occur when multiple threads access shared data without synchronization, and at least one access is a write. Crossbeam primitives prevent data races by enforcing atomicity, but you must use them correctly.
If you try to use AtomicCell with a non-Copy type, the compiler rejects you with E0277 (trait bound not satisfied). AtomicCell requires the inner type to be Copy. Use Arc or Mutex for heap data.
use crossbeam_utils::AtomicCell;
fn main() {
// This fails to compile.
// error[E0277]: the trait bound `String: Copy` is not satisfied
let _cell = AtomicCell::new(String::new());
}
crossbeam-channel select! can starve channels if one channel is always ready. The macro is fair, but if you process messages from one channel much faster than others, the slower channels may fall behind. Ensure your processing logic does not bias one channel over others.
Work-stealing deques are lock-free but not wait-free. Threads can still contend on the deque, especially when many threads try to steal simultaneously. Performance depends on the workload and the number of threads. Profile your application to ensure work-stealing provides the expected benefits.
Lock-free algorithms are harder to reason about than mutex-based ones. They rely on subtle atomic operations and memory ordering. Use crossbeam when you have measured a bottleneck and need the performance. Do not reach for lock-free structures for simple cases.
When to use crossbeam
Use crossbeam-channel when you need multiple producers and multiple consumers with high throughput, or when you need to select across multiple channels using the select! macro. Use std::sync::mpsc when you have a single producer and single consumer, or when you want to avoid external dependencies for simple cases.
Use crossbeam-deque when you are building a work-stealing scheduler, like a parallel task pool or a parallel tree traversal, where threads need to grab work from each other efficiently. Use Rayon when you want data-parallel iteration over collections without managing threads manually.
Use crossbeam-utils when you need low-level atomics like AtomicCell for Copy types, or Thread for scoped threads, or Backoff for spin loops. Use std::sync::atomic when you only need standard atomic types like AtomicUsize or AtomicBool.
Pick the tool that matches your concurrency pattern. Do not reach for work-stealing queues when a simple channel will do. Measure performance before optimizing. Lock-free structures add complexity. Use them when the complexity pays off in throughput.