When a lock is too heavy
You are building a high-throughput service. Every incoming request increments a global counter. You wrapped the counter in a Mutex, and it works. Then you run the profiler. The flame graph shows threads spending 40% of their time blocked, waiting for the lock. The data is just a number. You don't need to lock the whole structure; you just need to add one. The mutex is serializing everything, turning parallel work into a queue.
Rust gives you atomics for exactly this. Atomics let you update shared values without locks. They use special hardware instructions that guarantee the operation happens as a single, indivisible step. No waiting for a key. No context switching overhead. The CPU enforces the safety.
Atomics are indivisible
An atomic operation is indivisible. It happens all at once. No other thread can see the value in the middle of the update.
Think of a mutex like a bathroom with a key. One person holds the key, enters, does business, leaves, and returns the key. If ten threads want to increment a counter, nine wait while one holds the key. The waiting is the cost.
Atomics work differently. They use CPU instructions that guarantee the read-modify-write cycle completes without interruption. On modern hardware, this often involves a compare-and-swap loop. The CPU reads the value, computes the new value, and tries to write it back. If another thread changed the value in the meantime, the write fails, and the CPU retries instantly. This happens at the hardware level, much faster than waking up a thread from a blocked state.
The hardware does the heavy lifting. You just ask for the guarantee.
Minimal counter
Here is the standard pattern. You wrap the atomic in an Arc so multiple threads can share ownership. Then you call methods like fetch_add to mutate the value.
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;
/// Shared counter protected by atomic operations.
fn main() {
// Arc allows multiple threads to own the same data.
// AtomicUsize provides lock-free thread-safe access to the usize.
let counter = Arc::new(AtomicUsize::new(0));
let mut handles = vec![];
for _ in 0..10 {
// Clone the Arc to share ownership with the new thread.
// Convention: use Arc::clone explicitly. It signals we are cloning
// the reference count, not the inner value.
let c = Arc::clone(&counter);
let handle = thread::spawn(move || {
// fetch_add atomically adds 1 and returns the old value.
// SeqCst ensures this operation sees the most recent updates from all threads.
c.fetch_add(1, Ordering::SeqCst);
});
handles.push(handle);
}
for h in handles {
h.join().unwrap();
}
// Load the final value. SeqCst guarantees we see the result of all fetch_adds.
println!("Result: {}", counter.load(Ordering::SeqCst));
}
Clone the Arc, not the value. The convention is explicit.
How fetch_add works
The Arc wraps the AtomicUsize so multiple threads can hold a reference. Arc itself is thread-safe for sharing ownership. The AtomicUsize inside handles the actual data safety.
When a thread calls fetch_add, the CPU executes an atomic instruction. If contention is low, the value updates immediately. If multiple threads hit the same atomic at the exact same nanosecond, the CPU serializes the requests at the bus level or uses a compare-and-swap loop to retry. You never see the intermediate state. The operation either succeeds or retries transparently.
The Ordering::SeqCst parameter tells the compiler and CPU to treat this like a sequential program. It is the strongest guarantee. It ensures that if thread A writes X then Y, thread B will see X then Y. It costs a bit more performance than weaker orderings because it may insert memory barriers that prevent the CPU from reordering instructions. It is the right default for most code.
The CPU handles the retry. You get the result.
The ordering zoo
Atomics require you to specify an Ordering. This controls how the atomic operation synchronizes with other memory accesses. Rust offers several orderings, from strongest to weakest.
Ordering::SeqCst is sequential consistency. All threads see all atomic operations in the same order. It matches the mental model of a single-threaded program. Use this when you are unsure. It is safe and usually fast enough.
Ordering::Acquire and Ordering::Release form a pair. Release on a store ensures that all prior writes in the current thread are visible to any thread that performs an Acquire load on the same atomic. This is perfect for flags. A producer writes data, then stores a flag with Release. A consumer loads the flag with Acquire. If the flag is set, the consumer is guaranteed to see the data. This avoids the global barrier cost of SeqCst.
Ordering::Relaxed guarantees only that the atomic operation itself is atomic. It provides no ordering guarantees for other memory accesses. Use this for counters where you only care about the final sum. If you increment a counter with Relaxed, you might see stale values briefly, but the final result after all threads join will be correct. This is the fastest option.
Pick the weakest ordering that still gives you correctness. Performance follows.
Realistic worker state
Atomics shine when you embed them in structs to coordinate threads. Here is a worker pool state that tracks bytes processed and handles a shutdown signal.
use std::sync::atomic::{AtomicBool, AtomicU64, Ordering};
use std::sync::Arc;
use std::thread;
use std::time::Duration;
/// Worker state shared across threads.
struct WorkerState {
/// Total bytes processed by all workers.
bytes_processed: AtomicU64,
/// Flag to signal workers to stop.
should_stop: AtomicBool,
}
impl WorkerState {
fn new() -> Self {
WorkerState {
bytes_processed: AtomicU64::new(0),
should_stop: AtomicBool::new(false),
}
}
fn record_bytes(&self, count: u64) {
// Atomically add bytes. Relaxed ordering is sufficient here
// because we only care about the final sum, not synchronization with other variables.
self.bytes_processed.fetch_add(count, Ordering::Relaxed);
}
fn request_shutdown(&self) {
// Store true. Release ordering ensures any prior writes (like final byte counts)
// are visible to threads that read this flag.
self.should_stop.store(true, Ordering::Release);
}
fn is_stopping(&self) -> bool {
// Acquire ordering pairs with the Release store above.
// Guarantees we see the final state of other variables if the flag is true.
self.should_stop.load(Ordering::Acquire)
}
}
fn main() {
let state = Arc::new(WorkerState::new());
let mut handles = vec![];
// Spawn workers.
for id in 0..4 {
let state = Arc::clone(&state);
let handle = thread::spawn(move || {
// Simulate work.
for i in 0..100 {
if state.is_stopping() {
break;
}
// Process some data.
state.record_bytes(1024);
thread::sleep(Duration::from_millis(1));
}
println!("Worker {} finished.", id);
});
handles.push(handle);
}
// Let them run briefly.
thread::sleep(Duration::from_millis(50));
// Signal shutdown.
state.request_shutdown();
for h in handles {
h.join().unwrap();
}
// SeqCst is safe for the final read after all threads have joined.
println!("Total bytes: {}", state.bytes_processed.load(Ordering::SeqCst));
}
Relaxed for counters. Acq/Rel for flags. SeqCst for sanity.
Pitfalls and errors
If you try to put a plain usize inside an Arc and mutate it, the compiler rejects you. Arc<usize> is immutable. You need Arc<Mutex<usize>> or Arc<AtomicUsize>. If you try to mutate without the right type, you get E0596 (cannot borrow as mutable) or E0277 (trait bound not satisfied) because usize does not implement Sync. The compiler forces you to choose your synchronization strategy.
fetch_add returns the old value. Beginners often assume it returns the new value. If you need the new value, add one to the result. This design lets you build loops that stop when a condition is met, like checking if a counter reached a limit.
Ordering::Relaxed is not magic. It only guarantees the atomic operation itself is atomic. It does not order other memory accesses. If you use Relaxed for a flag that guards data, you might see stale data. The CPU could reorder instructions and let the flag become visible before the data is ready. Use Acquire/Release when the atomic coordinates with other state.
Relaxed is not a free pass. If you can't prove the order doesn't matter, use SeqCst.
Decision matrix
Use AtomicUsize or AtomicBool when you need a single, small value shared across threads and the operation is a simple read, write, or compare-and-swap.
Use Arc<Mutex<T>> when you need to update a complex data structure, like a HashMap or a struct with multiple fields that must stay consistent.
Use AtomicU64 with Ordering::Relaxed for high-performance counters where you only care about the final sum and don't need to synchronize with other variables.
Use AtomicBool with Ordering::Acquire and Ordering::Release for lightweight flags, like a shutdown signal, where a mutex would add unnecessary overhead.
Reach for Rc<T> when you have shared ownership but only one thread touches the data. Atomics and Arc are for multi-threading. Rc is single-threaded and faster.
Mutexes are for complex state. Atomics are for simple flags and counters. Don't over-engineer a counter with a lock.