The lock-free counter
You're running a high-throughput service. Every request increments a counter. You wrapped the counter in a Mutex because threads are involved. The code works. Then you check the profiler. The bottleneck isn't the work. It's the lock. Threads are queuing up just to add one to a number. You don't need a guard dog for a single integer. You need a hardware primitive that guarantees the operation happens all at once. That's where atomic types come in.
Atomic types live in std::sync::atomic. They wrap primitive types like u32, i64, or bool. The word "atomic" comes from chemistry: the smallest unit that cannot be split. In Rust, an atomic operation is indivisible. When thread A reads an atomic, it sees either the old value or the new value. It never sees a torn read where half the bits are old and half are new. More importantly, atomics let you coordinate between threads without the overhead of a Mutex. A mutex requires the OS to put a thread to sleep and wake it up. An atomic operation usually compiles down to a single CPU instruction. It's fast. It's lock-free.
Minimal example
use std::sync::atomic::{AtomicU32, Ordering};
fn main() {
// Create an atomic counter on the stack.
let counter = AtomicU32::new(0);
// Increment atomically. This returns the old value.
// Ordering::Relaxed is the fastest option; use it when you only care about the count itself.
counter.fetch_add(1, Ordering::Relaxed);
// Read the value.
let current = counter.load(Ordering::Relaxed);
println!("Count: {}", current);
}
Realistic usage with threads
In a real application, you'll share the atomic across threads. That requires Arc to manage ownership.
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::Arc;
use std::thread;
fn main() {
// Wrap the atomic in Arc so multiple threads can own the reference.
let counter = Arc::new(AtomicU64::new(0));
let mut handles = vec![];
for _ in 0..10 {
// Clone the Arc, not the atomic value.
// This increments the reference count, not the counter.
let counter_clone = Arc::clone(&counter);
let handle = thread::spawn(move || {
// Each thread increments the shared counter 1000 times.
for _ in 0..1000 {
// Fetch-add returns the previous value, which we discard.
let _ = counter_clone.fetch_add(1, Ordering::Relaxed);
}
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}
// Load the final result.
let total = counter.load(Ordering::Relaxed);
assert_eq!(total, 10_000);
}
Note the Arc::clone(&counter) call. Both counter.clone() and Arc::clone(&counter) compile. The community convention is the explicit form. counter.clone() looks like it might deep-clone the data, but it doesn't. Using Arc::clone signals to the reader that you're just bumping the reference count.
Don't reach for a Mutex when an Atomic will do. The performance gap is real.
Ordering controls memory visibility
The Ordering parameter is where atomics get tricky. It controls how the atomic operation synchronizes with other memory accesses. CPUs and compilers reorder instructions to optimize performance. Ordering tells them what they are allowed to reorder.
Relaxed guarantees the operation is atomic but says nothing about other memory. If you have a flag and a buffer, and you update the buffer then set the flag with Relaxed, another thread might see the flag set but still see the old buffer contents because the CPU reordered the writes. Use Relaxed only when the atomic value is self-contained, like a counter where you don't care about the exact order of increments, just the final sum.
Acquire and Release create a happens-before relationship. Release on a store ensures that all writes before the store become visible to other threads that perform an Acquire load on the same atomic. Acquire on a load ensures that all reads and writes after the load cannot be reordered before it. This is the classic pattern for a producer-consumer flag. The producer writes data, then stores the flag with Release. The consumer loads the flag with Acquire. If the consumer sees the flag, it is guaranteed to see the data.
SeqCst is the strongest ordering. It acts like a single global sequence of operations. All threads see the same order of SeqCst operations. It's safe to use SeqCst everywhere if performance isn't critical. The compiler and CPU optimize SeqCst heavily on x86, so the cost is often negligible there. On ARM, SeqCst can be expensive.
Start with SeqCst. Relax only when the profiler screams.
Pitfalls and compiler errors
You can't print an atomic directly. If you write println!("{}", counter), the compiler rejects it with E0277 (trait bound not satisfied). Atomics don't implement Display. You have to load the value first.
fetch_add returns the previous value. Beginners often ignore this or use it wrong. If you want the new value, you need to add one to the result, or use fetch_add and ignore the return. The return value is useful when you need to know the state before the update, like in a lock-free queue implementation.
There is no AtomicString. Atomics work on fixed-size primitives. If you need to share a string, use Arc<String> or a Mutex. You cannot atomically update a variable-length value.
Using the wrong ordering leads to subtle bugs. If you use Relaxed when you need synchronization, you get data races on other variables. The compiler won't catch this. The code will compile and run, but threads will see stale data. Test thoroughly with tools like Miri if you relax orderings.
Trust the borrow checker for data races, but remember it doesn't check Ordering. You are responsible for the memory model.
When to use atomics
Use AtomicU32 or AtomicI64 when you need to share a simple counter, flag, or state machine between threads and the operation is a single read or write. Use AtomicBool when you need a lock-free flag to signal between threads, like a shutdown signal. Use Arc<AtomicT> when multiple threads need to share ownership of the atomic value. Reach for Mutex<T> when you need to protect a complex data structure or perform a read-modify-write sequence that spans multiple variables. Use Cell<T> or RefCell<T> when you need interior mutability within a single thread and don't care about concurrency. Pick SeqCst ordering when you are unsure about memory ordering; it provides the strongest guarantees and matches the mental model of a sequential program, at a small performance cost on most modern CPUs.