When one thread isn't enough
You're building a web server. It needs to handle a thousand incoming requests at once. Some requests are just looking up a user in a database. Others are resizing a massive image. If you try to do everything on one thread, the server freezes when the image resize starts. You need a way to juggle the requests without dropping any, and a way to blast through the image work using all your CPU cores. That's the split between concurrency and parallelism.
Concurrency is about structure and coordination. It's managing multiple tasks that make progress over time, even if they don't run at the exact same nanosecond. Parallelism is about execution. It's doing multiple things at the exact same time, usually on different hardware cores.
Think of a coffee shop. Concurrency is one barista taking an order, starting to grind beans, then switching to pour water while the beans grind, then switching back to hand off a drink. The barista is juggling tasks. Parallelism is hiring three baristas so three drinks get made at the same time. You can have concurrency without parallelism (one barista juggling). You can have parallelism without concurrency (three baristas each making one drink sequentially). Rust gives you tools for both, and they solve different problems.
Minimal examples
Concurrency in Rust usually means async and await. You write code that looks synchronous, but the compiler turns it into a state machine that can pause and resume. Parallelism usually means threads. You spawn a new OS thread and run code on it.
use std::thread;
use std::time::Duration;
/// Concurrency: Tasks pause and yield control.
/// This runs on a single thread, switching when IO waits.
async fn concurrent_tasks() {
let task_a = async {
// Simulate network wait. The runtime pauses this task.
// Other tasks can run while we wait.
tokio::time::sleep(Duration::from_millis(100)).await;
println!("Task A done");
};
let task_b = async {
tokio::time::sleep(Duration::from_millis(100)).await;
println!("Task B done");
};
// Run both concurrently. Total time ~100ms, not 200ms.
// The runtime switches between them while sleeping.
tokio::join!(task_a, task_b);
}
/// Parallelism: Tasks run on separate OS threads.
/// This uses multiple CPU cores simultaneously.
fn parallel_tasks() {
let handle_a = thread::spawn(|| {
// This runs on a different core.
// CPU work happens here while main thread does other work.
println!("Thread A working");
});
let handle_b = thread::spawn(|| {
println!("Thread B working");
});
// Wait for threads to finish.
handle_a.join().unwrap();
handle_b.join().unwrap();
}
Async tasks share a thread. They switch when they hit an await point. Threads run independently. They don't switch unless the OS scheduler interrupts them.
How async works under the hood
When you write async fn, the compiler doesn't create a thread. It turns your function into a state machine. It's a struct that holds your local variables and a state enum. The runtime calls a poll method. If the task is ready, it runs. If it needs to wait for IO, it returns Pending and registers a waker. The runtime puts it aside and runs something else. No thread is blocked.
The Future trait is the engine under async. Every async fn returns a type that implements Future. The runtime holds a Pin<&mut Future> and calls poll. If poll returns Poll::Ready, the task is done. If Poll::Pending, the runtime stores the task and moves on. The magic is the Waker. When the IO completes, the kernel or driver calls the waker, which wakes the task back up. This avoids polling loops and busy waiting.
Async tasks use the heap for state and share the thread stack. You can spawn millions of async tasks with minimal memory overhead. Threads carry a stack, usually 2MB. If you spawn 1000 threads, you need 2GB of RAM just for stacks. Async scales better for high connection counts.
Don't treat async as a magic performance boost. It only helps if you yield control. Write code that awaits often.
How threads work under the hood
std::thread::spawn asks the OS for a new thread. The OS gives you a stack and schedules it on a core. If the thread blocks on IO, the OS pauses it and switches to another thread. This costs more memory and context switching is heavier than async switching.
Threads are isolated. Each thread has its own stack and registers. The OS manages the scheduling. Rust adds safety on top. Data must be Send to move across threads. Data must be Sync to be shared between threads. The compiler enforces these bounds.
use std::thread;
use std::rc::Rc;
/// This fails to compile.
/// Rc is not Send. You can't move it across threads.
fn bad_parallelism() {
let data = Rc::new("hello");
thread::spawn(|| {
// Error[E0277]: `Rc<&str>` cannot be sent between threads safely
println!("{}", data);
});
}
The compiler rejects this with E0277 (trait bound not satisfied). Rc uses reference counting without atomic operations. It's fast for single-threaded code, but unsafe for threads. Use Arc instead. Arc stands for Atomic Reference Counting. It uses atomic operations to update the counter safely across threads.
Convention aside: write Arc::clone(&data) instead of data.clone(). Both compile and work. The explicit form signals to readers that you're cloning the pointer, not the data. data.clone() looks like a deep clone but isn't.
Treat Send as a contract. If the compiler says it's not Send, believe it.
Realistic example: web server profile fetch
A web server needs to fetch a user profile. The profile includes user data from a database and an image from a CDN. Both are IO bound. You want to fetch them concurrently to reduce latency.
use tokio::task;
/// Fetch user data from database.
/// Simulates IO wait.
async fn fetch_user_data(user_id: u32) -> String {
// In real code, this would be a database query.
// The .await yields control to the runtime.
tokio::time::sleep(std::time::Duration::from_millis(50)).await;
format!("User data for {}", user_id)
}
/// Fetch image from CDN.
/// Simulates IO wait.
async fn fetch_image(user_id: u32) -> Vec<u8> {
// In real code, this would be an HTTP request.
tokio::time::sleep(std::time::Duration::from_millis(50)).await;
vec![0u8; 1024]
}
/// Get user profile by fetching data and image concurrently.
/// Uses tokio::spawn to run tasks on the runtime's thread pool.
async fn get_user_profile(user_id: u32) -> (String, Vec<u8>) {
// Spawn two concurrent tasks.
// They run on the same thread but switch when waiting.
let user_handle = task::spawn(fetch_user_data(user_id));
let image_handle = task::spawn(fetch_image(user_id));
// Await both handles.
// Total time ~50ms, not 100ms.
let user_data = user_handle.await.unwrap();
let image_data = image_handle.await.unwrap();
(user_data, image_data)
}
If the image processing is CPU heavy, async tasks block the runtime. You must use a thread pool or std::thread for CPU work.
use tokio::task;
/// Process image on a separate thread.
/// Uses spawn_blocking to avoid blocking the async runtime.
async fn process_image(image_data: Vec<u8>) -> Vec<u8> {
// CPU work blocks the thread.
// spawn_blocking moves this to a blocking thread pool.
task::spawn_blocking(move || {
// Heavy CPU work here.
// This runs on a dedicated thread.
image_data.iter().map(|&b| b.wrapping_add(1)).collect()
})
.await
.unwrap()
}
Convention aside: use tokio::task::spawn_blocking for CPU work in async contexts. Never block the async runtime with a heavy loop. The runtime might panic with a "blocking detected" error if you use tools like tokio-console. This is the most common mistake beginners make. They treat async as a magic performance boost, but it only helps if you yield control.
Async is for IO. Threads are for CPU. Mixing them without spawn_blocking kills your throughput.
Pitfalls and compiler errors
Blocking the async runtime is the biggest pitfall. If you put a heavy loop inside an async fn without yielding, you block the entire runtime. Other tasks starve. The runtime might panic with a "blocking detected" error if you use tools like tokio-console. This is the most common mistake beginners make. They treat async as a magic performance boost, but it only helps if you yield control.
Data races are prevented by the compiler. For parallelism, data must be Send to move across threads. For concurrency, if you share data between tasks, you need thread-safe types if the tasks might run on different threads. Arc instead of Rc. Mutex instead of RefCell.
use std::sync::Arc;
use std::sync::Mutex;
/// Share data between threads safely.
/// Arc provides shared ownership. Mutex provides exclusive access.
fn safe_parallelism() {
let data = Arc::new(Mutex::new(vec![1, 2, 3]));
let handle = std::thread::spawn({
let data = Arc::clone(&data);
move || {
// Lock the mutex to modify data.
// This blocks the thread until the lock is acquired.
let mut vec = data.lock().unwrap();
vec.push(4);
}
});
handle.join().unwrap();
}
The compiler enforces Send and Sync bounds. If you try to move a non-Send type across a thread, you get E0277. If you try to share a non-Sync type between threads, you get E0277. These errors save you from data races at compile time.
Don't fight the compiler here. Reach for Arc and Mutex when sharing data across threads.
Decision matrix
Use async/await when your tasks spend most time waiting for IO, network, or disk. Use std::thread::spawn when you have CPU-bound work that needs to run on multiple cores simultaneously. Use tokio::task::spawn_blocking when you must call a blocking library or run CPU work from within an async context. Use Rayon when you are doing data-parallel map-reduce style operations on collections. Use std::sync::mpsc or tokio::sync::mpsc when you need to pass messages between concurrent tasks.
Use Rc for single-threaded shared ownership. Use Arc for multi-threaded shared ownership. Use RefCell for interior mutability in single-threaded code. Use Mutex for interior mutability in multi-threaded code.
Counter-intuitive but true: the more you use threads, the harder your code becomes to reason about. Prefer async for IO-heavy workloads. Use threads only when you need parallel CPU execution.