Performance: When to Use Async vs Threads in Rust

Use threads for CPU-bound tasks that require true parallelism across multiple cores, and use async for I/O-bound tasks that need to handle many concurrent operations efficiently on a single thread. Threads leverage the OS scheduler to run code simultaneously, while async uses a runtime to multiplex

When one thread isn't enough, and a thousand threads is too many

You're building a service. One request asks for a heavy image resize. Another asks for a user profile from a database. The database takes 200ms. The image resize takes 50ms. If you handle both on the same thread, the profile request waits 50ms for the image to finish. If you spawn a new thread for every request, a traffic spike crashes your server with memory exhaustion. You need a way to do the heavy math without stopping the world, and a way to wait for the database without wasting a thread.

Rust gives you two tools for this. Threads provide true parallelism by handing work to the OS. Async provides high concurrency by multiplexing many tasks across a few threads. Picking the wrong one kills your performance. Picking the right one scales your system.

Threads and async: two different tools

Threads are like hiring multiple chefs. Each chef works on one dish at a time. If a chef is chopping vegetables, they are busy. If they need to wait for the oven, they stand there waiting. You get parallelism: two chefs can chop at the same time. But chefs are expensive. You can't hire a thousand chefs for a small kitchen. Each chef needs a station, tools, and a salary. In Rust, each thread gets a 2MB stack and costs CPU cycles to create and switch.

Async is like one chef with a stopwatch. The chef starts a dish, puts it in the oven, and immediately moves to the next station. When the oven timer rings, the chef returns to that dish. The chef never waits. The chef handles hundreds of dishes by switching between them instantly. This is concurrency, not parallelism. The chef is still one person. In Rust, async tasks are lightweight state machines. You can run millions of them on a single thread. The runtime switches between them in nanoseconds.

Threads give you parallelism. Async gives you concurrency. You often need both.

The cost of a thread

Operating system threads are heavy. When you call std::thread::spawn, the OS allocates a stack, usually 2MB, and sets up kernel structures. Creating a thread takes microseconds. Switching between threads requires a context switch. The kernel saves the CPU registers, updates memory mappings, and loads the new thread's state. This takes microseconds and flushes the CPU cache.

If you spawn a thread for every incoming request, you hit limits fast. A server handling 10,000 requests needs 20GB of stack memory just for threads. The CPU spends more time switching contexts than running code. The system thrashes.

use std::thread;

fn main() {
    // Spawning a thread is expensive.
    // The OS allocates a 2MB stack and sets up kernel structures.
    let handle = thread::spawn(|| {
        // This runs in parallel with main.
        // If this does heavy math, it uses a CPU core fully.
        let result: i32 = (0..1_000_000).sum();
        result
    });

    // Main thread waits here.
    // No parallelism happens during the join.
    let _ = handle.join().unwrap();
}

Threads shine when you have work that saturates a CPU core. If a task is doing math, encryption, or compression, it needs the core. A thread keeps that core busy. Threads also shine for FFI calls that block the OS. If you call a C library that sleeps or waits for a file, the thread blocks, but other threads keep running.

How async actually works

Async in Rust is not magic. The compiler transforms your async fn into a state machine struct. Each .await point becomes a state in the machine. The struct holds all the local variables and the current state.

When you poll the future, the state machine runs until the next .await. If the I/O is ready, it advances to the next state. If the I/O isn't ready, it returns Pending and tells the runtime to wake it up when the I/O is ready. The runtime stores the state machine and moves on to poll another future.

This switch is cheap. It's just a function call and a state check. No kernel involvement. No cache flush. You can switch thousands of tasks in the time it takes to switch one OS thread.

use tokio;

#[tokio::main]
async fn main() {
    // This task runs on the runtime's thread pool.
    // The compiler turns this function into a state machine.
    let _ = tokio::net::TcpStream::connect("127.0.0.1:8080").await;
}

The runtime manages a thread pool. By default, tokio uses a multi-threaded scheduler. It creates one thread per CPU core. Tasks are distributed across these threads. When a task yields at an .await, the thread picks up another task from the queue. The thread never sits idle waiting for I/O. It always has work.

Convention aside: tokio::join! is the standard way to wait for multiple tasks. It's cleaner than managing a vector of handles and avoids allocating a vector for the join. Use join! when you have a fixed number of tasks.

The realistic pattern: mixing both

Real applications have both I/O and CPU work. You fetch data from a database, then process it, then send a response. The fetch is I/O bound. The processing is CPU bound. The response is I/O bound.

If you run the CPU work on the async runtime, you block the thread. Other tasks waiting for I/O can't run. The whole system slows down. The solution is spawn_blocking. This moves the CPU work to a separate thread pool dedicated to blocking tasks. The async runtime stays free to handle I/O.

use tokio;

#[tokio::main]
async fn main() {
    // Fetch data from three sources concurrently.
    // These tasks run on the async runtime.
    let f1 = tokio::spawn(fetch_data("http://api1.com"));
    let f2 = tokio::spawn(fetch_data("http://api2.com"));
    let f3 = tokio::spawn(fetch_data("http://api3.com"));

    // Wait for all fetches to complete.
    // join! is efficient and avoids vector allocation.
    let (d1, d2, d3) = tokio::join!(f1, f2, f3);

    // Processing is heavy.
    // Don't block the async runtime with CPU work.
    // Use spawn_blocking to move this to a thread pool.
    let result = tokio::task::spawn_blocking(move || {
        heavy_compute(&d1.unwrap(), &d2.unwrap(), &d3.unwrap())
    }).await;
}

async fn fetch_data(url: &str) -> Result<Vec<u8>, Box<dyn std::error::Error>> {
    // Simulate network I/O.
    // The .await yields the thread while waiting.
    // The runtime switches to other tasks.
    tokio::time::sleep(std::time::Duration::from_millis(100)).await;
    Ok(vec![1, 2, 3])
}

fn heavy_compute(d1: &[u8], d2: &[u8], d3: &[u8]) -> i32 {
    // CPU bound work.
    // This runs on a dedicated thread, not the async executor.
    let mut sum = 0;
    for i in 0..1_000_000 {
        sum += d1[0] + d2[0] + d3[0];
    }
    sum
}

Convention aside: Keep spawn_blocking blocks small. The thread pool is shared. A huge blocking task can starve other tasks. If you have a long CPU job, break it into chunks or use a dedicated thread pool.

Pitfalls that trip everyone up

Blocking the async runtime is the cardinal sin. If you call a blocking function like std::thread::sleep or a synchronous file read inside an async function, you freeze the thread. All other tasks on that thread stop. The runtime can't recover. Your service hangs.

The compiler won't stop you from blocking. Rust doesn't know which functions block. You have to read the docs. If a function doesn't return a Future, it likely blocks. Wrap it in spawn_blocking or find an async alternative.

Another trap is holding locks across .await points. If you lock a std::sync::Mutex and then .await, you hold the lock while the task yields. Another task on the same thread tries to lock it and blocks. The first task never resumes because the thread is blocked. Deadlock.

Use tokio::sync::Mutex for async code. It's aware of the runtime and doesn't cause deadlocks. Or drop the lock before the .await.

The compiler helps with data sharing. If you try to move a type that isn't Send into a thread, you get E0277 (trait bound not satisfied). This stops you from sharing data across threads in ways that could cause data races. Trust the compiler here. If it rejects the code, the type isn't safe to send.

Never block the async runtime. If you block, you block the whole reactor.

Decision: picking the right tool

Use threads for CPU-bound tasks that require true parallelism across multiple cores. Use threads when you call external libraries that block the OS and you can't make them async. Use threads for long-running background workers that don't need to yield frequently.

Use async for I/O-bound tasks that spend most of their time waiting for the network, disk, or database. Use async when you need to handle thousands of concurrent connections with low memory overhead. Use async for reactive pipelines where data flows through stages and pauses frequently.

Use tokio::task::spawn_blocking when you must perform heavy computation inside an async function. Use spawn_blocking to wrap synchronous code that holds locks or performs CPU work, keeping the async runtime free to handle other tasks. Reach for async channels like tokio::sync::mpsc when you need to communicate between tasks without blocking.

Match the tool to the bottleneck. CPU bound? Threads. I/O bound? Async. Mixed? Both.

Where to go next

Think of threads as hiring multiple workers to do heavy lifting at the same time, while async is like one super-efficient worker who switches between tasks instantly whenever they have to wait for something. Use threads when your program is stuck doing math or processing data, and use async when your program is mostly waiting for files, databases, or the internet to respond.