The scaling wall
You are building a web scraper. You write a loop that fetches a URL, parses the HTML, and extracts the title. In Python, you reach for threading. You create a thread for every URL. At 100 URLs, your memory usage spikes. Each thread reserves a stack, and the OS has to manage thousands of execution contexts. At 1,000 URLs, the system starts swapping. The CPU spends more time switching threads than processing data. You hit a wall.
You switch to asyncio. You rewrite the loop with async and await. Suddenly you handle 10,000 URLs on a single core. Memory usage stays flat. The CPU stays cool. The code looks similar, but the execution model is completely different.
Rust gives you both tools. You can use OS threads for heavy lifting. You can use async tasks for high-concurrency I/O. The choice determines how your code scales, how the compiler checks your safety, and how you structure your application.
Threads vs tasks: the mental model
Threads are managed by the operating system. The OS allocates a stack, schedules the thread on a CPU core, and handles context switches. A context switch saves the registers of the current thread and loads the registers of the next one. This costs time. Each thread also reserves a fixed amount of memory for its stack, usually 2MB on Linux. If you spawn 10,000 threads, you commit 20GB of virtual memory. The OS has to track every thread. Scheduling thousands of threads causes thrashing. The CPU spends cycles saving and restoring state instead of executing your code.
Async tasks are managed by a runtime library in user space. The runtime maintains a pool of OS threads, typically one per CPU core. It schedules thousands of tasks across those threads. Tasks yield control voluntarily. When a task waits for I/O, it tells the runtime, "I am stuck. Wake me up when the data arrives." The runtime switches to another task on the same thread. No OS context switch. No stack swap. Just a function call return and a resume.
Think of threads as dedicated lanes on a highway. The OS manages the traffic lights. If a car stops, the lane is blocked. Async tasks are like a single driver juggling multiple balls. If one ball needs to bounce, the driver tosses it aside and keeps juggling the others. When the bounce comes back, the driver catches it and resumes. The driver never stops moving.
How async actually works
Async is not magic. It is code generation. The compiler transforms async fn into a state machine. This is the key insight.
An async fn returns a type that implements the Future trait. The compiler generates a struct that holds all local variables and an enum representing the current state. The Future trait has a poll method. The runtime calls poll to advance the task. If the operation is ready, poll returns Poll::Ready with the result. If not, poll registers a callback and returns Poll::Pending. The runtime stores the future and calls poll again when the callback fires.
This cooperative model means tasks must yield. If a task never yields, it blocks the thread. The runtime cannot run other tasks on that thread. This is why blocking operations are dangerous in async code.
use std::time::Duration;
use std::future::Future;
use std::pin::Pin;
use std::task::{Context, Poll};
/// A minimal example of an async function.
/// The compiler rewrites this into a state machine.
async fn fetch_data(id: u32) -> String {
// Simulate network delay without blocking the thread.
// The .await point yields control to the runtime.
tokio::time::sleep(Duration::from_millis(100)).await;
format!("Data for {}", id)
}
fn main() {
// Create the future. Nothing runs yet.
let future = fetch_data(42);
// In real code, you would pass this to a runtime.
// The runtime polls the future until it returns Ready.
println!("Future created: {:?}", std::any::type_name_of_val(&future));
}
Convention aside: The community treats Rc as forbidden in async code that crosses threads. Rc is not Send. If you try to spawn a task holding an Rc, the compiler rejects you with E0277 (type does not implement Send). Use Arc instead. Arc is atomic reference counting and is Send. This rule keeps your code portable across runtimes.
The state machine under the hood
The compiler generates a struct for each async fn. The struct contains fields for every local variable and a state enum. The state enum tracks which .await point the task is currently at.
When poll is called, the compiler generates a match on the state. Each arm corresponds to a segment of code between .await points. The arm executes until it hits the next .await. If the inner future returns Pending, the compiler saves the current state and returns Pending. If it returns Ready, the compiler extracts the value, updates local variables, and advances to the next state.
This transformation allows the async function to suspend and resume while keeping all local variables alive. The variables are stored in the struct, not on the stack. This is why async functions can return values that outlive the function call.
The overhead is small. The state machine adds a few bytes per task and a match statement per poll. The benefit is massive. You can run millions of tasks on a few threads. The memory footprint is proportional to the data, not a fixed stack.
Realistic example: concurrent fetching
In a real application, you use a runtime like Tokio to manage tasks. Tokio provides task::spawn to create new tasks. Each task runs concurrently on the runtime's thread pool.
use tokio::task;
/// Simulates fetching data from a remote server.
/// This function yields control when waiting for the network.
async fn fetch_data(id: u32) -> String {
// Yield to the runtime for 100ms.
// Other tasks can run on this thread during the sleep.
tokio::time::sleep(std::time::Duration::from_millis(100)).await;
format!("Data for {}", id)
}
/// Processes a batch of IDs concurrently using async tasks.
/// Returns the results in the same order as the input.
async fn process_batch(ids: Vec<u32>) -> Vec<String> {
let mut handles = Vec::new();
// Spawn a task for each ID.
// task::spawn returns a JoinHandle, not the result directly.
for id in ids {
let handle = task::spawn(async move {
// Capture id by move.
// The task owns the data it needs.
fetch_data(id).await
});
handles.push(handle);
}
// Collect results.
// Await each handle to get the result or error.
let mut results = Vec::new();
for handle in handles {
match handle.await {
Ok(data) => results.push(data),
Err(e) => eprintln!("Task failed: {}", e),
}
}
results
}
#[tokio::main]
async fn main() {
let ids = vec![1, 2, 3, 4, 5];
let results = process_batch(ids).await;
for result in results {
println!("{}", result);
}
}
Convention aside: The #[tokio::main] attribute is the standard entry point for Tokio applications. It sets up the runtime and runs the async main function. Do not write your own reactor. Use the runtime's entry point to ensure proper initialization and shutdown.
Pitfalls and compiler errors
Blocking the runtime is the most common mistake. If you call a blocking function like std::thread::sleep inside an async task, you freeze the thread. The runtime cannot run other tasks on that thread. If the runtime uses one thread per core, you just killed a core. The solution is tokio::task::spawn_blocking. This moves the blocking work to a dedicated thread pool. The async task yields while the blocking work runs. When the work finishes, the result is sent back to the async task.
Holding a lock across .await causes deadlocks. If you hold a Mutex guard and then .await, the task yields. The guard is dropped when the task resumes. If the runtime moves the task to another thread, the guard might be dropped on a different thread than it was acquired. This violates the Mutex contract. The compiler catches some cases, but not all. Always drop locks before .await.
Async tasks are mobile. The runtime can move a task from one thread to another. This requires the task to be Send. If your task holds a reference to data that is not Send, the runtime cannot move it. The compiler enforces this. You get E0277 if you try to spawn a non-Send future. This rule prevents data races. It forces you to use Arc and Mutex for shared state.
If you capture a Rc in a spawned task, the compiler rejects you with E0277 because Rc is not Send. You need Arc. If you capture a reference with a lifetime, the compiler rejects you with E0373 or similar lifetime errors. You need to own the data or use a thread-safe reference.
Decision matrix
Use threads for CPU-bound work where you need to saturate all cores and the work involves heavy computation without waiting. Use threads when you must call blocking C libraries that cannot be adapted to async. Use async tasks for high-concurrency I/O where you have thousands of connections and most time is spent waiting for the network or disk. Use async tasks when you need to structure concurrent code with clear suspension points and want to avoid the overhead of OS context switches. Reach for spawn_blocking when you have a small blocking operation inside an async context and need to offload it to a dedicated thread pool.
Async is not a replacement for threads. It is a different tool for a different job. Threads excel at parallelism. Async excels at concurrency. Pick the tool that matches the bottleneck. CPU bound? Threads. I/O bound? Async. Mixed? Use both, with spawn_blocking as the bridge.
Never block the runtime. If you must block, isolate it. Trust the borrow checker. It usually has a point.