How to run multiple futures concurrently

When waiting for one thing isn't enough

You are building a dashboard that shows a user's profile, their recent orders, and a list of recommended products. Each piece of data lives on a different server. You write three async functions to fetch them. If you await them one after another, the total time is the sum of all three network calls. If each call takes 100 milliseconds, the user waits 300 milliseconds. The CPU sits idle during every wait, burning time instead of making progress.

Rust gives you a way to shout all three requests at once and wait for the batch to return. You run the futures concurrently. The total time drops to roughly the duration of the slowest call. The user sees the dashboard in 100 milliseconds instead of 300.

This is the core benefit of async Rust. You can overlap I/O waits without spawning heavy OS threads. The language provides combinators to group futures together. The two most common tools are join for a fixed set of futures and join_all for a dynamic collection.

Don't chain awaits if the operations are independent. Parallelize the wait, not the work.

Futures are lazy recipes

A future in Rust is a lazy computation. Creating a future does not execute it. It only describes what will happen. The work starts when you .await the future, or when an executor polls it. This laziness is essential for concurrency. If futures ran immediately upon creation, you would have no control over when they start or how they interleave.

Think of a future like a recipe card. Writing the recipe doesn't cook the meal. Handing the card to a chef and asking them to start does. In Rust, the executor is the chef. When you await a future, you are telling the executor to poll that future. The executor checks if the future is ready. If it is, the future makes progress. If it is waiting for I/O, the executor moves on to another future.

join takes multiple futures and returns a new future that resolves only when all the input futures resolve. It bundles them into a single unit of work. You await the bundle, and the executor polls the children in turn.

use std::time::Duration;

/// Demonstrates running two futures concurrently with join.
fn main() {
    // trpl::block_on runs the async block to completion.
    // In real projects, you would use tokio::main or similar.
    trpl::block_on(async {
        // Create future A. It prints and sleeps.
        // The future is created but not started yet.
        let fut_a = async {
            println!("Task A started");
            trpl::sleep(Duration::from_millis(100)).await;
            println!("Task A finished");
        };

        // Create future B. Independent of A.
        let fut_b = async {
            println!("Task B started");
            trpl::sleep(Duration::from_millis(100)).await;
            println!("Task B finished");
        };

        // join bundles the futures.
        // Awaiting this polls both futures concurrently.
        // The executor switches between them as they yield.
        trpl::join(fut_a, fut_b).await;

        println!("Both tasks completed");
    });
}

The output shows both tasks starting before either finishes. The executor interleaves the sleep periods. If you had awaited fut_a and then fut_b separately, the total time would double. join collapses the wait time.

Convention aside: In production code with tokio, you will often see the join! macro instead of the join function. The macro accepts expressions directly and avoids creating intermediate variables. The function form is clearer for learning and works when futures are already bound to variables.

The executor is the traffic cop. It switches lanes only when a car stops.

How join orchestrates the wait

When you await a join, the executor does not run the futures to completion in sequence. It polls the first future. If the first future yields (because it is waiting for a timer, network, or disk), the executor immediately polls the second future. It continues this cycle until all futures are ready.

This polling loop happens inside the executor. Your code sees a single .await point. The interleaving is transparent. This is cooperative multitasking. Futures must yield control by awaiting something else. If a future runs a tight loop without yielding, it blocks the executor and starves the other futures in the join.

join returns a tuple containing the results of the input futures. The order of the tuple matches the order of the arguments. If you join three futures, you get a (ResultA, ResultB, ResultC). You can destructure the tuple directly in the await.

use std::time::Duration;

/// Fetches data from two simulated sources concurrently.
async fn fetch_profile() -> String {
    trpl::sleep(Duration::from_millis(50)).await;
    "User: Alice".to_string()
}

/// Fetches orders from a simulated database.
async fn fetch_orders() -> Vec<String> {
    trpl::sleep(Duration::from_millis(80)).await;
    vec!["Order #101".to_string(), "Order #102".to_string()]
}

fn main() {
    trpl::block_on(async {
        // Call the async functions to create futures.
        // The functions return futures, not the data yet.
        let profile_fut = fetch_profile();
        let orders_fut = fetch_orders();

        // Await the join.
        // The executor polls both futures concurrently.
        // When both complete, the tuple is returned.
        let (profile, orders) = trpl::join(profile_fut, orders_fut).await;

        println!("Profile: {}", profile);
        println!("Orders: {:?}", orders);
    });
}

The total runtime is approximately 80 milliseconds, determined by the slower fetch_orders. The faster fetch_profile finishes early and waits for its sibling. The tuple destructuring gives you clean access to both results without nested structures.

Structure your data flow around joins. It keeps related async operations grouped and the result types explicit.

Scaling up with join_all

join has a fixed arity. You can join two, three, or four futures, but you cannot join a variable number. If you have a vector of URLs to fetch, or a list of files to process, you need join_all.

join_all takes an iterator of futures and returns a future that resolves to a vector of results. It consumes the iterator, so you pass ownership of the futures into the combinator. The result vector preserves the order of the input futures. The first element of the result corresponds to the first future in the iterator, regardless of which one finished first.

use std::time::Duration;

/// Processes a batch of tasks concurrently.
fn main() {
    trpl::block_on(async {
        // Create a vector of futures.
        // Each future simulates work with a random delay.
        let futures: Vec<_> = (0..5)
            .map(|i| async move {
                let delay = Duration::from_millis((i * 20) as u64);
                trpl::sleep(delay).await;
                format!("Task {} done", i)
            })
            .collect();

        // join_all consumes the vector of futures.
        // It returns a future that yields a Vec<String>.
        let results = trpl::join_all(futures).await;

        // Results are in the same order as the input futures.
        // Task 0 might finish last, but it is still at index 0.
        for result in results {
            println!("{}", result);
        }
    });
}

The join_all combinator is efficient. It does not create intermediate allocations for each future. It manages the polling state internally. The result vector is allocated once and filled as futures complete.

Convention aside: join_all is the standard way to batch process async work. Avoid looping over a collection and awaiting each future individually. That reverts to sequential execution and defeats the purpose of async. Also, note that join_all requires all futures to have the same type. If you have heterogeneous futures, you must box them or use a trait object, though that adds overhead.

join_all is your batch processor. Feed it a list, get a list back.

A realistic dashboard fetch

In a real application, you often need to combine data from multiple sources and handle errors. join works well here because it returns a tuple, allowing you to handle each result individually. You can check for errors on one fetch without blocking the others.

use std::time::Duration;

/// Simulates a network fetch that might fail.
async fn fetch_user_data(user_id: u32) -> Result<String, String> {
    trpl::sleep(Duration::from_millis(50)).await;
    if user_id == 0 {
        Err("User not found".to_string())
    } else {
        Ok(format!("User {}", user_id))
    }
}

/// Simulates fetching preferences.
async fn fetch_preferences(user_id: u32) -> Result<String, String> {
    trpl::sleep(Duration::from_millis(30)).await;
    Ok(format!("Prefs for {}", user_id))
}

/// Combines user data and preferences concurrently.
async fn load_dashboard(user_id: u32) -> Result<String, String> {
    // Create futures for both fetches.
    let user_fut = fetch_user_data(user_id);
    let prefs_fut = fetch_preferences(user_id);

    // Await both concurrently.
    // If one fails, the other still completes.
    let (user_result, prefs_result) = trpl::join(user_fut, prefs_fut).await;

    // Handle errors individually.
    let user_data = user_result?;
    let prefs = prefs_result?;

    Ok(format!("Dashboard: {} with {}", user_data, prefs))
}

fn main() {
    trpl::block_on(async {
        match load_dashboard(42).await {
            Ok(dashboard) => println!("{}", dashboard),
            Err(e) => eprintln!("Error: {}", e),
        }
    });
}

The join ensures both fetches run in parallel. The error handling uses the ? operator on each result. If fetch_user_data fails, the ? propagates the error, but fetch_preferences has already completed in the background. This is more efficient than failing fast and cancelling the second fetch, unless the second fetch is expensive or has side effects.

Convention aside: In tokio, you can use the join! macro with expressions. let (user, prefs) = join!(fetch_user_data(42), fetch_preferences(42)).await; This is idiomatic and concise. The function form is useful when you need to store futures in variables or pass them to other functions.

Structure your error handling around the join. Let independent fetches complete, then aggregate the results.

The safety of structured concurrency

join and join_all implement structured concurrency. The lifecycle of the child futures is tied to the parent scope. If the outer async block is dropped, the join is dropped, and all child futures are cancelled. This prevents orphaned tasks from running indefinitely or leaking resources.

Contrast this with spawn. spawn detaches a future from the current scope. It runs independently. If the caller drops, the spawned task continues. This gives flexibility but adds responsibility. You must ensure the spawned task can handle cancellation or that it will finish before the program exits.

join is safer for most use cases. It guarantees that all work is cleaned up when the caller is done. It also makes reasoning about data flow easier. The data lives as long as the join, and the join lives as long as the scope.

join gives you automatic cleanup. spawn gives you responsibility.

Pitfalls and gotchas

Blocking the executor is the most common mistake. If a future inside a join executes a blocking operation, it halts the entire executor. Other futures in the join cannot make progress. This turns your concurrent code into sequential code and can deadlock the runtime.

If you must call blocking code, use spawn_blocking to move the work to a separate thread pool. This keeps the async executor free to poll other futures.

Another pitfall is assuming join cancels on error. join waits for all futures to complete. It does not short-circuit. If you want to cancel siblings when one fails, you need a different combinator like select or manual cancellation logic. join is for "all must finish" scenarios.

Compiler errors can appear when futures capture non-Send data. If you move a future across thread boundaries, it must implement Send. Some types, like Rc<T> or raw pointers, are not Send. If you try to join futures that capture non-Send data in a multi-threaded context, the compiler rejects the code with E0277 (trait bound not satisfied). The fix is to use Arc<T> instead of Rc<T>, or to ensure the data is thread-safe.

Convention aside: Check your captures. If you use async move, the closure captures variables by value. Ensure those values are Send if the executor might move the future to another thread. In single-threaded executors, Send is not required, but writing Send-friendly code makes your code portable.

A blocking future in a join is a traffic jam. Fix the block, don't just add more lanes.

Decision: picking the right combinator

Use join when you have a fixed set of futures and need all results. It returns a tuple, preserves order, and ties lifecycles to the caller. It is the default choice for combining a few independent async operations.

Use join_all when the number of futures varies or comes from a collection. It accepts an iterator and returns a vector. It is the standard tool for batch processing and dynamic workloads.

Use spawn when you want to fire off a task and continue without waiting for it immediately. It detaches the future from the current scope. Use this for background tasks, long-running workers, or when you need to handle the result later via a channel or oneshot.

Use select when you only care about the first future to complete. It polls multiple futures and returns as soon as one is ready. Use this for timeouts, race conditions, or when you want to cancel siblings on the first result.

Pick the combinator that matches your dependency graph.

Where to go next

Running multiple futures concurrently lets your program start multiple tasks at once without waiting for one to finish before starting the next. Think of it like a chef chopping vegetables and boiling water simultaneously instead of doing one after the other. You use join for a fixed pair of tasks and join_all when you have a dynamic list of tasks to run together.