How to do parallel file reads with async

Parallel file reads with async

You are building a tool that needs to load three configuration files before it starts. One for the database, one for the API keys, one for the user preferences. You write the code to read them one after another. It works, but it feels slow. You are staring at the terminal waiting for the disk to spin up three times when it could spin up once and do them all at the same time.

Rust's async runtime lets you initiate multiple file reads and wait for them all to finish without blocking the thread. The key is using tokio::spawn to create concurrent tasks and tokio::fs for non-blocking I/O.

The async kitchen

Async programming is not magic parallelism. It is efficient waiting.

Think of a kitchen with one chef and a stove with three burners. The chef needs to boil three pots of water.

In a sequential approach, the chef puts the first pot on the stove, stands there staring at it until it boils, takes it off, puts the second pot on, stares until it boils, and so on. The chef is busy, but the stove is underutilized. The total time is three times the boil time.

In an async approach, the chef puts all three pots on the stove at once. The chef sets timers and goes back to chopping vegetables. When a timer rings, the chef checks the pot. The chef never stands idle waiting for water to heat. The stove handles the heating in parallel. The chef handles the logic. The total time is roughly the time of the slowest pot.

In Rust, the chef is the runtime thread. The stove is the operating system kernel. The pots are file I/O operations. When you await a file read, you are telling the runtime: "I have asked the OS to read this file. I cannot do anything until the data arrives. Please switch to another task while I wait."

The runtime switches to another task. The OS continues reading the file in the background. When the OS finishes, it notifies the runtime. The runtime resumes your task with the data. The thread never blocks. It stays busy doing useful work.

Minimal example

Here is the pattern for reading two files concurrently.

use tokio::fs;

#[tokio::main]
async fn main() {
    // tokio::spawn creates a new task and schedules it on the runtime.
    // It returns a JoinHandle immediately, so this line does not block.
    let handle1 = tokio::spawn(async {
        // tokio::fs::read_to_string initiates the I/O and yields control
        // back to the runtime while waiting for the disk.
        fs::read_to_string("file1.txt").await.unwrap()
    });

    // Spawn the second task. Both tasks are now running concurrently.
    let handle2 = tokio::spawn(async {
        fs::read_to_string("file2.txt").await.unwrap()
    });

    // join! waits for both handles to complete.
    // It returns a tuple of the results in the same order.
    let (content1, content2) = tokio::join!(handle1, handle2);

    println!("Read {} bytes from file1", content1.len());
    println!("Read {} bytes from file2", content2.len());
}

The await inside the spawned tasks is the handoff. Without it, you are just writing sequential code with extra syntax. The runtime needs a yield point to switch tasks.

What happens under the hood

When you run this code, the sequence of events looks like this:

tokio::spawn wraps the async block in a task and pushes it onto the runtime's queue. The function returns a JoinHandle instantly.
The main task calls spawn again for the second file. Now there are two tasks in the queue.
The main task hits tokio::join!. This macro waits for both handles.
The runtime picks up the first task. It calls fs::read_to_string. The function asks the OS to read the file and returns a future that is not ready.
The runtime sees the future is not ready. It suspends the first task and marks it as waiting for I/O.
The runtime picks up the second task. It does the same thing. The OS starts reading the second file. The second task is suspended.
Both tasks are waiting. The runtime has no ready tasks. It waits for the OS to signal completion.
The OS finishes reading file1.txt. It signals the runtime.
The runtime resumes the first task. The future is now ready. The task completes and stores the result in handle1.
The OS finishes reading file2.txt. The runtime resumes the second task. It completes and stores the result in handle2.
join! sees both handles are done. It returns the tuple.

The thread was never blocked waiting for the disk. It was either running a task or waiting for the OS. The I/O operations overlapped in time.

Realistic batch processing

Real code rarely reads exactly two files. You usually have a list of paths, errors to handle, and dynamic behavior.

use tokio::fs;
use std::path::Path;

/// Reads multiple files concurrently and returns their contents.
/// Files that fail to read are skipped with a warning.
async fn read_batch(paths: &[&str]) -> Vec<String> {
    let mut handles = Vec::new();

    for path in paths {
        // We must own the path data because the task might outlive this function.
        // Converting to String gives us owned data.
        let path_owned = path.to_string();

        // spawn requires a 'static future. The async move block captures
        // path_owned by value, satisfying the ownership requirement.
        let handle = tokio::spawn(async move {
            fs::read_to_string(&path_owned).await
        });

        handles.push(handle);
    }

    let mut results = Vec::new();

    // Iterate over handles and collect successful reads.
    for handle in handles {
        // handle.await returns Result<Result<String, io::Error>, JoinError>.
        // The outer Result is from the task execution.
        // The inner Result is from the I/O operation.
        match handle.await {
            Ok(Ok(content)) => results.push(content),
            Ok(Err(io_err)) => eprintln!("IO error reading file: {}", io_err),
            Err(join_err) => eprintln!("Task panicked or was cancelled: {}", join_err),
        }
    }

    results
}

This example highlights three important details.

First, tokio::spawn requires the future to be 'static. This means the future cannot hold references to data that might be dropped. The task runs independently on the runtime and could execute after the calling function returns. You must pass owned data into the task. The path.to_string() call converts the borrowed string slice into an owned String that the task can take.

Second, spawn returns a JoinHandle<Result<T, JoinError>>. The inner Result comes from the closure you passed to spawn. The outer Result comes from the task execution itself. If the task panics, you get a JoinError. If the task completes normally, you get Ok(inner_result). You must handle both layers.

Third, the loop over handles collects results. The order of results matches the order of handles, not the order of completion. join! preserves order. Iterating handles preserves order. If you need results as they finish, you would use a different pattern like futures::stream::FuturesUnordered.

Handle the double Result. The outer one is the task; the inner one is the I/O. Ignore either at your peril.

Pitfalls and compiler traps

Async file I/O has specific traps that catch developers coming from synchronous code.

Using std::fs in async code

The most common mistake is using std::fs::read_to_string inside an async function. This function blocks the thread until the file is read. If you block the thread, the runtime cannot switch to other tasks. All your other async work stops. The concurrency vanishes.

Always use tokio::fs when you are in an async context. tokio::fs uses the OS's asynchronous I/O APIs or a thread pool to ensure the runtime thread never blocks.

Capturing references in spawn

If you try to capture a reference in a spawned task, the compiler rejects the code.

let data = String::from("hello");
// This fails. The closure captures a reference to data.
// spawn requires 'static, but data is dropped at the end of main.
let _handle = tokio::spawn(async {
    println!("{}", data);
});

The compiler emits E0373 (closure may outlive the current function) or a lifetime error. The fix is to move owned data into the closure. Clone the data if you need to keep it outside, or pass ownership into the task.

Panics in spawned tasks

If a spawned task panics, the panic does not crash the entire program. The runtime catches the panic and converts it into a JoinError. When you .await the handle, you get Err(JoinError). This is a safety feature. It prevents one bad task from killing the whole application. It also means you must check the JoinHandle result if you care about task failures.

Too many tasks

tokio::spawn is cheap, but not free. Each task consumes memory for its state and stack. If you spawn a task for every byte of a large file, you will run out of memory. Tasks are for logical units of work, not for micro-optimization. Use spawn for files, network requests, or independent computations. Do not use it to parallelize a tight loop over small data.

Never use std::fs in async code. You are paying for a race car and driving it in first gear.

Decision matrix

Use tokio::join! when you have a small, fixed number of async operations and you want to wait for all of them without spawning new tasks.

Use tokio::spawn when you need to run work concurrently with other tasks, handle a dynamic list of operations, or isolate potential panics from the main flow.

Use tokio::fs for all file I/O in async code to keep the runtime thread free for other work.

Use std::fs only when you are certain the operation is fast and you are in a sync context, or when you explicitly want to block the thread for a heavy operation wrapped in spawn_blocking.

Pick the tool that matches your concurrency shape. join! for the fixed batch. spawn for the dynamic swarm.

Where to go next

Parallel file reads with async let your program start reading multiple files at the same time without waiting for one to finish before starting the next. It is like having multiple people read different books simultaneously instead of one person reading them one by one. You use this when you need to fetch data from several sources quickly to improve overall speed.