How to Debug Async Code in Rust

The Heisenbug of Async

Your async HTTP server hangs silently. You add a println! inside the handler, and suddenly it works. You remove the print, and it hangs again. Or worse, you get a panic, but the stack trace is a wall of core::future::poll_with_tls_context and tokio::runtime::task::core frames that tell you nothing about your actual code. Debugging async Rust feels like trying to trace a conversation in a crowded room where everyone talks at once. The execution flow jumps around, tasks get polled, and state changes in ways that are hard to follow with a standard debugger.

The print statement changes the timing of the program. It introduces a delay that masks a race condition or allows a lock to be released. This is the classic Heisenbug. The act of observation changes the behavior. Async code amplifies this because the executor schedules tasks based on readiness. Adding I/O, even a print to stdout, can shift when tasks are polled, revealing or hiding bugs that depend on precise timing.

Async is a State Machine

Async code in Rust compiles to state machines. When you write await, the compiler turns your function into a struct with fields for every local variable and a state enum that tracks where you left off. The runtime polls this state machine. If the work isn't done, it yields control. If it is done, it advances the state. Debugging async code means you aren't just looking at a linear stack; you're looking at a collection of state machines being juggled by the runtime.

Think of it like a theater production. The actors (tasks) perform scenes, but they pause whenever they need a prop that isn't ready (awaiting I/O). The stage manager (the executor) checks in on each actor to see if they can continue. If you want to understand why the play stopped, you need to check which actor is waiting for which prop, not just look at the script. A standard stack trace shows you the current scene, but it doesn't show you the queue of actors waiting backstage or the props they're holding. You need tools that visualize the state of the entire production.

Minimal Example: Backtraces

The first step in debugging any Rust program is enabling backtraces. The runtime suppresses stack traces by default to save performance. Without a backtrace, a panic gives you a message but no context. Set the environment variable RUST_BACKTRACE=1 to force the runtime to print the stack.

use tokio;

/// Simulates a task that panics to demonstrate backtrace capture.
async fn fetch_data() -> String {
    // Simulate some async work.
    tokio::time::sleep(std::time::Duration::from_millis(10)).await;
    
    // This panic will trigger the backtrace.
    panic!("Something went wrong in fetch_data");
}

#[tokio::main]
async fn main() {
    // Run the task.
    let result = fetch_data().await;
    println!("Got: {}", result);
}

Run this with RUST_BACKTRACE=1 cargo run. The output includes the panic message and a stack trace. You'll see frames from your code mixed with frames from the standard library and the runtime. Scan past the runtime noise. Look for the frames that match your file names and function names. The backtrace shows the call stack at the moment of the panic. In async code, this stack represents the path through the state machine that led to the error.

Convention aside: Use RUST_BACKTRACE=full instead of 1 if you need to see inlined functions. The full value includes frames that the compiler inlined, which can be crucial when debugging optimized code or when the error originates inside a macro. The community standard is 1 for daily work and full when the trace is missing key frames.

Don't ignore the noise. Scan past the runtime frames until you find your code. The answer is always there.

Realistic Example: Tracing and Timeouts

Backtraces help with panics, but they don't help with hangs or logic errors. For those, you need structured logging. The community standard is the tracing crate. It provides spans and events that create a hierarchy of log data. Spans represent a period of time and a context. Events are points-in-time logs attached to a span. This structure lets you correlate logs across await points and multiple tasks.

use tokio::time::{timeout, Duration};
use tracing;

/// Fetches data with tracing spans to visualize async flow.
async fn process_request(id: u32) {
    // Create a span to group log events for this request.
    let span = tracing::info_span!("process_request", id);
    let _enter = span.enter();

    tracing::info!("Starting request processing");
    
    // Wrap the future in a timeout to detect hangs.
    let result = timeout(Duration::from_secs(5), query_db(id)).await;
    
    match result {
        Ok(data) => {
            tracing::debug!("Database query returned");
            let processed = transform(data);
            tracing::info!("Request complete with data: {}", processed);
        }
        Err(_) => {
            tracing::error!("Database query timed out. Task may be blocked.");
        }
    }
}

async fn query_db(id: u32) -> String {
    // Simulate async I/O.
    tokio::time::sleep(std::time::Duration::from_millis(50)).await;
    format!("data_{}", id)
}

fn transform(data: String) -> String {
    format!("processed_{}", data)
}

#[tokio::main]
async fn main() {
    // Initialize the tracing subscriber to output logs.
    tracing_subscriber::fmt::init();
    
    process_request(42).await;
}

The timeout function is a critical debugging tool. It wraps a future and returns an error if the future doesn't complete within the specified duration. This turns a silent hang into an actionable error. If the timeout fires, you know the future is stuck. You can then narrow down the investigation to the code inside that future.

Spans create a breadcrumb trail. When you log an event, it's associated with the current span. The subscriber prints the span hierarchy, so you can see the path taken. If a request spans multiple awaits and tasks, the span ID links the logs together. This is far more powerful than println!, which produces unstructured text that is hard to filter and correlate.

Convention aside: Use tracing for all new projects. The log crate is legacy. tracing supports async natively with spans, while log is purely event-based and lacks context tracking. The ecosystem has moved to tracing. Stick with it.

Timeouts turn silent hangs into actionable errors. Trust the span hierarchy. It reveals the flow the stack trace hides.

Pitfalls and Compiler Errors

Async code introduces specific pitfalls that the compiler catches at compile time. Understanding these errors saves hours of runtime debugging.

If you try to spawn a task that holds a non-Send reference, the compiler rejects it with E0277 (the trait bound is not satisfied). The tokio::task::spawn function requires the future to be Send, meaning it can be moved across thread boundaries. If your future captures a Rc or a raw pointer, it fails this bound. Rc is not Send because its reference counting uses atomic operations that are not thread-safe in the same way Arc is. Replace Rc with Arc to fix this. Arc provides atomic reference counting and is Send.

Convention aside: Use Arc for shared ownership in async code. Rc is for single-threaded contexts. The moment you spawn a task or use a multi-threaded runtime, Rc becomes a liability. The community calls this the "Arc rule" for async. Reach for Arc by default.

Another common error is E0502 (cannot borrow as mutable because it is also borrowed as immutable). This happens when you try to mutate shared state across an await point without proper synchronization. The borrow checker sees that the state might be accessed by other tasks while you hold a mutable reference. Use Mutex or RwLock inside an Arc to protect shared state. The lock serializes access, satisfying the borrow checker.

Calling tokio::task::block_on inside an async function is a deadlock risk. The block_on function runs a future to completion on the current thread. If the executor is single-threaded, blocking the thread prevents the future from making progress. The runtime detects this in debug mode and panics. In release mode, you get a silent hang. Use block_on only to bridge synchronous code into an async context, such as in a main function that isn't async or in a callback from a synchronous library.

The compiler rejects blocking calls with a clear error in debug mode. In release mode, the hang is your only clue. Always test with debug assertions enabled when working with executors.

Treat the Send bound as a contract. If the compiler rejects it, your code cannot run on a multi-threaded runtime. Fix the data structure, don't fight the bound.

Decision: Debugging Tools

Use RUST_BACKTRACE=1 when you need to see the stack trace on a panic. This is the baseline for any crash investigation. Enable it in your development environment permanently.

Use tracing spans and events when you need to understand the flow of execution across multiple tasks and await points. Structured logs let you filter and correlate activity without cluttering the output. Instrument your code with spans at logical boundaries.

Use a debugger like lldb or gdb with rust-gdb when you need to inspect memory state or step through code line by line. This works best for synchronous sections or when you pause execution at a specific point. Async state machines are harder to step through, but debuggers can still show the current state of the future struct.

Use tokio-console when you are running a production workload and need to monitor task health, latency, and blocking behavior in real time. This tool connects to your runtime and visualizes the task graph. It detects blocking tasks and shows the distribution of task durations.

Use tokio::time::timeout when you suspect a future is hanging. Wrap the future in a timeout to convert a hang into an error. This helps isolate the blocking code.

Use println! only for quick, throwaway debugging in small scripts. It lacks structure and performance overhead makes it unsuitable for production or complex async flows. Switch to tracing as soon as the script grows.

Pick the tool that matches the symptom. A panic needs a backtrace. A hang needs a timeout or profiler. A logic error needs structured logs.

Where to go next

Debugging async code means finding where your program gets stuck or crashes while waiting for tasks to finish. You enable detailed error reports and add print statements at waiting points to see exactly what is happening. It is like putting a flashlight on a dark hallway to see where you tripped.