How to use tokio tracing for async debugging

When logs become a tangled mess

You are building an async service. It works perfectly on your machine. In staging, it occasionally drops a request or hangs. You sprinkle println! everywhere, deploy, and get a wall of text where logs from three different requests are mashed together. You cannot tell which log belongs to which user. The output is a scrambled mess of interleaved messages. Debugging feels like trying to read a conversation where everyone is shouting at once.

The problem is not the volume of logs. The problem is the lack of context. Flat logs have no structure. They do not know which request is active. They do not track the flow of execution across .await points. When tasks yield and resume, the call stack vanishes, and with it, your ability to reason about what is happening.

tracing solves this by replacing flat logs with a hierarchy. It gives you spans and events. Spans track time intervals and context. Events capture point-in-time observations. Together, they form a structured trace that survives across async boundaries. You get back the ability to follow a single request through a web of concurrent tasks.

Spans, events, and the async call stack

Think of a span as a call frame that lives longer than a function call. A span represents a period of time, such as handling a request, processing a message, or executing a database query. Spans nest inside each other. When you enter a span, it becomes the active context. When you emit an event, the event inherits the context of all active spans.

An event is a single observation. It is like a log message, but it carries structured data and belongs to a specific point in the span hierarchy. You use events to record errors, warnings, or informational messages. The event knows exactly which spans were active when it happened.

In synchronous code, the call stack provides this context automatically. The stack tells you which functions are currently executing. In async code, the call stack is useless for debugging flow. Tasks yield control at .await points. The stack unwinds and rebuilds as tasks resume. The stack does not preserve the logical flow of a request.

tracing provides a span stack. When a task starts, you open a span. When the task yields, the span pauses but remains open. When the task resumes, the span continues. The span stack tracks the logical flow, not the mechanical stack. This is the key difference. The call stack lies in async code. Spans tell the truth.

Minimal setup and instrumentation

You need two crates: tracing for the API and tracing-subscriber for the implementation. tracing defines the macros and types. tracing-subscriber collects the data and formats it. This separation allows you to swap subscribers without changing your instrumentation code.

Add the dependencies to your Cargo.toml.

[dependencies]
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
tracing = "0.1"
tracing-subscriber = "0.3"

Initialize the subscriber in main before running any async tasks. The subscriber installs a global collector. All events and spans flow through this collector. If you forget to initialize, events go nowhere.

use tracing::{info, instrument, span, Level};

#[tokio::main]
async fn main() {
    // Install the default subscriber to capture and format events.
    // This must run before any tracing macros are called.
    tracing_subscriber::fmt::init();

    // Create a root span for the application lifecycle.
    let app_span = span!(Level::INFO, "app-start");
    let _guard = app_span.enter();

    // Run some async work.
    process_request("req-1").await;
    process_request("req-2").await;
}

// The instrument macro automatically creates a span for this function.
// It records the function name and arguments as span fields.
#[instrument]
async fn process_request(id: &str) {
    // Emit an event inside the span.
    info!("Processing request");
    
    // Simulate async work.
    tokio::time::sleep(tokio::time::Duration::from_millis(100)).await;
    
    info!("Done");
}

The #[instrument] macro does the heavy lifting. It wraps the function body in a span named after the function. It records the arguments as span fields. When the function returns, the span closes. The info! macro emits an event. The subscriber sees the event, looks up the active span stack, and prints the event along with the span context.

Run this code. You will see output like:

2024-01-01T00:00:00.000000Z  INFO app-start: process_request: Processing request
2024-01-01T00:00:00.100000Z  INFO app-start: process_request: Done

The hierarchy is visible. The output shows app-start containing process_request. The events are tagged with their parent spans. You can see the flow. Run it. Watch the hierarchy appear. That is the power of spans.

Structured fields and convention

Logs become useful when you can query them. Flat text logs are hard to search. tracing supports structured fields. You can attach key-value pairs to spans and events. This enables downstream tools to parse and index your logs.

Use the key = value syntax in macros.

use tracing::{info, instrument, warn};

#[instrument]
async fn fetch_user(user_id: u64) {
    // Attach structured data to the event.
    info!(user_id, "Fetching user");
    
    // Simulate work.
    tokio::time::sleep(tokio::time::Duration::from_millis(50)).await;
    
    // Conditional warning with structured data.
    if user_id > 1000 {
        warn!(user_id, latency_ms = 50, "High latency detected");
    }
}

The user_id shorthand records the variable user_id as a field. You can also use explicit key = value pairs. The subscriber formats these fields. The default formatter prints them as key-value pairs. You can swap the subscriber to output JSON, which is essential for log aggregation systems.

Convention aside: always use structured fields for identifiers and metrics. Never embed IDs in the message string. Write info!(user_id, "Login") instead of info!("User {} logged in", user_id). The structured form allows you to filter and aggregate by user_id without regex parsing. It also handles types correctly. The string form requires formatting, which adds overhead and loses type information.

Realistic async instrumentation

Real applications have nested operations, errors, and concurrent tasks. You need to handle these cases correctly.

use tracing::{error, info, instrument, warn, Level};
use std::io;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    tracing_subscriber::fmt::init();
    
    // Simulate concurrent requests.
    let handle1 = tokio::spawn(handle_connection("conn-1"));
    let handle2 = tokio::spawn(handle_connection("conn-2"));
    
    // Await results.
    handle1.await??;
    handle2.await??;
    
    Ok(())
}

// Skip the stream argument because it does not implement Debug.
// Logging non-Debug types causes a compile error.
#[instrument(skip(stream))]
async fn handle_connection(id: &str) -> Result<(), io::Error> {
    info!("Connection established");
    
    // Simulate a nested operation.
    let data = fetch_data(id).await?;
    
    // Log the result with structured data.
    info!(data_len = data.len(), "Data fetched");
    
    Ok(())
}

#[instrument]
async fn fetch_data(id: &str) -> Result<Vec<u8>, io::Error> {
    tokio::time::sleep(tokio::time::Duration::from_millis(50)).await;
    
    // Simulate failure for one request.
    if id == "conn-2" {
        error!(id, "Simulated failure");
        return Err(io::Error::new(io::ErrorKind::Other, "timeout"));
    }
    
    Ok(vec![1, 2, 3])
}

The skip(stream) attribute in #[instrument] is critical. The macro tries to log all arguments. If an argument does not implement Debug, the compiler rejects it. You will see a trait bound error complaining that the type does not implement std::fmt::Debug. The macro requires Debug to format arguments. Use skip to exclude arguments that cannot be logged.

Convention aside: always skip types that do not implement Debug, such as raw pointers, mutex guards, or streams. Also skip large buffers. Logging a megabyte of data will flood your logs and hurt performance. Use skip liberally. Log only what you need.

The code also shows nested spans. handle_connection calls fetch_data. Both functions have #[instrument]. The spans nest automatically. The output shows the hierarchy. You can see handle_connection containing fetch_data. The error event in fetch_data is tagged with both spans. This makes it easy to trace the failure back to the original request.

Pitfalls and compiler errors

Instrumentation has traps. Avoid them by understanding how spans and tasks interact.

Spans do not cross tokio::spawn boundaries automatically. If you create a span in main and then spawn a task, the task runs without that span context. The span stays in the parent task. The child task has no idea about the parent span. You lose the trail.

Fix this by attaching the span to the task. Use #[instrument] on the spawned function. Or use task.instrument(span.or_current()). The or_current() method attaches the span if one is active, or does nothing otherwise. This ensures the span context flows into the task. If the span does not cross the spawn, the context is lost. Instrument the task, or lose the trail.

Another pitfall is span explosion. Creating too many spans hurts performance. Each span has overhead. The subscriber must track span state. If you instrument every function, you create thousands of spans. The overhead adds up. Instrument only logical units of work. Skip trivial functions. Use manual spans for coarse-grained tracking when needed.

Performance aside: tracing is zero-cost when disabled. The macros check a filter before doing any work. If the filter rejects the event, the macro compiles to nothing. You can enable tracing in development and disable it in production with no runtime cost. Use the RUST_LOG environment variable to control filtering. Set RUST_LOG=info to see only info and above. Set RUST_LOG=debug to see debug messages. The filter runs at compile time and runtime. It prevents unnecessary work.

Compiler errors can also trip you up. If you use #[instrument] on a function with a non-Debug argument, the compiler rejects it with E0277 (trait bound not satisfied). The error points to the macro expansion. It complains that the type does not implement Debug. Add skip(arg_name) to fix it. If you forget tracing_subscriber::fmt::init(), you get no output. The events are generated but discarded. There is no compiler error. The code runs silently. Always initialize the subscriber.

When to use tracing versus alternatives

Pick the right tool for your complexity. Not every project needs spans.

Use println! for quick scripts where you run the code once and read the output immediately.

Use the log crate when you are writing a library that needs to emit messages but does not want to force a tracing dependency on users.

Use tracing for async applications where you need to track request flow across .await points and correlate logs from concurrent tasks.

Use #[instrument] for functions that represent a logical unit of work, especially when you want to capture arguments and duration automatically.

Use manual span! and enter() when you need fine-grained control over span lifetimes, such as spanning across multiple function calls or conditional logic.

Use tracing-subscriber env filter when you need dynamic filtering by level or target.

Use JSON formatting when you send logs to an aggregation system like Elasticsearch or Datadog.

Pick the tool that matches your complexity. Async needs spans.

Where to go next

Tracing acts like a dashboard for your async code, showing exactly when tasks start, finish, and what data they are processing. It matters because async code runs in parallel, making it hard to see where delays or errors happen without a timeline. Think of it as a flight recorder for your program that logs every step so you can replay the sequence of events.