How to Write Tower-Style Middleware in Rust

Tower-style middleware in Rust is implemented by creating a struct that wraps a service and implements the `Service` trait to intercept requests before and after they reach the inner service.

The checkpoint pattern

You are building a web service. Every request needs a correlation ID. Every request needs its latency logged. Every request needs a database connection checked before it touches your business logic. Copying that setup code into every route handler turns your application into a maintenance nightmare. You want a single place to intercept the request, do the prep work, hand it off to the handler, and then catch the response on the way out.

Tower solves this with middleware. Think of a middleware layer as a customs checkpoint for your data. The checkpoint does not know what is inside the package. It just knows how to stamp the paperwork before it moves forward, and how to inspect the receipt when it comes back. In Rust, that checkpoint is the Service trait. Any type that implements Service can be wrapped by another Service. The wrapper becomes the new entry point. The original service becomes the inner service. You chain them together, and requests flow through the stack.

Why you need a future wrapper

Async functions in Rust desugar into state machines. When you write async fn handle(req) -> Response, the compiler generates a struct that tracks where execution paused, what values are stored, and how to resume when a dependency finishes. Middleware needs to run code before that state machine starts, and code after it finishes. You cannot achieve that by simply returning the inner future directly. You need a container that captures the inner future, polls it, and yields control back to the executor when it is ready.

That container is a custom Future implementation. It holds the inner future as a field. It implements poll to delegate work. When the inner future returns Poll::Ready, your wrapper gets a single chance to transform the result, log metrics, or handle errors before passing it up the chain.

The minimal skeleton

The foundation is a struct that holds your inner service. You also need the future wrapper struct. The Service trait requires three associated types: Response, Error, and Future. The first two usually match the inner service. The third points to your custom future type.

use tower::Service;
use std::future::Future;
use std::pin::Pin;
use std::task::{Context, Poll};

/// Wraps an inner service to add pre and post processing steps.
struct LoggingMiddleware<S> {
    inner: S,
}

/// Holds the future returned by the inner service.
struct LoggingFuture<F> {
    inner: F,
}

impl<F> Future for LoggingFuture<F>
where
    F: Future,
{
    type Output = F::Output;

    fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        // Forward the poll to the inner future.
        // Pin::new is safe here because `self.inner` is never moved after construction.
        Pin::new(&mut self.inner).poll(cx)
    }
}

impl<S, Req> Service<Req> for LoggingMiddleware<S>
where
    S: Service<Req>,
{
    type Response = S::Response;
    type Error = S::Error;
    type Future = LoggingFuture<S::Future>;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        // Check if the inner service is ready to accept a request.
        self.inner.poll_ready(cx)
    }

    fn call(&mut self, req: Req) -> Self::Future {
        // Pre-processing happens here.
        println!("Incoming request");
        let inner_future = self.inner.call(req);
        // Wrap and return. Post-processing happens inside the Future impl.
        LoggingFuture { inner: inner_future }
    }
}

Keep the wrapper thin. The Service impl handles the synchronous setup. The Future impl handles the asynchronous lifecycle. Do not mix them.

How the polling loop actually works

The Service trait has two methods. poll_ready asks the service if it can handle a new request right now. This is how backpressure works. If your database connection pool is exhausted, poll_ready returns Poll::Pending. The executor waits until the pool frees up, then calls call again. Your middleware must forward this check to the inner service. If you skip it, you will overwhelm downstream resources.

The call method takes the request and returns a future. That future represents the entire lifecycle of the request. In the example above, call prints a message and wraps the inner future. The real work happens when the executor polls LoggingFuture. The poll method delegates to the inner future. When the inner future returns Poll::Ready(value), that is your hook for post-processing. You can transform the response, log the result, or update metrics before returning Poll::Ready(transformed_value).

The Context parameter carries wakers. A waker tells the executor how to resume this future when an external event completes. You rarely interact with wakers directly. You just pass cx down to the inner future. The executor handles the threading and scheduling. Trust the borrow checker here. It ensures you do not hold references across await points that could dangle.

Forward poll_ready every time. Backpressure is not optional.

Adding real behavior

Let's add actual latency tracking and response modification. We will capture a timestamp in call, wait for the inner service to finish, calculate the duration, and attach a header to the response. We will also show how to handle errors gracefully.

use std::time::Instant;
use std::future::Future;
use std::pin::Pin;
use std::task::{Context, Poll};
use tower::Service;

struct LatencyMiddleware<S> {
    inner: S,
}

struct LatencyFuture<F> {
    inner: F,
    start: Instant,
}

impl<F> Future for LatencyFuture<F>
where
    F: Future,
{
    type Output = F::Output;

    fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        match Pin::new(&mut self.inner).poll(cx) {
            Poll::Ready(Ok(response)) => {
                // Post-processing: measure duration and modify success response.
                let duration = self.start.elapsed();
                println!("Request completed in {:?}", duration);
                // Attach latency header or modify metadata here.
                Poll::Ready(Ok(response))
            }
            Poll::Ready(Err(e)) => {
                // Post-processing: log the error and optionally wrap it.
                let duration = self.start.elapsed();
                eprintln!("Request failed in {:?}: {:?}", duration, e);
                Poll::Ready(Err(e))
            }
            Poll::Pending => Poll::Pending,
        }
    }
}

impl<S, Req> Service<Req> for LatencyMiddleware<S>
where
    S: Service<Req>,
{
    type Response = S::Response;
    type Error = S::Error;
    type Future = LatencyFuture<S::Future>;

    fn poll_ready(&mut self, cx: &mut Context<'_>) -> Poll<Result<(), Self::Error>> {
        self.inner.poll_ready(cx)
    }

    fn call(&mut self, req: Req) -> Self::Future {
        // Pre-processing: record the start time.
        let start = Instant::now();
        let inner_future = self.inner.call(req);
        LatencyFuture { inner: inner_future, start }
    }
}

The future struct now carries state. start lives alongside the inner future. When the inner future resolves, the match block runs exactly once. That single execution point is where you transform responses or handle errors. If the inner service returns an error, you can catch it here, log it, and return a graceful fallback response instead of propagating the failure.

Do not mutate the request type inside call. Keep the request shape identical across the stack and attach metadata using context objects or extension traits.

Where things break

Writing middleware by hand exposes you to a few sharp edges. The most common is forgetting to forward poll_ready. If your middleware ignores backpressure, you will queue up requests faster than your inner service can drain them. Memory usage spikes, and latency degrades. Always delegate poll_ready unless you are implementing a custom rate limiter.

Another trap is capturing non-Send data in your future. Tower services often run across thread boundaries. If your middleware struct or future holds an Rc, a raw pointer, or a reference tied to the current stack frame, the compiler will reject it with E0277 (trait bound not satisfied). The executor expects Send + 'static futures. Keep your middleware state behind Arc or Mutex if multiple threads will share it. If you only need single-threaded execution, stick to Rc and RefCell, but document that constraint clearly.

You will also run into lifetime friction when trying to modify the request type. If your middleware changes Req to ModifiedReq, the inner service must accept ModifiedReq. Type mismatches trigger E0308. The cleanest pattern is to keep the request type identical across the stack and attach metadata using a context object or extension traits. Do not force the type system to juggle different request shapes unless you are building a dedicated transformation layer.

The community names future wrappers after the middleware they belong to. LoggingFuture, AuthFuture, RateLimitFuture. This keeps the namespace clean and makes stack traces readable. You will also see tower::BoxService used to erase concrete types when the middleware chain gets too long for the compiler to track. Type erasure costs a tiny amount of performance but saves you from fighting E0277 on deeply nested generic bounds. Use it at the boundaries of your application, not in hot paths.

Treat the Send bound as a contract. If your middleware cannot cross threads, say so upfront.

Choosing your approach

Use raw Service implementations when you need fine-grained control over the polling loop or when you are building a foundational library component. Use tower::ServiceBuilder when you are stacking standard middleware like timeouts, retries, or compression. The builder pattern handles the type wiring and future wrapping automatically. Use tower::layer::Layer when you want to apply the same middleware to multiple services without repeating the wrapper struct. Reach for tower::ServiceExt when you need to call a service once from synchronous code or when you want to chain one-off operations like map_request or map_response.

Pick the tool that matches your composition depth. Do not write a trait impl when a builder method does the job.

Where to go next