How to add rate limiting to Rust API

Your API is live. Then the hammering starts.

Your endpoint works. The tests pass. You deploy to production and watch the logs roll in. Traffic is normal. Then a script starts hitting your login route 500 times a second. Your database locks up. Legitimate users get timeouts. You didn't build a rate limiter, so your server treats every request as a valid guest. It processes them all until resources run out.

Rate limiting stops this before it reaches your business logic. It tracks how often a client makes requests and rejects the excess. The client gets a 429 Too Many Requests response instead of a successful reply. Your database stays calm. Your users stay happy.

Rate limiting in plain words

Think of a coffee shop with a "one cup per minute" rule. The barista keeps a mental list of customers. When you ask for a cup, the barista checks the list. If you got one less than a minute ago, they hand you a cup and update the timestamp. If you ask again five seconds later, they shake their head and tell you to wait.

In Rust, the rate limiter is middleware. It sits between the incoming HTTP request and your route handler. It extracts a key from the request (usually the IP address or an API key), checks a counter for that key, and decides whether to let the request pass. If the counter is over the limit, the middleware returns a 429 response immediately. Your handler never runs. No database query happens. No computation occurs.

There are two common algorithms. A fixed window resets the counter at strict intervals. A sliding window tracks individual request timestamps and slides the window forward as time passes. Sliding windows are smoother. They prevent a burst of requests right at the boundary of a fixed window from doubling the effective limit. Most modern crates use a sliding window or a token bucket approach.

The easy way: actix-web-rate-limit

The Actix ecosystem has a community-standard crate for this. actix-web-rate-limit handles the tracking, the windowing, and the response generation. It uses a DashMap internally for concurrent access, so it's safe to use across multiple threads.

Add the crate to your dependencies.

[dependencies]
actix-web = "4"
actix-web-rate-limit = "0.5"

Configure the limiter in your server setup. The API is builder-style, so you chain the options.

use actix_web::{web, App, HttpServer, HttpResponse};
use actix_web_rate_limit::RateLimit;

/// Handles the root route.
async fn index() -> HttpResponse {
    HttpResponse::Ok().body("Hello, world!")
}

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    // Create the rate limiter configuration.
    // Limit is the max requests allowed within the window duration.
    let rate_limit = RateLimit::new()
        .limit(10) // Allow 10 requests
        .window(std::time::Duration::from_secs(60)); // Per 60 seconds

    HttpServer::new(move || {
        App::new()
            // Wrap the entire app with the middleware.
            // Every route inherits this limit unless overridden.
            .wrap(rate_limit)
            .route("/", web::get().to(index))
    })
    .bind("127.0.0.1:8080")?
    .run()
    .await
}

The wrap call applies the middleware to all routes. The limiter tracks keys in memory. When a request arrives, it checks the key. If the key has fewer than 10 requests in the last 60 seconds, the request passes. If not, the middleware intercepts it and returns 429.

Convention aside: The community prefers explicit limit names. limit(10) is clear, but if you share this config across environments, use a constant or a config struct. Hardcoding numbers works for prototypes, but it becomes a maintenance trap when you need to tweak limits without recompiling.

How the middleware works

When a request hits the server, the middleware runs before your handler. It extracts the key. By default, the key is the client's IP address. The middleware looks up that IP in an internal map. If the IP is new, it creates an entry with a count of one. If the IP exists, it checks the timestamps of recent requests.

The sliding window logic discards timestamps older than the window duration. It counts the remaining timestamps. If the count is below the limit, it adds the current timestamp and lets the request continue. If the count meets or exceeds the limit, it stops the request. The middleware generates a 429 response with a Retry-After header. That header tells the client how many seconds to wait before trying again.

This happens in microseconds. The overhead is a hash map lookup and a vector push. It's negligible compared to database queries or heavy computation. You pay a tiny CPU cost to save your database from crashing.

Realistic config: keys and headers

IP-based limiting works for public APIs, but it fails behind NATs or corporate proxies. Multiple users share one IP, so they share one limit. One user hammers the endpoint, and everyone gets blocked. For authenticated APIs, you want to limit by user identity.

The crate supports custom key extractors. You can pull an API key from a header or a JWT claim. If the key is missing, you should fall back to IP. Otherwise, unauthenticated requests share a single bucket and can DDoS each other.

use actix_web::dev::ServiceRequest;
use actix_web_rate_limit::RateLimit;

/// Extracts the rate limit key from the request.
/// Prefers the X-Api-Key header. Falls back to IP if missing.
fn custom_key_extractor(req: &ServiceRequest) -> String {
    req.headers()
        .get("X-Api-Key")
        .and_then(|v| v.to_str().ok())
        .map(|k| k.to_string())
        .unwrap_or_else(|| {
            // Fallback to IP. unwrap is safe here because Actix guarantees peer_addr exists.
            req.peer_addr().unwrap().ip().to_string()
        })
}

// In your App setup:
let rate_limit = RateLimit::new()
    .limit(100)
    .window(std::time::Duration::from_secs(60))
    .key_extractor(custom_key_extractor);

The key_extractor closure receives the request and must return a String. The crate uses that string as the map key. If you return the wrong type, the compiler rejects you with E0308 (mismatched types). The extractor signature is strict. It must produce a string that uniquely identifies the client.

You can also add custom headers to the response. The crate adds X-RateLimit-Limit and X-RateLimit-Remaining by default. Some clients expect Retry-After. The middleware includes this automatically on 429 responses. Check the crate docs if you need to customize the header names.

The hard way: building with DashMap

The input kernel mentions dashmap for custom implementations. If you need logic the crate doesn't support, you can build your own middleware. This is useful for per-route limits with different windows, or for integrating with a database for persistent tracking.

A custom middleware needs a shared state. DashMap is the standard choice for concurrent hash maps in Rust. It allows multiple readers and writers without a global lock. You wrap the map in an Arc to share it across the server lifecycle.

use actix_web::{dev::ServiceRequest, web, App, HttpServer, HttpResponse, middleware::Next};
use dashmap::DashMap;
use std::sync::Arc;
use std::time::{Duration, Instant};

/// Tracks request counts and timestamps per key.
struct RateLimitState {
    // Map from key to a vector of request timestamps.
    requests: Arc<DashMap<String, Vec<Instant>>>,
    limit: usize,
    window: Duration,
}

/// Custom middleware handler.
async fn rate_limit_middleware(
    req: ServiceRequest,
    next: Next<HttpResponse>,
    state: web::Data<RateLimitState>,
) -> HttpResponse {
    let key = req.peer_addr().unwrap().ip().to_string();
    let now = Instant::now();

    // Lock the entry for this key.
    let mut entry = state.requests.entry(key).or_insert_with(Vec::new);
    
    // Remove timestamps outside the window.
    entry.retain(|t| now.duration_since(*t) <= state.window);

    if entry.len() >= state.limit {
        // Limit exceeded. Return 429 immediately.
        return HttpResponse::TooManyRequests()
            .header("Retry-After", "60")
            .body("Rate limit exceeded");
    }

    // Record this request.
    entry.push(now);

    // Call the next handler.
    next.call(req.into_inner()).await
}

This skeleton shows the complexity. You have to manage the map, clean up old entries, handle concurrency, and generate responses. The retain call prevents memory leaks by dropping old timestamps. Without it, the vector grows forever. The DashMap entry lock ensures thread safety.

Building this yourself is rarely worth it unless you have specific requirements. The crate handles edge cases, cleanup threads, and configuration. Reach for the crate first. Write custom middleware only when the crate hits a wall.

Pitfalls and errors

Rate limiting introduces new failure modes. Memory is the biggest one. Every unique key consumes space. If you limit by IP and face a distributed attack with rotating IPs, your map fills up. The actix-web-rate-limit crate includes a cleanup mechanism that evicts old keys. If you build your own, you must implement eviction. A DashMap with millions of entries will exhaust RAM and crash the server.

Distributed systems break local rate limiting. If you run three server instances behind a load balancer, each instance has its own map. A client can send 10 requests to instance A, 10 to instance B, and 10 to instance C. Each instance thinks the limit is fine. The total is 30. To fix this, you need a shared store like Redis. The middleware checks Redis instead of local memory. This adds latency but ensures consistency across the cluster.

Compiler errors appear if you mess up the types. If your key extractor returns an Option<String> instead of a String, the compiler emits E0308. The extractor must always return a string. If you forget to wrap the middleware state in web::Data, you get lifetime errors. The middleware expects shared state that lives as long as the server.

Pitfall alert: Rate limiting is a local defense. It won't stop a massive DDoS that saturates your network bandwidth. The packets never reach your application. Pair rate limiting with a WAF or CDN for high-volume attacks. Rate limiting protects your application logic. It doesn't protect your network interface.

When to use what

Use actix-web-rate-limit when you need a reliable, in-memory limiter for a single Actix server. Use a custom DashMap middleware when you need complex logic like per-endpoint limits with different windows, or when you want to avoid an external dependency for a simple prototype. Use Redis-backed rate limiting when you run multiple server instances behind a load balancer and need a shared counter. Reach for a WAF or CDN rate limiting when you face high-volume DDoS attacks that would overwhelm your application layer before the code even runs.

Memory is finite. Your rate limiter must evict old keys, or it becomes the denial of service.

Where to go next

Rate limiting is like a bouncer at a club who only lets a certain number of people in per minute. It protects your API from being overwhelmed by too many requests at once, ensuring it stays fast and available for everyone. You use it when you need to prevent abuse or accidental traffic spikes from crashing your server.