The scraper that ate itself
You are building a scraper. You have a list of ten thousand URLs. You write a loop that spawns a task for every link. You hit run. Your CPU spikes to 100%. The target server immediately returns 429 Too Many Requests and blocks your IP. Or worse, your process allocates memory for every response buffer at once, eats all available RAM, and the kernel sends a SIGKILL. You didn't just fail. You caused a denial of service against yourself.
The fix isn't to slow down the code with artificial delays. The fix is to limit how many tasks run at the exact same moment. You want to process as fast as possible, but you need a hard cap on concurrency. That cap keeps the server happy, keeps your memory bounded, and keeps the OS from murdering your process.
The bouncer with keys
A semaphore acts like a bouncer at a club with a fixed number of seats. You have a line of people outside. The bouncer holds a set of keys. Each key represents one seat. When someone wants to enter, they grab a key. If there are keys left, they walk in. If the keys are gone, they stand in line and wait. When someone leaves, they hand the key back to the bouncer, who immediately gives it to the next person in line.
In Rust, the keys are called permits. The line is the async wait queue. The semaphore tracks how many permits are available and hands them out fairly. Tokio's Semaphore is async-aware. When a task needs a permit and none are available, the task suspends. It yields control back to the runtime. The runtime runs other tasks. No threads are blocked. No CPU cycles are wasted spinning. The system stays responsive while the limit is enforced.
Minimal example
use tokio::sync::Semaphore;
#[tokio::main]
async fn main() {
// Create a semaphore with 3 permits.
// Only 3 tasks can hold a permit at once.
let semaphore = Semaphore::new(3);
let mut handles = vec![];
for i in 0..10 {
// Clone the semaphore handle for each task.
// This shares the state, not the permits.
// Cloning is cheap: it just increments a reference count.
let sem = semaphore.clone();
let handle = tokio::spawn(async move {
// Wait until a permit is available.
// This suspends the task if no permits exist.
// The permit is held until `_permit` is dropped.
let _permit = sem.acquire().await.unwrap();
// Critical section: limited concurrency.
println!("Task {} acquired permit", i);
tokio::time::sleep(tokio::time::Duration::from_millis(500)).await;
println!("Task {} releasing permit", i);
// Permit drops here, count increments, next waiter wakes.
});
handles.push(handle);
}
for handle in handles {
handle.await.unwrap();
}
}
What happens under the hood
When you call Semaphore::new(3), Tokio allocates a shared structure on the heap. This structure holds the count of available permits and a queue of waiters. The initial count is three. Calling semaphore.clone() does not create a new semaphore. It creates a new handle pointing to the same shared structure. This is cheap. You can clone the handle thousands of times without copying the permit count.
Inside the spawned task, sem.acquire().await checks the count. If the count is greater than zero, the semaphore decrements the count and returns a SemaphorePermit immediately. The task proceeds. If the count is zero, the task registers itself in the waiter queue and suspends. It yields control back to the Tokio runtime. The runtime runs other tasks. When a permit drops, the semaphore increments the count and wakes the next waiter in the queue. That waiter resumes and gets the permit.
The permit itself is a guard object. Holding the permit keeps the count decremented. Dropping the permit increments the count. You don't call a release method. The drop mechanism ensures the permit is released even if the task panics. This prevents leaks where a task crashes but still holds a permit, slowly starving the system.
Clone the semaphore handle before spawning. The cost is a single atomic increment. The safety is essential.
Realistic pattern
In real code, you usually wrap the logic in a function. You pass the semaphore into the function. This keeps the concurrency control separate from the business logic. You can reuse the function with different limits.
use tokio::sync::Semaphore;
use std::time::Duration;
/// Processes a single item with concurrency control.
/// Cloning the semaphore is cheap and shares the permit state.
async fn process_item(id: u32, semaphore: Semaphore) {
// Acquire permit. Task suspends here if limit is reached.
// The permit is held for the duration of the work.
let _permit = semaphore.acquire().await.unwrap();
println!("Processing item {}", id);
tokio::time::sleep(Duration::from_millis(100)).await;
println!("Finished item {}", id);
// Permit drops, count increments, next waiter wakes.
}
#[tokio::main]
async fn main() {
let semaphore = Semaphore::new(5);
let mut handles = vec![];
for id in 0..20 {
// Clone the semaphore handle for the task.
// This is required because tokio::spawn needs 'static data.
let sem = semaphore.clone();
let handle = tokio::spawn(async move {
process_item(id, sem).await;
});
handles.push(handle);
}
for handle in handles {
handle.await.unwrap();
}
}
Dynamic limits
You can change the limit at runtime. This is useful when you want to scale concurrency based on load. If the system is idle, you can add permits. If the system is under pressure, you can remove them.
use tokio::sync::Semaphore;
#[tokio::main]
async fn main() {
let semaphore = Semaphore::new(2);
// Add permits to increase the limit.
semaphore.add_permits(3);
println!("Available permits: {}", semaphore.available_permits());
// Remove permits to decrease the limit.
// This does not drop active permits.
// It only affects future acquisitions.
semaphore.add_permits(-2);
println!("Available permits: {}", semaphore.available_permits());
}
The add_permits method accepts negative values to remove permits. Removing permits does not affect tasks that already hold a permit. It only reduces the count for future acquisitions. If you remove more permits than available, the count stays at zero. It never goes negative.
Use add_permits to adapt to changing conditions. The semaphore adjusts instantly.
Pitfalls and compiler errors
The most common mistake is dropping the permit before the work finishes. If you write let permit = sem.acquire().await.unwrap(); and then permit goes out of scope before the async work, the permit returns to the pool immediately. The concurrency limit vanishes. All tasks run at once. The compiler won't stop you from dropping the permit early. You have to manage the scope yourself. Bind the permit to a variable that lives as long as the critical section. Using _permit signals the intent to hold it, but the underscore can hide scope issues. Name the variable explicitly if the scope is complex.
If you pass a reference to the semaphore into tokio::spawn without cloning, the compiler rejects the code. The spawned task requires owned data that lives for the entire program. A reference to a local variable dies when the function returns. You get an error like E0597 (does not live long enough). The fix is always to clone the semaphore handle before spawning.
If you try to move the semaphore into multiple tasks without cloning, you get E0382 (use of moved value). The first task takes ownership. The second task has nothing. Clone the handle for each task.
The community convention is to keep the permit scope tight. Acquire the permit, do the work, let it drop. Don't store the permit in a struct unless you really need to. Storing permits can lead to leaks where tasks hold permits forever and the system stalls. Also, tokio::sync::Semaphore is fair. Waiters are served in FIFO order. This prevents starvation. If you need unfair scheduling for performance, look elsewhere, but fairness is the default and usually what you want.
Treat the permit like a physical key. If you lose the key, the door stays open.
When to use a semaphore
Use Semaphore when you need to limit the number of concurrent operations but the tasks don't share mutable state. Use Semaphore when you are hitting rate limits on an external API and need to throttle requests. Use Semaphore when you want to bound memory usage by limiting how many large allocations happen at once. Use Mutex when multiple tasks need to read or write the same shared variable safely. Use Mutex when the coordination is about data consistency, not just counting. Use a channel when you need to feed tasks into a worker pool with backpressure. Use a channel when the producer and consumer are decoupled and you want to buffer work. Use try_acquire when you want to skip work if the limit is reached instead of waiting. Use try_acquire when the task is optional and you prefer dropping it over blocking.
Drop the permit when the work ends. Holding it longer only starves the queue.