Why you don't want to parse URLs by hand
It's tempting. A URL is just a string with slashes and a few well-known characters, right? You write a quick split('/'), pull out the host, and ship it. Then somebody types https://[::1]:8080/api. Then your service has to handle https://example.com:443/path%20with%20spaces?key=a%26b#frag. Then a bug report comes in about mailto: and javascript:. Now you're maintaining a parser nobody asked for.
URLs look simple. The actual grammar, defined in RFC 3986 with a half-dozen amendments and the WHATWG URL Living Standard layered on top, is anything but. Punycode for internationalised hostnames, percent-encoding rules that differ by component, the difference between path and path?query, the rules for relative URL resolution. None of it is hard once you've read the specs, but you really don't want to be the person reading them.
The url crate is the answer the Rust ecosystem settled on. It implements the WHATWG spec correctly, exposes a friendly API, and is used by basically every HTTP client in the ecosystem (reqwest, hyper, surf, you name it).
Adding it to your project
# Cargo.toml — pin a recent 2.x release.
[dependencies]
url = "2.5"
That's it for the basic case. The crate has a couple of optional features (serde for serialization, expose_internals for low-level access) but you usually don't need them.
Parsing a URL
use url::Url;
fn main() {
// Url::parse takes anything that's &str. It returns Result because
// not every string is a valid URL: "" fails, "not a url" fails,
// "http://" without a host also fails.
let raw = "https://user:pass@example.com:8080/api/v1/items?id=42&sort=desc#section";
let url = Url::parse(raw).expect("invalid URL");
// Each part of the URL is its own method on the Url struct.
// Notice how host_str() returns Option<&str>: not every URL has
// a host (think "mailto:alice@example.com" or "data:..." URLs).
println!("scheme = {}", url.scheme()); // "https"
println!("username = {:?}", url.username()); // "user"
println!("password = {:?}", url.password()); // Some("pass")
println!("host = {:?}", url.host_str()); // Some("example.com")
println!("port = {:?}", url.port()); // Some(8080)
println!("path = {}", url.path()); // "/api/v1/items"
println!("query = {:?}", url.query()); // Some("id=42&sort=desc")
println!("fragment = {:?}", url.fragment()); // Some("section")
}
The crate handles all the edge cases for you: percent-encoding is decoded where it should be and preserved where it shouldn't, default ports are normalised, IPv6 addresses are parsed correctly, internationalised domains are converted to Punycode.
Iterating query parameters
A query string is key=value&key=value. Parsing it byte by byte is a common source of bugs. The crate gives you a real iterator:
use url::Url;
fn main() {
let url = Url::parse("https://api.example.com/search?q=rust+book&page=3&page=4").unwrap();
// query_pairs() returns an iterator over (Cow<str>, Cow<str>) pairs.
// The Cow is there because some values need to be percent-decoded
// (which produces a new String) and some don't (which can borrow).
for (key, value) in url.query_pairs() {
println!("{key} = {value}");
}
// Output:
// q = rust book
// page = 3
// page = 4
}
Notice that q=rust+book decoded to rust book. The plus-as-space is part of the application/x-www-form-urlencoded convention that browsers and servers expect. The crate handles it.
If you want the values as a hashmap, write a helper:
use std::collections::HashMap;
use url::Url;
// Collect query params into a HashMap. Repeated keys will overwrite each
// other; if you need all values, use a HashMap<String, Vec<String>> instead.
fn query_map(url: &Url) -> HashMap<String, String> {
url.query_pairs()
.map(|(k, v)| (k.into_owned(), v.into_owned()))
.collect()
}
Building URLs from parts
Don't concatenate URL strings. The number of bugs caused by format!("{base}/{path}") is enormous. Use the crate's join method:
use url::Url;
fn main() {
// The base URL ends with a slash. That matters!
// "https://api.example.com/v1/" + "items" = "https://api.example.com/v1/items"
// "https://api.example.com/v1" + "items" = "https://api.example.com/items"
// (without the trailing slash, "v1" gets replaced, not appended)
let base = Url::parse("https://api.example.com/v1/").unwrap();
// join() does proper relative-URL resolution per RFC 3986.
let users = base.join("users").unwrap();
let user_42 = users.join("42").unwrap();
println!("{users}"); // https://api.example.com/v1/users
println!("{user_42}"); // https://api.example.com/v1/users/42 (wait, no!)
}
That last comment is a head-scratcher. users.join("42") actually produces https://api.example.com/v1/42, because users doesn't end in a slash, so 42 replaces the last segment. The fix is users.join("42") after users.join("users/"). Or build paths explicitly with path_segments_mut():
use url::Url;
fn main() {
let mut url = Url::parse("https://api.example.com/v1").unwrap();
// path_segments_mut returns an editor for the path. We can push
// segments one at a time, and the crate handles encoding.
url.path_segments_mut().unwrap()
.push("users")
.push("42")
.push("posts");
println!("{url}");
// https://api.example.com/v1/users/42/posts
}
For query strings, the equivalent is query_pairs_mut:
use url::Url;
fn main() {
let mut url = Url::parse("https://api.example.com/search").unwrap();
url.query_pairs_mut()
.append_pair("q", "rust book") // gets percent-encoded
.append_pair("page", "1");
println!("{url}");
// https://api.example.com/search?q=rust+book&page=1
}
The crate percent-encodes values for you. If you wrote the query string manually with format!("?q={}&page={}", q, p), you'd be open to all kinds of injection bugs and broken-on-special-characters surprises.
Common errors you'll see
RelativeUrlWithoutBase
You called Url::parse on a string that doesn't have a scheme, like "/api/v1/items". Either prepend a base URL, or use base.join("/api/v1/items").
EmptyHost
A URL like "http:///path" has no host. The parser correctly refuses.
InvalidPort
The port portion wasn't a number, or was out of range. Most often shows up when you accidentally include a colon followed by a non-numeric path.
These all come from the url::ParseError enum. Match on it if you want different handling per case; otherwise just propagate with ? and your function returns a clean error type.
When to reach for it vs. alternatives
url is the right answer when:
- You're parsing or constructing URLs of arbitrary shape.
- You need spec-correct percent-encoding.
- You care about international domain names.
It's overkill when:
- You're just splitting on
/for an internal route table that you control end to end. - You're working within a higher-level HTTP framework (
axum,actix-web,reqwest) that already gives you typed URL pieces.
In the second case, just use what the framework hands you. Both reqwest::Url and axum::extract::Path build on this crate under the hood.