How to Parse URLs in Rust

The problem with string splitting

You are building a CLI tool that fetches data from an API. The user passes a URL on the command line. You need to extract the host to set a header, or the path to build a request. Your first instinct is to split the string by slashes. You grab the part after :// and before the next /. It works for https://example.com/path.

Then the user passes https://example.com. Your split logic crashes because there is no path slash. Or they pass http://user:pass@example.com. Now you have credentials in the host. Or they pass a URL with a query string like https://example.com/search?q=rust&lang=en. Your split logic treats ?q=rust as part of the path. You start writing regex. You handle percent-encoding. You deal with IPv6 addresses in brackets. You realize you are reimplementing the URL specification.

String manipulation breaks the moment the input gets weird. You need a parser that understands the rules, not just a string splitter.

Enter the url crate

The url crate is the standard way to handle URLs in Rust. It implements the WHATWG URL Standard, the same specification browsers use. This means your Rust code handles edge cases identically to Chrome or Firefox. The crate gives you a Url struct that holds the parsed components. You can access the scheme, host, path, query, and fragment as structured data. You get validation, normalization, and safe manipulation for free.

Add the dependency to your Cargo.toml:

[dependencies]
url = "2.5"

The core workflow is simple. You call .parse() on a string slice. This returns a Result<Url, ParseError>. If the string is valid, you get a Url. If not, you get an error describing what went wrong. You never touch raw string slicing again.

Minimal example

Here is the basic pattern. You parse a string, handle the result, and access components.

use url::Url;

fn main() {
    // The input string. Could come from a file, CLI arg, or user input.
    let input = "https://example.com/path?query=1";

    // parse() returns Result<Url, ParseError>.
    // We use expect here for the example, but real code should handle the error.
    // Convention: prefer Url::parse(input) for clarity, though input.parse::<Url>() works too.
    let url = input.parse::<Url>().expect("Input must be a valid URL");

    // Access components directly. No string splitting needed.
    println!("Scheme: {}", url.scheme());
    println!("Host: {}", url.host_str().unwrap());
    println!("Path: {}", url.path());
}

The Url struct exposes methods for every part of the URL. scheme() returns the protocol. host_str() returns the domain as a string. path() returns the path segment. Each method returns the data in a clean, usable form. You don't need to worry about trailing slashes or missing components.

Trust the parser. It knows the spec better than you do.

What happens during parsing

When you call parse, the crate does more than split the string. It validates the structure against the standard. It checks for illegal characters. It ensures the scheme is valid. It verifies the host format. If anything is wrong, it returns a ParseError.

The crate also normalizes the URL. This is a critical detail. If you pass http://example.com, the path becomes /. If you pass http://EXAMPLE.COM, the host becomes lowercase. If you pass http://example.com//path, the double slash might be normalized depending on the context. The Url struct holds the canonical form. This prevents bugs where http://example.com and http://example.com/ are treated as different URLs when they are the same resource.

Normalization also handles percent-encoding. If you pass http://example.com/hello%20world, the path is stored with the encoding. When you call url.path(), you get the decoded string /hello world. When you print the URL, you get the encoded form back. The crate manages the encoding automatically. You don't need to call decode functions manually.

Don't fight normalization. It saves you from edge cases.

Realistic usage: Validation and extraction

In real code, you often need to validate the URL or extract specific parts. Here is a function that checks for HTTPS and extracts the host.

use url::Url;

/// Validates a URL and ensures it uses HTTPS.
/// Returns the host if valid, or an error message.
fn get_https_host(input: &str) -> Result<String, String> {
    // Parse the string into a Url struct.
    // Handle the error gracefully instead of panicking.
    let url = input.parse::<Url>().map_err(|e| format!("Parse error: {}", e))?;

    // Check the scheme. This is a simple string comparison.
    if url.scheme() != "https" {
        return Err("Only HTTPS URLs are allowed".to_string());
    }

    // host_str() returns Option<&str> because file:// URLs have no host.
    // We unwrap here because we know https implies a host,
    // but in robust code, you might want to handle None explicitly.
    Ok(url.host_str().unwrap().to_string())
}

The map_err call transforms the ParseError into a user-friendly string. This is a common pattern when you want to return a simple error type. The function checks the scheme and returns an error if it is not HTTPS. It then extracts the host. Note that host_str() returns an Option. This is because some URL schemes like file:// do not have a host. For https, a host is required, so unwrap is safe here. In a library, you might return an error if the host is missing.

Convention aside: Url::parse is slightly preferred over str::parse::<Url>() in many codebases because it makes the type explicit at the call site. Both compile to the same code. Pick the one that reads better in your context.

Handle the error before you panic. Users will paste weird URLs.

Working with query parameters

Query parameters are a common pain point. You might need to read ?q=rust&lang=en or build a URL with parameters. The url crate provides query_pairs() for this.

use url::Url;

/// Prints all query parameters from a URL.
fn print_params(url: &Url) {
    // query_pairs() returns an iterator over (key, value) pairs.
    // It handles percent-decoding automatically.
    // If there is no query, the iterator is empty.
    for (key, value) in url.query_pairs() {
        println!("{} = {}", key, value);
    }
}

The query_pairs() method returns an iterator. Each item is a borrowed pair of strings. The values are percent-decoded. If the query is ?name=hello%20world, the value is hello world. You can collect these into a vector or map if you need random access.

You can also modify query parameters. The Url struct is mutable. You can set the query string directly.

let mut url = "https://example.com".parse::<Url>().unwrap();

// Set a new query string.
// This replaces any existing query.
url.set_query(Some("q=rust&lang=en"));

println!("{}", url); // https://example.com?q=rust&lang=en

The set_query method takes an Option<&str>. Pass None to remove the query. Pass Some(...) to set it. The crate validates the query string. If you pass invalid characters, it returns an error.

Use query_pairs for reading. Use set_query for writing. Never split the query string manually.

Joining relative URLs

You often have a base URL and a relative path. For example, a base of https://example.com/api/ and a path of users. You want to combine them into https://example.com/api/users. String concatenation is dangerous. What if the base has a trailing slash? What if the path has a leading slash? What if the path contains ../?

The url crate provides join for this.

use url::Url;

fn main() {
    // Parse the base URL.
    let base = "https://example.com/api/".parse::<Url>().unwrap();

    // Join a relative path.
    // join handles slashes, .., and normalization automatically.
    let new_url = base.join("users").unwrap();

    println!("{}", new_url); // https://example.com/api/users

    // Join with a relative path that goes up.
    let up_url = base.join("../status").unwrap();
    println!("{}", up_url); // https://example.com/status
}

The join method takes a relative URL string. It resolves it against the base. It handles .. segments. It normalizes the result. It returns a Result because the relative URL might be invalid. This is the safe way to build URLs from parts.

Use join for relative paths. It handles the slashes for you.

Pitfalls and error handling

Parsing URLs can fail. You need to handle errors properly. The ParseError type has variants for different failure modes. Common errors include EmptyHost, RelativeUrlWithoutBase, and IdnaError.

If you try to parse a relative URL with Url::parse, you get an error. Url::parse expects absolute URLs. If you have a relative URL, you must use join with a base.

let result = "path/to/resource".parse::<Url>();
// This returns Err(ParseError::RelativeUrlWithoutBase).

Another pitfall is host() vs host_str(). The host() method returns Option<Host>. Host is an enum that can be a domain string, an IPv4 address, or an IPv6 address. If you just need the string representation, use host_str(). If you need to distinguish between IP types, use host().

let url = "https://127.0.0.1:8080".parse::<Url>().unwrap();

// host() returns the structured Host type.
match url.host() {
    Some(url::Host::Ipv4(addr)) => println!("IPv4: {}", addr),
    Some(url::Host::Ipv6(addr)) => println!("IPv6: {}", addr),
    Some(url::Host::Domain(_)) => println!("Domain"),
    None => println!("No host"),
}

The url crate also handles Internationalized Domain Names. If you parse http://münchen.de, the host is stored in ASCII-compatible encoding behind the scenes. When you print it, you get the Unicode form back. You don't need to worry about punycode. The crate handles IDNA automatically.

Check the error type before panicking. Relative URLs are common in user input.

Decision matrix

Use url::Url when you need to parse, validate, or manipulate URLs in isolation. Use reqwest::Url when you are already using the reqwest crate for HTTP requests, as it re-exports the same url crate and saves a dependency. Use percent_encoding when you only need to encode or decode a single component like a query value, without parsing a full URL. Use string splitting only when you are certain the format is fixed and simple, and you are willing to maintain regex or split logic for every edge case.

Reach for url unless you have a compelling reason not to. The spec is harder than you think.

Where to go next

How to use url crate in Rust URL parsing

Parsing a URL means breaking a web address string into its parts like the domain, path, and query parameters so your program can read them separately. It matters because you often need to modify specific parts of a link or check if it points to a valid site before using it. Think of it like taking apart a mailing address to find just the street name or zip code.