How to Parse XML in Rust

When XML refuses to die

You're integrating with a legacy banking API that only speaks XML. Or you're reading RSS feeds for a dashboard. Or a config file from 2008 refuses to die. XML is the cockroach of data formats: it survives everywhere, it's verbose, and you probably didn't ask for it. Rust's standard library doesn't include an XML parser. You need a crate. The community standard is quick-xml. It's fast, it's event-driven, and it handles the messy reality of XML without allocating a mountain of memory.

Add the dependency to your Cargo.toml. Pin the version; XML crates evolve their APIs.

[dependencies]
quick-xml = "0.31"

Streaming events, not trees

XML parsers usually fall into two camps. DOM parsers load the entire document into a tree in memory. That's easy to navigate but eats RAM proportional to the file size. If you parse a 500MB XML dump, you need gigabytes of RAM. Event-based parsers stream through the document, firing events as they encounter tags, text, and attributes. You decide what to keep and what to discard. The memory footprint stays flat.

Think of DOM like reading a whole book and photocopying every page onto a wall before you start analyzing it. Think of event-based parsing like reading the book one page at a time and scribbling notes on a notepad. You only keep what matters. quick-xml is event-based. It gives you a Reader that yields Event variants. You loop, match on the event, and extract data. The parser doesn't care about your structure. It just tells you what it found. You write the logic to connect the dots.

Minimal parsing loop

The core of quick-xml is a loop over events. You create a Reader, prepare a buffer, and call read_event_into. The buffer is critical. The parser reuses it to store event data. This avoids allocating a new Vec for every single tag. After you process the event, you call buf.clear(). This resets the length to zero but keeps the capacity. The next event reuses the same memory. This is the "minimum allocation" pattern.

use quick_xml::events::Event;
use quick_xml::Reader;

/// Parse a simple XML string and print tags and text.
fn main() {
    // Raw string literals handle quotes and newlines without escaping.
    let xml = r#"<root><item>value</item></root>"#;
    
    // Create a reader from the string.
    let mut reader = Reader::from_str(xml);
    
    // Buffer for the parser to reuse. Avoids reallocation per event.
    let mut buf = Vec::new();

    loop {
        // Read the next event into the buffer.
        // Returns Result<Event, Error>.
        match reader.read_event_into(&mut buf) {
            Ok(Event::Start(e)) => println!("Tag: {}", e.name()),
            Ok(Event::Text(e)) => println!("Text: {}", e.unescape().unwrap()),
            Ok(Event::Eof) => break,
            Err(e) => panic!("Error: {}", e),
            _ => {}
        }
        // Clear the buffer for the next iteration.
        // Keeps capacity, resets length. Reuses memory.
        buf.clear();
    }
}

The buf.clear() call is mandatory for memory efficiency. If you skip it, the buffer accumulates data from every event. The capacity grows, and you leak memory proportional to the file size. That defeats the purpose of streaming. The community convention is to declare buf outside the loop and clear it inside. Never create a new buffer inside the loop.

What happens under the hood

When read_event_into runs, the parser scans the XML bytes. It finds <root>. It emits Event::Start. Your match arm prints the tag. The buffer holds the raw bytes of the event. You clear it. The parser moves past >. It finds <item>. Event::Start again. It finds value. Event::Text. It finds </item>. Event::End. You ignore it in the minimal example. It finds </root>. Event::End. It hits the end. Event::Eof. You break.

The Reader tracks its position. It doesn't load the whole string. It just advances a cursor. The Event enum has variants for everything XML can contain: Start, End, Empty, Text, Comment, PI, CData, DocType, and Eof. You match on the variants you care about. The _ arm catches the rest. This gives you total control. You can skip sections, extract specific fields, or validate structure on the fly.

Handling real-world XML

Real XML has attributes, namespaces, and whitespace. Pretty-printed XML includes newlines and indentation between tags. Without configuration, quick-xml treats those as text nodes. You'll see Event::Text with just spaces. The Config struct lets you tune the parser. trim_text(true) discards text nodes that contain only whitespace. This is the standard setting for pretty-printed XML.

use quick_xml::events::Event;
use quick_xml::Reader;
use quick_xml::config::Config;

/// Parse XML with whitespace trimming and attribute extraction.
fn main() {
    let xml = r#"<root>
        <item id="1">value</item>
    </root>"#;
    
    let mut reader = Reader::from_str(xml);
    // Trim whitespace-only text nodes.
    // Essential for pretty-printed XML.
    reader.config_mut().trim_text(true);
    
    let mut buf = Vec::new();

    loop {
        match reader.read_event_into(&mut buf) {
            Ok(Event::Start(e)) => {
                println!("Tag: {}", e.name().local_name());
                // Iterate attributes directly to avoid allocation.
                for attr in e.attributes() {
                    if let Ok(attr) = attr {
                        println!("  Attr: {} = {}", attr.key.local_name(), attr.value);
                    }
                }
            }
            Ok(Event::Text(e)) => {
                // unescape converts entities like &amp; to &.
                // Returns Result, so handle errors in production.
                if let Ok(text) = e.unescape() {
                    println!("  Text: {}", text);
                }
            }
            Ok(Event::Eof) => break,
            _ => {}
        }
        buf.clear();
    }
}

The e.attributes() method returns an iterator. Iterating directly avoids allocating a Vec of attributes. The community convention is to iterate attributes in place unless you need to store them. If you need to store attributes, call e.attributes().into_owned().collect(). That allocates. Use it sparingly.

Namespaces add another layer. e.name() returns a QName that includes the namespace. Most of the time, you only care about the local name. Call e.name().local_name() to get the tag without the namespace prefix. If you need to check the namespace, use e.name().namespace(). XML namespaces are a pain. quick-xml gives you the tools to handle them, but you have to write the checks.

Parsing from files

Streaming from a file is the most common use case. You don't load the file into memory. You stream it from disk. Use Reader::from_reader with a File. The parser reads chunks from the file and emits events. The memory usage stays constant.

use std::fs::File;
use quick_xml::events::Event;
use quick_xml::Reader;

/// Stream XML from a file without loading it into memory.
fn main() {
    let file = File::open("data.xml").expect("File not found");
    let mut reader = Reader::from_reader(file);
    let mut buf = Vec::new();

    loop {
        match reader.read_event_into(&mut buf) {
            Ok(Event::Start(e)) => println!("Tag: {}", e.name()),
            Ok(Event::Eof) => break,
            _ => {}
        }
        buf.clear();
    }
}

The Reader takes any type that implements std::io::Read. This works with File, BufReader, TcpStream, or any custom reader. The parser handles buffering internally. You just provide the source. This pattern scales to gigabyte-sized files. The only limit is disk I/O speed.

Pitfalls and compiler errors

If you declare let reader = Reader::from_str(xml); without mut, the compiler rejects you with E0596 (cannot borrow as mutable). The reader advances state, so it must be mutable. Always use let mut reader.

If you forget buf.clear(), the buffer grows. The parser reuses the buffer capacity, but if you don't clear the length, the buffer accumulates data. You'll see memory usage climb linearly with file size. That defeats the purpose of streaming. Add buf.clear() after every successful read.

The unescape() method returns a Result. XML text can contain entities like < or  . If you call .unwrap() on unescape and the XML is malformed, your program panics. In production code, handle that error. Log it, skip the text, or return an error. Don't unwrap in library code.

Whitespace handling trips up beginners. If you don't enable trim_text, you get text nodes with newlines and spaces. Your logic might break if you expect only meaningful text. Enable trim_text(true) for pretty-printed XML. For compact XML, you might want to keep whitespace. Choose based on your input.

XML namespaces are another trap. If you compare e.name() directly, you might miss matches because of namespace prefixes. Use local_name() for tag comparison. Check namespace() only when you need to distinguish between namespaces. Most parsers normalize namespaces. quick-xml gives you the raw data. You have to do the normalization.

Decision matrix

Use quick-xml when you need to parse large XML files with minimal memory usage. The event-based API keeps your footprint flat regardless of file size.

Use roxmltree when you prefer a tree-based API and the XML documents are small enough to fit in memory. It builds a DOM-like structure that's easier to query with XPath-like syntax.

Use serde with quick-xml when you want to deserialize XML directly into Rust structs. quick-xml provides a serde feature flag. You get the convenience of #[derive(Deserialize)] while keeping the parsing engine efficient.

Use xml-rs or older crates only if you are maintaining legacy code. The ecosystem has moved to quick-xml for performance and active maintenance.

XML parsing is a state machine. Embrace the events. Don't fight the stream.

Where to go next

Parsing XML in Rust means converting structured text data into a format your program can use. You typically use a library like quick-xml to read the file piece by piece, identifying tags and text content. Think of it like reading a book where you only care about specific chapters and sentences, skipping the rest.