How to Use Memory-Mapped I/O in Rust

Use the memmap2 crate to map files into memory for efficient I/O in Rust.

When the file is too big to load

You're building a log analyzer. The target file is 4GB. Your machine has 8GB of RAM, but you're running a database, a browser, and the editor. You try std::fs::read_to_string, and the process hogs memory, the OS starts swapping, and your fan spins up like a jet engine. You don't need the whole file in RAM. You need the OS to pretend the file is in RAM, handing you pages only when you touch them.

Memory-mapped I/O solves this. It turns a file on disk into a slice of memory. You index into the slice, and the kernel loads chunks from disk on demand. You get random access to a massive file without allocating a massive buffer. The file stays on disk until you touch it, and only the touched parts occupy physical memory.

The window into the warehouse

Think of a file as a warehouse full of boxes. Standard I/O is like carrying boxes out one by one. You call read, the OS grabs a box, copies it to your desk, and you process it. If you want box number 500, you have to walk through boxes 1 through 499 or seek back and forth. Every access involves copying data between the OS buffer and your buffer.

Memory mapping installs a window into the warehouse. You don't move boxes. You look through the window and grab what you need. The OS manages the window. When you reach for a box that isn't visible, the OS slides that section into view. To your code, it looks like a contiguous array. The magic is lazy loading. You only pay the disk cost for the bytes you actually access. The OS also shares these pages across processes. If two programs map the same file, they share the physical memory. No duplication.

Minimal mapping

Rust doesn't include memory mapping in the standard library. The community standard is the memmap2 crate. Add it to your Cargo.toml:

[dependencies]
memmap2 = "0.9"

Here's the smallest working example. It opens a file, maps it, and reads the first byte.

use memmap2::MmapOptions;
use std::fs::File;

/// Reads the first byte of a file using memory mapping.
fn main() {
    // Open the file in read-only mode.
    let file = File::open("data.bin").unwrap();

    // Map the file into memory.
    // This requires unsafe because the kernel might modify the mapping
    // or the file could be truncated, violating Rust's aliasing rules.
    let mmap = unsafe {
        MmapOptions::new().map(&file).unwrap()
    };

    // Access data like a slice.
    // Mmap implements Deref<Target=[u8]>, so indexing works directly.
    println!("First byte: {}", mmap[0]);
}

The map function is unsafe. The compiler can't verify that the file won't be truncated or modified by another process while the mapping is active. If the file shrinks, accessing the mapping triggers undefined behavior. The unsafe block forces you to acknowledge this risk. Keep the block tight. The mapping creation is the only dangerous part; accessing the resulting Mmap is safe because it behaves like a &[u8].

Convention aside: use memmap2, not the older memmap crate. memmap2 supports modern features, better error handling, and active maintenance. The community has migrated entirely.

Keep the unsafe block tight. The mapping creation is the only dangerous part.

What happens under the hood

When you call MmapOptions::new().map(&file), nothing happens on disk yet. The function invokes a syscall: mmap on Unix or CreateFileMapping plus MapViewOfFile on Windows. The kernel reserves a range of virtual addresses in your process and links them to the file. It doesn't read any data.

The first time you touch mmap[0], the CPU checks its page tables to find the physical memory for that address. The entry is empty. The CPU raises a page fault. This isn't an error; it's a signal to the OS. The OS catches the fault, reads the corresponding block from the file, allocates a physical page, copies the data, and updates the page table. Your code never sees the fault. It just gets the byte.

This is zero-copy I/O. You skip the read syscall, the kernel buffer, and the memcpy into your user buffer. The data lands directly where you need it. For random access patterns, this is dramatically faster than standard I/O because you avoid repeated syscalls and buffer copies. For sequential reads, memory mapping can be slower. The OS has to manage page tables and might thrash the cache if you stream through a huge file linearly. The page fault overhead adds up when you touch every single byte.

Zero-copy means zero overhead. The data lands where you need it.

Real-world: Parsing binary records

Memory mapping shines when you parse binary formats. You can overlay structs onto the mapped slice, avoiding manual byte parsing. This requires care with alignment and endianness.

use memmap2::MmapOptions;
use std::fs::File;

/// Represents a simple binary record.
/// The repr(C) ensures stable layout for casting.
#[repr(C)]
struct Record {
    id: u32,
    value: f64,
}

/// Parses records from a memory-mapped file.
fn process_records(path: &str) -> Result<(), Box<dyn std::error::Error>> {
    let file = File::open(path)?;
    
    // Map the file read-only.
    let mmap = unsafe {
        MmapOptions::new().map(&file)?
    };

    // Check alignment and size before casting.
    let len = mmap.len();
    let record_size = std::mem::size_of::<Record>();

    if len % record_size != 0 {
        return Err("File size is not a multiple of record size".into());
    }

    // SAFETY:
    // 1. We verified the slice length is a multiple of Record size.
    // 2. The slice is aligned to Record alignment.
    //    Mmap returns a pointer aligned to page boundaries (usually 4KB),
    //    which satisfies the 8-byte alignment required by f64.
    // 3. The file is read-only, preventing concurrent mutation.
    let records: &[Record] = unsafe {
        std::slice::from_raw_parts(mmap.as_ptr() as *const Record, len / record_size)
    };

    for record in records {
        println!("ID: {}, Value: {}", record.id, record.value);
    }

    Ok(())
}

The cast from &[u8] to &[Record] is unsafe. Rust can't prove the byte slice contains valid Record instances or that the alignment matches. The safety comment lists the invariants: size divisibility, alignment, and immutability. Mmap pointers are page-aligned, which satisfies most struct alignments. If your struct has stricter alignment, you'd need to check mmap.as_ptr().align_offset(std::mem::align_of::<Record>()).

Alignment is the silent killer. Check it before you cast.

Mutable mappings and flushing

Read-only mappings are safe and efficient. Mutable mappings let you write back to the file through the slice. Use MmapMut for this.

use memmap2::MmapOptions;
use std::fs::File;
use std::fs::OpenOptions;

/// Modifies a file in place using a mutable memory map.
fn patch_file(path: &str) -> Result<(), Box<dyn std::error::Error>> {
    // Open with read-write permissions.
    let file = OpenOptions::new().read(true).write(true).open(path)?;

    // Create a mutable mapping.
    let mut mmap = unsafe {
        MmapOptions::new().map_mut(&file)?
    };

    // Modify data directly.
    mmap[0] = 0xFF;

    // Flush changes to disk.
    // Without this, data might remain in OS buffers and be lost on crash.
    mmap.flush()?;

    Ok(())
}

Changes to MmapMut update the file, but not immediately. The OS buffers writes in memory. You must call flush() to force the data to disk. If the process crashes before flushing, the modifications are lost. flush() corresponds to msync on Unix. It's expensive, so batch your changes and flush once. There's also flush_async() for non-blocking semantics, but flush() is the safe default for durability.

Flush or lose. The OS doesn't care about your data until you ask.

Pitfalls and crashes

Memory mapping introduces risks that standard I/O hides.

File truncation. If another process truncates the file while it's mapped, accessing the mapping causes a SIGBUS signal. Your process crashes instantly. Rust can't catch this. The kernel kills the process because the memory range no longer maps to valid file content. Treat the file size as a contract. If the file can change, memory mapping is a trap. Use file locking or versioning to prevent truncation.

File lifetime. The Mmap object does not store the File. If you drop the File before the Mmap, the mapping might dangle. On Linux, the file stays open as long as the mapping exists. On Windows, behavior varies. Best practice: keep the File alive for the duration of the mapping, or use a wrapper that ties their lifetimes together.

Alignment. Casting slices to structs requires alignment. If the pointer isn't aligned to the struct's alignment, you get undefined behavior. The compiler won't stop you inside unsafe. If you try to dereference a raw pointer outside unsafe, you get E0133 (dereference of raw pointer requires unsafe). If you forget the unsafe block around map, the compiler rejects you because map is an unsafe function.

Sequential performance. Memory mapping isn't always faster. For sequential reads, BufReader with std::fs::File often wins. The OS optimizes sequential reads with read-ahead buffering. Memory mapping forces page table management and can pollute the TLB (Translation Lookaside Buffer). Profile before mapping.

Treat the file size as a contract. If the file can change, memory mapping is a trap.

Decision matrix

Use std::fs::read when the file fits comfortably in RAM and you need the whole contents at once. It's simple, safe, and fast for small files.

Use memmap2 when you need random access to a large file without loading it entirely. Use memmap2 when you're building a database engine, a virtual file system, or a parser that jumps around a binary format. Use memmap2 when you want zero-copy performance and the OS page cache handles the buffering for you.

Use BufReader with std::fs::File when you're streaming data sequentially. Memory mapping adds overhead for sequential reads because the OS has to manage page tables and might thrash the cache if you read linearly through a huge file.

Use MmapMut when you need to write back to the file through the mapping. Use Mmap for read-only access; it's safer and the OS can optimize read-only mappings better.

Map for random access and size. Stream for sequence and safety.

Where to go next