How to Implement a Virtual Machine in Rust

Implementing a virtual machine in Rust involves defining a custom instruction set architecture (ISA), creating a bytecode representation, and writing a loop that fetches, decodes, and executes instructions while managing a stack or register state.

The machine that reads its own rules

You write a program that runs other programs. That sounds recursive, but it is just a machine reading a tape of instructions. Instead of the CPU executing x86 or ARM assembly directly, your Rust code reads a custom list of commands, manipulates its own memory, and produces a result. You control every cycle. You decide what an instruction means. You build the rules.

Building a virtual machine starts with a single loop. The loop fetches an instruction, decodes what it means, executes the corresponding action, and repeats until a stop signal appears. The entire architecture lives inside that cycle. Everything else is just data structures waiting to be processed.

How the stack actually works

Think of a virtual machine as a kitchen assembly line. The bytecode is the recipe card taped to the wall. The stack is a metal tray where ingredients get stacked in order. The instruction pointer is the chef's finger tracking which line of the recipe they are on. When the chef sees a push command, they grab a box and place it on the tray. When they see an add command, they take the top two boxes, combine their contents, and put a new box back on the tray. The machine does not understand math. It only understands moving boxes and following the card.

Rust makes this pattern natural because the language already manages memory safely. You do not need to manually allocate arrays or track pointers. You hand the work to a Vec, and the compiler guarantees the bounds stay valid. The stack grows and shrinks automatically. The instruction pointer marches forward. The match statement routes the traffic.

Build the tape first. The machine follows.

A minimal interpreter

Here is a complete stack machine that supports pushing numbers, adding them, and stopping. The code compiles as written and demonstrates the fetch-decode-execute cycle without any external dependencies.

use std::fmt;

/// Represents the custom instruction set for the stack machine.
#[derive(Debug, Clone, Copy, PartialEq)]
enum Instruction {
    Push(u8),
    Add,
    Halt,
}

/// The virtual machine state, holding the stack, bytecode, and instruction pointer.
struct VM {
    stack: Vec<u8>,
    bytecode: Vec<Instruction>,
    ip: usize,
}

impl VM {
    /// Creates a new VM instance with the provided bytecode program.
    fn new(bytecode: Vec<Instruction>) -> Self {
        VM {
            stack: Vec::new(),
            bytecode,
            ip: 0,
        }
    }

    /// Executes the bytecode until Halt is reached or an error occurs.
    fn run(&mut self) -> Result<u8, String> {
        // Loop while the instruction pointer stays within bytecode bounds.
        while self.ip < self.bytecode.len() {
            // Fetch the current instruction by index.
            let instruction = self.bytecode[self.ip];
            
            // Decode and execute based on the instruction variant.
            match instruction {
                Instruction::Push(value) => {
                    // Push loads a literal value onto the stack.
                    self.stack.push(value);
                }
                Instruction::Add => {
                    // Add requires two operands. Pop returns None if empty.
                    let b = self.stack.pop().ok_or("Stack underflow on second operand")?;
                    let a = self.stack.pop().ok_or("Stack underflow on first operand")?;
                    // Push the computed sum back onto the stack.
                    self.stack.push(a + b);
                }
                Instruction::Halt => {
                    // Halt breaks the execution loop immediately.
                    break;
                }
            }
            // Advance the instruction pointer to the next command.
            self.ip += 1;
        }
        
        // Return the top value, or an error if the stack is empty.
        self.stack.pop().ok_or("Empty stack at end of execution")
    }
}

fn main() {
    // Compile a simple program: 2 + 3 = 5
    let bytecode = vec![
        Instruction::Push(2),
        Instruction::Push(3),
        Instruction::Add,
        Instruction::Halt,
    ];

    let mut vm = VM::new(bytecode);
    match vm.run() {
        Ok(result) => println!("Result: {}", result),
        Err(e) => println!("Error: {}", e),
    }
}

Convention aside: always derive Debug and PartialEq on your instruction enum. You will spend hours printing stack states and comparing bytecode during development. The explicit derives save you from writing boilerplate formatters later.

Walking through the cycle

The run method starts with the instruction pointer at zero. The while condition checks that the pointer has not walked past the end of the bytecode vector. If it has, the loop exits and the function returns the top stack value.

Inside the loop, the fetch step grabs self.bytecode[self.ip]. Rust's enum system makes the decode step trivial. The match statement branches on the variant. If it is Push, the value goes onto the stack. If it is Add, two values come off, get added, and the result goes back. If it is Halt, the loop breaks. Every branch ends with self.ip += 1, which moves the pointer forward.

Trace it by hand before you run it. The stack never lies.

Scaling past three instructions

Three instructions teach the pattern. Fifty instructions expose the bottleneck. The match statement in the dispatch loop works fine for a handful of opcodes. It becomes a performance wall when you add multiplication, subtraction, jumps, function calls, and memory loads. The compiler generates a linear chain of comparisons. Each cycle adds branching overhead.

The community standard for larger VMs is a dispatch table. You replace the match with an array of function pointers or a jump table indexed by opcode. You assign each instruction a numeric ID, usually starting at zero. The table maps that ID directly to a closure or function that executes the operation. The CPU jumps straight to the code without comparing variants. You also gain the ability to inline the instruction pointer increment into each handler, which removes the ip += 1 from the hot loop.

Memory management follows a similar scaling curve. The host OS gives you a heap managed by the system allocator. Your VM needs its own heap for guest programs. You cannot safely hand out host Vec pointers to untrusted bytecode. You build a separate memory region, usually a large Vec<u8> or a boxed slice, and implement a simple bump allocator or a mark-and-sweep collector. The VM tracks offsets into that region instead of raw pointers. This isolation prevents a buggy guest program from corrupting the host process.

Convention aside: keep unsafe out of the core dispatch loop. The community calls this the minimum unsafe surface rule. If you need raw pointer arithmetic for the guest heap, wrap it in a small, well-documented helper. The rest of the VM stays in safe Rust.

Isolate the dispatch loop. Everything else is just plumbing.

Where the safety net tears

Virtual machines process untrusted data. That means every index, every pop, and every memory offset must be validated. Rust prevents buffer overflows at compile time, but it cannot stop you from panicking at runtime if you index a vector with a stale pointer. If your bytecode contains a jump instruction that points to index 999 in a ten-instruction program, self.bytecode[999] will panic. The process crashes. The guest program just killed the host.

You handle this by switching from direct indexing to bounds-checked access. Replace self.bytecode[self.ip] with self.bytecode.get(self.ip).ok_or("Jump out of bounds")?. The compiler will reject direct indexing if you try to assign it to a Result without handling the panic path. If you accidentally mix types in your stack, you will hit E0308 (mismatched types) when trying to push a u16 into a Vec<u8>. The compiler catches it early. Runtime panics come from logic errors, not type errors.

Stack underflow is the most common guest bug. A program that calls Add with only one value on the stack will trigger the ok_or branch. You return a clean error string instead of crashing. You log the fault, reset the VM state, and continue. Production VMs often wrap the entire run method in a catch_unwind or use a custom panic handler to guarantee the host survives a guest crash.

Validate the bytecode before it touches the stack. Trust nothing from outside.

Choosing your architecture

Use a stack-based VM when you want simple bytecode generation and predictable memory layout. Stack machines compile easily because you only emit values and operators. The instruction size stays small. The interpreter loop remains straightforward.

Use a register-based VM when your language has heavy arithmetic or tight loops. Register machines pass operands directly to instructions instead of shuffling them through a stack. The bytecode becomes larger, but the execution cycle drops significantly.

Use a dispatch table when your instruction set exceeds twenty opcodes and profiling shows the match statement consuming more than ten percent of CPU time. The table eliminates branching overhead and makes the hot loop branchless.

Use a custom guest allocator when your VM needs to manage heap memory for user programs. The host allocator cannot track guest lifetimes. A separate memory region with offset-based addressing keeps the host process safe from guest corruption.

Use a JIT compiler when interpretation is too slow for your target workload. Just-in-time compilation translates hot bytecode paths into native machine code at runtime. You trade startup time and memory for raw execution speed.

Start simple. Optimize only when the profiler points a finger.

Where to go next