How to measure code coverage

When green tests aren't enough

You just finished a feature. You wrote tests. cargo test turns green. You commit the code and feel safe. Three days later, a user reports a crash. The crash happens when an input is empty. You "definitely" tested that case. You look at your test file. You tested the case where the input has values. You forgot the empty case.

This happens to everyone. Tests verify behavior, but they don't tell you what you missed. You need a way to see exactly which lines of code executed during your test suite. That's what code coverage does. It shows you the blind spots in your tests.

Coverage is a metric, not a guarantee. It tells you what ran, not whether it's correct. You can have 100% coverage and still have a logic bug. But you can't have a bug in a line of code that never runs. Coverage helps you find the lines you forgot to test so you can add tests for them.

What coverage actually measures

Coverage tools run your tests with extra instrumentation. They count how many times each line or branch executes. They produce a report highlighting missed spots. There are two main types of coverage.

Line coverage tracks whether a line of code executed at least once. If a line runs, it's green. If it never runs, it's red. Line coverage is easy to understand but can be misleading. A line with an if statement might run, but only one branch might execute. Line coverage marks the whole line as covered even if the other branch is untested.

Branch coverage tracks decision points. It ensures that every possible path through the code runs. For an if statement, branch coverage requires both the if and the else to execute. For a match expression, every arm must run. Branch coverage is stricter and more useful. It catches the cases where you tested the happy path but missed the error handling.

Rust's ecosystem relies on external tools for coverage. The language doesn't ship with a built-in command. The standard tool for local development is cargo-tarpaulin. It wraps cargo test and handles the instrumentation and reporting. Under the hood, it uses LLVM's profiling data or debug information to map execution counts back to your source code.

Running your first coverage report

Install cargo-tarpaulin using cargo install. This adds the cargo-tarpaulin subcommand to your toolchain. Run it in your project directory. It will compile your code with profiling flags, run your tests, and print a summary.

# Install the tool globally for your user.
cargo install cargo-tarpaulin

# Run coverage on the current project.
# The --engine llvm flag forces the LLVM backend, which is faster and more accurate.
cargo tarpaulin --engine llvm

The output shows percentages for lines and branches. You'll see a table listing each file and its coverage. Files with low coverage stand out immediately. If you want a visual report, add the --out Html flag. This generates tarpaulin-report.html in your project directory. Open the file in your browser to see your source code with color-coded lines.

/// Calculates the average of a slice of numbers.
/// Returns None if the slice is empty.
pub fn average(numbers: &[f64]) -> Option<f64> {
    // This branch checks for empty input.
    // Coverage will show if this line executed.
    if numbers.is_empty() {
        return None;
    }

    // Sum the numbers and divide by the count.
    // Both lines must run for full coverage.
    let sum: f64 = numbers.iter().sum();
    Some(sum / numbers.len() as f64)
}

If you only test this function with [1.0, 2.0], the is_empty branch never runs. The coverage report will mark that line as missed. You'll know you need a test for the empty case.

How the instrumentation works

When you run cargo tarpaulin, it modifies the build process. It adds compiler flags to enable profiling. The flag -C instrument-coverage tells LLVM to insert counters into your binary. These counters increment every time the associated code executes.

The test binary writes the counter data to a .profraw file when it exits. Tarpaulin reads this file and maps the counts to your source code. It uses debug information to correlate machine instructions with Rust source lines. The result is a precise map of execution.

The LLVM engine is the recommended choice. It uses the same profiling infrastructure that rustc uses for optimization. It's faster than older methods and handles Rust's code generation accurately. The community convention is to always use --engine llvm unless you have a specific reason not to. Older engines may miss coverage in complex macros or inline functions.

Reading the HTML report

The HTML report is your primary tool for finding gaps. Each source file has its own page. Lines are colored based on execution. Green means the line ran. Red means it never ran. Some lines may be gray. Gray lines are usually comments or blank lines that don't affect execution.

Hover over a line to see details. The tooltip shows how many times the line executed. This helps you spot hot paths and rarely used code. If a line ran thousands of times, it's a performance candidate. If it ran once, it might be initialization code.

Look for red lines carefully. Not all red lines need tests. Some code is dead code that you plan to remove. Some code is only reachable in specific configurations. Use your judgment. The goal is to test the code that matters, not to chase 100% coverage for the sake of the metric.

Convention aside: The community often excludes generated files and build scripts from coverage. Use --exclude-files "build.rs" or --exclude-files "src/generated/*" to filter them out. This keeps your metrics focused on hand-written logic. You can also configure these exclusions in Cargo.toml under [package.metadata.tarpaulin] to avoid typing flags every time.

Branch coverage and match arms

Branch coverage is especially important in Rust. Rust encourages pattern matching and error handling. These constructs create many branches. If you don't test all branches, you might miss edge cases.

Consider a function that parses a configuration value. It uses a match expression to handle different cases. If you only test the success case, the error arms remain untested. Branch coverage will flag the missed arms. This forces you to write tests for error conditions.

/// Parses a configuration value from a string.
/// Returns an error if the value is invalid.
pub fn parse_config(value: &str) -> Result<i32, &'static str> {
    // Match creates branches for each arm.
    // Branch coverage requires every arm to run.
    match value {
        "fast" => Ok(1),
        "slow" => Ok(2),
        // This arm is an error case.
        // If you don't test it, coverage will show a gap.
        "invalid" => Err("unknown mode"),
        // The catch-all arm handles unexpected input.
        // This is often missed in tests.
        _ => Err("unexpected value"),
    }
}

If you test "fast" and "slow", the error arms are red. You need tests for "invalid" and an unexpected string like "medium". Branch coverage makes these gaps visible. It ensures your error handling is as robust as your success path.

Tarpaulin reports branch coverage by default. The summary shows both line and branch percentages. Aim for high branch coverage. Line coverage can be high even if you missed critical branches. Branch coverage gives you confidence that your tests exercise the full logic of your code.

CI integration and LCOV

Local reports are useful for development. Continuous integration pipelines need machine-readable formats. Services like Codecov and Coveralls accept LCOV format. LCOV is a standard text format that describes coverage data. It works across languages and tools.

Generate LCOV output by passing --out Lcov to tarpaulin. This writes tarpaulin-report.lcov to your directory. You can upload this file to your coverage service. Most services provide a CLI tool or GitHub Action to handle the upload.

# Generate LCOV output for CI.
# The --out Lcov flag produces the standard format.
cargo tarpaulin --engine llvm --out Lcov

# Upload to Codecov using their CLI tool.
# This step depends on your CI environment.
codecov --file tarpaulin-report.lcov

In a GitHub Actions workflow, you can add a step to run tarpaulin and upload the report. This gives you coverage history over time. You can see how coverage changes with each pull request. This helps prevent regressions. If a PR drops coverage, the team can investigate before merging.

Convention aside: CI pipelines often use cargo-llvm-cov instead of cargo-tarpaulin. cargo-llvm-cov is a newer tool focused purely on LLVM coverage. It's faster and has a simpler interface. It's becoming the standard for CI. If you're setting up a new pipeline, consider cargo-llvm-cov. It handles LCOV generation and upload scripts with less configuration.

Pitfalls and false confidence

Coverage has limitations. Understanding them prevents misuse. The biggest pitfall is confusing coverage with quality. High coverage doesn't mean your code is correct. It means your tests ran the code. You can have 100% coverage and still have bugs.

Example: You write a test that asserts 1 + 1 == 3. The test fails, but the line still executes. Coverage marks the line as covered. The bug remains. Coverage complements tests; it doesn't replace them. Use coverage to find missing tests, not to validate existing ones.

Another pitfall is chasing 100%. Some code is hard to test. Error paths might be unreachable in normal operation. Dead code might exist for future features. Forcing 100% coverage can lead to artificial tests that add no value. Focus on the code that matters. Ignore dead code and unreachable paths. Use #[cfg(test)] to mark test-only code. Tarpaulin excludes this code from production coverage automatically.

Compiler errors can also interfere. If your code doesn't compile, coverage tools can't run. You'll get standard compiler errors. If you have a complex build script, the instrumentation might fail. Tarpaulin will report an error about missing test binaries. Ensure your project builds cleanly before measuring coverage.

If you try to measure coverage on a binary crate without tests, the tool reports 0% coverage. You won't get a compiler error, but the result is useless. Binary crates often rely on integration tests. Make sure your integration tests are included. Tarpaulin runs all test targets by default. If you have separate test crates, they should be picked up automatically.

Treat the red lines as a map of your blind spots. Fix the gaps that matter. Don't waste time on noise.

Choosing your tool

The Rust ecosystem offers several coverage tools. Pick the one that fits your workflow.

Use cargo-tarpaulin when you want a local tool with a rich HTML report and flexible configuration. It handles workspaces, filtering, and multiple output formats. It's the mature choice for developers who want detailed insights.

Use cargo-llvm-cov when you want a lightweight, fast tool focused on LLVM coverage. It has a clean interface and integrates well with CI. It's the modern choice for teams prioritizing speed and simplicity.

Use grcov when you are building a custom CI pipeline and need to parse raw LLVM data into LCOV format. It's a lower-level tool that gives you control over the conversion process. It's useful when you need to combine coverage from multiple sources.

Use llvm-cov directly when you need fine-grained control over the profiling data. It's the underlying tool used by higher-level wrappers. It's rarely needed for standard projects but useful for debugging coverage issues.

Trust the metric, but verify the tests. Coverage shows you what you missed. It's up to you to write tests that matter.

Where to go next

Code coverage tells you what percentage of your code is actually run when your tests execute. It helps you find parts of your program that you forgot to test. Think of it like a highlighter that marks every line of code your tests touch, so you can see exactly which lines are still untested.