When CSVs meet structs
You have a CSV file exported from a spreadsheet. It contains user data. You need to load it into Rust, filter by age, and write the results to a new file. You could split lines by commas and index into strings. That approach collapses the moment a name contains a comma, a field is quoted, or the encoding shifts. You need a parser that understands CSV rules, and you need a way to map those rows to Rust types without writing boilerplate.
Serde handles the mapping between external data formats and Rust values. The csv crate handles the text parsing. Together, they turn rows into structs with zero manual parsing code. Serde generates the conversion logic at compile time. The csv crate streams the file and feeds rows to serde's generated code. You get type safety, error reporting per row, and high performance.
How the pieces fit
Serde is a framework for serializing and deserializing Rust data structures. It works through traits: Serialize for writing data out, and Deserialize for reading data in. You rarely implement these traits by hand. You use derive macros: #[derive(Serialize, Deserialize)]. The macros generate the implementation based on your struct fields.
The csv crate provides Reader and Writer types. Reader parses CSV text according to RFC 4180 rules, handling quoting, escaping, and newlines correctly. Writer produces valid CSV output. When you call reader.deserialize(), the csv crate yields rows and serde converts each row into your target type.
Think of serde as the blueprint for your data. The csv crate is the construction crew that reads the raw materials and builds the structure according to the blueprint. If the materials don't match the blueprint, the crew reports exactly where and why.
Minimal working example
Add the dependencies to your Cargo.toml. The csv crate depends on serde internally, but you need serde in your project to use the derive macros.
[dependencies]
serde = { version = "1.0", features = ["derive"] }
csv = "1.3"
Create a struct that matches your CSV columns. Derive Debug for printing, Deserialize for reading, and Serialize for writing.
use serde::{Deserialize, Serialize};
/// Represents a single row from the CSV file.
#[derive(Debug, Deserialize, Serialize)]
struct Record {
name: String,
age: u32,
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
// from_path opens the file and returns a Result.
// The ? operator propagates file errors to the caller.
let mut reader = csv::Reader::from_path("data.csv")?;
// deserialize returns an iterator over Result<Record, csv::Error>.
// Each row is parsed independently.
for result in reader.deserialize() {
let record: Record = result?;
println!("{:?}", record);
}
Ok(())
}
The deserialize method returns an iterator. Each item in the iterator is a Result. This design lets you handle row-level errors without aborting the entire file. If row 42 has a bad age, you can log it and continue to row 43.
Don't swallow the error. Log the row number and move on.
What happens under the hood
When csv::Reader::from_path runs, it opens the file and creates an internal buffer. The csv crate is optimized for speed. It reads chunks of data and parses them in place. You don't need to wrap the file in a BufReader. The csv crate already buffers efficiently.
The deserialize method scans the file row by row. For each row, it extracts fields and passes them to serde's deserializer. Serde matches fields to your struct. If your CSV has headers, serde uses the header names to match fields by name. If your CSV has no headers, serde matches by position.
If a field is missing, serde reports an error. If a field has the wrong type, serde reports an error. If the row has too many or too few fields, serde reports an error. All errors include the byte position and line number, which makes debugging straightforward.
Convention aside: the csv crate is authored by the same developer who created serde. They are designed to work together seamlessly. You'll see this synergy in the API design and error messages.
Real-world parsing with headers and errors
Real CSVs rarely match your structs perfectly. Headers might use different names. Fields might be optional. Some rows might be malformed. You need ReaderBuilder to configure the parser and serde attributes to handle mismatches.
use csv::ReaderBuilder;
use serde::Deserialize;
/// User record with flexible field mapping.
#[derive(Debug, Deserialize)]
struct User {
// The CSV header is "full_name", but the struct field is "name".
// rename maps the header to the field.
#[serde(rename = "full_name")]
name: String,
// The CSV might have empty ages.
// Option<u32> handles missing or empty values gracefully.
age: Option<u32>,
// Skip this field during deserialization.
// The CSV has an "id" column we don't need.
#[serde(skip)]
_id: u32,
}
/// Processes users from a CSV file with error handling.
fn process_users() -> Result<(), Box<dyn std::error::Error>> {
// ReaderBuilder allows configuration of parsing behavior.
// has_headers(true) tells the parser to use the first row as headers.
// flexible(true) allows rows with different numbers of fields.
let mut reader = ReaderBuilder::new()
.has_headers(true)
.flexible(true)
.from_path("users.csv")?;
// enumerate adds a row index to the iterator.
// This is essential for reporting errors with context.
for (i, result) in reader.deserialize::<User>().enumerate() {
match result {
Ok(user) => {
// Process valid user.
println!("Row {}: {:?}", i + 1, user);
}
Err(err) => {
// Handle row-level error without stopping the loop.
// err.position() gives byte offset and line number.
eprintln!("Error on row {}: {}", i + 1, err);
}
}
}
Ok(())
}
The rename attribute maps CSV headers to struct fields. This decouples your Rust code from the exact header names in the file. The Option type handles missing values. If the age column is empty, serde sets age to None instead of failing. The skip attribute ignores columns you don't care about.
ReaderBuilder::flexible(true) is a safety net. It allows rows with fewer or more fields than expected. Without it, a row with a missing trailing field causes a parse error. With it, missing fields become empty strings, which serde then handles based on your types.
Check the headers. A mismatched header is the silent killer of CSV parsing.
Pitfalls and compiler errors
Forgetting to derive Deserialize is the most common mistake. If you try to deserialize into a struct without the trait, the compiler rejects the code with E0277 (the trait Deserialize<'_> is not implemented for Record). The fix is simple: add #[derive(Deserialize)] to the struct.
Type mismatches cause runtime errors, not compile errors. If the CSV contains "abc" in the age column, serde fails to parse it as u32. The error is a csv::Error with kind Deserialize. The error message includes the invalid value and the expected type. Handle this in your loop. Don't let it panic.
Header mismatches also cause runtime errors. If your struct expects a field named email but the CSV header is e-mail, serde reports a missing field error. Use rename to fix this. Or use #[serde(default)] to provide a default value when the field is missing.
Empty fields can trip up non-Option types. If a field is empty and the struct expects String, serde sets it to an empty string. If the struct expects u32, serde fails. Use Option<T> for fields that might be empty. Or use #[serde(deserialize_with = "...")] for custom parsing logic.
Convention aside: csv::Error implements std::error::Error. You can chain it with ? in functions that return Result. The error type also provides position(), which returns a Position struct with byte offset and line/column numbers. Use this for debugging.
Writing CSV output
Writing CSVs is symmetric to reading. Use csv::Writer and the Serialize derive.
use csv::Writer;
use serde::Serialize;
/// Output record for CSV export.
#[derive(Serialize)]
struct Output {
id: u32,
value: String,
}
/// Writes data to a CSV file.
fn write_data() -> Result<(), Box<dyn std::error::Error>> {
// from_path creates a writer that appends to the file.
// If the file doesn't exist, it is created.
let mut writer = csv::Writer::from_path("output.csv")?;
let data = vec![
Output { id: 1, value: "alpha".into() },
Output { id: 2, value: "beta".into() },
];
// serialize writes a single record.
// It handles quoting and escaping automatically.
for record in data {
writer.serialize(record)?;
}
// flush ensures all buffered data is written to disk.
// This is important if you want to read the file immediately after.
writer.flush()?;
Ok(())
}
The serialize method writes one record. It quotes fields that contain commas, quotes, or newlines. It escapes internal quotes by doubling them. You don't need to handle quoting manually.
flush is critical. The writer buffers output for performance. If you drop the writer without flushing, the buffer might not be written. Calling flush forces the buffer to disk. Alternatively, you can let the writer drop at the end of scope, which also flushes, but explicit flush is clearer.
Serde pays for itself the moment your schema changes. Write the struct, not the parser.
Decision matrix
Use csv with serde when your data has a stable schema and you want type safety. Use csv without serde when you need maximum throughput and can work with raw string records. Use ReaderBuilder when your CSV uses a non-standard delimiter, flexible field counts, or custom quote characters. Use manual string splitting only for tiny, trusted files where adding a dependency is unjustified.
Counter-intuitive but true: the csv crate is often faster than manual parsing because it uses SIMD optimizations and avoids unnecessary allocations. Trust the crate.