How does nom's parser combinator approach differ from traditional parser generators like lalrpop?
Parser combinators and parser generators represent fundamentally different approaches to building parsers. Nom builds parsers by composing small functions, while lalrpop generates parser code from a grammar specification. This difference affects everything from error handling to debugging to integration with the rest of your code.
Parser Combinators: Building Parsers from Functions
Nom uses parser combinatorsâfunctions that take input and return parsed results:
use nom::{
IResult,
bytes::complete::tag,
sequence::tuple,
character::complete::digit1,
};
// A parser that matches a literal string
fn parse_hello(input: &str) -> IResult<&str, &str> {
tag("hello")(input)
}
// A parser that matches one or more digits
fn parse_number(input: &str) -> IResult<&str, &str> {
digit1(input)
}
// Composing parsers
fn parse_hello_number(input: &str) -> IResult<&str, (&str, &str)> {
tuple((tag("hello"), digit1))(input)
}Each parser is a regular Rust function. You compose them using combinator functions like tuple, alt, many0, etc.
Parser Generators: Grammar-Based Code Generation
Lalrpop defines a grammar that generates parser code:
// In a .lalrpop file:
grammar;
pub Expr: i32 = {
<l:Expr> "+" <r:Factor> => l + r,
<l:Expr> "-" <r:Factor> => l - r,
Factor,
};
pub Factor: i32 = {
<l:Factor> "*" <r:Term> => l * r,
<l:Factor> "/" <r:Term> => l / r,
Term,
};
pub Term: i32 = {
Num,
"(" <Expr> ")",
};
Num: i32 = <s:r"[0-9]+"> => s.parse().unwrap();Lalrpop processes this grammar file at compile time and generates Rust code for the parser.
Fundamental Architecture Differences
// nom: Parsers are functions composed at runtime
use nom::{
IResult,
bytes::complete::tag,
sequence::preceded,
character::complete::alpha1,
};
fn nom_example() {
// Build parser by combining functions
let parser = preceded(tag("hello "), alpha1);
// Parse at runtime
let result: IResult<&str, &str> = parser("hello world");
assert_eq!(result, Ok(("", "world")));
}
// lalrpop: Parsers are generated from grammar at compile time
mod generated_parser {
// lalrpop generates this module from the .lalrpop file
// The grammar is fixed at compile time
}
fn lalrpop_example() {
use generated_parser::ExprParser;
// Use the generated parser
let result = ExprParser::new().parse("1 + 2 * 3");
assert_eq!(result, Ok(7));
}Nom parsers are constructed dynamically; lalrpop parsers are generated statically from a grammar.
Composability and Reusability
Nom's combinator approach enables flexible composition:
use nom::{
IResult,
bytes::complete::{tag, take_while},
sequence::{tuple, preceded, delimited},
multi::many0,
character::complete::char,
combinator::map,
};
// Small, reusable parsers
fn whitespace(input: &str) -> IResult<&str, &str> {
take_while(|c: char| c.is_whitespace())(input)
}
fn identifier(input: &str) -> IResult<&str, &str> {
take_while(|c: char| c.is_alphanumeric() || c == '_')(input)
}
// Compose them in different ways
fn parse_assignment(input: &str) -> IResult<&str, (&str, &str)> {
let (input, (name, _, _, value)) = tuple((
identifier,
whitespace,
tag("="),
identifier
))(input)?;
Ok((input, (name, value)))
}
// Same parsers, different composition
fn parse_function_call(input: &str) -> IResult<&str, (&str, Vec<&str>)> {
let (input, (name, _, args, _)) = tuple((
identifier,
delimited(char('('), many0(preceded(whitespace, identifier)), char(')')),
whitespace
))(input)?;
Ok((input, (name, args)))
}
// Create custom combinators
fn parenthesized<'a, T>(
inner: impl Fn(&'a str) -> IResult<&'a str, T>
) -> impl Fn(&'a str) -> IResult<&'a str, T> {
move |input| {
delimited(char('('), &inner, char(')'))(input)
}
}Lalrpop grammars are less flexible in composition:
// In .lalrpop file, you define all alternatives in one place
// Reusing parts of a grammar requires duplicating or restructuring
grammar;
// Can't easily compose these in different contexts
Id: String = <s:r"[a-zA-Z_][a-zA-Z0-9_]*"> => s.to_string();
Assignment: (String, String) = {
<name:Id> "=" <value:Id> => (name, value),
};
FunctionCall: (String, Vec<String>) = {
<name:Id> "(" <args:Id*> ")" => {
let args: Vec<String> = args;
(name, args)
},
};Error Handling
Nom uses Rust's Result type with detailed error information:
use nom::{
IResult,
error::{Error, VerboseError,VerboseErrorKind, ErrorKind},
bytes::complete::tag,
character::complete::digit1,
};
fn error_handling() {
// Basic error type
let result: IResult<&str, &str, Error<&str>> = tag("hello")("world");
match result {
Err(nom::Err::Error(e)) => {
println!("Failed at position {}: {:?}", e.input, e.code);
}
_ => {}
}
// Verbose errors with context
use nom::error::context;
fn parse_with_context(input: &str) -> IResult<&str, &str, VerboseError<&str>> {
context("expecting hello", tag("hello"))(input)
}
let result = parse_with_context("world");
match result {
Err(nom::Err::Error(e)) => {
// e contains chain of contexts for debugging
println!("Verbose error: {:?}", e);
}
_ => {}
}
}Lalrpop provides parse errors but with less flexibility:
use lalrpop_util::ParseError;
fn lalrpop_errors() {
// let result = ExprParser::new().parse("1 + + 2");
// match result {
// Err(ParseError::InvalidToken { location }) => { ... }
// Err(ParseError::UnrecognizedEOF { location, expected }) => { ... }
// Err(ParseError::UnrecognizedToken { token, expected }) => { ... }
// Err(ParseError::ExtraToken { token }) => { ... }
// Ok(_) => {}
// }
}Performance Characteristics
// nom: Zero-allocation parsing where possible
use nom::{
IResult,
bytes::complete::take_while,
character::complete::digit1,
};
fn nom_zero_alloc(input: &str) -> IResult<&str, &str> {
// Returns a slice into the input, no allocation
take_while(|c: char| c.is_alphabetic())(input)
}
fn nom_with_allocation(input: &str) -> IResult<&str, String> {
use nom::combinator::map;
map(take_while(|c: char| c.is_alphabetic()), |s: &str| s.to_string())(input)
}
// lalrpop: Generates an LR parser
// - Builds parse tables at compile time
// - Stack-based parsing at runtime
// - Allocates for AST construction
// - Typically efficient for complex grammarsNom gives you control over allocations; lalrpop handles it automatically.
Parsing Strategies: Recursive Descent vs LR
// nom: Recursive descent (LL-style)
use nom::{
IResult,
bytes::complete::tag,
character::complete::digit1,
sequence::tuple,
branch::alt,
};
fn parse_expr(input: &str) -> IResult<&str, i32> {
// Left recursion causes infinite loops in recursive descent
// WRONG:
// fn expr(input: &str) -> IResult<&str, i32> {
// alt((
// tuple((expr, tag("+"), term)), // Infinite recursion!
// term
// ))(input)
// }
// Must rewrite to avoid left recursion
parse_additive(input)
}
fn parse_additive(input: &str) -> IResult<&str, i32> {
// Parse term, then optionally more terms with operators
let (input, mut result) = parse_term(input)?;
let mut input = input;
loop {
let (remaining, _) = nom::character::complete::multispace0(input)?;
if let Ok((rem, _)) = tag::<_, _, nom::error::Error<&str>>("+")(remaining) {
let (rem, rhs) = parse_term(rem)?;
result += rhs;
input = rem;
} else if let Ok((rem, _)) = tag::<_, _, nom::error::Error<&str>>("-")(remaining) {
let (rem, rhs) = parse_term(rem)?;
result -= rhs;
input = rem;
} else {
break;
}
}
Ok((input, result))
}
fn parse_term(input: &str) -> IResult<&str, i32> {
// Similar pattern for multiplication/division
let (input, mut result) = parse_factor(input)?;
let mut input = input;
loop {
let (remaining, _) = nom::character::complete::multispace0(input)?;
if let Ok((rem, _)) = tag::<_, _, nom::error::Error<&str>>("*")(remaining) {
let (rem, rhs) = parse_factor(rem)?;
result *= rhs;
input = rem;
} else if let Ok((rem, _)) = tag::<_, _, nom::error::Error<&str>>("/")(remaining) {
let (rem, rhs) = parse_factor(rem)?;
result /= rhs;
input = rem;
} else {
break;
}
}
Ok((input, result))
}
fn parse_factor(input: &str) -> IResult<&str, i32> {
alt((
nom::combinator::map_res(digit1, |s: &str| s.parse::<i32>()),
nom::sequence::delimited(
tag("("),
parse_expr,
tag(")")
)
))(input)
}Lalrpop handles left recursion naturally:
// In .lalrpop - left recursion is fine!
grammar;
pub Expr: i32 = {
<l:Expr> "+" <r:Term> => l + r, // Left recursion works
<l:Expr> "-" <r:Term> => l - r,
Term,
};
pub Term: i32 = {
<l:Term> "*" <r:Factor> => l * r,
<l:Term> "/" <r:Factor> => l / r,
Factor,
};
pub Factor: i32 = {
Num,
"(" <Expr> ")",
};
Num: i32 = <s:r"[0-9]+"> => s.parse().unwrap();Integration with Rust Code
Nom parsers blend naturally with Rust:
use nom::{
IResult,
bytes::complete::{tag, take_while},
character::complete::digit1,
combinator::map_res,
sequence::tuple,
};
#[derive(Debug)]
struct User {
name: String,
age: u32,
}
// Parse directly into your types
fn parse_user(input: &str) -> IResult<&str, User> {
let (input, (name, _, age)) = tuple((
take_while(|c: char| c.is_alphabetic()),
tag(","),
map_res(digit1, |s: &str| s.parse::<u32>())
))(input)?;
Ok((input, User {
name: name.to_string(),
age,
}))
}
// Use anywhere in your code
fn process_input(input: &str) -> Result<User, String> {
parse_user(input)
.map(|(_, user)| user)
.map_err(|e| format!("Parse error: {:?}", e))
}Lalrpop requires generated code and separate grammar files:
// user.lalrpop
grammar;
use super::User;
pub User: User = {
<name:Name> "," <age:Num> => User { name, age },
};
Name: String = <s:r"[a-zA-Z]+"> => s.to_string();
Num: u32 = <s:r"[0-9]+"> => s.parse().unwrap();
// main.rs
mod user; // Generated by lalrpop
fn process_input(input: &str) -> Result<User, String> {
user::UserParser::new()
.parse(input)
.map_err(|e| format!("Parse error: {:?}", e))
}Handling Ambiguity
Nom requires explicit choice ordering:
use nom::{IResult, branch::alt, bytes::complete::tag};
fn ambiguous_example(input: &str) -> IResult<&str, &str> {
// Try alternatives in order; first match wins
alt((
tag("foo"),
tag("foobar"), // Never matches because "foo" matches first
))(input)
}
// Must order by specificity
fn correct_order(input: &str) -> IResult<&str, &str> {
alt((
tag("foobar"), // Try longer match first
tag("foo"),
))(input)
}Lalrpop resolves ambiguity through grammar rules:
// In .lalrpop, precedence is handled explicitly
grammar;
pub Expr: i32 = {
<l:Expr> "+" <r:Expr> => l + r,
<l:Expr> "*" <r:Expr> => l * r,
Num,
};
// Or use precedence annotations
#[precedence(level="1")]
pub Expr: i32 = {
Num,
};
#[precedence(level="2")]
#[assoc(side="left")]
pub Add: i32 = <l:Expr> "+" <r:Expr> => l + r;
#[precedence(level="3")]
#[assoc(side="left")]
pub Mul: i32 = <l:Expr> "*" <r:Expr> => l * r;Streaming vs Complete Input
Nom supports both streaming and complete parsing:
use nom::{
IResult,
bytes::streaming::tag as streaming_tag,
bytes::complete::tag as complete_tag,
};
// Streaming: Input may be incomplete
fn parse_streaming(input: &[u8]) -> IResult<&[u8], &[u8]> {
streaming_tag("hello")(input)
}
// Complete: Input is complete
fn parse_complete(input: &[u8]) -> IResult<&[u8], &[u8]> {
complete_tag("hello")(input)
}
// Streaming returns Needed when more data required
fn streaming_example() {
let partial: &[u8] = b"hel";
match streaming_tag::<_, _, nom::error::Error<&[u8]>>("hello")(partial) {
Err(nom::Err::Incomplete(needed)) => {
println!("Need more data: {:?}", needed);
}
_ => {}
}
}Lalrpop is designed for complete input:
// Lalrpop expects complete input
// No built-in streaming support
// You would need to buffer input externallyTesting and Debugging
Nom parsers are testable as regular functions:
use nom::{IResult, bytes::complete::tag, sequence::tuple};
fn parser(input: &str) -> IResult<&str, (&str, &str)> {
tuple((tag("hello"), tag(" world")))(input)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parser_success() {
assert_eq!(parser("hello world"), Ok(("", ("hello", " world"))));
}
#[test]
fn test_parser_failure() {
assert!(parser("hello there").is_err());
}
#[test]
fn test_parser_partial() {
assert_eq!(parser("hello world!"), Ok(("!", ("hello", " world"))));
}
}Lalrpop tests use the generated parser:
// Tests work with the generated parser interface
#[cfg(test)]
mod tests {
// use super::generated_parser::ExprParser;
#[test]
fn test_expr() {
// assert_eq!(ExprParser::new().parse("1 + 2"), Ok(3));
}
}When to Choose Nom
// Choose nom when:
// 1. You need fine-grained control over parsing
// 2. You're parsing binary formats or streaming data
// 3. You want parsers as composable functions
// 4. You need zero-allocation parsing
// 5. Your grammar is simple or unconventional
// 6. You want to integrate parsers into existing code
use nom::{
IResult,
bytes::complete::{tag, take},
number::complete::{be_u16, be_u32},
sequence::tuple,
};
// Binary protocol parsing - nom excels here
fn parse_header(input: &[u8]) -> IResult<&[u8], (u16, u32, &[u8])> {
tuple((
be_u16, // version
be_u32, // length
take(4usize), // flags
))(input)
}When to Choose Lalrpop
// Choose lalrpop when:
// 1. You have a complex grammar with precedence rules
// 2. You want grammar-based specification
// 3. Left recursion is natural for your language
// 4. You're building a programming language parser
// 5. You prefer declarative grammar definitions
// 6. The grammar is mostly fixed
// Programming language - lalrpop excels here
// .lalrpop file:
grammar;
// pub Program = Statement*;
// pub Statement = Expr ";";
// pub Expr: i32 = {
// <l:Expr> "+" <r:Expr> => l + r,
// <l:Expr> "*" <r:Expr> => l * r,
// "(" <Expr> ")",
// Num,
// };Synthesis
Nom and lalrpop represent two fundamentally different approaches:
Parser Combinators (Nom):
- Parsers are functions composed at runtime
- Recursive descent parsing (LL-style)
- No left recursion; must transform grammars
- Fine-grained control over error handling
- Zero-allocation parsing possible
- Natural integration with Rust code
- Excellent for binary formats and streaming
- Steeper learning curve for complex grammars
Parser Generators (Lalrpop):
- Grammars generate parsers at compile time
- LR parsing (handles left recursion naturally)
- Declarative grammar specification
- Automatic precedence handling
- Separate grammar files
- Generated code interface
- Excellent for programming languages and DSLs
- Clearer grammar structure for complex languages
Choose nom when you need composability, streaming, binary formats, or tight integration with Rust code. Choose lalrpop when you have a complex grammar with precedence rules or prefer declarative grammar specifications.
Both can achieve the same parsing goals, but the development experience differs significantly. Nom feels like writing Rust code; lalrpop feels like writing a grammar specification.
