Loading pageā¦
Rust walkthroughs
Loading pageā¦
nom's parser combinator approach differ from traditional parser generators like lalrpop?Parser combinators and parser generators represent fundamentally different approaches to building parsers. Nom builds parsers by composing small functions, while lalrpop generates parser code from a grammar specification. This difference affects everything from error handling to debugging to integration with the rest of your code.
Nom uses parser combinatorsāfunctions that take input and return parsed results:
use nom::{
IResult,
bytes::complete::tag,
sequence::tuple,
character::complete::digit1,
};
// A parser that matches a literal string
fn parse_hello(input: &str) -> IResult<&str, &str> {
tag("hello")(input)
}
// A parser that matches one or more digits
fn parse_number(input: &str) -> IResult<&str, &str> {
digit1(input)
}
// Composing parsers
fn parse_hello_number(input: &str) -> IResult<&str, (&str, &str)> {
tuple((tag("hello"), digit1))(input)
}Each parser is a regular Rust function. You compose them using combinator functions like tuple, alt, many0, etc.
Lalrpop defines a grammar that generates parser code:
// In a .lalrpop file:
grammar;
pub Expr: i32 = {
<l:Expr> "+" <r:Factor> => l + r,
<l:Expr> "-" <r:Factor> => l - r,
Factor,
};
pub Factor: i32 = {
<l:Factor> "*" <r:Term> => l * r,
<l:Factor> "/" <r:Term> => l / r,
Term,
};
pub Term: i32 = {
Num,
"(" <Expr> ")",
};
Num: i32 = <s:r"[0-9]+"> => s.parse().unwrap();Lalrpop processes this grammar file at compile time and generates Rust code for the parser.
// nom: Parsers are functions composed at runtime
use nom::{
IResult,
bytes::complete::tag,
sequence::preceded,
character::complete::alpha1,
};
fn nom_example() {
// Build parser by combining functions
let parser = preceded(tag("hello "), alpha1);
// Parse at runtime
let result: IResult<&str, &str> = parser("hello world");
assert_eq!(result, Ok(("", "world")));
}
// lalrpop: Parsers are generated from grammar at compile time
mod generated_parser {
// lalrpop generates this module from the .lalrpop file
// The grammar is fixed at compile time
}
fn lalrpop_example() {
use generated_parser::ExprParser;
// Use the generated parser
let result = ExprParser::new().parse("1 + 2 * 3");
assert_eq!(result, Ok(7));
}Nom parsers are constructed dynamically; lalrpop parsers are generated statically from a grammar.
Nom's combinator approach enables flexible composition:
use nom::{
IResult,
bytes::complete::{tag, take_while},
sequence::{tuple, preceded, delimited},
multi::many0,
character::complete::char,
combinator::map,
};
// Small, reusable parsers
fn whitespace(input: &str) -> IResult<&str, &str> {
take_while(|c: char| c.is_whitespace())(input)
}
fn identifier(input: &str) -> IResult<&str, &str> {
take_while(|c: char| c.is_alphanumeric() || c == '_')(input)
}
// Compose them in different ways
fn parse_assignment(input: &str) -> IResult<&str, (&str, &str)> {
let (input, (name, _, _, value)) = tuple((
identifier,
whitespace,
tag("="),
identifier
))(input)?;
Ok((input, (name, value)))
}
// Same parsers, different composition
fn parse_function_call(input: &str) -> IResult<&str, (&str, Vec<&str>)> {
let (input, (name, _, args, _)) = tuple((
identifier,
delimited(char('('), many0(preceded(whitespace, identifier)), char(')')),
whitespace
))(input)?;
Ok((input, (name, args)))
}
// Create custom combinators
fn parenthesized<'a, T>(
inner: impl Fn(&'a str) -> IResult<&'a str, T>
) -> impl Fn(&'a str) -> IResult<&'a str, T> {
move |input| {
delimited(char('('), &inner, char(')'))(input)
}
}Lalrpop grammars are less flexible in composition:
// In .lalrpop file, you define all alternatives in one place
// Reusing parts of a grammar requires duplicating or restructuring
grammar;
// Can't easily compose these in different contexts
Id: String = <s:r"[a-zA-Z_][a-zA-Z0-9_]*"> => s.to_string();
Assignment: (String, String) = {
<name:Id> "=" <value:Id> => (name, value),
};
FunctionCall: (String, Vec<String>) = {
<name:Id> "(" <args:Id*> ")" => {
let args: Vec<String> = args;
(name, args)
},
};Nom uses Rust's Result type with detailed error information:
use nom::{
IResult,
error::{Error, VerboseError,VerboseErrorKind, ErrorKind},
bytes::complete::tag,
character::complete::digit1,
};
fn error_handling() {
// Basic error type
let result: IResult<&str, &str, Error<&str>> = tag("hello")("world");
match result {
Err(nom::Err::Error(e)) => {
println!("Failed at position {}: {:?}", e.input, e.code);
}
_ => {}
}
// Verbose errors with context
use nom::error::context;
fn parse_with_context(input: &str) -> IResult<&str, &str, VerboseError<&str>> {
context("expecting hello", tag("hello"))(input)
}
let result = parse_with_context("world");
match result {
Err(nom::Err::Error(e)) => {
// e contains chain of contexts for debugging
println!("Verbose error: {:?}", e);
}
_ => {}
}
}Lalrpop provides parse errors but with less flexibility:
use lalrpop_util::ParseError;
fn lalrpop_errors() {
// let result = ExprParser::new().parse("1 + + 2");
// match result {
// Err(ParseError::InvalidToken { location }) => { ... }
// Err(ParseError::UnrecognizedEOF { location, expected }) => { ... }
// Err(ParseError::UnrecognizedToken { token, expected }) => { ... }
// Err(ParseError::ExtraToken { token }) => { ... }
// Ok(_) => {}
// }
}// nom: Zero-allocation parsing where possible
use nom::{
IResult,
bytes::complete::take_while,
character::complete::digit1,
};
fn nom_zero_alloc(input: &str) -> IResult<&str, &str> {
// Returns a slice into the input, no allocation
take_while(|c: char| c.is_alphabetic())(input)
}
fn nom_with_allocation(input: &str) -> IResult<&str, String> {
use nom::combinator::map;
map(take_while(|c: char| c.is_alphabetic()), |s: &str| s.to_string())(input)
}
// lalrpop: Generates an LR parser
// - Builds parse tables at compile time
// - Stack-based parsing at runtime
// - Allocates for AST construction
// - Typically efficient for complex grammarsNom gives you control over allocations; lalrpop handles it automatically.
// nom: Recursive descent (LL-style)
use nom::{
IResult,
bytes::complete::tag,
character::complete::digit1,
sequence::tuple,
branch::alt,
};
fn parse_expr(input: &str) -> IResult<&str, i32> {
// Left recursion causes infinite loops in recursive descent
// WRONG:
// fn expr(input: &str) -> IResult<&str, i32> {
// alt((
// tuple((expr, tag("+"), term)), // Infinite recursion!
// term
// ))(input)
// }
// Must rewrite to avoid left recursion
parse_additive(input)
}
fn parse_additive(input: &str) -> IResult<&str, i32> {
// Parse term, then optionally more terms with operators
let (input, mut result) = parse_term(input)?;
let mut input = input;
loop {
let (remaining, _) = nom::character::complete::multispace0(input)?;
if let Ok((rem, _)) = tag::<_, _, nom::error::Error<&str>>("+")(remaining) {
let (rem, rhs) = parse_term(rem)?;
result += rhs;
input = rem;
} else if let Ok((rem, _)) = tag::<_, _, nom::error::Error<&str>>("-")(remaining) {
let (rem, rhs) = parse_term(rem)?;
result -= rhs;
input = rem;
} else {
break;
}
}
Ok((input, result))
}
fn parse_term(input: &str) -> IResult<&str, i32> {
// Similar pattern for multiplication/division
let (input, mut result) = parse_factor(input)?;
let mut input = input;
loop {
let (remaining, _) = nom::character::complete::multispace0(input)?;
if let Ok((rem, _)) = tag::<_, _, nom::error::Error<&str>>("*")(remaining) {
let (rem, rhs) = parse_factor(rem)?;
result *= rhs;
input = rem;
} else if let Ok((rem, _)) = tag::<_, _, nom::error::Error<&str>>("/")(remaining) {
let (rem, rhs) = parse_factor(rem)?;
result /= rhs;
input = rem;
} else {
break;
}
}
Ok((input, result))
}
fn parse_factor(input: &str) -> IResult<&str, i32> {
alt((
nom::combinator::map_res(digit1, |s: &str| s.parse::<i32>()),
nom::sequence::delimited(
tag("("),
parse_expr,
tag(")")
)
))(input)
}Lalrpop handles left recursion naturally:
// In .lalrpop - left recursion is fine!
grammar;
pub Expr: i32 = {
<l:Expr> "+" <r:Term> => l + r, // Left recursion works
<l:Expr> "-" <r:Term> => l - r,
Term,
};
pub Term: i32 = {
<l:Term> "*" <r:Factor> => l * r,
<l:Term> "/" <r:Factor> => l / r,
Factor,
};
pub Factor: i32 = {
Num,
"(" <Expr> ")",
};
Num: i32 = <s:r"[0-9]+"> => s.parse().unwrap();Nom parsers blend naturally with Rust:
use nom::{
IResult,
bytes::complete::{tag, take_while},
character::complete::digit1,
combinator::map_res,
sequence::tuple,
};
#[derive(Debug)]
struct User {
name: String,
age: u32,
}
// Parse directly into your types
fn parse_user(input: &str) -> IResult<&str, User> {
let (input, (name, _, age)) = tuple((
take_while(|c: char| c.is_alphabetic()),
tag(","),
map_res(digit1, |s: &str| s.parse::<u32>())
))(input)?;
Ok((input, User {
name: name.to_string(),
age,
}))
}
// Use anywhere in your code
fn process_input(input: &str) -> Result<User, String> {
parse_user(input)
.map(|(_, user)| user)
.map_err(|e| format!("Parse error: {:?}", e))
}Lalrpop requires generated code and separate grammar files:
// user.lalrpop
grammar;
use super::User;
pub User: User = {
<name:Name> "," <age:Num> => User { name, age },
};
Name: String = <s:r"[a-zA-Z]+"> => s.to_string();
Num: u32 = <s:r"[0-9]+"> => s.parse().unwrap();
// main.rs
mod user; // Generated by lalrpop
fn process_input(input: &str) -> Result<User, String> {
user::UserParser::new()
.parse(input)
.map_err(|e| format!("Parse error: {:?}", e))
}Nom requires explicit choice ordering:
use nom::{IResult, branch::alt, bytes::complete::tag};
fn ambiguous_example(input: &str) -> IResult<&str, &str> {
// Try alternatives in order; first match wins
alt((
tag("foo"),
tag("foobar"), // Never matches because "foo" matches first
))(input)
}
// Must order by specificity
fn correct_order(input: &str) -> IResult<&str, &str> {
alt((
tag("foobar"), // Try longer match first
tag("foo"),
))(input)
}Lalrpop resolves ambiguity through grammar rules:
// In .lalrpop, precedence is handled explicitly
grammar;
pub Expr: i32 = {
<l:Expr> "+" <r:Expr> => l + r,
<l:Expr> "*" <r:Expr> => l * r,
Num,
};
// Or use precedence annotations
#[precedence(level="1")]
pub Expr: i32 = {
Num,
};
#[precedence(level="2")]
#[assoc(side="left")]
pub Add: i32 = <l:Expr> "+" <r:Expr> => l + r;
#[precedence(level="3")]
#[assoc(side="left")]
pub Mul: i32 = <l:Expr> "*" <r:Expr> => l * r;Nom supports both streaming and complete parsing:
use nom::{
IResult,
bytes::streaming::tag as streaming_tag,
bytes::complete::tag as complete_tag,
};
// Streaming: Input may be incomplete
fn parse_streaming(input: &[u8]) -> IResult<&[u8], &[u8]> {
streaming_tag("hello")(input)
}
// Complete: Input is complete
fn parse_complete(input: &[u8]) -> IResult<&[u8], &[u8]> {
complete_tag("hello")(input)
}
// Streaming returns Needed when more data required
fn streaming_example() {
let partial: &[u8] = b"hel";
match streaming_tag::<_, _, nom::error::Error<&[u8]>>("hello")(partial) {
Err(nom::Err::Incomplete(needed)) => {
println!("Need more data: {:?}", needed);
}
_ => {}
}
}Lalrpop is designed for complete input:
// Lalrpop expects complete input
// No built-in streaming support
// You would need to buffer input externallyNom parsers are testable as regular functions:
use nom::{IResult, bytes::complete::tag, sequence::tuple};
fn parser(input: &str) -> IResult<&str, (&str, &str)> {
tuple((tag("hello"), tag(" world")))(input)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_parser_success() {
assert_eq!(parser("hello world"), Ok(("", ("hello", " world"))));
}
#[test]
fn test_parser_failure() {
assert!(parser("hello there").is_err());
}
#[test]
fn test_parser_partial() {
assert_eq!(parser("hello world!"), Ok(("!", ("hello", " world"))));
}
}Lalrpop tests use the generated parser:
// Tests work with the generated parser interface
#[cfg(test)]
mod tests {
// use super::generated_parser::ExprParser;
#[test]
fn test_expr() {
// assert_eq!(ExprParser::new().parse("1 + 2"), Ok(3));
}
}// Choose nom when:
// 1. You need fine-grained control over parsing
// 2. You're parsing binary formats or streaming data
// 3. You want parsers as composable functions
// 4. You need zero-allocation parsing
// 5. Your grammar is simple or unconventional
// 6. You want to integrate parsers into existing code
use nom::{
IResult,
bytes::complete::{tag, take},
number::complete::{be_u16, be_u32},
sequence::tuple,
};
// Binary protocol parsing - nom excels here
fn parse_header(input: &[u8]) -> IResult<&[u8], (u16, u32, &[u8])> {
tuple((
be_u16, // version
be_u32, // length
take(4usize), // flags
))(input)
}// Choose lalrpop when:
// 1. You have a complex grammar with precedence rules
// 2. You want grammar-based specification
// 3. Left recursion is natural for your language
// 4. You're building a programming language parser
// 5. You prefer declarative grammar definitions
// 6. The grammar is mostly fixed
// Programming language - lalrpop excels here
// .lalrpop file:
grammar;
// pub Program = Statement*;
// pub Statement = Expr ";";
// pub Expr: i32 = {
// <l:Expr> "+" <r:Expr> => l + r,
// <l:Expr> "*" <r:Expr> => l * r,
// "(" <Expr> ")",
// Num,
// };Nom and lalrpop represent two fundamentally different approaches:
Parser Combinators (Nom):
Parser Generators (Lalrpop):
Choose nom when you need composability, streaming, binary formats, or tight integration with Rust code. Choose lalrpop when you have a complex grammar with precedence rules or prefer declarative grammar specifications.
Both can achieve the same parsing goals, but the development experience differs significantly. Nom feels like writing Rust code; lalrpop feels like writing a grammar specification.