How does `nom`'s parser combinator approach differ from traditional parser generators like `lalrpop`?

Parser combinators and parser generators represent fundamentally different approaches to building parsers. Nom builds parsers by composing small functions, while lalrpop generates parser code from a grammar specification. This difference affects everything from error handling to debugging to integration with the rest of your code.

Parser Combinators: Building Parsers from Functions

Nom uses parser combinators—functions that take input and return parsed results:

use nom::{
    IResult,
    bytes::complete::tag,
    sequence::tuple,
    character::complete::digit1,
};
 
// A parser that matches a literal string
fn parse_hello(input: &str) -> IResult<&str, &str> {
    tag("hello")(input)
}
 
// A parser that matches one or more digits
fn parse_number(input: &str) -> IResult<&str, &str> {
    digit1(input)
}
 
// Composing parsers
fn parse_hello_number(input: &str) -> IResult<&str, (&str, &str)> {
    tuple((tag("hello"), digit1))(input)
}

Each parser is a regular Rust function. You compose them using combinator functions like tuple, alt, many0, etc.

Parser Generators: Grammar-Based Code Generation

Lalrpop defines a grammar that generates parser code:

// In a .lalrpop file:
grammar;
 
pub Expr: i32 = {
    <l:Expr> "+" <r:Factor> => l + r,
    <l:Expr> "-" <r:Factor> => l - r,
    Factor,
};
 
pub Factor: i32 = {
    <l:Factor> "*" <r:Term> => l * r,
    <l:Factor> "/" <r:Term> => l / r,
    Term,
};
 
pub Term: i32 = {
    Num,
    "(" <Expr> ")",
};
 
Num: i32 = <s:r"[0-9]+"> => s.parse().unwrap();

Lalrpop processes this grammar file at compile time and generates Rust code for the parser.

Fundamental Architecture Differences

// nom: Parsers are functions composed at runtime
use nom::{
    IResult,
    bytes::complete::tag,
    sequence::preceded,
    character::complete::alpha1,
};
 
fn nom_example() {
    // Build parser by combining functions
    let parser = preceded(tag("hello "), alpha1);
    
    // Parse at runtime
    let result: IResult<&str, &str> = parser("hello world");
    assert_eq!(result, Ok(("", "world")));
}
 
// lalrpop: Parsers are generated from grammar at compile time
mod generated_parser {
    // lalrpop generates this module from the .lalrpop file
    // The grammar is fixed at compile time
}
 
fn lalrpop_example() {
    use generated_parser::ExprParser;
    
    // Use the generated parser
    let result = ExprParser::new().parse("1 + 2 * 3");
    assert_eq!(result, Ok(7));
}

Nom parsers are constructed dynamically; lalrpop parsers are generated statically from a grammar.

Composability and Reusability

Nom's combinator approach enables flexible composition:

use nom::{
    IResult,
    bytes::complete::{tag, take_while},
    sequence::{tuple, preceded, delimited},
    multi::many0,
    character::complete::char,
    combinator::map,
};
 
// Small, reusable parsers
fn whitespace(input: &str) -> IResult<&str, &str> {
    take_while(|c: char| c.is_whitespace())(input)
}
 
fn identifier(input: &str) -> IResult<&str, &str> {
    take_while(|c: char| c.is_alphanumeric() || c == '_')(input)
}
 
// Compose them in different ways
fn parse_assignment(input: &str) -> IResult<&str, (&str, &str)> {
    let (input, (name, _, _, value)) = tuple((
        identifier,
        whitespace,
        tag("="),
        identifier
    ))(input)?;
    Ok((input, (name, value)))
}
 
// Same parsers, different composition
fn parse_function_call(input: &str) -> IResult<&str, (&str, Vec<&str>)> {
    let (input, (name, _, args, _)) = tuple((
        identifier,
        delimited(char('('), many0(preceded(whitespace, identifier)), char(')')),
        whitespace
    ))(input)?;
    Ok((input, (name, args)))
}
 
// Create custom combinators
fn parenthesized<'a, T>(
    inner: impl Fn(&'a str) -> IResult<&'a str, T>
) -> impl Fn(&'a str) -> IResult<&'a str, T> {
    move |input| {
        delimited(char('('), &inner, char(')'))(input)
    }
}

Lalrpop grammars are less flexible in composition:

// In .lalrpop file, you define all alternatives in one place
// Reusing parts of a grammar requires duplicating or restructuring
 
grammar;
 
// Can't easily compose these in different contexts
Id: String = <s:r"[a-zA-Z_][a-zA-Z0-9_]*"> => s.to_string();
 
Assignment: (String, String) = {
    <name:Id> "=" <value:Id> => (name, value),
};
 
FunctionCall: (String, Vec<String>) = {
    <name:Id> "(" <args:Id*> ")" => {
        let args: Vec<String> = args;
        (name, args)
    },
};

Error Handling

Nom uses Rust's Result type with detailed error information:

use nom::{
    IResult,
    error::{Error, VerboseError,VerboseErrorKind, ErrorKind},
    bytes::complete::tag,
    character::complete::digit1,
};
 
fn error_handling() {
    // Basic error type
    let result: IResult<&str, &str, Error<&str>> = tag("hello")("world");
    match result {
        Err(nom::Err::Error(e)) => {
            println!("Failed at position {}: {:?}", e.input, e.code);
        }
        _ => {}
    }
    
    // Verbose errors with context
    use nom::error::context;
    
    fn parse_with_context(input: &str) -> IResult<&str, &str, VerboseError<&str>> {
        context("expecting hello", tag("hello"))(input)
    }
    
    let result = parse_with_context("world");
    match result {
        Err(nom::Err::Error(e)) => {
            // e contains chain of contexts for debugging
            println!("Verbose error: {:?}", e);
        }
        _ => {}
    }
}

Lalrpop provides parse errors but with less flexibility:

use lalrpop_util::ParseError;
 
fn lalrpop_errors() {
    // let result = ExprParser::new().parse("1 + + 2");
    // match result {
    //     Err(ParseError::InvalidToken { location }) => { ... }
    //     Err(ParseError::UnrecognizedEOF { location, expected }) => { ... }
    //     Err(ParseError::UnrecognizedToken { token, expected }) => { ... }
    //     Err(ParseError::ExtraToken { token }) => { ... }
    //     Ok(_) => {}
    // }
}

Performance Characteristics

// nom: Zero-allocation parsing where possible
use nom::{
    IResult,
    bytes::complete::take_while,
    character::complete::digit1,
};
 
fn nom_zero_alloc(input: &str) -> IResult<&str, &str> {
    // Returns a slice into the input, no allocation
    take_while(|c: char| c.is_alphabetic())(input)
}
 
fn nom_with_allocation(input: &str) -> IResult<&str, String> {
    use nom::combinator::map;
    map(take_while(|c: char| c.is_alphabetic()), |s: &str| s.to_string())(input)
}
 
// lalrpop: Generates an LR parser
// - Builds parse tables at compile time
// - Stack-based parsing at runtime
// - Allocates for AST construction
// - Typically efficient for complex grammars

Nom gives you control over allocations; lalrpop handles it automatically.

Parsing Strategies: Recursive Descent vs LR

// nom: Recursive descent (LL-style)
use nom::{
    IResult,
    bytes::complete::tag,
    character::complete::digit1,
    sequence::tuple,
    branch::alt,
};
 
fn parse_expr(input: &str) -> IResult<&str, i32> {
    // Left recursion causes infinite loops in recursive descent
    // WRONG:
    // fn expr(input: &str) -> IResult<&str, i32> {
    //     alt((
    //         tuple((expr, tag("+"), term)),  // Infinite recursion!
    //         term
    //     ))(input)
    // }
    
    // Must rewrite to avoid left recursion
    parse_additive(input)
}
 
fn parse_additive(input: &str) -> IResult<&str, i32> {
    // Parse term, then optionally more terms with operators
    let (input, mut result) = parse_term(input)?;
    
    let mut input = input;
    loop {
        let (remaining, _) = nom::character::complete::multispace0(input)?;
        if let Ok((rem, _)) = tag::<_, _, nom::error::Error<&str>>("+")(remaining) {
            let (rem, rhs) = parse_term(rem)?;
            result += rhs;
            input = rem;
        } else if let Ok((rem, _)) = tag::<_, _, nom::error::Error<&str>>("-")(remaining) {
            let (rem, rhs) = parse_term(rem)?;
            result -= rhs;
            input = rem;
        } else {
            break;
        }
    }
    
    Ok((input, result))
}
 
fn parse_term(input: &str) -> IResult<&str, i32> {
    // Similar pattern for multiplication/division
    let (input, mut result) = parse_factor(input)?;
    
    let mut input = input;
    loop {
        let (remaining, _) = nom::character::complete::multispace0(input)?;
        if let Ok((rem, _)) = tag::<_, _, nom::error::Error<&str>>("*")(remaining) {
            let (rem, rhs) = parse_factor(rem)?;
            result *= rhs;
            input = rem;
        } else if let Ok((rem, _)) = tag::<_, _, nom::error::Error<&str>>("/")(remaining) {
            let (rem, rhs) = parse_factor(rem)?;
            result /= rhs;
            input = rem;
        } else {
            break;
        }
    }
    
    Ok((input, result))
}
 
fn parse_factor(input: &str) -> IResult<&str, i32> {
    alt((
        nom::combinator::map_res(digit1, |s: &str| s.parse::<i32>()),
        nom::sequence::delimited(
            tag("("),
            parse_expr,
            tag(")")
        )
    ))(input)
}

Lalrpop handles left recursion naturally:

// In .lalrpop - left recursion is fine!
grammar;
 
pub Expr: i32 = {
    <l:Expr> "+" <r:Term> => l + r,  // Left recursion works
    <l:Expr> "-" <r:Term> => l - r,
    Term,
};
 
pub Term: i32 = {
    <l:Term> "*" <r:Factor> => l * r,
    <l:Term> "/" <r:Factor> => l / r,
    Factor,
};
 
pub Factor: i32 = {
    Num,
    "(" <Expr> ")",
};
 
Num: i32 = <s:r"[0-9]+"> => s.parse().unwrap();

Integration with Rust Code

Nom parsers blend naturally with Rust:

use nom::{
    IResult,
    bytes::complete::{tag, take_while},
    character::complete::digit1,
    combinator::map_res,
    sequence::tuple,
};
 
#[derive(Debug)]
struct User {
    name: String,
    age: u32,
}
 
// Parse directly into your types
fn parse_user(input: &str) -> IResult<&str, User> {
    let (input, (name, _, age)) = tuple((
        take_while(|c: char| c.is_alphabetic()),
        tag(","),
        map_res(digit1, |s: &str| s.parse::<u32>())
    ))(input)?;
    
    Ok((input, User {
        name: name.to_string(),
        age,
    }))
}
 
// Use anywhere in your code
fn process_input(input: &str) -> Result<User, String> {
    parse_user(input)
        .map(|(_, user)| user)
        .map_err(|e| format!("Parse error: {:?}", e))
}

Lalrpop requires generated code and separate grammar files:

// user.lalrpop
grammar;
use super::User;
 
pub User: User = {
    <name:Name> "," <age:Num> => User { name, age },
};
 
Name: String = <s:r"[a-zA-Z]+"> => s.to_string();
Num: u32 = <s:r"[0-9]+"> => s.parse().unwrap();
 
// main.rs
mod user;  // Generated by lalrpop
 
fn process_input(input: &str) -> Result<User, String> {
    user::UserParser::new()
        .parse(input)
        .map_err(|e| format!("Parse error: {:?}", e))
}

Handling Ambiguity

Nom requires explicit choice ordering:

use nom::{IResult, branch::alt, bytes::complete::tag};
 
fn ambiguous_example(input: &str) -> IResult<&str, &str> {
    // Try alternatives in order; first match wins
    alt((
        tag("foo"),
        tag("foobar"),  // Never matches because "foo" matches first
    ))(input)
}
 
// Must order by specificity
fn correct_order(input: &str) -> IResult<&str, &str> {
    alt((
        tag("foobar"),  // Try longer match first
        tag("foo"),
    ))(input)
}

Lalrpop resolves ambiguity through grammar rules:

// In .lalrpop, precedence is handled explicitly
grammar;
 
pub Expr: i32 = {
    <l:Expr> "+" <r:Expr> => l + r,
    <l:Expr> "*" <r:Expr> => l * r,
    Num,
};
 
// Or use precedence annotations
#[precedence(level="1")]
pub Expr: i32 = {
    Num,
};
 
#[precedence(level="2")]
#[assoc(side="left")]
pub Add: i32 = <l:Expr> "+" <r:Expr> => l + r;
 
#[precedence(level="3")]
#[assoc(side="left")]
pub Mul: i32 = <l:Expr> "*" <r:Expr> => l * r;

Streaming vs Complete Input

Nom supports both streaming and complete parsing:

use nom::{
    IResult,
    bytes::streaming::tag as streaming_tag,
    bytes::complete::tag as complete_tag,
};
 
// Streaming: Input may be incomplete
fn parse_streaming(input: &[u8]) -> IResult<&[u8], &[u8]> {
    streaming_tag("hello")(input)
}
 
// Complete: Input is complete
fn parse_complete(input: &[u8]) -> IResult<&[u8], &[u8]> {
    complete_tag("hello")(input)
}
 
// Streaming returns Needed when more data required
fn streaming_example() {
    let partial: &[u8] = b"hel";
    match streaming_tag::<_, _, nom::error::Error<&[u8]>>("hello")(partial) {
        Err(nom::Err::Incomplete(needed)) => {
            println!("Need more data: {:?}", needed);
        }
        _ => {}
    }
}

Lalrpop is designed for complete input:

// Lalrpop expects complete input
// No built-in streaming support
// You would need to buffer input externally

Testing and Debugging

Nom parsers are testable as regular functions:

use nom::{IResult, bytes::complete::tag, sequence::tuple};
 
fn parser(input: &str) -> IResult<&str, (&str, &str)> {
    tuple((tag("hello"), tag(" world")))(input)
}
 
#[cfg(test)]
mod tests {
    use super::*;
    
    #[test]
    fn test_parser_success() {
        assert_eq!(parser("hello world"), Ok(("", ("hello", " world"))));
    }
    
    #[test]
    fn test_parser_failure() {
        assert!(parser("hello there").is_err());
    }
    
    #[test]
    fn test_parser_partial() {
        assert_eq!(parser("hello world!"), Ok(("!", ("hello", " world"))));
    }
}

Lalrpop tests use the generated parser:

// Tests work with the generated parser interface
#[cfg(test)]
mod tests {
    // use super::generated_parser::ExprParser;
    
    #[test]
    fn test_expr() {
        // assert_eq!(ExprParser::new().parse("1 + 2"), Ok(3));
    }
}

When to Choose Nom

// Choose nom when:
// 1. You need fine-grained control over parsing
// 2. You're parsing binary formats or streaming data
// 3. You want parsers as composable functions
// 4. You need zero-allocation parsing
// 5. Your grammar is simple or unconventional
// 6. You want to integrate parsers into existing code
 
use nom::{
    IResult,
    bytes::complete::{tag, take},
    number::complete::{be_u16, be_u32},
    sequence::tuple,
};
 
// Binary protocol parsing - nom excels here
fn parse_header(input: &[u8]) -> IResult<&[u8], (u16, u32, &[u8])> {
    tuple((
        be_u16,           // version
        be_u32,           // length
        take(4usize),     // flags
    ))(input)
}

When to Choose Lalrpop

// Choose lalrpop when:
// 1. You have a complex grammar with precedence rules
// 2. You want grammar-based specification
// 3. Left recursion is natural for your language
// 4. You're building a programming language parser
// 5. You prefer declarative grammar definitions
// 6. The grammar is mostly fixed
 
// Programming language - lalrpop excels here
// .lalrpop file:
grammar;
// pub Program = Statement*;
// pub Statement = Expr ";";
// pub Expr: i32 = {
//     <l:Expr> "+" <r:Expr> => l + r,
//     <l:Expr> "*" <r:Expr> => l * r,
//     "(" <Expr> ")",
//     Num,
// };

Synthesis

Nom and lalrpop represent two fundamentally different approaches:

Parser Combinators (Nom):

Parsers are functions composed at runtime
Recursive descent parsing (LL-style)
No left recursion; must transform grammars
Fine-grained control over error handling
Zero-allocation parsing possible
Natural integration with Rust code
Excellent for binary formats and streaming
Steeper learning curve for complex grammars

Parser Generators (Lalrpop):

Grammars generate parsers at compile time
LR parsing (handles left recursion naturally)
Declarative grammar specification
Automatic precedence handling
Separate grammar files
Generated code interface
Excellent for programming languages and DSLs
Clearer grammar structure for complex languages

Choose nom when you need composability, streaming, binary formats, or tight integration with Rust code. Choose lalrpop when you have a complex grammar with precedence rules or prefer declarative grammar specifications.

Both can achieve the same parsing goals, but the development experience differs significantly. Nom feels like writing Rust code; lalrpop feels like writing a grammar specification.

How does nom's parser combinator approach differ from traditional parser generators like lalrpop?