What are the trade-offs between `nom::bytes::complete::take` and `take_until` for consuming input without parsing?

take consumes a fixed number of bytes specified at parse time, while take_until consumes bytes until it finds a specific delimiter pattern. Both extract raw input without interpreting its meaning, but they differ fundamentally in how they determine what to consume: take requires knowing the exact length beforehand, making it suitable for length-prefixed data, while take_until searches for a delimiter, making it ideal for terminated data like strings delimited by specific bytes or patterns.

Basic take Usage

use nom::bytes::complete::take;
use nom::IResult;
 
fn basic_take() {
    let input = b"Hello, World!";
    
    // take(n) consumes exactly n bytes
    let result: IResult<&[u8], &[u8]> = take(5u8)(input);
    //           count ^    (can be usize or a type implementing InputLength)
    
    match result {
        Ok((remaining, consumed)) => {
            assert_eq!(consumed, b"Hello");
            assert_eq!(remaining, b", World!");
        }
        Err(e) => panic!("Parse failed: {:?}", e),
    }
}
 
fn take_with_usize() {
    let input = b"abcdefghij";
    
    // Can specify count as usize
    let result = take(3usize)(input);
    // remaining: "defghij", consumed: "abc"
    
    // Or use a type that implements ToUsize
    let result = take(7u8)(input);
    // remaining: "ij", consumed: "abcdefgh"
}

take is straightforward: specify the count, get exactly that many bytes.

Basic take_until Usage

use nom::bytes::complete::take_until;
use nom::IResult;
 
fn basic_take_until() {
    let input = b"Hello, World!";
    
    // take_until searches for a delimiter
    let result: IResult<&[u8], &[u8]> = take_until(",")(input);
    //                          delimiter ^
    
    match result {
        Ok((remaining, consumed)) => {
            assert_eq!(consumed, b"Hello");
            assert_eq!(remaining, b", World!");  // Delimiter is NOT consumed
        }
        Err(e) => panic!("Parse failed: {:?}", e),
    }
}
 
fn take_until_delimiter_not_consumed() {
    let input = b"key:value\nnext";
    
    let result = take_until(":")(input);
    // consumed: "key"
    // remaining: ":value\nnext"  <- colon is still there
    
    let result = take_until("\n")(input);
    // consumed: "key:value"
    // remaining: "\nnext"  <- newline is still there
}

take_until finds a delimiter and stops before it—the delimiter remains in the input.

Key Behavioral Differences

use nom::bytes::complete::{take, take_until};
use nom::IResult;
 
fn comparison() {
    // ┌─────────────────────────────────────────────────────────────────────┐
    // │ Aspect              │ take              │ take_until               │
    // ├─────────────────────────────────────────────────────────────────────┤
    // │ What to consume     │ Fixed byte count  │ Until delimiter found    │
    // │ Input requirement   │ Must have n bytes  │ Must find delimiter     │
    // │ Delimiter handling  │ N/A               │ Delimiter NOT consumed  │
    // │ Search behavior     │ None (direct slice│ Scans for delimiter     │
    // │ Failure mode        │ Not enough input  │ Delimiter not found     │
    // │ Complexity          │ O(1)              │ O(n) linear scan        │
    // └─────────────────────────────────────────────────────────────────────┘
    
    let input = b"abcdefghij";
    
    // take: O(1) operation
    let result = take(5usize)(input);  // Just slices
    
    // take_until: O(n) operation
    let result = take_until("f")(input);  // Scans until 'f' found
}
 
fn failure_modes() {
    // take fails when input is too short
    let input = b"abc";
    let result = take(5usize)(input);
    assert!(result.is_err());  // Need 5 bytes, only have 3
    
    // take_until fails when delimiter not found
    let input = b"abcdefghij";
    let result = take_until("xyz")(input);
    assert!(result.is_err());  // 'xyz' never appears
}

The complexity difference matters for performance-critical parsing.

Length-Prefixed Data with take

use nom::bytes::complete::take;
use nom::number::complete::{be_u16, le_u32};
use nom::sequence::tuple;
use nom::IResult;
 
fn parse_length_prefixed_string(input: &[u8]) -> IResult<&[u8], &str> {
    // Common pattern: length followed by data
    let (remaining, length) = be_u16(input)?;
    let (remaining, data) = take(length as usize)(remaining)?;
    
    // Convert to string (assuming valid UTF-8 for this example)
    let text = std::str::from_utf8(data).unwrap();
    Ok((remaining, text))
}
 
fn parse_pascal_string(input: &[u8]) -> IResult<&[u8], &str> {
    // Pascal-style string: 1-byte length prefix
    let (remaining, length) = take(1usize)(input)?;
    let length = length[0] as usize;
    
    let (remaining, data) = take(length)(remaining)?;
    Ok((remaining, std::str::from_utf8(data).unwrap()))
}
 
fn parse_netstring(input: &[u8]) -> IResult<&[u8], &[u8]> {
    // Netstring format: "length:data,"
    // Parse the length (simplified - assumes single digit)
    let (remaining, len_byte) = take(1usize)(input)?;
    let length = (len_byte[0] - b'0') as usize;
    
    // Skip the colon
    let (remaining, _) = take(1usize)(remaining)?;
    
    // Take the data
    let (remaining, data) = take(length)(remaining)?;
    
    // Skip the trailing comma
    let (remaining, _) = take(1usize)(remaining)?;
    
    Ok((remaining, data))
}
 
fn length_prefixed_example() {
    // Binary protocol: 2-byte big-endian length + data
    let input = b"\x00\x05HelloWorld";
    
    let (remaining, text) = parse_length_prefixed_string(input).unwrap();
    assert_eq!(text, "Hello");
    assert_eq!(remaining, b"World");
}

take excels when you know the exact length to consume, typically from a preceding length field.

Delimiter-Terminated Data with take_until

use nom::bytes::complete::take_until;
use nom::character::complete::char;
use nom::sequence::preceded;
use nom::IResult;
 
fn parse_until_delimiter(input: &[u8]) -> IResult<&[u8], &[u8]> {
    take_until("\x00")(input)  // Null-terminated string
}
 
fn parse_c_string(input: &[u8]) -> IResult<&[u8], &str> {
    let (remaining, content) = take_until("\x00")(input)?;
    let (remaining, _) = char('\0')(remaining)?;  // Consume the null byte
    
    Ok((remaining, std::str::from_utf8(content).unwrap()))
}
 
fn parse_csv_field(input: &[u8]) -> IResult<&[u8], &str> {
    // Field ends at comma or newline
    let (remaining, field) = take_until(",")(input)?;
    Ok((remaining, std::str::from_utf8(field).unwrap()))
}
 
fn parse_key_value(input: &[u8]) -> IResult<&[u8], (&str, &str)> {
    // Parse "key=value" format
    let (remaining, key) = take_until("=")(input)?;
    let (remaining, _) = char('=')(remaining)?;
    let (remaining, value) = take_until("\n")(remaining)?;
    let (remaining, _) = char('\n')(remaining)?;
    
    Ok((remaining, (std::str::from_utf8(key).unwrap(), std::str::from_utf8(value).unwrap())))
}
 
fn delimited_example() {
    let input = b"name=Alice\nage=30\n";
    
    let (remaining, (key, value)) = parse_key_value(input).unwrap();
    assert_eq!(key, "name");
    assert_eq!(value, "Alice");
}

take_until excels when data ends at a specific delimiter rather than having a known length.

Combining with Other Parsers

use nom::bytes::complete::{take, take_until};
use nom::sequence::{tuple, preceded};
use nom::character::complete::char;
use nom::IResult;
 
fn parse_record_length_prefixed(input: &[u8]) -> IResult<&[u8], (&str, u32)> {
    // Record format: [2-byte length][name][4-byte id]
    let (remaining, len) = take(2usize)(input)?;
    let len = u16::from_be_bytes([len[0], len[1]]) as usize;
    
    let (remaining, name) = take(len)(remaining)?;
    let name = std::str::from_utf8(name).unwrap();
    
    let (remaining, id_bytes) = take(4usize)(remaining)?;
    let id = u32::from_be_bytes([id_bytes[0], id_bytes[1], id_bytes[2], id_bytes[3]]);
    
    Ok((remaining, (name, id)))
}
 
fn parse_record_delimited(input: &[u8]) -> IResult<&[u8], (&str, &str)> {
    // Record format: name:value\n
    let (remaining, name) = take_until(":")(input)?;
    let (remaining, _) = char(':')(remaining)?;
    let (remaining, value) = take_until("\n")(remaining)?;
    let (remaining, _) = char('\n')(remaining)?;
    
    Ok((remaining, (std::str::from_utf8(name).unwrap(), std::str::from_utf8(value).unwrap())))
}
 
fn parse_http_header_line(input: &[u8]) -> IResult<&[u8], (&str, &str)> {
    // HTTP header: "Name: Value\r\n"
    let (remaining, name) = take_until(":")(input)?;
    let (remaining, _) = char(':')(remaining)?;
    let (remaining, _) = char(' ')(remaining)?;
    let (remaining, value) = take_until("\r\n")(remaining)?;
    let (remaining, _) = take(2usize)(remaining)?;
    
    Ok((remaining, (std::str::from_utf8(name).unwrap(), std::str::from_utf8(value).unwrap())))
}

Both combinators integrate well with other nom parsers for building complex parsers.

Performance Considerations

use nom::bytes::complete::{take, take_until};
 
fn performance_comparison() {
    // ┌─────────────────────────────────────────────────────────────────────┐
    // │ Operation    │ Complexity │ When to use                           │
    // ├─────────────────────────────────────────────────────────────────────┤
    // │ take(n)      │ O(1)      │ Length known at parse time           │
    // │ take_until   │ O(n)      │ Searching for delimiter in input     │
    // └─────────────────────────────────────────────────────────────────────┘
    
    let large_input = &[0u8; 1_000_000];
    
    // take(1000000) - constant time
    // Just pointer arithmetic
    let result = take(1000000usize)(large_input);
    
    // take_until("marker") - scans entire input
    // Linear search through all bytes
    let result = take_until("marker")(large_input);
    // If "marker" is at the end or not found, scans all 1M bytes
    
    // Recommendation: Use take when possible
}
 
fn length_prefix_is_efficient() {
    // Length-prefixed formats are more efficient for parsers
    let input = b"\x00\x05HelloWorldMoreData";
    
    // Take exactly 5 bytes - O(1)
    let (remaining, data) = take(5usize)(&input[2..]).unwrap();
    
    // Versus searching for delimiter - O(n)
    // Would need to scan to find "World" or whatever delimiter
}

Use take when you know the length; it's always O(1) regardless of input size.

When take_until Finds the Delimiter at Start

use nom::bytes::complete::take_until;
use nom::IResult;
 
fn empty_consume() {
    // If delimiter is at the very start, take_until returns empty slice
    let input = b":value";
    let result: IResult<&[u8], &[u8]> = take_until(":")(input);
    
    match result {
        Ok((remaining, consumed)) => {
            assert_eq!(consumed, b"");  // Empty - nothing before delimiter
            assert_eq!(remaining, b":value");  // Delimiter is first
        }
        Err(_) => panic!("Should succeed"),
    }
    
    // This is useful for optional content before delimiter
}
 
fn handling_empty_fields() {
    // CSV: "Alice,,30" - empty field between commas
    let input = b"Alice,,30";
    
    let (remaining, field1) = take_until(",")(input).unwrap();
    assert_eq!(field1, b"Alice");
    
    let (remaining, _) = take(1usize)(remaining).unwrap();  // Skip comma
    
    let (remaining, field2) = take_until(",")(remaining).unwrap();
    assert_eq!(field2, b"");  // Empty field!
    
    let (remaining, _) = take(1usize)(remaining).unwrap();  // Skip comma
    
    let (remaining, field3) = take_until("\n")(remaining).unwrap();
    assert_eq!(field3, b"30");
}

take_until returns an empty slice when the delimiter immediately follows the current position.

Multi-byte Delimiters

use nom::bytes::complete::take_until;
use nom::IResult;
 
fn multi_byte_delimiter() {
    // take_until works with multi-byte delimiters
    let input = b"Hello, World!";
    
    let result = take_until(", W")(input);
    // consumed: "Hello"
    // remaining: ", World!"
    
    let input = b"function(arg1, arg2) { body }";
    let result = take_until("function")(input);
    // consumed: "" (delimiter at start)
    // remaining: "function(arg1, arg2) { body }"
}
 
fn parse_until_marker() {
    // Common pattern: parse until a specific marker
    let input = b"---START---data---END---more";
    
    let (remaining, _) = take_until("---START---")(input).unwrap();
    let (remaining, _) = take(11usize)(remaining).unwrap();  // Skip "---START---"
    
    let (remaining, data) = take_until("---END---")(remaining).unwrap();
    assert_eq!(data, b"data");
}

take_until handles delimiters of any length, searching for the complete pattern.

Error Handling

use nom::bytes::complete::{take, take_until};
use nom::error::{Error, ErrorKind};
 
fn error_examples() {
    // take error: not enough input
    let input = b"abc";
    let result = take(10usize)(input);
    
    match result {
        Err(nom::Err::Error(Error { input: err_input, code })) => {
            assert_eq!(err_input, b"abc");
            assert_eq!(code, ErrorKind::Eof);  // Or Take
        }
        _ => unreachable!(),
    }
    
    // take_until error: delimiter not found
    let input = b"abcdefghij";
    let result = take_until("xyz")(input);
    
    match result {
        Err(nom::Err::Error(Error { input: err_input, code })) => {
            // Consumed entire input looking for delimiter
            assert_eq!(code, ErrorKind::TakeUntil);
        }
        _ => unreachable!(),
    }
}
 
fn safe_parsing_with_take_until() {
    fn parse_field(input: &[u8]) -> Option<&str> {
        take_until(",")(input)
            .ok()
            .map(|(_, field)| std::str::from_utf8(field).unwrap())
    }
    
    assert_eq!(parse_field(b"hello,world"), Some("hello"));
    assert_eq!(parse_field(b"hello"), None);  // No delimiter found
}

Both combinators return errors when their requirements aren't met—insufficient input for take, missing delimiter for take_until.

Practical Parser Example

use nom::bytes::complete::{take, take_until};
use nom::character::complete::char;
use nom::number::complete::be_u32;
use nom::sequence::tuple;
use nom::multi::many0;
use nom::IResult;
 
// Example: Parse a simple binary protocol
// Format: [4-byte count][count strings]
// Each string: [2-byte length][data]
 
#[derive(Debug)]
struct ProtocolMessage {
    count: u32,
    strings: Vec<String>,
}
 
fn parse_length_prefixed_string(input: &[u8]) -> IResult<&[u8], String> {
    let (remaining, length) = be_u32(input)?;
    let (remaining, data) = take(length as usize)(remaining)?;
    Ok((remaining, String::from_utf8_lossy(data).into_owned()))
}
 
fn parse_protocol_message(input: &[u8]) -> IResult<&[u8], ProtocolMessage> {
    let (remaining, count) = be_u32(input)?;
    
    let mut strings = Vec::new();
    let mut remaining = remaining;
    
    for _ in 0..count {
        let (rem, s) = parse_length_prefixed_string(remaining)?;
        strings.push(s);
        remaining = rem;
    }
    
    Ok((remaining, ProtocolMessage { count, strings }))
}
 
// Example: Parse a text-based protocol
// Format: lines ending with \r\n, empty line terminates
 
fn parse_text_line(input: &[u8]) -> IResult<&[u8], &str> {
    let (remaining, line) = take_until("\r\n")(input)?;
    let (remaining, _) = take(2usize)(remaining)?;  // Consume \r\n
    Ok((remaining, std::str::from_utf8(line).unwrap()))
}
 
fn parse_text_message(input: &[u8]) -> IResult<&[u8], Vec<&str>> {
    let mut lines = Vec::new();
    let mut remaining = input;
    
    loop {
        // Check for terminator (empty line)
        if remaining.starts_with(b"\r\n") {
            remaining = &remaining[2..];
            break;
        }
        
        let (rem, line) = parse_text_line(remaining)?;
        lines.push(line);
        remaining = rem;
    }
    
    Ok((remaining, lines))
}

These examples show typical usage: take for binary formats with length prefixes, take_until for text formats with delimiters.

Trade-offs Summary

use nom::bytes::complete::{take, take_until};
 
fn complete_guide_summary() {
    // ┌─────────────────────────────────────────────────────────────────────────┐
    // │ Aspect                 │ take                    │ take_until           │
    // ├─────────────────────────────────────────────────────────────────────────┤
    // │ Determination          │ Length from argument    │ Find delimiter      │
    // │ Complexity             │ O(1)                    │ O(n)                │
    // │ Use when                │ Length known            │ Delimiter marks end │
    // │ Input requirement       │ At least n bytes        │ Contains delimiter  │
    // │ Consumes delimiter      │ N/A                     │ No                  │
    // │ Error on failure        │ Need more input         │ No delimiter found │
    // │ Binary format fit       │ Excellent               │ Poor               │
    // │ Text format fit         │ Poor                    │ Excellent          │
    // └─────────────────────────────────────────────────────────────────────────┘
    
    // Use take when:
    // - Format has length prefix
    // - Fixed-size fields
    // - You know exact count at parse time
    // - Performance matters (O(1))
    
    // Use take_until when:
    // - Data ends with delimiter
    // - Variable-length text fields
    // - Common delimiters (null, newline, comma)
    // - Human-readable formats
}
 
// Key insight:
// - take: "I need exactly N bytes" - parser controls count
// - take_until: "I need everything until this pattern" - input controls count
//
// The choice depends on your data format:
// - Binary protocols often use length prefixes -> use take
// - Text protocols often use delimiters -> use take_until
//
// Remember: take_until doesn't consume the delimiter!
// You typically need another parser to consume it afterward.

Key insight: take and take_until represent two fundamental approaches to consuming input: known-length extraction vs. delimiter-based extraction. take is O(1) and ideal for binary protocols where lengths are known (length-prefixed data, fixed-size fields). take_until is O(n) and ideal for text protocols where data ends at a marker (null-terminated strings, CSV fields, HTTP headers). The critical behavioral difference is that take_until leaves the delimiter in the remaining input—you must explicitly consume it with another parser. When designing protocols, prefer length prefixes for efficiency; when parsing human-readable formats, delimiters are more practical and take_until is the right tool.

What are the trade-offs between nom::bytes::complete::take and take_until for consuming input without parsing?