What are the trade-offs between nom::bytes::complete::take and take_until for consuming input without parsing?
take consumes a fixed number of bytes specified at parse time, while take_until consumes bytes until it finds a specific delimiter pattern. Both extract raw input without interpreting its meaning, but they differ fundamentally in how they determine what to consume: take requires knowing the exact length beforehand, making it suitable for length-prefixed data, while take_until searches for a delimiter, making it ideal for terminated data like strings delimited by specific bytes or patterns.
Basic take Usage
use nom::bytes::complete::take;
use nom::IResult;
fn basic_take() {
let input = b"Hello, World!";
// take(n) consumes exactly n bytes
let result: IResult<&[u8], &[u8]> = take(5u8)(input);
// count ^ (can be usize or a type implementing InputLength)
match result {
Ok((remaining, consumed)) => {
assert_eq!(consumed, b"Hello");
assert_eq!(remaining, b", World!");
}
Err(e) => panic!("Parse failed: {:?}", e),
}
}
fn take_with_usize() {
let input = b"abcdefghij";
// Can specify count as usize
let result = take(3usize)(input);
// remaining: "defghij", consumed: "abc"
// Or use a type that implements ToUsize
let result = take(7u8)(input);
// remaining: "ij", consumed: "abcdefgh"
}take is straightforward: specify the count, get exactly that many bytes.
Basic take_until Usage
use nom::bytes::complete::take_until;
use nom::IResult;
fn basic_take_until() {
let input = b"Hello, World!";
// take_until searches for a delimiter
let result: IResult<&[u8], &[u8]> = take_until(",")(input);
// delimiter ^
match result {
Ok((remaining, consumed)) => {
assert_eq!(consumed, b"Hello");
assert_eq!(remaining, b", World!"); // Delimiter is NOT consumed
}
Err(e) => panic!("Parse failed: {:?}", e),
}
}
fn take_until_delimiter_not_consumed() {
let input = b"key:value\nnext";
let result = take_until(":")(input);
// consumed: "key"
// remaining: ":value\nnext" <- colon is still there
let result = take_until("\n")(input);
// consumed: "key:value"
// remaining: "\nnext" <- newline is still there
}take_until finds a delimiter and stops before itβthe delimiter remains in the input.
Key Behavioral Differences
use nom::bytes::complete::{take, take_until};
use nom::IResult;
fn comparison() {
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
// β Aspect β take β take_until β
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
// β What to consume β Fixed byte count β Until delimiter found β
// β Input requirement β Must have n bytes β Must find delimiter β
// β Delimiter handling β N/A β Delimiter NOT consumed β
// β Search behavior β None (direct sliceβ Scans for delimiter β
// β Failure mode β Not enough input β Delimiter not found β
// β Complexity β O(1) β O(n) linear scan β
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
let input = b"abcdefghij";
// take: O(1) operation
let result = take(5usize)(input); // Just slices
// take_until: O(n) operation
let result = take_until("f")(input); // Scans until 'f' found
}
fn failure_modes() {
// take fails when input is too short
let input = b"abc";
let result = take(5usize)(input);
assert!(result.is_err()); // Need 5 bytes, only have 3
// take_until fails when delimiter not found
let input = b"abcdefghij";
let result = take_until("xyz")(input);
assert!(result.is_err()); // 'xyz' never appears
}The complexity difference matters for performance-critical parsing.
Length-Prefixed Data with take
use nom::bytes::complete::take;
use nom::number::complete::{be_u16, le_u32};
use nom::sequence::tuple;
use nom::IResult;
fn parse_length_prefixed_string(input: &[u8]) -> IResult<&[u8], &str> {
// Common pattern: length followed by data
let (remaining, length) = be_u16(input)?;
let (remaining, data) = take(length as usize)(remaining)?;
// Convert to string (assuming valid UTF-8 for this example)
let text = std::str::from_utf8(data).unwrap();
Ok((remaining, text))
}
fn parse_pascal_string(input: &[u8]) -> IResult<&[u8], &str> {
// Pascal-style string: 1-byte length prefix
let (remaining, length) = take(1usize)(input)?;
let length = length[0] as usize;
let (remaining, data) = take(length)(remaining)?;
Ok((remaining, std::str::from_utf8(data).unwrap()))
}
fn parse_netstring(input: &[u8]) -> IResult<&[u8], &[u8]> {
// Netstring format: "length:data,"
// Parse the length (simplified - assumes single digit)
let (remaining, len_byte) = take(1usize)(input)?;
let length = (len_byte[0] - b'0') as usize;
// Skip the colon
let (remaining, _) = take(1usize)(remaining)?;
// Take the data
let (remaining, data) = take(length)(remaining)?;
// Skip the trailing comma
let (remaining, _) = take(1usize)(remaining)?;
Ok((remaining, data))
}
fn length_prefixed_example() {
// Binary protocol: 2-byte big-endian length + data
let input = b"\x00\x05HelloWorld";
let (remaining, text) = parse_length_prefixed_string(input).unwrap();
assert_eq!(text, "Hello");
assert_eq!(remaining, b"World");
}take excels when you know the exact length to consume, typically from a preceding length field.
Delimiter-Terminated Data with take_until
use nom::bytes::complete::take_until;
use nom::character::complete::char;
use nom::sequence::preceded;
use nom::IResult;
fn parse_until_delimiter(input: &[u8]) -> IResult<&[u8], &[u8]> {
take_until("\x00")(input) // Null-terminated string
}
fn parse_c_string(input: &[u8]) -> IResult<&[u8], &str> {
let (remaining, content) = take_until("\x00")(input)?;
let (remaining, _) = char('\0')(remaining)?; // Consume the null byte
Ok((remaining, std::str::from_utf8(content).unwrap()))
}
fn parse_csv_field(input: &[u8]) -> IResult<&[u8], &str> {
// Field ends at comma or newline
let (remaining, field) = take_until(",")(input)?;
Ok((remaining, std::str::from_utf8(field).unwrap()))
}
fn parse_key_value(input: &[u8]) -> IResult<&[u8], (&str, &str)> {
// Parse "key=value" format
let (remaining, key) = take_until("=")(input)?;
let (remaining, _) = char('=')(remaining)?;
let (remaining, value) = take_until("\n")(remaining)?;
let (remaining, _) = char('\n')(remaining)?;
Ok((remaining, (std::str::from_utf8(key).unwrap(), std::str::from_utf8(value).unwrap())))
}
fn delimited_example() {
let input = b"name=Alice\nage=30\n";
let (remaining, (key, value)) = parse_key_value(input).unwrap();
assert_eq!(key, "name");
assert_eq!(value, "Alice");
}take_until excels when data ends at a specific delimiter rather than having a known length.
Combining with Other Parsers
use nom::bytes::complete::{take, take_until};
use nom::sequence::{tuple, preceded};
use nom::character::complete::char;
use nom::IResult;
fn parse_record_length_prefixed(input: &[u8]) -> IResult<&[u8], (&str, u32)> {
// Record format: [2-byte length][name][4-byte id]
let (remaining, len) = take(2usize)(input)?;
let len = u16::from_be_bytes([len[0], len[1]]) as usize;
let (remaining, name) = take(len)(remaining)?;
let name = std::str::from_utf8(name).unwrap();
let (remaining, id_bytes) = take(4usize)(remaining)?;
let id = u32::from_be_bytes([id_bytes[0], id_bytes[1], id_bytes[2], id_bytes[3]]);
Ok((remaining, (name, id)))
}
fn parse_record_delimited(input: &[u8]) -> IResult<&[u8], (&str, &str)> {
// Record format: name:value\n
let (remaining, name) = take_until(":")(input)?;
let (remaining, _) = char(':')(remaining)?;
let (remaining, value) = take_until("\n")(remaining)?;
let (remaining, _) = char('\n')(remaining)?;
Ok((remaining, (std::str::from_utf8(name).unwrap(), std::str::from_utf8(value).unwrap())))
}
fn parse_http_header_line(input: &[u8]) -> IResult<&[u8], (&str, &str)> {
// HTTP header: "Name: Value\r\n"
let (remaining, name) = take_until(":")(input)?;
let (remaining, _) = char(':')(remaining)?;
let (remaining, _) = char(' ')(remaining)?;
let (remaining, value) = take_until("\r\n")(remaining)?;
let (remaining, _) = take(2usize)(remaining)?;
Ok((remaining, (std::str::from_utf8(name).unwrap(), std::str::from_utf8(value).unwrap())))
}Both combinators integrate well with other nom parsers for building complex parsers.
Performance Considerations
use nom::bytes::complete::{take, take_until};
fn performance_comparison() {
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
// β Operation β Complexity β When to use β
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
// β take(n) β O(1) β Length known at parse time β
// β take_until β O(n) β Searching for delimiter in input β
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
let large_input = &[0u8; 1_000_000];
// take(1000000) - constant time
// Just pointer arithmetic
let result = take(1000000usize)(large_input);
// take_until("marker") - scans entire input
// Linear search through all bytes
let result = take_until("marker")(large_input);
// If "marker" is at the end or not found, scans all 1M bytes
// Recommendation: Use take when possible
}
fn length_prefix_is_efficient() {
// Length-prefixed formats are more efficient for parsers
let input = b"\x00\x05HelloWorldMoreData";
// Take exactly 5 bytes - O(1)
let (remaining, data) = take(5usize)(&input[2..]).unwrap();
// Versus searching for delimiter - O(n)
// Would need to scan to find "World" or whatever delimiter
}Use take when you know the length; it's always O(1) regardless of input size.
When take_until Finds the Delimiter at Start
use nom::bytes::complete::take_until;
use nom::IResult;
fn empty_consume() {
// If delimiter is at the very start, take_until returns empty slice
let input = b":value";
let result: IResult<&[u8], &[u8]> = take_until(":")(input);
match result {
Ok((remaining, consumed)) => {
assert_eq!(consumed, b""); // Empty - nothing before delimiter
assert_eq!(remaining, b":value"); // Delimiter is first
}
Err(_) => panic!("Should succeed"),
}
// This is useful for optional content before delimiter
}
fn handling_empty_fields() {
// CSV: "Alice,,30" - empty field between commas
let input = b"Alice,,30";
let (remaining, field1) = take_until(",")(input).unwrap();
assert_eq!(field1, b"Alice");
let (remaining, _) = take(1usize)(remaining).unwrap(); // Skip comma
let (remaining, field2) = take_until(",")(remaining).unwrap();
assert_eq!(field2, b""); // Empty field!
let (remaining, _) = take(1usize)(remaining).unwrap(); // Skip comma
let (remaining, field3) = take_until("\n")(remaining).unwrap();
assert_eq!(field3, b"30");
}take_until returns an empty slice when the delimiter immediately follows the current position.
Multi-byte Delimiters
use nom::bytes::complete::take_until;
use nom::IResult;
fn multi_byte_delimiter() {
// take_until works with multi-byte delimiters
let input = b"Hello, World!";
let result = take_until(", W")(input);
// consumed: "Hello"
// remaining: ", World!"
let input = b"function(arg1, arg2) { body }";
let result = take_until("function")(input);
// consumed: "" (delimiter at start)
// remaining: "function(arg1, arg2) { body }"
}
fn parse_until_marker() {
// Common pattern: parse until a specific marker
let input = b"---START---data---END---more";
let (remaining, _) = take_until("---START---")(input).unwrap();
let (remaining, _) = take(11usize)(remaining).unwrap(); // Skip "---START---"
let (remaining, data) = take_until("---END---")(remaining).unwrap();
assert_eq!(data, b"data");
}take_until handles delimiters of any length, searching for the complete pattern.
Error Handling
use nom::bytes::complete::{take, take_until};
use nom::error::{Error, ErrorKind};
fn error_examples() {
// take error: not enough input
let input = b"abc";
let result = take(10usize)(input);
match result {
Err(nom::Err::Error(Error { input: err_input, code })) => {
assert_eq!(err_input, b"abc");
assert_eq!(code, ErrorKind::Eof); // Or Take
}
_ => unreachable!(),
}
// take_until error: delimiter not found
let input = b"abcdefghij";
let result = take_until("xyz")(input);
match result {
Err(nom::Err::Error(Error { input: err_input, code })) => {
// Consumed entire input looking for delimiter
assert_eq!(code, ErrorKind::TakeUntil);
}
_ => unreachable!(),
}
}
fn safe_parsing_with_take_until() {
fn parse_field(input: &[u8]) -> Option<&str> {
take_until(",")(input)
.ok()
.map(|(_, field)| std::str::from_utf8(field).unwrap())
}
assert_eq!(parse_field(b"hello,world"), Some("hello"));
assert_eq!(parse_field(b"hello"), None); // No delimiter found
}Both combinators return errors when their requirements aren't metβinsufficient input for take, missing delimiter for take_until.
Practical Parser Example
use nom::bytes::complete::{take, take_until};
use nom::character::complete::char;
use nom::number::complete::be_u32;
use nom::sequence::tuple;
use nom::multi::many0;
use nom::IResult;
// Example: Parse a simple binary protocol
// Format: [4-byte count][count strings]
// Each string: [2-byte length][data]
#[derive(Debug)]
struct ProtocolMessage {
count: u32,
strings: Vec<String>,
}
fn parse_length_prefixed_string(input: &[u8]) -> IResult<&[u8], String> {
let (remaining, length) = be_u32(input)?;
let (remaining, data) = take(length as usize)(remaining)?;
Ok((remaining, String::from_utf8_lossy(data).into_owned()))
}
fn parse_protocol_message(input: &[u8]) -> IResult<&[u8], ProtocolMessage> {
let (remaining, count) = be_u32(input)?;
let mut strings = Vec::new();
let mut remaining = remaining;
for _ in 0..count {
let (rem, s) = parse_length_prefixed_string(remaining)?;
strings.push(s);
remaining = rem;
}
Ok((remaining, ProtocolMessage { count, strings }))
}
// Example: Parse a text-based protocol
// Format: lines ending with \r\n, empty line terminates
fn parse_text_line(input: &[u8]) -> IResult<&[u8], &str> {
let (remaining, line) = take_until("\r\n")(input)?;
let (remaining, _) = take(2usize)(remaining)?; // Consume \r\n
Ok((remaining, std::str::from_utf8(line).unwrap()))
}
fn parse_text_message(input: &[u8]) -> IResult<&[u8], Vec<&str>> {
let mut lines = Vec::new();
let mut remaining = input;
loop {
// Check for terminator (empty line)
if remaining.starts_with(b"\r\n") {
remaining = &remaining[2..];
break;
}
let (rem, line) = parse_text_line(remaining)?;
lines.push(line);
remaining = rem;
}
Ok((remaining, lines))
}These examples show typical usage: take for binary formats with length prefixes, take_until for text formats with delimiters.
Trade-offs Summary
use nom::bytes::complete::{take, take_until};
fn complete_guide_summary() {
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
// β Aspect β take β take_until β
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
// β Determination β Length from argument β Find delimiter β
// β Complexity β O(1) β O(n) β
// β Use when β Length known β Delimiter marks end β
// β Input requirement β At least n bytes β Contains delimiter β
// β Consumes delimiter β N/A β No β
// β Error on failure β Need more input β No delimiter found β
// β Binary format fit β Excellent β Poor β
// β Text format fit β Poor β Excellent β
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
// Use take when:
// - Format has length prefix
// - Fixed-size fields
// - You know exact count at parse time
// - Performance matters (O(1))
// Use take_until when:
// - Data ends with delimiter
// - Variable-length text fields
// - Common delimiters (null, newline, comma)
// - Human-readable formats
}
// Key insight:
// - take: "I need exactly N bytes" - parser controls count
// - take_until: "I need everything until this pattern" - input controls count
//
// The choice depends on your data format:
// - Binary protocols often use length prefixes -> use take
// - Text protocols often use delimiters -> use take_until
//
// Remember: take_until doesn't consume the delimiter!
// You typically need another parser to consume it afterward.Key insight: take and take_until represent two fundamental approaches to consuming input: known-length extraction vs. delimiter-based extraction. take is O(1) and ideal for binary protocols where lengths are known (length-prefixed data, fixed-size fields). take_until is O(n) and ideal for text protocols where data ends at a marker (null-terminated strings, CSV fields, HTTP headers). The critical behavioral difference is that take_until leaves the delimiter in the remaining inputβyou must explicitly consume it with another parser. When designing protocols, prefer length prefixes for efficiency; when parsing human-readable formats, delimiters are more practical and take_until is the right tool.
