What is the difference between regex::Regex::captures and captures_iter for extracting all matches from text?

captures returns the first match as an optional Captures object, while captures_iter returns an iterator that yields all non-overlapping matches in order—use captures when you need only the first match or want to confirm a pattern exists, and captures_iter when you need to process every match in the text. Both methods provide access to capture groups (portions of the match defined by parentheses in the pattern), but they differ fundamentally in how many matches they return and whether they allocate intermediate storage. The iterator-based approach is more memory-efficient for large texts with many matches since it yields results lazily rather than collecting them all at once.

The captures Method: First Match Only

use regex::Regex;
 
fn main() {
    let text = "The dates are 2024-01-15, 2024-03-22, and 2024-12-31.";
    let pattern = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
    
    // captures returns Option<Captures>
    // Only the first match is returned
    if let Some(caps) = pattern.captures(text) {
        // Full match
        println!("Full match: {}", &caps[0]);  // "2024-01-15"
        
        // Individual capture groups
        println!("Year: {}", &caps[1]);   // "2024"
        println!("Month: {}", &caps[2]);  // "01"
        println!("Day: {}", &caps[3]);    // "15"
    } else {
        println!("No match found");
    }
}

captures stops at the first match and returns it wrapped in Some, or None if no match exists.

The captures_iter Method: All Matches

use regex::Regex;
 
fn main() {
    let text = "The dates are 2024-01-15, 2024-03-22, and 2024-12-31.";
    let pattern = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
    
    // captures_iter returns an iterator over all matches
    for caps in pattern.captures_iter(text) {
        println!("Date: {}-{}-{}", &caps[1], &caps[2], &caps[3]);
    }
    // Output:
    // Date: 2024-01-15
    // Date: 2024-03-22
    // Date: 2024-12-31
}

captures_iter yields each match lazily, processing matches as you iterate.

Accessing Capture Groups

use regex::Regex;
 
fn main() {
    let text = "Contact: alice@example.com, bob@example.org";
    let pattern = Regex::new(r"(\w+)@(\w+)\.(\w+)").unwrap();
    
    // Single match - access groups by index
    if let Some(caps) = pattern.captures(text) {
        // caps[0] is always the full match
        println!("Full: {}", &caps[0]);      // "alice@example.com"
        println!("User: {}", &caps[1]);       // "alice"
        println!("Domain: {}", &caps[2]);    // "example"
        println!("TLD: {}", &caps[3]);        // "com"
        
        // Use get() for safe access with Option
        if let Some(m) = caps.get(1) {
            println!("User via get: {}", m.as_str());
        }
        
        // Named captures
        let named_pattern = Regex::new(r"(?P<user>\w+)@(?P<domain>\w+)\.(?P<tld>\w+)").unwrap();
        if let Some(caps) = named_pattern.captures(text) {
            println!("User: {}", &caps["user"]);
            println!("Domain: {}", &caps["domain"]);
        }
    }
}

Both methods provide the same Captures API for accessing groups.

Memory Characteristics

use regex::Regex;
 
fn main() {
    let text = "word ".repeat(1000);  // Many matches
    let pattern = Regex::new(r"word").unwrap();
    
    // captures: Only finds first match, minimal memory
    if let Some(caps) = pattern.captures(&text) {
        println!("Found at: {}", caps.get(0).unwrap().start());
    }
    
    // captures_iter: Yields matches lazily
    // Does NOT collect all matches into memory
    let mut count = 0;
    for _ in pattern.captures_iter(&text) {
        count += 1;
        // Each match is yielded one at a time
        // No allocation for all 1000 matches
    }
    println!("Count: {}", count);
}

captures_iter is memory-efficient because it yields matches on-demand.

Processing All Matches: Practical Example

use regex::Regex;
 
fn main() {
    let log = r#"
        ERROR 2024-01-15: Connection failed
        INFO 2024-01-15: Server started
        ERROR 2024-01-16: Timeout occurred
        WARN 2024-01-16: Low memory
        ERROR 2024-01-17: Disk full
    "#;
    
    // Pattern with named captures
    let pattern = Regex::new(
        r"(?P<level>ERROR|WARN|INFO)\s+(?P<date>\d{4}-\d{2}-\d{2}):\s+(?P<message>.+)"
    ).unwrap();
    
    // Extract all errors using captures_iter
    let errors: Vec<(String, String, String)> = pattern
        .captures_iter(log)
        .filter_map(|caps| {
            let level = caps["level"].to_string();
            if level == "ERROR" {
                Some((
                    level,
                    caps["date"].to_string(),
                    caps["message"].to_string(),
                ))
            } else {
                None
            }
        })
        .collect();
    
    for (level, date, message) in errors {
        println!("[{}] {}: {}", level, date, message);
    }
    // Output:
    // [ERROR] 2024-01-15: Connection failed
    // [ERROR] 2024-01-16: Timeout occurred
    // [ERROR] 2024-01-17: Disk full
}

captures_iter enables functional processing of all matches.

When to Use Each

use regex::Regex;
 
fn main() {
    let text = "Values: 42, 17, 99, 23";
    let number_pattern = Regex::new(r"\d+").unwrap();
    
    // Use captures when:
    // 1. You only need the first match
    
    if let Some(caps) = number_pattern.captures(text) {
        println!("First number: {}", &caps[0]);
    }
    
    // 2. You're checking if a pattern exists (faster than find)
    if number_pattern.captures(text).is_some() {
        println!("Pattern exists");
    }
    
    // 3. You need capture groups from the first match
    let email_pattern = Regex::new(r"(\w+)@(\w+)\.(\w+)").unwrap();
    if let Some(caps) = email_pattern.captures(text) {
        // Process first email's components
        println!("Domain of first email: {}", &caps[2]);
    }
    
    // Use captures_iter when:
    // 1. Processing all matches in the text
    
    let numbers: Vec<i32> = number_pattern
        .captures_iter(text)
        .map(|caps| caps[0].parse().unwrap())
        .collect();
    println!("All numbers: {:?}", numbers);
    
    // 2. Need to transform each match
    let doubled: Vec<String> = number_pattern
        .captures_iter(text)
        .map(|caps| {
            let n: i32 = caps[0].parse().unwrap();
            format!("{}", n * 2)
        })
        .collect();
    println!("Doubled: {:?}", doubled);
}

Choose based on whether you need one match or all matches.

Position Information

use regex::Regex;
 
fn main() {
    let text = "abc def abc";
    let pattern = Regex::new(r"abc").unwrap();
    
    // Single match position
    if let Some(caps) = pattern.captures(text) {
        let m = caps.get(0).unwrap();
        println!("First match at {}..{}", m.start(), m.end());
    }
    
    // All match positions
    for caps in pattern.captures_iter(text) {
        let m = caps.get(0).unwrap();
        println!("Match at {}..{}", m.start(), m.end());
    }
    // Output:
    // Match at 0..3
    // Match at 8..11
    // Note: "abc" at position 4 is part of "abc def" - matches don't overlap
}

Both methods provide position information via Match objects.

Non-Overlapping Matches

use regex::Regex;
 
fn main() {
    let text = "aaa";
    let pattern = Regex::new(r"aa").unwrap();
    
    // captures_iter finds non-overlapping matches
    for caps in pattern.captures_iter(text) {
        println!("Match: {}", &caps[0]);
    }
    // Output: "aa" (only one match)
    // The second "aa" would overlap with the first
    
    // To find overlapping matches, you need manual iteration
    // Or use a crate like regex-syntax with lookahead
}

Both methods find non-overlapping matches; overlapping requires different approaches.

Capture Groups with Quantifiers

use regex::Regex;
 
fn main() {
    // Repeated capture groups only capture the last iteration
    let text = "values: a, b, c";
    let pattern = Regex::new(r"values: (?:(\w),?\s*)+").unwrap();
    
    if let Some(caps) = pattern.captures(text) {
        println!("Full match: {}", &caps[0]);  // "values: a, b, c"
        println!("Capture group: {:?}", caps.get(1).map(|m| m.as_str()));
        // Only "c" is captured - last iteration of the group
    }
    
    // To capture all repeated values, use captures_iter with a different pattern
    let item_pattern = Regex::new(r"(\w+)").unwrap();
    let items: Vec<&str> = item_pattern
        .captures_iter(text)
        .map(|caps| caps.get(1).unwrap().as_str())
        .collect();
    println!("Items: {:?}", items);
}

Repeated capture groups capture only the last match; use captures_iter to get all.

Iterator Adapters

use regex::Regex;
 
fn main() {
    let text = "prices: $10, $20, $30";
    let price_pattern = Regex::new(r"\$(\d+)").unwrap();
    
    // Chain iterator adapters
    let total: i32 = price_pattern
        .captures_iter(text)
        .map(|caps| caps[1].parse::<i32>().unwrap())
        .sum();
    println!("Total: ${}", total);  // $60
    
    // Filter and transform
    let expensive: Vec<i32> = price_pattern
        .captures_iter(text)
        .map(|caps| caps[1].parse::<i32>().unwrap())
        .filter(|&price| price > 15)
        .collect();
    println!("Expensive: {:?}", expensive);  // [20, 30]
}

captures_iter integrates seamlessly with iterator adapters.

Finding vs Capturing

use regex::Regex;
 
fn main() {
    let text = "key1=value1, key2=value2";
    
    // find: Just positions, no capture groups
    for m in Regex::new(r"\w+=\w+").unwrap().find_iter(text) {
        println!("Found: {}", m.as_str());
    }
    
    // captures: Access to capture groups
    let pattern = Regex::new(r"(\w+)=(\w+)").unwrap();
    for caps in pattern.captures_iter(text) {
        println!("Key: {}, Value: {}", &caps[1], &caps[2]);
    }
    
    // Use find/find_iter when you don't need groups
    // Use captures/captures_iter when you need groups
}

Use find/find_iter when you only need positions; captures/captures_iter for groups.

Performance Considerations

use regex::Regex;
 
fn main() {
    let text = "word ".repeat(10000);
    let pattern = Regex::new(r"word").unwrap();
    
    // captures: O(1) relative to text size (stops at first match)
    // Best case: match at start - very fast
    // Worst case: no match - scans entire text
    if let Some(caps) = pattern.captures(&text) {
        println!("Found");
    }
    
    // captures_iter: O(n) relative to number of matches
    // Lazily yields matches, memory O(1) per iteration
    let count = pattern.captures_iter(&text).count();
    println!("Count: {}", count);
    
    // For simple existence check:
    // pattern.is_match(text) is fastest (no capture overhead)
    println!("Exists: {}", pattern.is_match(&text));
}

For existence checks, is_match is fastest; for first match with groups, captures; for all matches, captures_iter.

Real-World Example: Parsing Log Files

use regex::Regex;
use std::collections::HashMap;
 
struct LogEntry {
    level: String,
    timestamp: String,
    message: String,
}
 
fn parse_logs(log_text: &str) -> Vec<LogEntry> {
    let pattern = Regex::new(
        r"\[(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\]\s+(?P<level>ERROR|WARN|INFO|DEBUG):\s+(?P<message>.+)"
    ).unwrap();
    
    pattern
        .captures_iter(log_text)
        .map(|caps| LogEntry {
            level: caps["level"].to_string(),
            timestamp: caps["timestamp"].to_string(),
            message: caps["message"].to_string(),
        })
        .collect()
}
 
fn main() {
    let logs = r#"
        [2024-01-15 10:30:45] ERROR: Connection refused
        [2024-01-15 10:30:46] INFO: Retrying connection
        [2024-01-15 10:30:50] ERROR: Connection timeout
    "#;
    
    let entries = parse_logs(logs);
    
    // Count by level
    let mut level_counts = HashMap::new();
    for entry in &entries {
        *level_counts.entry(&entry.level).or_insert(0) += 1;
    }
    
    println!("Level counts: {:?}", level_counts);
    
    // Filter errors
    let errors: Vec<_> = entries
        .into_iter()
        .filter(|e| e.level == "ERROR")
        .collect();
    
    for error in errors {
        println!("[{}] {}", error.timestamp, error.message);
    }
}

captures_iter enables clean log parsing with named capture groups.

Synthesis

Quick reference:

use regex::Regex;
 
let text = "a1 b2 c3";
let pattern = Regex::new(r"(\w)(\d)").unwrap();
 
// captures: First match only
if let Some(caps) = pattern.captures(text) {
    println!("First: {}{}", &caps[1], &caps[2]);  // "a1"
}
 
// captures_iter: All matches
for caps in pattern.captures_iter(text) {
    println!("Match: {}{}", &caps[1], &caps[2]);
}
// "a1", "b2", "c3"
 
// Key differences:
// - captures: Option<Captures> - single result
// - captures_iter: impl Iterator - multiple results
// - Both provide same Captures API for groups
// - captures_iter is lazy (no intermediate allocation)
// - Use captures when you only need the first match
// - Use captures_iter when processing all matches
 
// Performance hierarchy:
// 1. is_match() - fastest, just checks existence
// 2. find() - position of first match
// 3. captures() - first match with groups
// 4. captures_iter() - all matches with groups

Key insight: captures and captures_iter differ in scope, not capability—both provide the same Captures object for accessing the full match and individual groups. Use captures when you need only the first match or want to confirm a pattern exists with group extraction; use captures_iter when you need to process every match. The iterator approach is memory-efficient for large texts with many matches because it yields results lazily rather than collecting all matches into a vector. For pure existence checks without needing groups, is_match is fastest; for positions without groups, use find/find_iter instead of captures/captures_iter.