What is the difference between regex::Regex::captures and captures_iter for extracting all matches from text?
captures returns the first match as an optional Captures object, while captures_iter returns an iterator that yields all non-overlapping matches in order—use captures when you need only the first match or want to confirm a pattern exists, and captures_iter when you need to process every match in the text. Both methods provide access to capture groups (portions of the match defined by parentheses in the pattern), but they differ fundamentally in how many matches they return and whether they allocate intermediate storage. The iterator-based approach is more memory-efficient for large texts with many matches since it yields results lazily rather than collecting them all at once.
The captures Method: First Match Only
use regex::Regex;
fn main() {
let text = "The dates are 2024-01-15, 2024-03-22, and 2024-12-31.";
let pattern = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
// captures returns Option<Captures>
// Only the first match is returned
if let Some(caps) = pattern.captures(text) {
// Full match
println!("Full match: {}", &caps[0]); // "2024-01-15"
// Individual capture groups
println!("Year: {}", &caps[1]); // "2024"
println!("Month: {}", &caps[2]); // "01"
println!("Day: {}", &caps[3]); // "15"
} else {
println!("No match found");
}
}captures stops at the first match and returns it wrapped in Some, or None if no match exists.
The captures_iter Method: All Matches
use regex::Regex;
fn main() {
let text = "The dates are 2024-01-15, 2024-03-22, and 2024-12-31.";
let pattern = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
// captures_iter returns an iterator over all matches
for caps in pattern.captures_iter(text) {
println!("Date: {}-{}-{}", &caps[1], &caps[2], &caps[3]);
}
// Output:
// Date: 2024-01-15
// Date: 2024-03-22
// Date: 2024-12-31
}captures_iter yields each match lazily, processing matches as you iterate.
Accessing Capture Groups
use regex::Regex;
fn main() {
let text = "Contact: alice@example.com, bob@example.org";
let pattern = Regex::new(r"(\w+)@(\w+)\.(\w+)").unwrap();
// Single match - access groups by index
if let Some(caps) = pattern.captures(text) {
// caps[0] is always the full match
println!("Full: {}", &caps[0]); // "alice@example.com"
println!("User: {}", &caps[1]); // "alice"
println!("Domain: {}", &caps[2]); // "example"
println!("TLD: {}", &caps[3]); // "com"
// Use get() for safe access with Option
if let Some(m) = caps.get(1) {
println!("User via get: {}", m.as_str());
}
// Named captures
let named_pattern = Regex::new(r"(?P<user>\w+)@(?P<domain>\w+)\.(?P<tld>\w+)").unwrap();
if let Some(caps) = named_pattern.captures(text) {
println!("User: {}", &caps["user"]);
println!("Domain: {}", &caps["domain"]);
}
}
}Both methods provide the same Captures API for accessing groups.
Memory Characteristics
use regex::Regex;
fn main() {
let text = "word ".repeat(1000); // Many matches
let pattern = Regex::new(r"word").unwrap();
// captures: Only finds first match, minimal memory
if let Some(caps) = pattern.captures(&text) {
println!("Found at: {}", caps.get(0).unwrap().start());
}
// captures_iter: Yields matches lazily
// Does NOT collect all matches into memory
let mut count = 0;
for _ in pattern.captures_iter(&text) {
count += 1;
// Each match is yielded one at a time
// No allocation for all 1000 matches
}
println!("Count: {}", count);
}captures_iter is memory-efficient because it yields matches on-demand.
Processing All Matches: Practical Example
use regex::Regex;
fn main() {
let log = r#"
ERROR 2024-01-15: Connection failed
INFO 2024-01-15: Server started
ERROR 2024-01-16: Timeout occurred
WARN 2024-01-16: Low memory
ERROR 2024-01-17: Disk full
"#;
// Pattern with named captures
let pattern = Regex::new(
r"(?P<level>ERROR|WARN|INFO)\s+(?P<date>\d{4}-\d{2}-\d{2}):\s+(?P<message>.+)"
).unwrap();
// Extract all errors using captures_iter
let errors: Vec<(String, String, String)> = pattern
.captures_iter(log)
.filter_map(|caps| {
let level = caps["level"].to_string();
if level == "ERROR" {
Some((
level,
caps["date"].to_string(),
caps["message"].to_string(),
))
} else {
None
}
})
.collect();
for (level, date, message) in errors {
println!("[{}] {}: {}", level, date, message);
}
// Output:
// [ERROR] 2024-01-15: Connection failed
// [ERROR] 2024-01-16: Timeout occurred
// [ERROR] 2024-01-17: Disk full
}captures_iter enables functional processing of all matches.
When to Use Each
use regex::Regex;
fn main() {
let text = "Values: 42, 17, 99, 23";
let number_pattern = Regex::new(r"\d+").unwrap();
// Use captures when:
// 1. You only need the first match
if let Some(caps) = number_pattern.captures(text) {
println!("First number: {}", &caps[0]);
}
// 2. You're checking if a pattern exists (faster than find)
if number_pattern.captures(text).is_some() {
println!("Pattern exists");
}
// 3. You need capture groups from the first match
let email_pattern = Regex::new(r"(\w+)@(\w+)\.(\w+)").unwrap();
if let Some(caps) = email_pattern.captures(text) {
// Process first email's components
println!("Domain of first email: {}", &caps[2]);
}
// Use captures_iter when:
// 1. Processing all matches in the text
let numbers: Vec<i32> = number_pattern
.captures_iter(text)
.map(|caps| caps[0].parse().unwrap())
.collect();
println!("All numbers: {:?}", numbers);
// 2. Need to transform each match
let doubled: Vec<String> = number_pattern
.captures_iter(text)
.map(|caps| {
let n: i32 = caps[0].parse().unwrap();
format!("{}", n * 2)
})
.collect();
println!("Doubled: {:?}", doubled);
}Choose based on whether you need one match or all matches.
Position Information
use regex::Regex;
fn main() {
let text = "abc def abc";
let pattern = Regex::new(r"abc").unwrap();
// Single match position
if let Some(caps) = pattern.captures(text) {
let m = caps.get(0).unwrap();
println!("First match at {}..{}", m.start(), m.end());
}
// All match positions
for caps in pattern.captures_iter(text) {
let m = caps.get(0).unwrap();
println!("Match at {}..{}", m.start(), m.end());
}
// Output:
// Match at 0..3
// Match at 8..11
// Note: "abc" at position 4 is part of "abc def" - matches don't overlap
}Both methods provide position information via Match objects.
Non-Overlapping Matches
use regex::Regex;
fn main() {
let text = "aaa";
let pattern = Regex::new(r"aa").unwrap();
// captures_iter finds non-overlapping matches
for caps in pattern.captures_iter(text) {
println!("Match: {}", &caps[0]);
}
// Output: "aa" (only one match)
// The second "aa" would overlap with the first
// To find overlapping matches, you need manual iteration
// Or use a crate like regex-syntax with lookahead
}Both methods find non-overlapping matches; overlapping requires different approaches.
Capture Groups with Quantifiers
use regex::Regex;
fn main() {
// Repeated capture groups only capture the last iteration
let text = "values: a, b, c";
let pattern = Regex::new(r"values: (?:(\w),?\s*)+").unwrap();
if let Some(caps) = pattern.captures(text) {
println!("Full match: {}", &caps[0]); // "values: a, b, c"
println!("Capture group: {:?}", caps.get(1).map(|m| m.as_str()));
// Only "c" is captured - last iteration of the group
}
// To capture all repeated values, use captures_iter with a different pattern
let item_pattern = Regex::new(r"(\w+)").unwrap();
let items: Vec<&str> = item_pattern
.captures_iter(text)
.map(|caps| caps.get(1).unwrap().as_str())
.collect();
println!("Items: {:?}", items);
}Repeated capture groups capture only the last match; use captures_iter to get all.
Iterator Adapters
use regex::Regex;
fn main() {
let text = "prices: $10, $20, $30";
let price_pattern = Regex::new(r"\$(\d+)").unwrap();
// Chain iterator adapters
let total: i32 = price_pattern
.captures_iter(text)
.map(|caps| caps[1].parse::<i32>().unwrap())
.sum();
println!("Total: ${}", total); // $60
// Filter and transform
let expensive: Vec<i32> = price_pattern
.captures_iter(text)
.map(|caps| caps[1].parse::<i32>().unwrap())
.filter(|&price| price > 15)
.collect();
println!("Expensive: {:?}", expensive); // [20, 30]
}captures_iter integrates seamlessly with iterator adapters.
Finding vs Capturing
use regex::Regex;
fn main() {
let text = "key1=value1, key2=value2";
// find: Just positions, no capture groups
for m in Regex::new(r"\w+=\w+").unwrap().find_iter(text) {
println!("Found: {}", m.as_str());
}
// captures: Access to capture groups
let pattern = Regex::new(r"(\w+)=(\w+)").unwrap();
for caps in pattern.captures_iter(text) {
println!("Key: {}, Value: {}", &caps[1], &caps[2]);
}
// Use find/find_iter when you don't need groups
// Use captures/captures_iter when you need groups
}Use find/find_iter when you only need positions; captures/captures_iter for groups.
Performance Considerations
use regex::Regex;
fn main() {
let text = "word ".repeat(10000);
let pattern = Regex::new(r"word").unwrap();
// captures: O(1) relative to text size (stops at first match)
// Best case: match at start - very fast
// Worst case: no match - scans entire text
if let Some(caps) = pattern.captures(&text) {
println!("Found");
}
// captures_iter: O(n) relative to number of matches
// Lazily yields matches, memory O(1) per iteration
let count = pattern.captures_iter(&text).count();
println!("Count: {}", count);
// For simple existence check:
// pattern.is_match(text) is fastest (no capture overhead)
println!("Exists: {}", pattern.is_match(&text));
}For existence checks, is_match is fastest; for first match with groups, captures; for all matches, captures_iter.
Real-World Example: Parsing Log Files
use regex::Regex;
use std::collections::HashMap;
struct LogEntry {
level: String,
timestamp: String,
message: String,
}
fn parse_logs(log_text: &str) -> Vec<LogEntry> {
let pattern = Regex::new(
r"\[(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\]\s+(?P<level>ERROR|WARN|INFO|DEBUG):\s+(?P<message>.+)"
).unwrap();
pattern
.captures_iter(log_text)
.map(|caps| LogEntry {
level: caps["level"].to_string(),
timestamp: caps["timestamp"].to_string(),
message: caps["message"].to_string(),
})
.collect()
}
fn main() {
let logs = r#"
[2024-01-15 10:30:45] ERROR: Connection refused
[2024-01-15 10:30:46] INFO: Retrying connection
[2024-01-15 10:30:50] ERROR: Connection timeout
"#;
let entries = parse_logs(logs);
// Count by level
let mut level_counts = HashMap::new();
for entry in &entries {
*level_counts.entry(&entry.level).or_insert(0) += 1;
}
println!("Level counts: {:?}", level_counts);
// Filter errors
let errors: Vec<_> = entries
.into_iter()
.filter(|e| e.level == "ERROR")
.collect();
for error in errors {
println!("[{}] {}", error.timestamp, error.message);
}
}captures_iter enables clean log parsing with named capture groups.
Synthesis
Quick reference:
use regex::Regex;
let text = "a1 b2 c3";
let pattern = Regex::new(r"(\w)(\d)").unwrap();
// captures: First match only
if let Some(caps) = pattern.captures(text) {
println!("First: {}{}", &caps[1], &caps[2]); // "a1"
}
// captures_iter: All matches
for caps in pattern.captures_iter(text) {
println!("Match: {}{}", &caps[1], &caps[2]);
}
// "a1", "b2", "c3"
// Key differences:
// - captures: Option<Captures> - single result
// - captures_iter: impl Iterator - multiple results
// - Both provide same Captures API for groups
// - captures_iter is lazy (no intermediate allocation)
// - Use captures when you only need the first match
// - Use captures_iter when processing all matches
// Performance hierarchy:
// 1. is_match() - fastest, just checks existence
// 2. find() - position of first match
// 3. captures() - first match with groups
// 4. captures_iter() - all matches with groupsKey insight: captures and captures_iter differ in scope, not capability—both provide the same Captures object for accessing the full match and individual groups. Use captures when you need only the first match or want to confirm a pattern exists with group extraction; use captures_iter when you need to process every match. The iterator approach is memory-efficient for large texts with many matches because it yields results lazily rather than collecting all matches into a vector. For pure existence checks without needing groups, is_match is fastest; for positions without groups, use find/find_iter instead of captures/captures_iter.
