How does regex::CaptureLocations::get differ from Regex::captures_read for low-level capture access?
CaptureLocations::get provides direct access to stored capture byte offsets without requiring a string reference, while Regex::captures_read performs matching and captures in one operation, requiring both the regex pattern and the haystack string. The key distinction lies in their ownership model and allocation behavior: CaptureLocations is a reusable allocation that stores match positions as raw byte indices, allowing you to extract capture information without keeping the matched string alive. captures_read combines matching and capturing into a single call, returning a Captures value that holds both the match locations and a reference to the original string, enabling convenient substring extraction via get() methods.
The Captures Type and Its Limitations
use regex::Regex;
fn captures_basics() {
let re = Regex::new(r"(\w+)@(\w+)\.(\w+)").unwrap();
let hay = "Email: test@example.com here";
// captures_read returns a Captures value
let caps = re.captures(hay).unwrap();
// Captures holds a reference to the haystack
// This means the haystack must live as long as the Captures
// Extract matched strings easily
let full_match = caps.get(0).unwrap().as_str(); // "test@example.com"
let local_part = caps.get(1).unwrap().as_str(); // "test"
let domain = caps.get(2).unwrap().as_str(); // "example"
let tld = caps.get(3).unwrap().as_str(); // "com"
println!("{}@{}.{}", local_part, domain, tld);
}The standard Captures type is ergonomic but requires the haystack string to remain alive.
CaptureLocations: Low-Level Position Storage
use regex::{Regex, CaptureLocations};
fn capture_locations_basics() {
let re = Regex::new(r"(\w+)@(\w+)\.(\w+)").unwrap();
let mut locs = re.capture_locations(); // Reusable allocation
let hay1 = "test@example.com";
let hay2 = "other@domain.org";
// captures_read stores positions into locs, returns match
let Some(m) = re.captures_read(&mut locs, hay1) else {
return;
};
// locs now holds byte offsets from hay1
// m is the overall match
// Get byte offsets directly - no string reference needed
let full_start = locs.get(0).unwrap().0; // Start byte offset
let full_end = locs.get(0).unwrap().1; // End byte offset
// Get capture group offsets
let local_offsets = locs.get(1).unwrap(); // (0, 4)
let domain_offsets = locs.get(2).unwrap(); // (5, 12)
let tld_offsets = locs.get(3).unwrap(); // (13, 16)
println!("Full match: {}-{}", full_start, full_end);
println!("Capture 1: {:?}", local_offsets);
}CaptureLocations stores raw byte offsets without holding a string reference.
Key Difference: String Lifetime Independence
use regex::{Regex, CaptureLocations};
fn lifetime_comparison() {
let re = Regex::new(r"(\d+)-(\d+)").unwrap();
let mut locs = re.capture_locations();
// With Captures (standard approach):
// The Captures value borrows the haystack
let hay = "123-456".to_string();
let caps = re.captures(&hay).unwrap();
let matched = caps.get(0).unwrap().as_str();
// matched borrows from hay, so hay must stay alive
// With CaptureLocations:
// Positions are stored as byte offsets
re.captures_read(&mut locs, &hay);
let (start, end) = locs.get(0).unwrap();
// We now have raw offsets - can extract substring later
// But locs doesn't borrow hay
// (Though we still need hay to extract the actual string)
}CaptureLocations doesn't hold a reference to the haystack, giving more flexibility.
Reusing CaptureLocations for Performance
use regex::{Regex, CaptureLocations};
fn reuse_performance() {
let re = Regex::new(r"(\w+)\s*=\s*(\w+)").unwrap();
let mut locs = re.capture_locations();
let lines = [
"name = value",
"key = other",
"a = b",
];
// Reuse the same allocation for each match
// This avoids allocating a new Captures each time
for line in lines {
if let Some(m) = re.captures_read(&mut locs, line) {
// locs is reused, only overwriting the previous positions
let key_offsets = locs.get(1).unwrap();
let val_offsets = locs.get(2).unwrap();
// Extract substrings from the line
let key = &line[key_offsets.0..key_offsets.1];
let val = &line[val_offsets.0..val_offsets.1];
println!("Key: '{}', Value: '{}'", key, val);
}
}
// Compare with allocating new Captures each time:
for line in lines {
if let Some(caps) = re.captures(line) {
// New allocation for each captures call
let key = caps.get(1).unwrap().as_str();
let val = caps.get(2).unwrap().as_str();
println!("Key: '{}', Value: '{}'", key, val);
}
}
}Reusing CaptureLocations avoids repeated allocations when matching many strings.
CapturesRead vs CapturesAt
use regex::{Regex, CaptureLocations};
fn captures_read_vs_captures_at() {
let re = Regex::new(r"(\w+)").unwrap();
let mut locs = re.capture_locations();
let hay = "hello world test";
// captures_read: takes a mutable CaptureLocations
// Returns Option<Match> for the overall match
let match_result = re.captures_read(&mut locs, hay);
// captures_read_at: also takes a start position
// Useful for finding captures starting at a specific position
let match_result = re.captures_read_at(&mut locs, hay, 6); // Start at "world"
// captures_at: Low-level, stores into CaptureLocations
// and returns whether a match was found
let found = re.captures_at(hay, 0, &mut locs);
// captures_read is the most commonly used
// It's essentially captures_at with start=0
}captures_read is the primary method; captures_read_at allows specifying a starting position.
The Match Type Returned by CapturesRead
use regex::{Regex, CaptureLocations};
fn match_type() {
let re = Regex::new(r"(\d+)").unwrap();
let mut locs = re.capture_locations();
let hay = "abc123def";
// captures_read returns Option<Match>
let Some(m) = re.captures_read(&mut locs, hay) else {
return;
};
// Match provides the overall match information
println!("Match start: {}", m.start()); // 3
println!("Match end: {}", m.end()); // 6
println!("Match: {}", m.as_str()); // "123"
// Match doesn't include capture groups
// Capture groups are stored in locs
let group_offsets = locs.get(1).unwrap();
println!("Group 1: {:?}", group_offsets); // (3, 6)
}captures_read returns a Match for the overall match while CaptureLocations stores group positions.
Extracting Capture Information from CaptureLocations
use regex::{Regex, CaptureLocations};
fn extracting_from_locs() {
let re = Regex::new(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})").unwrap();
let mut locs = re.capture_locations();
let hay = "Date: 2024-03-15 end";
if let Some(m) = re.captures_read(&mut locs, hay) {
// Method 1: get(index) returns Option<(usize, usize)>
let year_offsets = locs.get(1).unwrap();
let year_str = &hay[year_offsets.0..year_offsets.1];
println!("Year: {}", year_str);
// Method 2: iter() to iterate all captures
for (i, opt_offsets) in locs.iter().enumerate() {
if let Some((start, end)) = opt_offsets {
println!("Capture {}: '{}' at {}-{}", i, &hay[start..end], start, end);
}
}
// Method 3: len() for number of capture slots
println!("Number of captures: {}", locs.len());
// Note: Index 0 is always the full match
// Index 1+ are capture groups
}
}CaptureLocations provides several methods to access stored offsets.
Named Captures with CaptureLocations
use regex::Regex;
fn named_captures() {
let re = Regex::new(r"(?P<key>\w+)\s*=\s*(?P<value>\w+)").unwrap();
let hay = "name = value";
// With standard Captures:
let caps = re.captures(hay).unwrap();
let key = caps.name("key").unwrap().as_str(); // "name"
let val = caps.name("value").unwrap().as_str(); // "value"
// With CaptureLocations:
let mut locs = re.capture_locations();
re.captures_read(&mut locs, hay);
// Must use index for named captures
// Get the index for a named capture
let key_idx = re.capture_index("key"); // Returns Some(1)
let val_idx = re.capture_index("value"); // Returns Some(2)
if let (Some(idx), Some((s, e))) = (key_idx, locs.get(1)) {
println!("Key: '{}'", &hay[s..e]);
}
}For named captures with CaptureLocations, you need to resolve the name to an index.
When to Use Each Approach
use regex::{Regex, CaptureLocations};
fn when_to_use() {
// Use standard Captures when:
// 1. You need ergonomic access to matched strings
// 2. You're matching once or a few times
// 3. You want named capture access
// 4. You need the Captures to own string slices
// Use CaptureLocations when:
// 1. You're matching many strings in a loop (reuse allocation)
// 2. You need to store positions for later processing
// 3. You want to minimize allocations
// 4. You're doing low-level string manipulation
// Example: High-throughput parsing
let re = Regex::new(r"(\d+)\.(\d+)\.(\d+)").unwrap();
let mut locs = re.capture_locations();
// Many matches, reuse locs
for version in &["1.0.0", "2.1.3", "0.0.1"] {
re.captures_read(&mut locs, version);
// Process without allocating Captures
}
// Example: Ergonomic single match
let caps = re.captures("1.2.3").unwrap();
let major = caps.get(1).unwrap().as_str(); // Easy string access
}Choose based on performance needs and code clarity.
Memory Allocation Comparison
use regex::{Regex, CaptureLocations};
fn allocation_comparison() {
let re = Regex::new(r"(a)(b)(c)(d)(e)").unwrap();
// Standard Captures allocates for each match
// Internal representation includes:
// - Vector of match positions
// - Reference to haystack
// - Method overhead for named capture lookup
// CaptureLocations is a fixed-size allocation
// - Created once with capture_locations()
// - Reused across matches
// - Stores only (start, end) pairs
// Size difference for regex with N capture groups:
// - Captures: ~24 + N * 8 bytes + overhead
// - CaptureLocations: ~8 + N * 8 bytes
// For hot loops with many matches:
let mut locs = re.capture_locations(); // One allocation
for hay in get_many_strings() {
re.captures_read(&mut locs, hay); // No new allocation
}
// vs.
for hay in get_many_strings() {
let _ = re.captures(hay); // New allocation each iteration
}
}
fn get_many_strings() -> Vec<&'static str> {
vec!["abcde", "abcde", "abcde"]
}CaptureLocations is more memory-efficient for repeated matching.
Low-Level API: captures_at
use regex::{Regex, CaptureLocations};
fn captures_at_low_level() {
let re = Regex::new(r"(\w+)").unwrap();
let mut locs = re.capture_locations();
let hay = "hello world";
// captures_at is the lowest-level method
// Arguments: haystack, start position, CaptureLocations
// Returns: true if match found
let found = re.captures_at(hay, 0, &mut locs);
if found {
// locs is populated with positions
// But no Match object is returned
// Use locs.get(0) for overall match
let (start, end) = locs.get(0).unwrap();
println!("Match: '{}'", &hay[start..end]);
}
// Compare with captures_read:
// captures_read returns Option<Match>
// captures_read_at allows specifying start position
// captures_at is the underlying implementation
}captures_at is the lowest-level method, used internally by higher-level APIs.
Working with Iter and Slots
use regex::{Regex, CaptureLocations};
fn iter_and_slots() {
let re = Regex::new(r"(a)(b)?(c)").unwrap();
let mut locs = re.capture_locations();
re.captures_read(&mut locs, "ac");
// Not all captures may have matched
// Optional groups can be None
// iter() returns iterator over all capture slots
for (i, slot) in locs.iter().enumerate() {
match slot {
Some((start, end)) => println!("{}: matched {}-{}", i, start, end),
None => println!("{}: did not match", i),
}
}
// Output:
// 0: matched 0-2 (full match "ac")
// 1: matched 0-1 ("a")
// 2: did not match (optional "b" didn't match)
// 3: matched 1-2 ("c")
// get() returns None for unmatched captures
assert!(locs.get(2).is_none());
}Not all capture groups necessarily match; get() returns None for unmatched optional groups.
Performance Benchmarks
use regex::{Regex, CaptureLocations};
// Simplified performance comparison
fn bench_captures(re: &Regex, haystacks: &[&str]) -> usize {
let mut count = 0;
for hay in haystacks {
if let Some(caps) = re.captures(hay) {
if caps.get(1).is_some() {
count += 1;
}
}
}
count
}
fn bench_captures_read(re: &Regex, haystacks: &[&str]) -> usize {
let mut count = 0;
let mut locs = re.capture_locations();
for hay in haystacks {
if re.captures_read(&mut locs, hay).is_some() {
if locs.get(1).is_some() {
count += 1;
}
}
}
count
}
// In practice:
// - captures_read is faster for many iterations
// - The gap grows with more capture groups
// - Standard captures() is fine for single matches
// - Reuse of locs avoids allocation churnFor performance-critical code, captures_read with reused CaptureLocations wins.
Complete Example: Version Parsing
use regex::{Regex, CaptureLocations};
fn parse_versions(versions: &[&str]) -> Vec<(u32, u32, u32)> {
let re = Regex::new(r"^(\d+)\.(\d+)\.(\d+)$").unwrap();
let mut locs = re.capture_locations();
let mut results = Vec::new();
for v in versions {
if let Some(_match) = re.captures_read(&mut locs, v) {
// Extract each component
let major: u32 = locs.get(1)
.and_then(|(s, e)| v.get(s..e))
.and_then(|s| s.parse().ok())
.unwrap_or(0);
let minor: u32 = locs.get(2)
.and_then(|(s, e)| v.get(s..e))
.and_then(|s| s.parse().ok())
.unwrap_or(0);
let patch: u32 = locs.get(3)
.and_then(|(s, e)| v.get(s..e))
.and_then(|s| s.parse().ok())
.unwrap_or(0);
results.push((major, minor, patch));
}
}
results
}
fn main_example() {
let versions = ["1.0.0", "2.1.3", "10.20.30", "invalid"];
let parsed = parse_versions(&versions);
for (maj, min, pat) in parsed {
println!("{}.{}.{}", maj, min, pat);
}
}A realistic use case showing CaptureLocations reuse for parsing structured data.
Synthesis
Quick reference:
use regex::{Regex, CaptureLocations};
fn quick_reference() {
let re = Regex::new(r"(\w+)=(\w+)").unwrap();
// Standard Captures approach
let caps = re.captures("key=value").unwrap();
let key = caps.get(1).unwrap().as_str(); // Easy string access
let val = caps.get(2).unwrap().as_str();
// CaptureLocations approach
let mut locs = re.capture_locations(); // Reusable allocation
re.captures_read(&mut locs, "key=value");
let (s1, e1) = locs.get(1).unwrap(); // Byte offsets
let (s2, e2) = locs.get(2).unwrap();
// Key differences:
// | Aspect | Captures | CaptureLocations |
// |------------------|--------------------|-----------------------|
// | String reference | Yes (borrows hay) | No |
// | Allocation | Per match | Reusable |
// | Ease of use | High (as_str()) | Medium (offsets) |
// | Named captures | caps.name("x") | re.capture_index("x") |
// | Performance | Good for single | Better for loops |
// Use Captures when:
// - Convenience matters more than performance
// - You need named captures easily
// - Single or few matches
// Use CaptureLocations when:
// - Matching in tight loops
// - Minimizing allocations
// - You need raw byte positions
}Key insight: CaptureLocations::get and Regex::captures_read represent two approaches to capture extraction with different trade-offs. The standard captures() method returns a Captures value that holds both match positions and a reference to the haystack, enabling ergonomic .get(1).as_str() access. This convenience comes with an allocation per match and a lifetime dependency on the haystack string. In contrast, captures_read() populates a reusable CaptureLocations allocation with raw byte offsets, returning only a Match for the overall match. The offsets in CaptureLocations are just (usize, usize) pairs—you need the haystack to extract actual strings, but the CaptureLocations itself doesn't hold a reference. This design enables reuse across many matches without allocation churn. For hot parsing loops, capture_locations() + captures_read() in a loop is significantly faster than repeated captures() calls. The trade-off is verbosity: you work with byte offsets and manual string slicing rather than ready-made string references. Choose Captures for readability and single matches; choose CaptureLocations for performance in loops or when you need position data independent of string lifetimes.
