How does `regex::CaptureLocations::get` differ from `Regex::captures_read` for low-level capture access?

CaptureLocations::get provides direct access to stored capture byte offsets without requiring a string reference, while Regex::captures_read performs matching and captures in one operation, requiring both the regex pattern and the haystack string. The key distinction lies in their ownership model and allocation behavior: CaptureLocations is a reusable allocation that stores match positions as raw byte indices, allowing you to extract capture information without keeping the matched string alive. captures_read combines matching and capturing into a single call, returning a Captures value that holds both the match locations and a reference to the original string, enabling convenient substring extraction via get() methods.

The Captures Type and Its Limitations

use regex::Regex;
 
fn captures_basics() {
    let re = Regex::new(r"(\w+)@(\w+)\.(\w+)").unwrap();
    let hay = "Email: test@example.com here";
    
    // captures_read returns a Captures value
    let caps = re.captures(hay).unwrap();
    
    // Captures holds a reference to the haystack
    // This means the haystack must live as long as the Captures
    
    // Extract matched strings easily
    let full_match = caps.get(0).unwrap().as_str();  // "test@example.com"
    let local_part = caps.get(1).unwrap().as_str();  // "test"
    let domain = caps.get(2).unwrap().as_str();      // "example"
    let tld = caps.get(3).unwrap().as_str();         // "com"
    
    println!("{}@{}.{}", local_part, domain, tld);
}

The standard Captures type is ergonomic but requires the haystack string to remain alive.

CaptureLocations: Low-Level Position Storage

use regex::{Regex, CaptureLocations};
 
fn capture_locations_basics() {
    let re = Regex::new(r"(\w+)@(\w+)\.(\w+)").unwrap();
    let mut locs = re.capture_locations();  // Reusable allocation
    
    let hay1 = "test@example.com";
    let hay2 = "other@domain.org";
    
    // captures_read stores positions into locs, returns match
    let Some(m) = re.captures_read(&mut locs, hay1) else {
        return;
    };
    
    // locs now holds byte offsets from hay1
    // m is the overall match
    
    // Get byte offsets directly - no string reference needed
    let full_start = locs.get(0).unwrap().0;  // Start byte offset
    let full_end = locs.get(0).unwrap().1;    // End byte offset
    
    // Get capture group offsets
    let local_offsets = locs.get(1).unwrap();  // (0, 4)
    let domain_offsets = locs.get(2).unwrap(); // (5, 12)
    let tld_offsets = locs.get(3).unwrap();    // (13, 16)
    
    println!("Full match: {}-{}", full_start, full_end);
    println!("Capture 1: {:?}", local_offsets);
}

CaptureLocations stores raw byte offsets without holding a string reference.

Key Difference: String Lifetime Independence

use regex::{Regex, CaptureLocations};
 
fn lifetime_comparison() {
    let re = Regex::new(r"(\d+)-(\d+)").unwrap();
    let mut locs = re.capture_locations();
    
    // With Captures (standard approach):
    // The Captures value borrows the haystack
    let hay = "123-456".to_string();
    let caps = re.captures(&hay).unwrap();
    let matched = caps.get(0).unwrap().as_str();
    // matched borrows from hay, so hay must stay alive
    
    // With CaptureLocations:
    // Positions are stored as byte offsets
    re.captures_read(&mut locs, &hay);
    let (start, end) = locs.get(0).unwrap();
    
    // We now have raw offsets - can extract substring later
    // But locs doesn't borrow hay
    // (Though we still need hay to extract the actual string)
}

CaptureLocations doesn't hold a reference to the haystack, giving more flexibility.

Reusing CaptureLocations for Performance

use regex::{Regex, CaptureLocations};
 
fn reuse_performance() {
    let re = Regex::new(r"(\w+)\s*=\s*(\w+)").unwrap();
    let mut locs = re.capture_locations();
    
    let lines = [
        "name = value",
        "key = other",
        "a = b",
    ];
    
    // Reuse the same allocation for each match
    // This avoids allocating a new Captures each time
    for line in lines {
        if let Some(m) = re.captures_read(&mut locs, line) {
            // locs is reused, only overwriting the previous positions
            let key_offsets = locs.get(1).unwrap();
            let val_offsets = locs.get(2).unwrap();
            
            // Extract substrings from the line
            let key = &line[key_offsets.0..key_offsets.1];
            let val = &line[val_offsets.0..val_offsets.1];
            
            println!("Key: '{}', Value: '{}'", key, val);
        }
    }
    
    // Compare with allocating new Captures each time:
    for line in lines {
        if let Some(caps) = re.captures(line) {
            // New allocation for each captures call
            let key = caps.get(1).unwrap().as_str();
            let val = caps.get(2).unwrap().as_str();
            println!("Key: '{}', Value: '{}'", key, val);
        }
    }
}

Reusing CaptureLocations avoids repeated allocations when matching many strings.

CapturesRead vs CapturesAt

use regex::{Regex, CaptureLocations};
 
fn captures_read_vs_captures_at() {
    let re = Regex::new(r"(\w+)").unwrap();
    let mut locs = re.capture_locations();
    let hay = "hello world test";
    
    // captures_read: takes a mutable CaptureLocations
    // Returns Option<Match> for the overall match
    let match_result = re.captures_read(&mut locs, hay);
    
    // captures_read_at: also takes a start position
    // Useful for finding captures starting at a specific position
    let match_result = re.captures_read_at(&mut locs, hay, 6);  // Start at "world"
    
    // captures_at: Low-level, stores into CaptureLocations
    // and returns whether a match was found
    let found = re.captures_at(hay, 0, &mut locs);
    
    // captures_read is the most commonly used
    // It's essentially captures_at with start=0
}

captures_read is the primary method; captures_read_at allows specifying a starting position.

The Match Type Returned by CapturesRead

use regex::{Regex, CaptureLocations};
 
fn match_type() {
    let re = Regex::new(r"(\d+)").unwrap();
    let mut locs = re.capture_locations();
    let hay = "abc123def";
    
    // captures_read returns Option<Match>
    let Some(m) = re.captures_read(&mut locs, hay) else {
        return;
    };
    
    // Match provides the overall match information
    println!("Match start: {}", m.start());     // 3
    println!("Match end: {}", m.end());         // 6
    println!("Match: {}", m.as_str());          // "123"
    
    // Match doesn't include capture groups
    // Capture groups are stored in locs
    let group_offsets = locs.get(1).unwrap();
    println!("Group 1: {:?}", group_offsets);   // (3, 6)
}

captures_read returns a Match for the overall match while CaptureLocations stores group positions.

Extracting Capture Information from CaptureLocations

use regex::{Regex, CaptureLocations};
 
fn extracting_from_locs() {
    let re = Regex::new(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})").unwrap();
    let mut locs = re.capture_locations();
    
    let hay = "Date: 2024-03-15 end";
    
    if let Some(m) = re.captures_read(&mut locs, hay) {
        // Method 1: get(index) returns Option<(usize, usize)>
        let year_offsets = locs.get(1).unwrap();
        let year_str = &hay[year_offsets.0..year_offsets.1];
        println!("Year: {}", year_str);
        
        // Method 2: iter() to iterate all captures
        for (i, opt_offsets) in locs.iter().enumerate() {
            if let Some((start, end)) = opt_offsets {
                println!("Capture {}: '{}' at {}-{}", i, &hay[start..end], start, end);
            }
        }
        
        // Method 3: len() for number of capture slots
        println!("Number of captures: {}", locs.len());
        
        // Note: Index 0 is always the full match
        // Index 1+ are capture groups
    }
}

CaptureLocations provides several methods to access stored offsets.

Named Captures with CaptureLocations

use regex::Regex;
 
fn named_captures() {
    let re = Regex::new(r"(?P<key>\w+)\s*=\s*(?P<value>\w+)").unwrap();
    let hay = "name = value";
    
    // With standard Captures:
    let caps = re.captures(hay).unwrap();
    let key = caps.name("key").unwrap().as_str();  // "name"
    let val = caps.name("value").unwrap().as_str(); // "value"
    
    // With CaptureLocations:
    let mut locs = re.capture_locations();
    re.captures_read(&mut locs, hay);
    
    // Must use index for named captures
    // Get the index for a named capture
    let key_idx = re.capture_index("key");      // Returns Some(1)
    let val_idx = re.capture_index("value");    // Returns Some(2)
    
    if let (Some(idx), Some((s, e))) = (key_idx, locs.get(1)) {
        println!("Key: '{}'", &hay[s..e]);
    }
}

For named captures with CaptureLocations, you need to resolve the name to an index.

When to Use Each Approach

use regex::{Regex, CaptureLocations};
 
fn when_to_use() {
    // Use standard Captures when:
    // 1. You need ergonomic access to matched strings
    // 2. You're matching once or a few times
    // 3. You want named capture access
    // 4. You need the Captures to own string slices
    
    // Use CaptureLocations when:
    // 1. You're matching many strings in a loop (reuse allocation)
    // 2. You need to store positions for later processing
    // 3. You want to minimize allocations
    // 4. You're doing low-level string manipulation
    
    // Example: High-throughput parsing
    let re = Regex::new(r"(\d+)\.(\d+)\.(\d+)").unwrap();
    let mut locs = re.capture_locations();
    
    // Many matches, reuse locs
    for version in &["1.0.0", "2.1.3", "0.0.1"] {
        re.captures_read(&mut locs, version);
        // Process without allocating Captures
    }
    
    // Example: Ergonomic single match
    let caps = re.captures("1.2.3").unwrap();
    let major = caps.get(1).unwrap().as_str();  // Easy string access
}

Choose based on performance needs and code clarity.

Memory Allocation Comparison

use regex::{Regex, CaptureLocations};
 
fn allocation_comparison() {
    let re = Regex::new(r"(a)(b)(c)(d)(e)").unwrap();
    
    // Standard Captures allocates for each match
    // Internal representation includes:
    // - Vector of match positions
    // - Reference to haystack
    // - Method overhead for named capture lookup
    
    // CaptureLocations is a fixed-size allocation
    // - Created once with capture_locations()
    // - Reused across matches
    // - Stores only (start, end) pairs
    
    // Size difference for regex with N capture groups:
    // - Captures: ~24 + N * 8 bytes + overhead
    // - CaptureLocations: ~8 + N * 8 bytes
    
    // For hot loops with many matches:
    let mut locs = re.capture_locations();  // One allocation
    for hay in get_many_strings() {
        re.captures_read(&mut locs, hay);     // No new allocation
    }
    
    // vs.
    for hay in get_many_strings() {
        let _ = re.captures(hay);            // New allocation each iteration
    }
}
 
fn get_many_strings() -> Vec<&'static str> {
    vec!["abcde", "abcde", "abcde"]
}

CaptureLocations is more memory-efficient for repeated matching.

Low-Level API: captures_at

use regex::{Regex, CaptureLocations};
 
fn captures_at_low_level() {
    let re = Regex::new(r"(\w+)").unwrap();
    let mut locs = re.capture_locations();
    let hay = "hello world";
    
    // captures_at is the lowest-level method
    // Arguments: haystack, start position, CaptureLocations
    // Returns: true if match found
    
    let found = re.captures_at(hay, 0, &mut locs);
    
    if found {
        // locs is populated with positions
        // But no Match object is returned
        // Use locs.get(0) for overall match
        let (start, end) = locs.get(0).unwrap();
        println!("Match: '{}'", &hay[start..end]);
    }
    
    // Compare with captures_read:
    // captures_read returns Option<Match>
    // captures_read_at allows specifying start position
    // captures_at is the underlying implementation
}

captures_at is the lowest-level method, used internally by higher-level APIs.

Working with Iter and Slots

use regex::{Regex, CaptureLocations};
 
fn iter_and_slots() {
    let re = Regex::new(r"(a)(b)?(c)").unwrap();
    let mut locs = re.capture_locations();
    
    re.captures_read(&mut locs, "ac");
    
    // Not all captures may have matched
    // Optional groups can be None
    
    // iter() returns iterator over all capture slots
    for (i, slot) in locs.iter().enumerate() {
        match slot {
            Some((start, end)) => println!("{}: matched {}-{}", i, start, end),
            None => println!("{}: did not match", i),
        }
    }
    // Output:
    // 0: matched 0-2  (full match "ac")
    // 1: matched 0-1  ("a")
    // 2: did not match  (optional "b" didn't match)
    // 3: matched 1-2  ("c")
    
    // get() returns None for unmatched captures
    assert!(locs.get(2).is_none());
}

Not all capture groups necessarily match; get() returns None for unmatched optional groups.

Performance Benchmarks

use regex::{Regex, CaptureLocations};
 
// Simplified performance comparison
fn bench_captures(re: &Regex, haystacks: &[&str]) -> usize {
    let mut count = 0;
    for hay in haystacks {
        if let Some(caps) = re.captures(hay) {
            if caps.get(1).is_some() {
                count += 1;
            }
        }
    }
    count
}
 
fn bench_captures_read(re: &Regex, haystacks: &[&str]) -> usize {
    let mut count = 0;
    let mut locs = re.capture_locations();
    for hay in haystacks {
        if re.captures_read(&mut locs, hay).is_some() {
            if locs.get(1).is_some() {
                count += 1;
            }
        }
    }
    count
}
 
// In practice:
// - captures_read is faster for many iterations
// - The gap grows with more capture groups
// - Standard captures() is fine for single matches
// - Reuse of locs avoids allocation churn

For performance-critical code, captures_read with reused CaptureLocations wins.

Complete Example: Version Parsing

use regex::{Regex, CaptureLocations};
 
fn parse_versions(versions: &[&str]) -> Vec<(u32, u32, u32)> {
    let re = Regex::new(r"^(\d+)\.(\d+)\.(\d+)$").unwrap();
    let mut locs = re.capture_locations();
    
    let mut results = Vec::new();
    
    for v in versions {
        if let Some(_match) = re.captures_read(&mut locs, v) {
            // Extract each component
            let major: u32 = locs.get(1)
                .and_then(|(s, e)| v.get(s..e))
                .and_then(|s| s.parse().ok())
                .unwrap_or(0);
            
            let minor: u32 = locs.get(2)
                .and_then(|(s, e)| v.get(s..e))
                .and_then(|s| s.parse().ok())
                .unwrap_or(0);
            
            let patch: u32 = locs.get(3)
                .and_then(|(s, e)| v.get(s..e))
                .and_then(|s| s.parse().ok())
                .unwrap_or(0);
            
            results.push((major, minor, patch));
        }
    }
    
    results
}
 
fn main_example() {
    let versions = ["1.0.0", "2.1.3", "10.20.30", "invalid"];
    let parsed = parse_versions(&versions);
    
    for (maj, min, pat) in parsed {
        println!("{}.{}.{}", maj, min, pat);
    }
}

A realistic use case showing CaptureLocations reuse for parsing structured data.

Synthesis

Quick reference:

use regex::{Regex, CaptureLocations};
 
fn quick_reference() {
    let re = Regex::new(r"(\w+)=(\w+)").unwrap();
    
    // Standard Captures approach
    let caps = re.captures("key=value").unwrap();
    let key = caps.get(1).unwrap().as_str();    // Easy string access
    let val = caps.get(2).unwrap().as_str();
    
    // CaptureLocations approach
    let mut locs = re.capture_locations();       // Reusable allocation
    re.captures_read(&mut locs, "key=value");
    let (s1, e1) = locs.get(1).unwrap();         // Byte offsets
    let (s2, e2) = locs.get(2).unwrap();
    
    // Key differences:
    // | Aspect           | Captures           | CaptureLocations     |
    // |------------------|--------------------|-----------------------|
    // | String reference | Yes (borrows hay)  | No                    |
    // | Allocation       | Per match          | Reusable              |
    // | Ease of use      | High (as_str())    | Medium (offsets)      |
    // | Named captures   | caps.name("x")     | re.capture_index("x") |
    // | Performance      | Good for single    | Better for loops      |
    
    // Use Captures when:
    // - Convenience matters more than performance
    // - You need named captures easily
    // - Single or few matches
    
    // Use CaptureLocations when:
    // - Matching in tight loops
    // - Minimizing allocations
    // - You need raw byte positions
}

Key insight: CaptureLocations::get and Regex::captures_read represent two approaches to capture extraction with different trade-offs. The standard captures() method returns a Captures value that holds both match positions and a reference to the haystack, enabling ergonomic .get(1).as_str() access. This convenience comes with an allocation per match and a lifetime dependency on the haystack string. In contrast, captures_read() populates a reusable CaptureLocations allocation with raw byte offsets, returning only a Match for the overall match. The offsets in CaptureLocations are just (usize, usize) pairs—you need the haystack to extract actual strings, but the CaptureLocations itself doesn't hold a reference. This design enables reuse across many matches without allocation churn. For hot parsing loops, capture_locations() + captures_read() in a loop is significantly faster than repeated captures() calls. The trade-off is verbosity: you work with byte offsets and manual string slicing rather than ready-made string references. Choose Captures for readability and single matches; choose CaptureLocations for performance in loops or when you need position data independent of string lifetimes.

How does regex::CaptureLocations::get differ from Regex::captures_read for low-level capture access?