What are the trade-offs between `regex::Regex::is_match` and `find` for simple pattern detection?

regex::Regex::is_match and find serve different purposes in the regex API: is_match performs a boolean existence check that stops at the first match position, while find returns the actual byte offsets of matches, enabling extraction of matched content. The trade-off is between simplicity and capability—is_match is simpler and slightly faster when you only need to know if a pattern exists, but find provides match locations that is_match cannot supply. Both methods use the same underlying regex engine, but is_match can exit earlier in certain cases because it doesn't need to compute the full match bounds.

Basic is_match Usage

use regex::Regex;
 
fn main() {
    let pattern = Regex::new(r"\d{4}").unwrap();
    
    // is_match returns bool only
    let text = "The year is 2024";
    
    if pattern.is_match(text) {
        println!("Found a 4-digit number");
    } else {
        println!("No 4-digit number found");
    }
    
    // Simple existence check
    let has_email = Regex::new(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}").unwrap();
    let text = "Contact: user@example.com";
    
    if has_email.is_match(text) {
        println!("Text contains an email");
    }
}

is_match provides a simple boolean answer about pattern existence.

Basic find Usage

use regex::Regex;
 
fn main() {
    let pattern = Regex::new(r"\d{4}").unwrap();
    let text = "The years 2023 and 2024";
    
    // find returns Option<Match> with position
    if let Some(mat) = pattern.find(text) {
        println!("Found at {}-{}: {}", mat.start(), mat.end(), &text[mat.start()..mat.end()]);
    }
    
    // Iterate over all matches
    let numbers = Regex::new(r"\d+").unwrap();
    let text = "Numbers: 42, 100, 7";
    
    for mat in numbers.find_iter(text) {
        println!("Found '{}' at {}-{}", &text[mat.start()..mat.end()], mat.start(), mat.end());
    }
}

find returns match positions, enabling content extraction.

Return Type Comparison

use regex::Regex;
 
fn main() {
    let pattern = Regex::new(r"hello").unwrap();
    let text = "hello world";
    
    // is_match: bool
    let exists: bool = pattern.is_match(text);
    println!("is_match returns: {}", exists);
    
    // find: Option<Match>
    let location: Option<regex::Match> = pattern.find(text);
    match location {
        Some(mat) => {
            println!("find returns: Match({}:{})", mat.start(), mat.end());
            println!("Matched text: '{}'", &text[mat.start()..mat.end()]);
        }
        None => println!("find returns: None"),
    }
    
    // Key difference in return types:
    // is_match: bool
    // find: Option<regex::Match>
    // find_iter: impl Iterator<Item = Match>
}

The return types encode what information is available after the operation.

Performance Comparison

use regex::Regex;
use std::time::Instant;
 
fn main() {
    let pattern = Regex::new(r"\d{4}-\d{2}-\d{2}").unwrap();
    let text = "Dates: 2024-01-15, 2024-02-20, 2024-03-25, 2024-04-30";
    
    const ITERATIONS: usize = 100_000;
    
    // Benchmark is_match
    let start = Instant::now();
    for _ in 0..ITERATIONS {
        let _ = pattern.is_match(text);
    }
    let is_match_time = start.elapsed();
    
    // Benchmark find
    let start = Instant::now();
    for _ in 0..ITERATIONS {
        let _ = pattern.find(text);
    }
    let find_time = start.elapsed();
    
    // Benchmark find_iter
    let start = Instant::now();
    for _ in 0..ITERATIONS {
        let _ = pattern.find_iter(text).count();
    }
    let find_iter_time = start.elapsed();
    
    println!("is_match: {:?}", is_match_time);
    println!("find: {:?}", find_time);
    println!("find_iter: {:?}", find_iter_time);
    println!("find/is_match ratio: {:.2}x", 
        find_time.as_secs_f64() / is_match_time.as_secs_f64());
}

is_match can be slightly faster when you don't need match positions.

Early Termination Optimization

use regex::Regex;
 
fn main() {
    // is_match can terminate as soon as a match is confirmed
    // find must determine the full match bounds
    
    let pattern = Regex::new(r"a+").unwrap();
    let text = "aaaaa"; // Five 'a' characters
    
    // For is_match:
    // - Engine finds 'a' at position 0
    // - Returns true immediately
    // - May not examine all 'a's
    
    // For find:
    // - Engine finds 'a' at position 0
    // - Must find end of match (position 5)
    // - Returns Match(0, 5)
    
    println!("is_match: {}", pattern.is_match(text));
    if let Some(mat) = pattern.find(text) {
        println!("find: {}-{}", mat.start(), mat.end());
    }
    
    // The difference is more pronounced with:
    // 1. Variable-length patterns (a+, a*)
    // 2. Long matches
    // 3. Complex patterns with alternations
}

is_match can skip computing match bounds in some regex engines.

When is_match is Sufficient

use regex::Regex;
 
fn main() {
    // Validation: only need yes/no
    let email_pattern = Regex::new(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$").unwrap();
    
    fn is_valid_email(email: &str) -> bool {
        email_pattern.is_match(email)
    }
    
    println!("Is valid: {}", is_valid_email("user@example.com"));
    println!("Is valid: {}", is_valid_email("invalid"));
    
    // Feature detection: only need existence
    let has_uppercase = Regex::new(r"[A-Z]").unwrap();
    let has_digit = Regex::new(r"\d").unwrap();
    let has_special = Regex::new(r"[!@#$%^&*]").unwrap();
    
    fn check_password_strength(password: &str) -> usize {
        let mut strength = 0;
        if has_uppercase.is_match(password) { strength += 1; }
        if has_digit.is_match(password) { strength += 1; }
        if has_special.is_match(password) { strength += 1; }
        strength
    }
    
    println!("Strength: {}", check_password_strength("Pass123!"));
    
    // Content filtering: only need to detect
    let spam_patterns = [
        Regex::new(r"buy now").unwrap(),
        Regex::new(r"click here").unwrap(),
        Regex::new(r"free money").unwrap(),
    ];
    
    fn is_spam(text: &str) -> bool {
        spam_patterns.iter().any(|p| p.is_match(text))
    }
}

Use is_match when match location adds no value.

When find is Necessary

use regex::Regex;
 
fn main() {
    // Extraction: need the matched content
    let url_pattern = Regex::new(r"https?://[^\s]+").unwrap();
    let text = "Visit https://example.com and http://rust-lang.org";
    
    // Can't do this with is_match
    for mat in url_pattern.find_iter(text) {
        println!("URL: {}", &text[mat.start()..mat.end()]);
    }
    
    // Position-based processing
    let log_pattern = Regex::new(r"\[(ERROR|WARN|INFO)\]").unwrap();
    let log = "[ERROR] Failed to connect [WARN] Retrying [INFO] Connected";
    
    for mat in log_pattern.find_iter(log) {
        let level = &log[mat.start()+1..mat.end()-1];
        println!("Log level at position {}: {}", mat.start(), level);
    }
    
    // Replacement needs location
    let censor_pattern = Regex::new(r"\b(password|secret|key)\b").unwrap();
    let text = "The password is secret123 and key is abc";
    
    let censored = censor_pattern.replace_all(text, "***");
    println!("Censored: {}", censored);
}

Use find when you need to extract or process matched content.

find_iter for Multiple Matches

use regex::Regex;
 
fn main() {
    let word_pattern = Regex::new(r"\b\w+\b").unwrap();
    let text = "The quick brown fox jumps over the lazy dog";
    
    // Count matches
    let count = word_pattern.find_iter(text).count();
    println!("Word count: {}", count);
    
    // Collect all matches
    let words: Vec<&str> = word_pattern
        .find_iter(text)
        .map(|mat| &text[mat.start()..mat.end()])
        .collect();
    println!("Words: {:?}", words);
    
    // Process with positions
    let long_words: Vec<(usize, usize, &str)> = word_pattern
        .find_iter(text)
        .filter(|mat| mat.end() - mat.start() > 3)
        .map(|mat| (mat.start(), mat.end(), &text[mat.start()..mat.end()]))
        .collect();
    println!("Long words: {:?}", long_words);
    
    // Find specific match
    let first_long = word_pattern
        .find_iter(text)
        .find(|mat| mat.end() - mat.start() > 4);
    
    if let Some(mat) = first_long {
        println!("First long word: '{}'", &text[mat.start()..mat.end()]);
    }
}

find_iter enables processing all matches with full position information.

Capturing Groups: find vs captures

use regex::Regex;
 
fn main() {
    let date_pattern = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
    let text = "Date: 2024-01-15";
    
    // find: only gets full match
    if let Some(mat) = date_pattern.find(text) {
        println!("Full match: {}", &text[mat.start()..mat.end()]);
        // Can't access individual groups
    }
    
    // captures: gets groups
    if let Some(caps) = date_pattern.captures(text) {
        println!("Full: {}", &caps[0]);
        println!("Year: {}", &caps[1]);
        println!("Month: {}", &caps[2]);
        println!("Day: {}", &caps[3]);
    }
    
    // When you only need groups, captures is required
    // When you only need to check existence, is_match is sufficient
    // find is the middle ground: full match but no groups
}

For capturing groups, neither is_match nor find is sufficient—use captures.

Memory Allocation

use regex::Regex;
 
fn main() {
    let pattern = Regex::new(r"\d+").unwrap();
    let text = "Numbers: 123, 456, 789";
    
    // is_match: no allocation for match data
    let exists = pattern.is_match(text);
    // Returns simple bool
    
    // find: allocates Match struct
    let maybe_match = pattern.find(text);
    // Returns Option<Match>
    // Match contains: start (usize), end (usize)
    
    // The Match struct is small (2 usizes)
    // But still more than a bool
    
    // find_iter: allocates Match for each iteration
    let matches: Vec<_> = pattern.find_iter(text).collect();
    // Each Match is allocated separately
    
    // For tight loops, is_match avoids Match allocation
}

is_match avoids allocating match metadata; find allocates Match structs.

Short-Circuit Behavior

use regex::Regex;
 
fn main() {
    // Patterns that can match at multiple positions
    let pattern = Regex::new(r"a+").unwrap();
    let text = "aaa bbb aaa";
    
    // is_match: stops at first successful match position
    // Returns true immediately after finding "aaa" at position 0
    
    // find: returns the first match with bounds
    // Must scan to find the end of "aaa" (position 3)
    
    // For anchored patterns, the difference is minimal
    let anchored = Regex::new(r"^a+").unwrap();
    
    // is_match and find do essentially the same work for anchored patterns
    // Both must examine from position 0
    
    // For unanchored patterns on long strings:
    let long_text = "xyz ".repeat(1000) + "aaa";
    
    // is_match must scan until it finds "aaa" at position 4000
    // But it stops as soon as it confirms the match exists
    // find must also compute that the match ends at position 4003
}

The performance gap depends on pattern anchoring and match length.

Anchored Pattern Optimization

use regex::Regex;
 
fn main() {
    // Anchored patterns (^ or $) benefit less from is_match
    let anchored_start = Regex::new(r"^\d+").unwrap();
    let anchored_end = Regex::new(r"\d+$").unwrap();
    let unanchored = Regex::new(r"\d+").unwrap();
    
    let text = "12345 is a number";
    
    // Anchored at start: must check position 0
    // is_match and find do similar work
    println!("Start anchored is_match: {}", anchored_start.is_match(text));
    if let Some(mat) = anchored_start.find(text) {
        println!("Start anchored find: {}-{}", mat.start(), mat.end());
    }
    
    // Anchored at end: must scan to end
    let text2 = "number is 12345";
    println!("End anchored is_match: {}", anchored_end.is_match(text2));
    
    // Unanchored: is_match can exit early
    // find must compute full match bounds
    println!("Unanchored is_match: {}", unanchored.is_match(text));
    if let Some(mat) = unanchored.find(text) {
        println!("Unanchored find: {}-{}", mat.start(), mat.end());
    }
}

Anchored patterns reduce the advantage of is_match over find.

Use Case Decision Tree

use regex::Regex;
 
fn main() {
    // Decision: is_match vs find vs captures
    
    // 1. Do you need to know IF pattern exists?
    //    -> Use is_match
    
    // 2. Do you need to know WHERE pattern exists?
    //    -> Use find
    
    // 3. Do you need to extract parts of the match?
    //    -> Use captures
    
    // 4. Do you need all matches?
    //    -> Use find_iter or captures_iter
    
    // Example decision flow:
    
    // Case A: Filter spam - just need yes/no
    let spam_indicator = Regex::new(r"(buy now|click here|free)").unwrap();
    fn is_spam(text: &str) -> bool {
        spam_indicator.is_match(text)  // is_match sufficient
    }
    
    // Case B: Highlight matches - need positions
    let highlight_pattern = Regex::new(r"\b\w{4,}\b").unwrap();
    fn highlight_long_words(text: &str) -> String {
        // Need positions, use find_iter
        highlight_pattern.replace_all(text, "**$0**")
    }
    
    // Case C: Parse structured data - need groups
    let kv_pattern = Regex::new(r"(\w+)=(\w+)").unwrap();
    fn parse_key_values(text: &str) -> Vec<(String, String)> {
        kv_pattern.captures_iter(text)
            .map(|caps| (caps[1].to_string(), caps[2].to_string()))
            .collect()
    }
    
    // Case D: Count occurrences - can use either
    let word_pattern = Regex::new(r"\b\w+\b").unwrap();
    fn count_words(text: &str) -> usize {
        // find_iter is clearer here
        word_pattern.find_iter(text).count()
    }
    
    // But if you only need existence check:
    fn has_any_words(text: &str) -> bool {
        // is_match is cleaner
        word_pattern.is_match(text)
    }
}

Choose based on what information you need from the match.

Benchmark Patterns

use regex::Regex;
use std::time::Instant;
 
fn main() {
    let patterns = [
        (r"\d{4}-\d{2}-\d{2}", "Date pattern"),
        (r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}", "Email pattern"),
        (r"https?://[^\s]+", "URL pattern"),
        (r"\b\w{10,}\b", "Long word"),
    ];
    
    let text = "Contact john.doe+test@example.com on 2024-01-15 at https://example.com/path/to/page for more info about internationalization";
    
    for (pat, name) in patterns {
        let regex = Regex::new(pat).unwrap();
        
        let start = Instant::now();
        for _ in 0..100_000 {
            let _ = regex.is_match(text);
        }
        let is_match_time = start.elapsed();
        
        let start = Instant::now();
        for _ in 0..100_000 {
            let _ = regex.find(text);
        }
        let find_time = start.elapsed();
        
        println!("{}", name);
        println!("  is_match: {:?}", is_match_time);
        println!("  find: {:?}", find_time);
        println!("  ratio: {:.2}x", find_time.as_secs_f64() / is_match_time.as_secs_f64());
    }
}

Benchmarks show the actual performance difference for various patterns.

Synthesis

Method comparison:

Method	Returns	Use Case
`is_match`	`bool`	Existence check only
`find`	`Option<Match>`	Single match with position
`find_iter`	`Iterator<Match>`	All matches with positions
`captures`	`Option<Captures>`	Groups extraction
`captures_iter`	`Iterator<Captures>`	All matches with groups

When to use each:

Scenario	Best Method
Validation (yes/no)	`is_match`
Feature detection	`is_match`
First match location	`find`
All match locations	`find_iter`
Extract matched text	`find` / `find_iter`
Extract named parts	`captures` / `captures_iter`
Replace with references	`replace` with `captures`

Performance considerations:

Factor	`is_match`	`find`
Return overhead	`bool` (minimal)	`Match` struct
Match bounds	Not computed	Computed
Early termination	Possible	Limited
Anchored patterns	Similar work	Similar work
Unanchored patterns	May skip bounds	Must compute bounds

Key insight: The trade-off between is_match and find is primarily about information density versus simplicity. is_match answers "does this exist?" with minimal overhead—a boolean return value that requires no allocation. find answers "where does this exist?" returning match bounds that enable content extraction, at the cost of computing and allocating Match structures. The performance difference is typically small for simple patterns (often <10%) but can be meaningful in tight loops or when match bounds are expensive to compute (variable-length matches). The more important consideration is semantic: use is_match when you're implementing boolean logic (validation, filtering, conditional branching), and use find when match positions enable downstream processing (extraction, highlighting, replacement, or position-aware analysis). For capturing groups, neither method suffices—captures provides the necessary group extraction. The regex engine optimizations mean both methods share the same matching logic; is_match simply stops earlier in some cases where computing match bounds would add unnecessary work.

What are the trade-offs between regex::Regex::is_match and find for simple pattern detection?