What are the trade-offs between regex::Regex::is_match and find for simple pattern detection?
regex::Regex::is_match and find serve different purposes in the regex API: is_match performs a boolean existence check that stops at the first match position, while find returns the actual byte offsets of matches, enabling extraction of matched content. The trade-off is between simplicity and capabilityâis_match is simpler and slightly faster when you only need to know if a pattern exists, but find provides match locations that is_match cannot supply. Both methods use the same underlying regex engine, but is_match can exit earlier in certain cases because it doesn't need to compute the full match bounds.
Basic is_match Usage
use regex::Regex;
fn main() {
let pattern = Regex::new(r"\d{4}").unwrap();
// is_match returns bool only
let text = "The year is 2024";
if pattern.is_match(text) {
println!("Found a 4-digit number");
} else {
println!("No 4-digit number found");
}
// Simple existence check
let has_email = Regex::new(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}").unwrap();
let text = "Contact: user@example.com";
if has_email.is_match(text) {
println!("Text contains an email");
}
}is_match provides a simple boolean answer about pattern existence.
Basic find Usage
use regex::Regex;
fn main() {
let pattern = Regex::new(r"\d{4}").unwrap();
let text = "The years 2023 and 2024";
// find returns Option<Match> with position
if let Some(mat) = pattern.find(text) {
println!("Found at {}-{}: {}", mat.start(), mat.end(), &text[mat.start()..mat.end()]);
}
// Iterate over all matches
let numbers = Regex::new(r"\d+").unwrap();
let text = "Numbers: 42, 100, 7";
for mat in numbers.find_iter(text) {
println!("Found '{}' at {}-{}", &text[mat.start()..mat.end()], mat.start(), mat.end());
}
}find returns match positions, enabling content extraction.
Return Type Comparison
use regex::Regex;
fn main() {
let pattern = Regex::new(r"hello").unwrap();
let text = "hello world";
// is_match: bool
let exists: bool = pattern.is_match(text);
println!("is_match returns: {}", exists);
// find: Option<Match>
let location: Option<regex::Match> = pattern.find(text);
match location {
Some(mat) => {
println!("find returns: Match({}:{})", mat.start(), mat.end());
println!("Matched text: '{}'", &text[mat.start()..mat.end()]);
}
None => println!("find returns: None"),
}
// Key difference in return types:
// is_match: bool
// find: Option<regex::Match>
// find_iter: impl Iterator<Item = Match>
}The return types encode what information is available after the operation.
Performance Comparison
use regex::Regex;
use std::time::Instant;
fn main() {
let pattern = Regex::new(r"\d{4}-\d{2}-\d{2}").unwrap();
let text = "Dates: 2024-01-15, 2024-02-20, 2024-03-25, 2024-04-30";
const ITERATIONS: usize = 100_000;
// Benchmark is_match
let start = Instant::now();
for _ in 0..ITERATIONS {
let _ = pattern.is_match(text);
}
let is_match_time = start.elapsed();
// Benchmark find
let start = Instant::now();
for _ in 0..ITERATIONS {
let _ = pattern.find(text);
}
let find_time = start.elapsed();
// Benchmark find_iter
let start = Instant::now();
for _ in 0..ITERATIONS {
let _ = pattern.find_iter(text).count();
}
let find_iter_time = start.elapsed();
println!("is_match: {:?}", is_match_time);
println!("find: {:?}", find_time);
println!("find_iter: {:?}", find_iter_time);
println!("find/is_match ratio: {:.2}x",
find_time.as_secs_f64() / is_match_time.as_secs_f64());
}is_match can be slightly faster when you don't need match positions.
Early Termination Optimization
use regex::Regex;
fn main() {
// is_match can terminate as soon as a match is confirmed
// find must determine the full match bounds
let pattern = Regex::new(r"a+").unwrap();
let text = "aaaaa"; // Five 'a' characters
// For is_match:
// - Engine finds 'a' at position 0
// - Returns true immediately
// - May not examine all 'a's
// For find:
// - Engine finds 'a' at position 0
// - Must find end of match (position 5)
// - Returns Match(0, 5)
println!("is_match: {}", pattern.is_match(text));
if let Some(mat) = pattern.find(text) {
println!("find: {}-{}", mat.start(), mat.end());
}
// The difference is more pronounced with:
// 1. Variable-length patterns (a+, a*)
// 2. Long matches
// 3. Complex patterns with alternations
}is_match can skip computing match bounds in some regex engines.
When is_match is Sufficient
use regex::Regex;
fn main() {
// Validation: only need yes/no
let email_pattern = Regex::new(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$").unwrap();
fn is_valid_email(email: &str) -> bool {
email_pattern.is_match(email)
}
println!("Is valid: {}", is_valid_email("user@example.com"));
println!("Is valid: {}", is_valid_email("invalid"));
// Feature detection: only need existence
let has_uppercase = Regex::new(r"[A-Z]").unwrap();
let has_digit = Regex::new(r"\d").unwrap();
let has_special = Regex::new(r"[!@#$%^&*]").unwrap();
fn check_password_strength(password: &str) -> usize {
let mut strength = 0;
if has_uppercase.is_match(password) { strength += 1; }
if has_digit.is_match(password) { strength += 1; }
if has_special.is_match(password) { strength += 1; }
strength
}
println!("Strength: {}", check_password_strength("Pass123!"));
// Content filtering: only need to detect
let spam_patterns = [
Regex::new(r"buy now").unwrap(),
Regex::new(r"click here").unwrap(),
Regex::new(r"free money").unwrap(),
];
fn is_spam(text: &str) -> bool {
spam_patterns.iter().any(|p| p.is_match(text))
}
}Use is_match when match location adds no value.
When find is Necessary
use regex::Regex;
fn main() {
// Extraction: need the matched content
let url_pattern = Regex::new(r"https?://[^\s]+").unwrap();
let text = "Visit https://example.com and http://rust-lang.org";
// Can't do this with is_match
for mat in url_pattern.find_iter(text) {
println!("URL: {}", &text[mat.start()..mat.end()]);
}
// Position-based processing
let log_pattern = Regex::new(r"\[(ERROR|WARN|INFO)\]").unwrap();
let log = "[ERROR] Failed to connect [WARN] Retrying [INFO] Connected";
for mat in log_pattern.find_iter(log) {
let level = &log[mat.start()+1..mat.end()-1];
println!("Log level at position {}: {}", mat.start(), level);
}
// Replacement needs location
let censor_pattern = Regex::new(r"\b(password|secret|key)\b").unwrap();
let text = "The password is secret123 and key is abc";
let censored = censor_pattern.replace_all(text, "***");
println!("Censored: {}", censored);
}Use find when you need to extract or process matched content.
find_iter for Multiple Matches
use regex::Regex;
fn main() {
let word_pattern = Regex::new(r"\b\w+\b").unwrap();
let text = "The quick brown fox jumps over the lazy dog";
// Count matches
let count = word_pattern.find_iter(text).count();
println!("Word count: {}", count);
// Collect all matches
let words: Vec<&str> = word_pattern
.find_iter(text)
.map(|mat| &text[mat.start()..mat.end()])
.collect();
println!("Words: {:?}", words);
// Process with positions
let long_words: Vec<(usize, usize, &str)> = word_pattern
.find_iter(text)
.filter(|mat| mat.end() - mat.start() > 3)
.map(|mat| (mat.start(), mat.end(), &text[mat.start()..mat.end()]))
.collect();
println!("Long words: {:?}", long_words);
// Find specific match
let first_long = word_pattern
.find_iter(text)
.find(|mat| mat.end() - mat.start() > 4);
if let Some(mat) = first_long {
println!("First long word: '{}'", &text[mat.start()..mat.end()]);
}
}find_iter enables processing all matches with full position information.
Capturing Groups: find vs captures
use regex::Regex;
fn main() {
let date_pattern = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
let text = "Date: 2024-01-15";
// find: only gets full match
if let Some(mat) = date_pattern.find(text) {
println!("Full match: {}", &text[mat.start()..mat.end()]);
// Can't access individual groups
}
// captures: gets groups
if let Some(caps) = date_pattern.captures(text) {
println!("Full: {}", &caps[0]);
println!("Year: {}", &caps[1]);
println!("Month: {}", &caps[2]);
println!("Day: {}", &caps[3]);
}
// When you only need groups, captures is required
// When you only need to check existence, is_match is sufficient
// find is the middle ground: full match but no groups
}For capturing groups, neither is_match nor find is sufficientâuse captures.
Memory Allocation
use regex::Regex;
fn main() {
let pattern = Regex::new(r"\d+").unwrap();
let text = "Numbers: 123, 456, 789";
// is_match: no allocation for match data
let exists = pattern.is_match(text);
// Returns simple bool
// find: allocates Match struct
let maybe_match = pattern.find(text);
// Returns Option<Match>
// Match contains: start (usize), end (usize)
// The Match struct is small (2 usizes)
// But still more than a bool
// find_iter: allocates Match for each iteration
let matches: Vec<_> = pattern.find_iter(text).collect();
// Each Match is allocated separately
// For tight loops, is_match avoids Match allocation
}is_match avoids allocating match metadata; find allocates Match structs.
Short-Circuit Behavior
use regex::Regex;
fn main() {
// Patterns that can match at multiple positions
let pattern = Regex::new(r"a+").unwrap();
let text = "aaa bbb aaa";
// is_match: stops at first successful match position
// Returns true immediately after finding "aaa" at position 0
// find: returns the first match with bounds
// Must scan to find the end of "aaa" (position 3)
// For anchored patterns, the difference is minimal
let anchored = Regex::new(r"^a+").unwrap();
// is_match and find do essentially the same work for anchored patterns
// Both must examine from position 0
// For unanchored patterns on long strings:
let long_text = "xyz ".repeat(1000) + "aaa";
// is_match must scan until it finds "aaa" at position 4000
// But it stops as soon as it confirms the match exists
// find must also compute that the match ends at position 4003
}The performance gap depends on pattern anchoring and match length.
Anchored Pattern Optimization
use regex::Regex;
fn main() {
// Anchored patterns (^ or $) benefit less from is_match
let anchored_start = Regex::new(r"^\d+").unwrap();
let anchored_end = Regex::new(r"\d+$").unwrap();
let unanchored = Regex::new(r"\d+").unwrap();
let text = "12345 is a number";
// Anchored at start: must check position 0
// is_match and find do similar work
println!("Start anchored is_match: {}", anchored_start.is_match(text));
if let Some(mat) = anchored_start.find(text) {
println!("Start anchored find: {}-{}", mat.start(), mat.end());
}
// Anchored at end: must scan to end
let text2 = "number is 12345";
println!("End anchored is_match: {}", anchored_end.is_match(text2));
// Unanchored: is_match can exit early
// find must compute full match bounds
println!("Unanchored is_match: {}", unanchored.is_match(text));
if let Some(mat) = unanchored.find(text) {
println!("Unanchored find: {}-{}", mat.start(), mat.end());
}
}Anchored patterns reduce the advantage of is_match over find.
Use Case Decision Tree
use regex::Regex;
fn main() {
// Decision: is_match vs find vs captures
// 1. Do you need to know IF pattern exists?
// -> Use is_match
// 2. Do you need to know WHERE pattern exists?
// -> Use find
// 3. Do you need to extract parts of the match?
// -> Use captures
// 4. Do you need all matches?
// -> Use find_iter or captures_iter
// Example decision flow:
// Case A: Filter spam - just need yes/no
let spam_indicator = Regex::new(r"(buy now|click here|free)").unwrap();
fn is_spam(text: &str) -> bool {
spam_indicator.is_match(text) // is_match sufficient
}
// Case B: Highlight matches - need positions
let highlight_pattern = Regex::new(r"\b\w{4,}\b").unwrap();
fn highlight_long_words(text: &str) -> String {
// Need positions, use find_iter
highlight_pattern.replace_all(text, "**$0**")
}
// Case C: Parse structured data - need groups
let kv_pattern = Regex::new(r"(\w+)=(\w+)").unwrap();
fn parse_key_values(text: &str) -> Vec<(String, String)> {
kv_pattern.captures_iter(text)
.map(|caps| (caps[1].to_string(), caps[2].to_string()))
.collect()
}
// Case D: Count occurrences - can use either
let word_pattern = Regex::new(r"\b\w+\b").unwrap();
fn count_words(text: &str) -> usize {
// find_iter is clearer here
word_pattern.find_iter(text).count()
}
// But if you only need existence check:
fn has_any_words(text: &str) -> bool {
// is_match is cleaner
word_pattern.is_match(text)
}
}Choose based on what information you need from the match.
Benchmark Patterns
use regex::Regex;
use std::time::Instant;
fn main() {
let patterns = [
(r"\d{4}-\d{2}-\d{2}", "Date pattern"),
(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}", "Email pattern"),
(r"https?://[^\s]+", "URL pattern"),
(r"\b\w{10,}\b", "Long word"),
];
let text = "Contact john.doe+test@example.com on 2024-01-15 at https://example.com/path/to/page for more info about internationalization";
for (pat, name) in patterns {
let regex = Regex::new(pat).unwrap();
let start = Instant::now();
for _ in 0..100_000 {
let _ = regex.is_match(text);
}
let is_match_time = start.elapsed();
let start = Instant::now();
for _ in 0..100_000 {
let _ = regex.find(text);
}
let find_time = start.elapsed();
println!("{}", name);
println!(" is_match: {:?}", is_match_time);
println!(" find: {:?}", find_time);
println!(" ratio: {:.2}x", find_time.as_secs_f64() / is_match_time.as_secs_f64());
}
}Benchmarks show the actual performance difference for various patterns.
Synthesis
Method comparison:
| Method | Returns | Use Case |
|---|---|---|
is_match |
bool |
Existence check only |
find |
Option<Match> |
Single match with position |
find_iter |
Iterator<Match> |
All matches with positions |
captures |
Option<Captures> |
Groups extraction |
captures_iter |
Iterator<Captures> |
All matches with groups |
When to use each:
| Scenario | Best Method |
|---|---|
| Validation (yes/no) | is_match |
| Feature detection | is_match |
| First match location | find |
| All match locations | find_iter |
| Extract matched text | find / find_iter |
| Extract named parts | captures / captures_iter |
| Replace with references | replace with captures |
Performance considerations:
| Factor | is_match |
find |
|---|---|---|
| Return overhead | bool (minimal) |
Match struct |
| Match bounds | Not computed | Computed |
| Early termination | Possible | Limited |
| Anchored patterns | Similar work | Similar work |
| Unanchored patterns | May skip bounds | Must compute bounds |
Key insight: The trade-off between is_match and find is primarily about information density versus simplicity. is_match answers "does this exist?" with minimal overheadâa boolean return value that requires no allocation. find answers "where does this exist?" returning match bounds that enable content extraction, at the cost of computing and allocating Match structures. The performance difference is typically small for simple patterns (often <10%) but can be meaningful in tight loops or when match bounds are expensive to compute (variable-length matches). The more important consideration is semantic: use is_match when you're implementing boolean logic (validation, filtering, conditional branching), and use find when match positions enable downstream processing (extraction, highlighting, replacement, or position-aware analysis). For capturing groups, neither method sufficesâcaptures provides the necessary group extraction. The regex engine optimizations mean both methods share the same matching logic; is_match simply stops earlier in some cases where computing match bounds would add unnecessary work.
