How does regex::Regex::shortest_match differ from is_match for finding the earliest match position?
shortest_match returns the end position of the earliest (leftmost) match, while is_match returns only a boolean indicating whether any match exists without providing position information. Both methods find the same starting position (the leftmost match), but shortest_match gives you the end position of the shortest possible match at that location, enabling positional analysis that is_match cannot provide.
The is_match Method: Boolean Result Only
use regex::Regex;
fn is_match_example() {
let re = Regex::new(r"\d+").unwrap();
let text = "abc123def456";
// is_match returns true if ANY match exists
let has_match = re.is_match(text);
assert!(has_match); // true, "123" matches
// But you don't know WHERE the match is
// Or what text matched
// Just that something matched
// Common use case: validation
let email_re = Regex::new(r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$").unwrap();
let valid = email_re.is_match("user@example.com"); // true
let invalid = email_re.is_match("not-an-email"); // false
}is_match is the simplest check: does the pattern match anywhere? No position, no capture, just yes/no.
The shortest_match Method: Position Information
use regex::Regex;
fn shortest_match_example() {
let re = Regex::new(r"\d+").unwrap();
let text = "abc123def";
// shortest_match returns Option<usize>
// The usize is the END position of the match
let end = re.shortest_match(text);
// The match starts at index 3 (after "abc")
// shortest_match returns index 6 (after "123")
assert_eq!(end, Some(6));
// To get the matched text:
if let Some(end_pos) = re.shortest_match(text) {
// You need to know the start position
// For shortest_match, it's always the first position where match succeeds
// But finding start requires another approach
}
}shortest_match returns the end position of the match, letting you locate where the match occurs.
What "Shortest" Means
use regex::Regex;
fn shortest_meaning() {
let re = Regex::new(r"a+").unwrap();
let text = "aaa bbb";
// The pattern "a+" matches one or more 'a's
// At position 0, possible matches are: "a", "aa", "aaa"
// shortest_match finds the shortest match at the leftmost position
let end = re.shortest_match(text);
assert_eq!(end, Some(1)); // End position after "a" (shortest match)
// Compare to find, which finds the longest match by default:
let longest = re.find(text);
assert_eq!(longest.unwrap().end(), 3); // End position after "aaa" (longest)
}shortest_match finds the shortest possible match at the leftmost position, while find returns the longest match.
Greedy vs Non-Greedy Behavior
use regex::Regex;
fn greedy_comparison() {
let text = "hello world";
// Greedy pattern: matches as much as possible
let greedy = Regex::new(r".+").unwrap();
let greedy_match = greedy.find(text);
assert_eq!(greedy_match.unwrap().range(), 0..11); // Matches all
// shortest_match with greedy pattern still finds shortest at that position
let greedy_shortest = greedy.shortest_match(text);
assert_eq!(greedy_shortest, Some(1)); // Just "h"
// Non-greedy pattern: matches as little as possible
let non_greedy = Regex::new(r".+?").unwrap();
let non_greedy_match = non_greedy.find(text);
assert_eq!(non_greedy_match.unwrap().range(), 0..1); // Just "h"
// shortest_match with non-greedy gives same result
let non_greedy_shortest = non_greedy.shortest_match(text);
assert_eq!(non_greedy_shortest, Some(1)); // Just "h"
}shortest_match effectively gives you non-greedy behavior regardless of the pattern's greediness.
Finding the Match Start Position
use regex::Regex;
fn find_start_position() {
let re = Regex::new(r"\d+").unwrap();
let text = "abc123def";
// shortest_match only gives end position
// To find start, use find or captures
let match_end = re.shortest_match(text); // Some(6)
// Using find for full position info:
if let Some(m) = re.find(text) {
println!("Match: {}..{}", m.start(), m.end()); // 3..6
println!("Text: {}", m.as_str()); // "123"
}
// shortest_match is useful when:
// 1. You only need to know WHERE the match ends
// 2. You want to skip ahead in the text
// 3. You want the shortest match for efficiency
}If you need the start position, use find() instead of shortest_match().
Use Case: Efficient Text Skipping
use regex::Regex;
fn skip_ahead() {
// When processing large text, you might want to skip past matches
let re = Regex::new(r"\s+").unwrap(); // One or more whitespace
let text = "word1 word2\t\tword3";
// Find where whitespace ends
if let Some(end) = re.shortest_match(text) {
// Skip to after first whitespace match
let remaining = &text[end..];
println!("Remaining: {}", remaining); // " word2\t\tword3"
// Note: shortest_match found shortest at leftmost
// For skipping all whitespace, you'd need to iterate
}
}shortest_match is efficient when you only need to know where the match ends to skip ahead.
Use Case: Checking Match Location
use regex::Regex;
fn match_location() {
let re = Regex::new(r"error|warning").unwrap();
let text = "Build completed with error: missing semicolon";
// is_match: Just check if error exists
if re.is_match(text) {
println!("Contains error or warning");
// But WHERE is it?
}
// shortest_match: Find where match ends
if let Some(end) = re.shortest_match(text) {
// Now we know the match ends at position X
// We can extract context around it
println!("Match ends at position: {}", end); // 22
// Extract text before match end
let context = &text[..end];
println!("Context: {}", context); // "Build completed with error"
}
// For full position info, use find:
if let Some(m) = re.find(text) {
println!("Match: {}..{}", m.start(), m.end());
println!("Matched text: {}", m.as_str()); // "error"
}
}shortest_match provides positional context that is_match cannot.
Performance Characteristics
use regex::Regex;
fn performance() {
// is_match: Stops at first match found
// Fastest for boolean check
// shortest_match: Finds match, then finds shortest
// Slightly more work than is_match, but still optimized
// find: Finds match and its complete span
// More work than shortest_match for greedy patterns
let re = Regex::new(r"a+b+c+d+e+f+").unwrap();
let text = "aaaabbbcccdddeeefff";
// All three find the same starting position (0)
// But:
// - is_match: Returns true immediately after finding match
// - shortest_match: Finds "a" at start, returns end position 1
// - find: Finds entire "aaaabbbcccdddeeefff", returns end position 18
let _ = re.is_match(text); // Fastest
let _ = re.shortest_match(text); // Medium (returns 1)
let _ = re.find(text); // Slowest (returns full match)
}Performance hierarchy: is_match < shortest_match < find (fastest to slowest).
Behavior with Alternation Patterns
use regex::Regex;
fn alternation_patterns() {
// Alternation: match first alternative that succeeds
let re = Regex::new(r"abc|a").unwrap();
let text = "abcdef";
// Leftmost position where ANY alternative matches is 0
// At position 0, which alternatives match?
// - "abc" matches "abc"
// - "a" matches "a"
// is_match: true (something matches)
assert!(re.is_match(text));
// shortest_match: end of shortest match at leftmost position
let end = re.shortest_match(text);
// Both alternatives start at 0, but "a" is shorter
assert_eq!(end, Some(1)); // End of "a"
// find: longest match at leftmost position
let m = re.find(text).unwrap();
// "abc" is longer than "a"
assert_eq!(m.end(), 3); // End of "abc"
}With alternation, shortest_match finds the shortest alternative that matches, while find returns the longest.
Early Termination Benefit
use regex::Regex;
fn early_termination() {
// Complex pattern with many possibilities
let re = Regex::new(r"[0-9]+|[a-z]+").unwrap();
let text = "123abc456def";
// shortest_match at position 0:
// - "[0-9]+" matches "123"
// - "[a-z]+" doesn't match at position 0
// Shortest match from "[0-9]+" is "1"
let end = re.shortest_match(text);
assert_eq!(end, Some(1)); // Just "1"
// For validation purposes, this is often sufficient:
// "Does the text start with a number or letter?"
// is_match tells you yes/no
// shortest_match tells you yes/no AND where it ends
}shortest_match can terminate earlier than find for patterns that match long strings.
Shortest Match for Validation with Position
use regex::Regex;
fn validation_with_position() {
// Validate that text starts with expected pattern
let id_re = Regex::new(r"^[A-Z]{2}[0-9]{6}").unwrap();
let text = "AB123456 rest of text";
if id_re.is_match(text) {
// Valid ID format
// But where does the ID end?
}
if let Some(end) = id_re.shortest_match(text) {
// ID ends at position 8
// Now we can parse the rest
let rest = &text[end..];
println!("ID ends at: {}, rest: '{}'", end, rest);
}
// Note: For this pattern, shortest and longest match are the same
// because the pattern is fixed length
}When you need both validation and positional info, shortest_match is more efficient than find.
Iterating with shortest_match
use regex::Regex;
fn iterate_matches() {
let re = Regex::new(r"[a-z]+").unwrap();
let text = "abc123def456ghi";
// shortest_match finds first match's end
let mut pos = 0;
let text_str = text;
while pos < text_str.len() {
if let Some(end) = re.shortest_match(&text_str[pos..]) {
// Found match ending at 'end' relative to slice
println!("Match ends at: {}", pos + end);
pos += end; // Move past this match
} else {
break;
}
// This approach has issues - need actual match start
// Better to use find_iter for iteration
}
// For iteration, use find_iter:
for m in re.find_iter(text_str) {
println!("Match: {}..{} = '{}'", m.start(), m.end(), m.as_str());
}
}For iterating matches, find_iter is more appropriate than shortest_match.
Comparison Summary
use regex::Regex;
fn comparison_summary() {
let re = Regex::new(r"[a-z]+").unwrap();
let text = "abc123";
// is_match: Boolean, no position
let has_match: bool = re.is_match(text);
// Returns: true
// Use case: Validation, filtering
// shortest_match: End position of shortest match
let end: Option<usize> = re.shortest_match(text);
// Returns: Some(1) (shortest match "a" ends at 1)
// Use case: Find where match ends, early termination
// find: Full match span
let m = re.find(text);
// Returns: Some(Match { start: 0, end: 3 })
// Use case: Extract matched text, full position info
// find_iter: All matches (longest at each position)
// captures: Capture groups
// captures_iter: All captures
}| Method | Returns | Position Info | Use Case |
|---|---|---|---|
is_match |
bool |
None | Validation |
shortest_match |
Option<usize> |
End only | Position + early termination |
find |
Option<Match> |
Start + End | Extract matched text |
Synthesis
Quick reference:
use regex::Regex;
fn quick_reference() {
let re = Regex::new(r"\w+").unwrap();
let text = "hello world";
// is_match: Just check existence
if re.is_match(text) {
println!("Text contains word character(s)");
}
// shortest_match: Find where shortest match ends
if let Some(end) = re.shortest_match(text) {
println!("Shortest match ends at position: {}", end); // 1
}
// find: Get complete match info
if let Some(m) = re.find(text) {
println!("Match: {}..{} = '{}'", m.start(), m.end(), m.as_str());
// 0..5 = 'hello' (longest match at leftmost position)
}
}Key insight: is_match and shortest_match both find matches at the same leftmost starting position, but they differ fundamentally in what they return: is_match provides a boolean answer sufficient for validation checks, while shortest_match returns the end position of the shortest possible match at that location. The "shortest" aspect is about match length, not position—shortest_match still finds the leftmost match location, but it returns the minimum end position rather than the maximum. This is particularly useful for patterns with quantifiers like + or *: the pattern a+ could match "a", "aa", or "aaa" at position 0, and shortest_match returns the end after just "a" while find returns the end after matching all consecutive 'a's. Use is_match when you only need existence checking, shortest_match when you need to know where the match ends (especially for skipping ahead), and find when you need the full match span including the matched text.
