What are the implications of using regex::Regex::new vs regex::RegexBuilder for configuring regex compilation options?
regex::Regex::new provides a simple, one-line interface for compiling regex patterns with default options, while regex::RegexBuilder offers fine-grained control over compilation settings including case sensitivity, multiline mode, size limits, and execution time limits. The choice between them depends on whether you need the default behavior or require customization for performance, security, or feature requirements. Regex::new is ideal for simple patterns and quick prototyping where default settings work well. RegexBuilder becomes necessary when you need to tune compilation for large patterns, untrusted input, or specific matching behaviors like Unicode handling or line-oriented matching.
Basic Regex::new Usage
use regex::Regex;
fn main() {
// Simple pattern with default options
let re = Regex::new(r"\d{4}").unwrap();
let text = "Year: 2024";
if let Some(caps) = re.find(text) {
println!("Found: {}", caps.as_str());
}
// Default behavior:
// - Case sensitive
// - Single line mode (^ and $ match start/end)
// - Unicode enabled
// - No size or time limits
}Regex::new uses all default settings with no customization options.
Basic RegexBuilder Usage
use regex::RegexBuilder;
fn main() {
// Same pattern with explicit configuration
let re = RegexBuilder::new(r"\d{4}")
.build()
.unwrap();
// Now with case insensitive matching
let re_casei = RegexBuilder::new(r"hello")
.case_insensitive(true)
.build()
.unwrap();
assert!(re_casei.is_match("HELLO"));
assert!(re_casei.is_match("hello"));
assert!(re_casei.is_match("HeLLo"));
}RegexBuilder chains configuration methods before building.
Case Sensitivity Control
use regex::{Regex, RegexBuilder};
fn main() {
// Regex::new: case sensitive by default
let re_default = Regex::new(r"hello").unwrap();
assert!(!re_default.is_match("HELLO"));
assert!(re_default.is_match("hello"));
// Regex::new with inline flag
let re_inline = Regex::new(r"(?i)hello").unwrap();
assert!(re_inline.is_match("HELLO"));
assert!(re_inline.is_match("hello"));
// RegexBuilder: explicit case insensitive setting
let re_builder = RegexBuilder::new(r"hello")
.case_insensitive(true)
.build()
.unwrap();
assert!(re_builder.is_match("HELLO"));
assert!(re_builder.is_match("hello"));
// Builder setting is equivalent to inline flag
// but more explicit in configuration code
}Both inline flags and RegexBuilder can set case insensitivity.
Multiline and Dotall Modes
use regex::{Regex, RegexBuilder};
fn main() {
let text = "line1\nline2\nline3";
// Single line mode (default): ^ and $ match start/end of string
let re_single = Regex::new(r"^line2$").unwrap();
assert!(!re_single.is_match(text)); // No match: ^line2$ doesn't match entire string
// Multiline mode: ^ and $ match start/end of lines
let re_multi = RegexBuilder::new(r"^line2$")
.multi_line(true)
.build()
.unwrap();
assert!(re_multi.is_match(text)); // Matches line2 on its own line
// Dotall mode: . matches newlines
let re_dot = RegexBuilder::new(r"line1.line2")
.dot_matches_new_line(true)
.build()
.unwrap();
assert!(re_dot.is_match(text)); // . now matches \n
// Combining modes
let re_combined = RegexBuilder::new(r"^line1.*line3$")
.multi_line(true)
.dot_matches_new_line(true)
.build()
.unwrap();
assert!(re_combined.is_match(text));
}multi_line and dot_matches_new_line change anchor and dot behavior.
Size Limits for Security
use regex::{Regex, RegexBuilder};
fn main() {
// Default size limit: ~10MB for compiled regex
// This protects against regex bombs
// Large regex with default limit might fail
let huge_pattern = format!("(a|{})", "a".repeat(100));
// Result depends on pattern complexity
match Regex::new(&huge_pattern) {
Ok(_) => println!("Compiled with default limit"),
Err(e) => println!("Failed: {}", e),
}
// Increase size limit for complex patterns
let re_large = RegexBuilder::new(&huge_pattern)
.size_limit(100_000_000) // 100MB
.build();
// Or decrease for strict limits
let re_strict = RegexBuilder::new(r"(a+)+")
.size_limit(1_000) // Very strict
.build();
// Size limit protects against malicious patterns
// that would consume excessive memory
}Size limits prevent regex denial-of-service attacks.
Execution Time Limits
use regex::{Regex, RegexBuilder};
use std::time::Duration;
fn main() {
// Without time limit, catastrophic backtracking can hang
// Example: (a+)+ on long strings of 'a'
// Add time limit to prevent hangs
let re_safe = RegexBuilder::new(r"(a+)+b")
.time_limit(Duration::from_millis(100))
.build()
.unwrap();
// Safe regex will timeout rather than hang
let text = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa";
// This pattern is safe to run with time limit
match re_safe.find(text) {
Some(_) => println!("Found match"),
None => println!("No match (or timed out)"),
}
// Without time limit, this could run for a very long time
// depending on pattern and input
// Time limit is essential for:
// - User-provided patterns
// - Unknown/untrusted input
// - Server applications that must respond
}Time limits prevent catastrophic backtracking from hanging your application.
Unicode Handling
use regex::{Regex, RegexBuilder};
fn main() {
// Default: Unicode enabled
let re_unicode = Regex::new(r"\w+").unwrap();
assert!(re_unicode.is_match("日本語")); // Japanese matches \w
// Disable Unicode for ASCII-only matching
let re_ascii = RegexBuilder::new(r"\w+")
.unicode(false)
.build()
.unwrap();
assert!(!re_ascii.is_match("日本語")); // No match
assert!(re_ascii.is_match("hello")); // ASCII matches
// Unicode classes
let re_class = RegexBuilder::new(r"\p{Script=Hiragana}+")
.unicode(true)
.build()
.unwrap();
assert!(re_class.is_match("ひらがな"));
// Byte-based matching (disables Unicode)
let re_bytes = RegexBuilder::new(r"\w+")
.unicode(false)
.build()
.unwrap();
}Unicode settings affect \w, \d, \s and other character classes.
NFA Engine Settings
use regex::{Regex, RegexBuilder};
fn main() {
// Default: hybrid NFA/DFA engine (PikeVM + backtracking)
// Good for most use cases
// For very large regexes, tune NFA settings
let re = RegexBuilder::new(r"[a-z]+\d+[a-z]+")
.nfa_size_limit(1_000_000) // Limit NFA state size
.build()
.unwrap();
// For patterns with many alternations
let re_alts = RegexBuilder::new(&format!("{}|{}|{}",
"pattern1", "pattern2", "pattern3"))
.build()
.unwrap();
// NFA size limit prevents excessive memory use during compilation
}NFA limits control compilation memory for complex patterns.
Caching Behavior
use regex::{Regex, RegexBuilder};
fn main() {
// RegexBuilder doesn't cache by default
// Each .build() creates a new compiled regex
let re1 = RegexBuilder::new(r"\d+")
.build()
.unwrap();
let re2 = RegexBuilder::new(r"\d+")
.case_insensitive(true) // Different config
.build()
.unwrap();
// These are separate compiled patterns
// No sharing even though pattern is same
// If you need to reuse, store the compiled Regex:
let digit_matcher = Regex::new(r"\d+").unwrap();
// Reuse digit_matcher throughout your application
}Each .build() call compiles a new regex; cache at the application level.
When to Use Each Approach
use regex::{Regex, RegexBuilder};
fn main() {
// USE Regex::new WHEN:
// - Default behavior is correct
// - Pattern is simple and trusted
// - Quick prototyping or one-off matching
let re_simple = Regex::new(r"\d{4}-\d{2}-\d{2}").unwrap();
// USE RegexBuilder WHEN:
// - Need case insensitive matching (cleaner than (?i))
// - Processing untrusted patterns (security limits)
// - Matching against untrusted input (time limits)
// - Need to tune Unicode behavior
// - Want explicit configuration in code
let re_configured = RegexBuilder::new(r"user:(\w+)")
.case_insensitive(true)
.unicode(true)
.time_limit(std::time::Duration::from_millis(50))
.build()
.unwrap();
// SECURITY CONTEXT EXAMPLE
// User-provided pattern with limits
fn compile_user_pattern(pattern: &str) -> Result<Regex, regex::Error> {
RegexBuilder::new(pattern)
.size_limit(10_000) // Small size limit
.time_limit(std::time::Duration::from_millis(100))
.build()
}
}Choose based on control needs and trust level of input.
Compile-Time vs Runtime Flags
use regex::{Regex, RegexBuilder};
fn main() {
// Inline flags: compile-time specification
let re_inline = Regex::new(r"(?i)(?m)^hello$").unwrap();
// Flags are part of the pattern string
// RegexBuilder: runtime configuration
let case_insensitive = true;
let multiline = true;
let re_builder = RegexBuilder::new(r"^hello$")
.case_insensitive(case_insensitive)
.multi_line(multiline)
.build()
.unwrap();
// Runtime configuration is useful when:
// - Settings come from config files
// - Flags are determined by user preferences
// - You want to separate pattern from options
}RegexBuilder separates pattern from configuration for cleaner code.
Error Handling Differences
use regex::{Regex, RegexBuilder, Error};
fn main() {
// Both return Result<Regex, Error>
// Error types are the same
match Regex::new(r"[invalid") {
Ok(re) => println!("Compiled: {:?}", re),
Err(e) => println!("Error: {}", e),
}
match RegexBuilder::new(r"[invalid").build() {
Ok(re) => println!("Compiled: {:?}", re),
Err(e) => println!("Error: {}", e),
}
// Both provide same error information
// RegexBuilder errors may include size/time limit violations
// Pattern with size limit exceeded
match RegexBuilder::new(r"(a+)+b")
.size_limit(100) // Very small
.build()
{
Ok(_) => println!("Compiled"),
Err(e) => println!("Size limit exceeded: {}", e),
}
}Error handling is identical; both use regex::Error.
Complete Configuration Example
use regex::RegexBuilder;
use std::time::Duration;
fn main() {
let pattern = r"\b\w+@\w+\.\w+\b"; // Email-like pattern
let re = RegexBuilder::new(pattern)
// Matching behavior
.case_insensitive(true) // (?i)
.multi_line(false) // (?m) off
.dot_matches_new_line(false) // (?s) off
.crlf(false) // \n vs \r\n line endings
.line_terminator(None) // Custom line terminator
// Unicode
.unicode(true) // Enable Unicode
.utf8(true) // Expect UTF-8 input
// Performance/Security limits
.size_limit(10_000_000) // Max compiled size
.time_limit(Duration::from_millis(500)) // Max execution time
// Advanced
.nest_limit(250) // Nesting depth limit
.octonary(false) // Octal escapes
.build()
.unwrap();
let text = "Contact: user@Example.COM";
if let Some(m) = re.find(text) {
println!("Found: {}", m.as_str());
}
}RegexBuilder supports all regex compilation options in a chainable interface.
Performance Implications
use regex::{Regex, RegexBuilder};
use std::time::Instant;
fn main() {
let text = "a".repeat(1000);
// Compile time: Regex::new vs RegexBuilder::new().build()
// Essentially identical (both compile pattern)
let start = Instant::now();
for _ in 0..100 {
let _ = Regex::new(r"\d+").unwrap();
}
let new_time = start.elapsed();
let start = Instant::now();
for _ in 0..100 {
let _ = RegexBuilder::new(r"\d+").build().unwrap();
}
let builder_time = start.elapsed();
println!("Regex::new: {:?}", new_time);
println!("RegexBuilder: {:?}", builder_time);
// Times are similar - both compile the regex
// MATCH time is affected by configuration, not method used
// case_insensitive(true) may be slightly slower
// time_limit adds overhead for checking
// unicode(false) is faster for ASCII-only input
}Compile time is similar; match time depends on configuration settings.
Synthesis
Core distinction:
Regex::new(pattern): compile with all defaults, one-line callRegexBuilder::new(pattern).option(value).build(): configure before compilation
Default settings in both:
- Case sensitive
- Single line mode (
^/$match string start/end) - Unicode enabled
- No time or size limits
- Standard error handling
When Regex::new is sufficient:
- Default behavior is correct
- Pattern is trusted (not user-provided)
- Input is trusted (no malicious strings)
- Quick one-off pattern matching
When RegexBuilder is necessary:
- Case insensitivity without inline flags
- Multiline or dotall mode configuration
- Size limits for security (DoS protection)
- Time limits for untrusted input
- Unicode tuning for ASCII-only matching
- Configuration from external sources (config files, user settings)
- Explicit control in production code
Security implications:
- Default settings have reasonable limits for most cases
- Increase limits for complex legitimate patterns
- Decrease limits for untrusted patterns
- Time limits prevent catastrophic backtracking
- Size limits prevent memory exhaustion
Key insight: The choice is about control, not capability—inline flags like (?i) can achieve most settings through Regex::new. RegexBuilder is valuable when configuration should be separate from pattern (cleaner code), determined at runtime (config-driven), or needs explicit security limits (untrusted input). The performance overhead of using RegexBuilder is negligible; the compilation result is identical. Use Regex::new for simplicity when defaults work, and RegexBuilder when you need explicit, configurable, or secure compilation settings.
