How does `criterion::BenchmarkGroup::bench_function` isolate benchmarks from external noise?

criterion::BenchmarkGroup::bench_function isolates benchmarks from external noise through statistical analysis, warm-up iterations, and measurement techniques that account for system variability. The benchmark framework runs multiple iterations, collects timing samples, applies statistical methods to detect outliers and estimate true performance, and reports confidence intervals rather than single-point measurements. This approach distinguishes genuine performance differences from random noise caused by CPU frequency scaling, cache state variations, OS scheduling, memory allocation patterns, and other system-level perturbations.

Basic Benchmark Setup

use criterion::{black_box, criterion_group, criterion_main, Criterion};
 
fn fibonacci(n: u64) -> u64 {
    match n {
        0 => 0,
        1 => 1,
        _ => fibonacci(n - 1) + fibonacci(n - 2),
    }
}
 
fn bench_fibonacci(c: &mut Criterion) {
    c.bench_function("fib 20", |b| {
        b.iter(|| fibonacci(black_box(20)));
    });
}
 
criterion_group!(benches, bench_fibonacci);
criterion_main!(benches);

bench_function accepts a closure that receives a Bencher for running iterations.

Warm-Up Phase

use criterion::{black_box, criterion_group, criterion_main, Criterion};
 
fn expensive_computation(n: usize) -> usize {
    (0..n).sum()
}
 
fn bench_with_warmup(c: &mut Criterion) {
    // Criterion automatically performs warm-up before measurement
    // This ensures caches are hot and JIT compilation is complete
    c.bench_function("sum 1000", |b| {
        b.iter(|| expensive_computation(black_box(1000)));
    });
}
 
criterion_group!(benches, bench_with_warmup);
criterion_main!(benches);

Warm-up runs many iterations before measuring, filling caches and triggering JIT compilation.

Black Box for Preventing Optimization

use criterion::{black_box, criterion_group, criterion_main, Criterion};
 
fn compute(x: u64) -> u64 {
    x * x + x
}
 
fn bench_without_black_box(c: &mut Criterion) {
    // DANGEROUS: Compiler might optimize away the computation
    c.bench_function("compute_naive", |b| {
        b.iter(|| compute(42));
    });
}
 
fn bench_with_black_box(c: &mut Criterion) {
    // CORRECT: black_box prevents optimization
    c.bench_function("compute_correct", |b| {
        b.iter(|| compute(black_box(42)));
    });
}
 
criterion_group!(benches, bench_without_black_box, bench_with_black_box);
criterion_main!(benches);

black_box prevents the compiler from optimizing away computations by hiding values from the optimizer.

Statistical Measurement

use criterion::{criterion_group, criterion_main, Criterion};
 
fn vector_push(n: usize) -> Vec<u64> {
    let mut v = Vec::with_capacity(n);
    for i in 0..n {
        v.push(i as u64);
    }
    v
}
 
fn bench_statistics(c: &mut Criterion) {
    c.bench_function("vector_push", |b| {
        b.iter(|| vector_push(100));
    });
    
    // Criterion automatically:
    // 1. Runs multiple iterations
    // 2. Collects timing samples
    // 3. Calculates statistics
    // 4. Reports confidence intervals
    // 5. Detects outliers
}
 
criterion_group!(benches, bench_statistics);
criterion_main!(benches);

Multiple samples are collected and analyzed statistically, not just timed once.

Iteration Count Selection

use criterion::{criterion_group, criterion_main, Criterion, BatchSize};
 
fn allocation_heavy(n: usize) -> Vec<u8> {
    vec
![0u8; n]
}
 
fn bench_batch_size(c: &mut Criterion) {
    // Criterion automatically determines iteration count
    // based on how long each iteration takes
    
    // For fast functions, many iterations per sample
    // For slow functions, fewer iterations per sample
    
    c.bench_function("allocate", |b| {
        b.iter(|| allocation_heavy(1024));
    });
    
    // You can also specify batch size manually:
    c.bench_function("allocate_batch", |b| {
        b.iter_batched(
            || 1024,              // Setup
            |n| allocation_heavy(n), // Routine
            BatchSize::SmallInput,   // Batch size
        );
    });
}
 
criterion_group!(benches, bench_batch_size);
criterion_main!(benches);

Iteration counts are automatically calibrated based on function duration.

Outlier Detection

use criterion::{criterion_group, criterion_main, Criterion};
 
fn noisy_operation() -> u64 {
    // Simulate an operation with variable runtime
    std::hint::black_box(42)
}
 
fn bench_outliers(c: &mut Criterion) {
    c.bench_function("noisy_op", |b| {
        b.iter(|| noisy_operation());
    });
    
    // Criterion output includes:
    // - Outlier detection
    // - Confidence intervals
    // - Standard deviation
    // - Median vs mean comparison
}
 
criterion_group!(benches, bench_outliers);
criterion_main!(benches);

Outliers are identified and reported separately from main statistics.

Confidence Intervals

use criterion::{criterion_group, criterion_main, Criterion};
 
fn process_data(data: &[u8]) -> u64 {
    data.iter().map(|&b| b as u64).sum()
}
 
fn bench_confidence(c: &mut Criterion) {
    let data = vec
![0u8; 10000];
    
    c.bench_function("process_data", |b| {
        b.iter(|| process_data(&data));
    });
    
    // Criterion reports:
    // - Mean time with confidence interval
    // - Lower/upper bounds (95% confidence)
    // - Helps distinguish real differences from noise
}
 
criterion_group!(benches, bench_confidence);
criterion_main!(benches);

Confidence intervals show the range where true performance likely falls.

Linear Regression Analysis

use criterion::{criterion_group, criterion_main, Criterion};
 
fn linear_search(data: &[i32], target: i32) -> Option<usize> {
    data.iter().position(|&x| x == target)
}
 
fn bench_regression(c: &mut Criterion) {
    let data: Vec<i32> = (0..1000).collect();
    
    c.bench_function("linear_search", |b| {
        b.iter(|| {
            let target = black_box(500);
            linear_search(&data, target)
        });
    });
    
    // Criterion uses linear regression to estimate
    // per-iteration time from multiple samples
}
 
criterion_group!(benches, bench_regression);
criterion_main!(benches);

Linear regression separates iteration time from measurement overhead.

Throughput Measurement

use criterion::{criterion_group, criterion_main, Criterion, Throughput};
 
fn copy_bytes(src: &[u8], dst: &mut [u8]) {
    dst.copy_from_slice(src);
}
 
fn bench_throughput(c: &mut Criterion) {
    let size = 1024 * 1024; // 1MB
    let src = vec
![0u8; size];
    let mut dst = vec
![0u8; size];
    
    c.bench_function("copy_1mb", |b| {
        b.throughput(Throughput::Bytes(size as u64));
        b.iter(|| copy_bytes(&src, &mut dst));
    });
    
    // Reports throughput (MB/s) instead of just time
    // Useful for I/O benchmarks
}
 
criterion_group!(benches, bench_throughput);
criterion_main!(benches);

Throughput converts timing results into bytes/elements per second.

Comparing Implementations

use criterion::{criterion_group, criterion_main, Criterion, BatchSize};
 
fn sum_iter(data: &[u64]) -> u64 {
    data.iter().sum()
}
 
fn sum_loop(data: &[u64]) -> u64 {
    let mut total = 0;
    for &x in data {
        total += x;
    }
    total
}
 
fn bench_comparison(c: &mut Criterion) {
    let data: Vec<u64> = (0..10000).collect();
    
    let mut group = c.benchmark_group("sum_methods");
    
    group.bench_function("iter", |b| {
        b.iter(|| sum_iter(black_box(&data)));
    });
    
    group.bench_function("loop", |b| {
        b.iter(|| sum_loop(black_box(&data)));
    });
    
    group.finish();
    
    // Comparison shows relative performance
    // Criterion highlights significant differences
}
 
criterion_group!(benches, bench_comparison);
criterion_main!(benches);

Benchmark groups enable direct comparison between implementations.

Setup and Teardown with iter_batched

use criterion::{criterion_group, criterion_main, Criterion, BatchSize};
 
fn sort_vector(v: &mut Vec<i32>) {
    v.sort();
}
 
fn bench_batched(c: &mut Criterion) {
    c.bench_function("sort", |b| {
        b.iter_batched(
            || (0..1000).rev().collect::<Vec<i32>>(), // Setup: create new vector each time
            |mut v| {
                sort_vector(&mut v);
                v
            },
            BatchSize::SmallInput,
        );
    });
    
    // iter_batched separates setup from measured code
    // Setup is not included in timing
}
 
criterion_group!(benches, bench_batched);
criterion_main!(benches);

iter_batched runs setup before each measured iteration, excluding it from timing.

Comparing to Previous Runs

use criterion::{criterion_group, criterion_main, Criterion};
 
fn optimized_function(n: usize) -> usize {
    (0..n).filter(|x| x % 2 == 0).sum()
}
 
fn bench_regression_detection(c: &mut Criterion) {
    c.bench_function("optimized", |b| {
        b.iter(|| optimized_function(black_box(1000)));
    });
    
    // Criterion saves baseline measurements
    // Subsequent runs compare against baseline
    // Detects performance regressions automatically
}
 
criterion_group!(benches, bench_regression_detection);
criterion_main!(benches);
 
// Run with: cargo bench -- --save-baseline main
// Compare with: cargo bench -- --baseline main

Baselines enable detecting performance regressions across code changes.

Sampling Strategy

use criterion::{criterion_group, criterion_main, Criterion};
 
fn sample_count_demo(c: &mut Criterion) {
    // Criterion uses multiple samples per benchmark
    // Each sample contains multiple iterations
    // Samples are spread across the run
    
    // Default: 100 samples
    // Can be configured:
    
    let mut group = c.benchmark_group("configured");
    
    // Set custom sample size and warm-up time
    group.sample_size(50)  // Number of samples
          .measurement_time(std::time::Duration::from_secs(5))
          .warm_up_time(std::time::Duration::from_millis(100));
    
    group.bench_function("custom_sampled", |b| {
        b.iter(|| {
            // Some computation
            (0..1000).sum::<u64>()
        });
    });
    
    group.finish();
}
 
criterion_group!(benches, sample_count_demo);
criterion_main!(benches);

Sample count and timing can be configured for specific measurement needs.

Noise Sources and Mitigation

use criterion::{criterion_group, criterion_main, Criterion};
 
// Sources of noise that Criterion mitigates:
// 1. CPU frequency scaling (warm-up stabilizes)
// 2. Cache state (repeated iterations)
// 3. Memory allocator variability (batch size)
// 4. OS scheduling (statistical analysis)
// 5. Thermal throttling (confidence intervals)
 
fn memory_intensive(n: usize) -> Vec<u64> {
    (0..n).collect()
}
 
fn bench_noise_handling(c: &mut Criterion) {
    // Criterion handles noise through:
    // - Warm-up iterations
    // - Multiple samples
    // - Statistical analysis
    // - Outlier detection
    
    c.bench_function("memory_ops", |b| {
        b.iter(|| memory_intensive(black_box(10000)));
    });
}
 
criterion_group!(benches, bench_noise_handling);
criterion_main!(benches);

Multiple noise sources are handled through warm-up, sampling, and statistics.

Benchmarking Async Code

use criterion::{criterion_group, criterion_main, Criterion};
use tokio::runtime::Runtime;
 
async fn async_compute(n: u64) -> u64 {
    let mut sum = 0;
    for i in 0..n {
        sum += i;
    }
    sum
}
 
fn bench_async(c: &mut Criterion) {
    let rt = Runtime::new().unwrap();
    
    c.bench_function("async_compute", |b| {
        b.to_async(&rt).iter(|| async_compute(black_box(1000)));
    });
    
    // to_async handles async benchmarks
    // Uses Tokio runtime for execution
}
 
criterion_group!(benches, bench_async);
criterion_main!(benches);

to_async enables benchmarking async functions with a runtime.

Profiling Integration

use criterion::{criterion_group, criterion_main, Criterion, Profiler};
 
// Custom profiler can be integrated for detailed analysis
struct MyProfiler;
 
impl Profiler for MyProfiler {
    fn start_profiling(&mut self, benchmark_id: &str, benchmark_path: &std::path::Path) {
        // Start profiling tool
    }
    
    fn stop_profiling(&mut self, benchmark_id: &str, benchmark_path: &std::path::Path) {
        // Stop profiling tool
    }
}
 
fn bench_with_profiler(c: &mut Criterion) {
    // Profiler integration available for advanced analysis
    c.bench_function("profiled", |b| {
        b.iter(|| (0..1000).sum::<u64>());
    });
}
 
criterion_group!(benches, bench_with_profiler);
criterion_main!(benches);

Profiler integration enables detailed analysis with external tools.

Input Generation

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn process_input(input: &[u8]) -> u64 {
    input.iter().map(|&b| b as u64).sum()
}
 
fn bench_input_generation(c: &mut Criterion) {
    // Input generation excluded from timing
    c.bench_function("process_input", |b| {
        b.iter_batched(
            || (0..1000).map(|i| i as u8).collect::<Vec<u8>>(),
            |input| process_input(&input),
            BatchSize::SmallInput,
        );
    });
    
    // Alternative: pre-generate input once
    let data: Vec<u8> = (0..1000).map(|i| i as u8).collect();
    c.bench_function("process_input_cached", |b| {
        b.iter(|| process_input(black_box(&data)));
    });
}
 
criterion_group!(benches, bench_input_generation);
criterion_main!(benches);

Pre-generate inputs or use iter_batched to exclude input generation from timing.

Parameterized Benchmarks

use criterion::{black_box, criterion_group, criterion_main, Criterion, BenchmarkId};
 
fn binary_search(data: &[u32], target: u32) -> Option<usize> {
    data.binary_search(&target).ok()
}
 
fn bench_parameterized(c: &mut Criterion) {
    let mut group = c.benchmark_group("binary_search");
    
    let sizes = [100, 1000, 10000, 100000];
    
    for size in sizes {
        let data: Vec<u32> = (0..size).collect();
        let target = size / 2;
        
        group.bench_with_input(
            BenchmarkId::new("size", size),
            &(&data, target),
            |b, &(data, target)| {
                b.iter(|| binary_search(black_box(data), black_box(target)));
            },
        );
    }
    
    group.finish();
    
    // Produces separate results for each parameter value
}
 
criterion_group!(benches, bench_parameterized);
criterion_main!(benches);

BenchmarkId creates parameterized benchmarks for comparing across input sizes.

Synthesis

Noise isolation techniques:

Technique	Purpose
Warm-up iterations	Fill caches, trigger JIT, stabilize CPU frequency
Multiple samples	Collect statistical distribution
Confidence intervals	Show true performance range
Outlier detection	Identify and exclude anomalies
Linear regression	Separate iteration time from overhead

Benchmark phases:

Phase	Activity
Warm-up	Many iterations, not timed
Measurement	Timed samples collected
Analysis	Statistics calculated
Reporting	Results formatted

Key methods:

Method	Use case
`bench_function`	Simple benchmarks
`bench_with_input`	Parameterized benchmarks
`iter`	Basic iteration timing
`iter_batched`	Setup/teardown separation
`to_async`	Async function benchmarking

Output interpretation:

Metric	Meaning
Mean	Average iteration time
Median	Middle value, robust to outliers
Std dev	Variability measure
Confidence interval	True value range (95%)
Outliers	Extreme values detected
Slope	Per-iteration time from regression

Best practices:

// Prevent optimization
b.iter(|| compute(black_box(input)));
 
// Separate setup
b.iter_batched(setup, routine, BatchSize::SmallInput);
 
// Group related benchmarks
let mut group = c.benchmark_group("name");
group.bench_function("a", ...);
group.bench_function("b", ...);
group.finish();
 
// Report throughput for I/O
b.throughput(Throughput::Bytes(n));
b.iter(|| read_bytes());

Key insight: bench_function provides statistically rigorous benchmarking through a multi-phase approach: warm-up iterations stabilize the execution environment (filling caches, triggering JIT compilation, allowing CPU frequency to settle), measurement collects multiple samples with multiple iterations each, and statistical analysis computes confidence intervals that distinguish genuine performance differences from noise. The framework uses linear regression on sample data to estimate per-iteration time, automatically calibrates iteration counts based on function duration (fast functions get more iterations, slow functions get fewer), and reports both point estimates and uncertainty ranges. black_box prevents compiler optimizations from eliminating benchmarked code, while iter_batched separates setup from measured code. The resulting confidence intervals and outlier detection help developers distinguish real performance changes from measurement noise, making it possible to identify even small (5-10%) performance differences reliably across runs.

How does criterion::BenchmarkGroup::bench_function isolate benchmarks from external noise?