How does `criterion::BatchSize` control sample sizes for statistically significant benchmarking?

criterion::BatchSize controls how many iterations of a benchmark run in a single sample, determining the granularity of measurement and the statistical reliability of results—smaller batch sizes provide finer-grained timing data but may suffer from measurement noise, while larger batch sizes amortize per-iteration overhead but can obscure per-iteration performance characteristics. The BatchSize parameter directly influences how Criterion collects samples: it runs the benchmark function batch_size times per sample, measures the total elapsed time, and divides by batch_size to compute per-iteration time, allowing Criterion to measure extremely fast operations that would otherwise fall below timer resolution and to build statistically robust estimates through repeated sampling.

Default BatchSize Behavior

use criterion::{Criterion, black_box};
 
fn main() {
    let mut c = Criterion::default();
    
    // Criterion auto-detects batch size by default
    // It runs iterations until timing data is statistically stable
    c.bench_function("default_batch", |b| {
        b.iter(|| {
            // Simple computation
            (0..1000).fold(0, |acc, x| acc + x)
        })
    });
    
    c.final_summary();
}

By default, Criterion automatically determines batch size based on estimated iteration time.

Small BatchSize Configuration

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    let mut c = Criterion::default();
    
    // Small batch size: more samples, more overhead per sample
    c.bench_function("small_batch", |b| {
        b.iter_batched(
            || Vec::<u8>::with_capacity(1024),
            |v| {
                // Setup runs once, this closure runs batch_size times
                v.len()
            },
            BatchSize::Small,  // ~10-100 iterations per sample
        )
    });
    
    c.final_summary();
}

BatchSize::Small uses approximately 10-100 iterations per sample, suitable for slower operations.

Large BatchSize Configuration

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    let mut c = Criterion::default();
    
    // Large batch size: fewer samples, less overhead, more iterations
    c.bench_function("large_batch", |b| {
        b.iter_batched(
            || 0u64,
            |n| n.wrapping_add(1),
            BatchSize::Large,  // ~10,000+ iterations per sample
        )
    });
    
    c.final_summary();
}

BatchSize::Large uses 10,000 or more iterations per sample for very fast operations.

Per-Iteration Overhead Measurement

use criterion::{Criterion, BatchSize, black_box};
 
fn fast_operation() -> u64 {
    // Extremely fast - may be sub-nanosecond
    black_box(42)
}
 
fn main() {
    let mut c = Criterion::default();
    
    // Without batch size, fast operations are hard to measure
    // Timer resolution is typically ~100ns
    
    // Large batch amortizes timing overhead
    c.bench_function("fast_with_large_batch", |b| {
        b.iter_batched(
            || (),
            |_| fast_operation(),
            BatchSize::Large,
        )
    });
    
    // Custom exact batch size for precise control
    c.bench_function("fast_exact_batch", |b| {
        b.iter_batched(
            || (),
            |_| fast_operation(),
            BatchSize::PerIteration,  // Exactly one iteration
        )
    });
    
    c.final_summary();
}

Large batches help measure operations faster than timer resolution.

IterBatched for Setup Cost Isolation

use criterion::{Criterion, BatchSize, black_box};
use std::collections::HashMap;
 
fn main() {
    let mut c = Criterion::default();
    
    // Setup cost is excluded from timing
    c.bench_function("hashmap_lookup", |b| {
        b.iter_batched(
            || {
                // Setup: runs once per batch
                let mut map = HashMap::new();
                for i in 0..1000 {
                    map.insert(i, i * 2);
                }
                (map, 500)  // Return both map and lookup key
            },
            |(map, key)| {
                // Routine: runs batch_size times
                map.get(&key)
            },
            BatchSize::Small,  // Each sample does ~100 lookups
        )
    });
    
    c.final_summary();
}

iter_batched isolates setup cost; only the routine is timed.

IterBatchedRef for Mutating Operations

use criterion::{Criterion, BatchSize, black_box};
use std::collections::VecDeque;
 
fn main() {
    let mut c = Criterion::default();
    
    // Setup creates fresh state for each batch
    c.bench_function("vecdeque_push", |b| {
        b.iter_batched_ref(
            || VecDeque::with_capacity(1000),
            |deque| {
                // Routine can mutate the deque
                deque.push_back(42);
                deque.len()
            },
            BatchSize::Small,  // Batches of ~100 pushes
        )
    });
    
    // Compare with iter_batched (immutable reference)
    c.bench_function("vecdeque_peek", |b| {
        b.iter_batched(
            || {
                let mut deque = VecDeque::new();
                deque.push_back(42);
                deque
            },
            |deque| {
                deque.front()
            },
            BatchSize::Small,
        )
    });
    
    c.final_summary();
}

iter_batched_ref provides mutable references for each iteration in the batch.

NumIterations for Exact Control

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    let mut c = Criterion::default();
    
    // Exact number of iterations per sample
    c.bench_function("exact_iterations", |b| {
        b.iter_batched(
            || 0u64,
            |n| n.wrapping_add(1),
            BatchSize::NumIterations(1000),  // Exactly 1000 iterations
        )
    });
    
    // Very large for extremely fast operations
    c.bench_function("very_fast", |b| {
        b.iter_batched(
            || (),
            |_| black_box(1 + 1),
            BatchSize::NumIterations(1_000_000),  // 1 million iterations
        )
    });
    
    c.final_summary();
}

BatchSize::NumIterations(n) specifies exactly n iterations per sample.

Sample Count and Statistical Reliability

use criterion::{Criterion, BatchSize};
 
fn main() {
    let mut c = Criterion::default()
        .sample_size(100);  // Number of samples
    
    // With sample_size=100 and BatchSize::Small (~100 iterations):
    // Total iterations ≈ 100 * 100 = 10,000
    
    c.bench_function("with_sample_size", |b| {
        b.iter_batched(
            || 0u64,
            |n| n + 1,
            BatchSize::Small,
        )
    });
    
    // More samples = better statistical confidence
    // But also longer benchmark time
    
    let c2 = Criterion::default()
        .sample_size(1000);  // 10x more samples
    
    c2.bench_function("more_samples", |b| {
        b.iter_batched(
            || 0u64,
            |n| n + 1,
            BatchSize::Small,
        )
    });
}

Sample count affects statistical confidence; batch size affects per-sample iteration count.

Timing Resolution and Batch Size

use criterion::{Criterion, BatchSize, black_box};
use std::time::Instant;
 
fn main() {
    // Timer resolution is typically limited
    // On most systems: ~100ns - 1μs
    
    let mut c = Criterion::default();
    
    // Operation taking ~1ns per iteration
    c.bench_function("nanosecond_op", |b| {
        b.iter_batched(
            || (),
            |_| {
                // This is too fast to measure directly
                black_box(1u64.wrapping_add(1))
            },
            BatchSize::NumIterations(10_000),  // Need 10,000 iterations
            // 10,000 * 1ns = 10μs, measurable above timer resolution
        )
    });
    
    // Operation taking ~1μs per iteration
    c.bench_function("microsecond_op", |b| {
        b.iter_batched(
            || (),
            |_| {
                let v: Vec<u8> = (0..100).collect();
                v.len()
            },
            BatchSize::Small,  // ~100 iterations sufficient
            // 100 * 1μs = 100μs, easily measurable
        )
    });
    
    c.final_summary();
}

Fast operations need larger batch sizes to exceed timer resolution.

Memory Allocation Benchmarks

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    let mut c = Criterion::default();
    
    // Allocate with setup to measure just the allocation
    c.bench_function("allocate_1kb", |b| {
        b.iter_batched(
            || (),
            |_| Vec::<u8>::with_capacity(1024),
            BatchSize::Small,  // ~100 allocations per sample
        )
    });
    
    // For allocations, want enough iterations for stable measurement
    // But not so many that allocator overhead accumulates
    
    c.bench_function("allocate_1mb", |b| {
        b.iter_batched(
            || (),
            |_| Vec::<u8>::with_capacity(1024 * 1024),
            BatchSize::PerIteration,  // One at a time for large allocations
        )
    });
    
    c.final_summary();
}

Batch size affects memory allocator behavior; smaller batches for large allocations.

Comparing Batch Size Effects

use criterion::{Criterion, BatchSize, black_box};
 
fn fibonacci(n: u64) -> u64 {
    if n <= 1 {
        n
    } else {
        fibonacci(n - 1) + fibonacci(n - 2)
    }
}
 
fn main() {
    let mut c = Criterion::default();
    
    // Small batch for slow operation
    c.bench_function("fib_slow_small", |b| {
        b.iter_batched(
            || 20u64,
            |n| fibonacci(n),
            BatchSize::Small,  // fibonacci(20) is slow enough
        )
    });
    
    // PerIteration for very slow operations
    c.bench_function("fib_slow_single", |b| {
        b.iter_batched(
            || 25u64,
            |n| fibonacci(n),
            BatchSize::PerIteration,  // One iteration per sample
        )
    });
    
    // For fast operations, larger batches needed
    c.bench_function("fib_fast_large", |b| {
        b.iter_batched(
            || 10u64,
            |n| fibonacci(n),
            BatchSize::Large,  // fibonacci(10) is very fast
        )
    });
    
    c.final_summary();
}

Match batch size to operation speed: slower operations need smaller batches.

Warmup and Batch Size

use criterion::{Criterion, BatchSize};
 
fn main() {
    let mut c = Criterion::default()
        .warm_up_time(std::time::Duration::from_secs(1))
        .measurement_time(std::time::Duration::from_secs(3));
    
    // Warm-up runs iterations to stabilize CPU caches, JIT, etc.
    // Batch size affects how iterations are grouped during warm-up too
    
    c.bench_function("with_warmup", |b| {
        b.iter_batched(
            || Vec::<u64>::with_capacity(1000),
            |v| v.len(),
            BatchSize::Small,
        )
    });
    
    // After warm-up, measurement phase collects samples
    // With sample_size=100 and BatchSize::Small(~100):
    // ~10,000 iterations during measurement
    
    c.final_summary();
}

Warm-up runs before measurement; batch size affects warm-up iteration grouping.

Throughput Measurements

use criterion::{Criterion, BatchSize, Throughput};
 
fn main() {
    let mut c = Criterion::default();
    
    // Throughput requires knowing how much work per iteration
    c.bench_function("process_bytes", |b| {
        b.throughput(Throughput::Bytes(1024))  // 1KB per iteration
            .iter_batched(
                || vec
![0u8; 1024],
                |data| {
                    data.iter().sum::<u8>()
                },
                BatchSize::Small,
            )
    });
    
    // Throughput + batch size: Criterion knows total bytes processed
    // Can report bytes/second in addition to time/iteration
    
    c.bench_function("process_items", |b| {
        b.throughput(Throughput::Elements(100))  // 100 items per iteration
            .iter_batched(
                || (0..100).collect::<Vec<_>>(),
                |items| items.len(),
                BatchSize::Small,
            )
    });
    
    c.final_summary();
}

Throughput metrics combined with batch size give meaningful throughput measurements.

Common BatchSize Mistakes

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    let mut c = Criterion::default();
    
    // MISTAKE 1: Too small batch for fast operation
    // c.bench_function("too_small", |b| {
    //     b.iter_batched(
    //         || (),
    //         |_| black_box(1 + 1),
    //         BatchSize::PerIteration,  // Timer can't measure ~1ns
    //     )
    // });
    
    // FIX: Use larger batch for fast operations
    c.bench_function("correct_batch", |b| {
        b.iter_batched(
            || (),
            |_| black_box(1 + 1),
            BatchSize::NumIterations(100_000),
        )
    });
    
    // MISTAKE 2: Setup cost included in timing (using iter instead of iter_batched)
    // c.bench_function("setup_included", |b| {
    //     b.iter(|| {
    //         let mut v = Vec::new();  // Setup inside iter!
    //         for i in 0..1000 { v.push(i); }
    //         v.len()
    //     })
    // });
    
    // FIX: Use iter_batched to exclude setup
    c.bench_function("setup_excluded", |b| {
        b.iter_batched(
            || (0..1000).collect::<Vec<_>>(),
            |v| v.len(),
            BatchSize::Small,
        )
    });
    
    c.final_summary();
}

Common pitfalls: batch too small for operation speed, setup inside timed region.

BatchSize Variants Summary

use criterion::BatchSize;
 
fn main() {
    // PerIteration: exactly 1 iteration per sample
    let _per_iter = BatchSize::PerIteration;
    // Use for: very slow operations (>1ms), large allocations
    
    // Small: ~10-100 iterations per sample
    let _small = BatchSize::Small;
    // Use for: moderate operations (>1μs), typical code
    
    // Large: ~10,000+ iterations per sample
    let _large = BatchSize::Large;
    // Use for: very fast operations (<100ns), simple arithmetic
    
    // NumIterations(n): exactly n iterations per sample
    let _exact = BatchSize::NumIterations(5000);
    // Use for: precise control, known iteration counts
}

Choose batch size based on operation duration and timer resolution.

Synthesis

BatchSize variants:

Variant	Iterations	Use Case
`PerIteration`	1	Very slow operations (>1ms)
`Small`	~10-100	Moderate operations (>1μs)
`Large`	~10,000+	Very fast operations (<100ns)
`NumIterations(n)`	exactly n	Precise control

Batch size effects:

Batch Size	Samples	Per-Iteration Noise	Total Time
Small	More samples	Higher noise	Lower
Large	Fewer samples	Lower noise	Higher
Auto	Criterion decides	Balanced	Balanced

Key insight: criterion::BatchSize controls the fundamental trade-off in benchmarking between statistical reliability and measurement accuracy. Each sample runs the benchmark batch_size times, and Criterion measures the total time then divides by iteration count—this allows measuring operations faster than timer resolution (sub-nanosecond operations need thousands of iterations to accumulate measurable time) and amortizing per-sample overhead (timing function calls, loop overhead). The choice depends on operation speed: PerIteration for slow operations where each iteration is measurable, Small for typical operations where ~100 iterations gives stable measurements, Large for very fast operations that need many iterations to exceed timer resolution, and NumIterations for exact control. Combined with iter_batched and iter_batched_ref, batch size enables isolating setup costs from measured routine execution—setup runs once per batch, routine runs batch_size times per sample, and only routine time is measured. Use smaller batches when setup is expensive or you want more statistical samples; use larger batches when the operation is very fast and you need accumulated time above timer resolution.

How does criterion::BatchSize control sample sizes for statistically significant benchmarking?