How does `criterion::BatchSize` control sample sizes and what impact does it have on benchmark reliability?

criterion::BatchSize controls how many iterations of a benchmark are executed in a single sample, directly affecting measurement precision and benchmark duration. The BatchSize configuration determines whether criterion measures the time for a single iteration or batches multiple iterations together, amortizing measurement overhead across more repetitions. Larger batch sizes reduce measurement noise and improve statistical reliability but increase benchmark time; smaller batch sizes are faster but may produce noisier results. The default BatchSize::Auto attempts to balance these concerns, while explicit sizes (Small, Large, or exact counts) give you control over the trade-off between precision and speed.

Default BatchSize Behavior

use criterion::{black_box, criterion_group, criterion_main, Criterion};
 
fn fibonacci(n: u64) -> u64 {
    if n < 2 {
        n
    } else {
        fibonacci(n - 1) + fibonacci(n - 2)
    }
}
 
fn bench_fibonacci(c: &mut Criterion) {
    // Default: BatchSize::Auto
    // Criterion automatically determines batch size based on initial samples
    c.bench_function("fibonacci_20", |b| {
        b.iter(|| fibonacci(black_box(20)));
    });
}
 
criterion_group!(benches, bench_fibonacci);
criterion_main!(benches);

By default, Criterion uses BatchSize::Auto which samples multiple batch sizes to find a reasonable value.

Explicit BatchSize Values

use criterion::{black_box, BatchSize, criterion_group, criterion_main, Criterion};
 
fn bench_with_explicit_batch_size(c: &mut Criterion) {
    // Small batch size: fewer iterations per sample
    c.bench_function("small_batch", |b| {
        b.iter_batched(
            || 0u64,
            |n| fibonacci(black_box(n + 20)),
            BatchSize::Small,
        );
    });
    
    // Large batch size: more iterations per sample
    c.bench_function("large_batch", |b| {
        b.iter_batched(
            || 0u64,
            |n| fibonacci(black_box(n + 20)),
            BatchSize::Large,
        );
    });
    
    // Exact batch size: specific number of iterations
    c.bench_function("exact_batch_100", |b| {
        b.iter_batched(
            || 0u64,
            |n| fibonacci(black_box(n + 20)),
            BatchSize::NumIterations(100),
        );
    });
}
 
fn fibonacci(n: u64) -> u64 {
    if n < 2 { n } else { fibonacci(n - 1) + fibonacci(n - 2) }
}
 
criterion_group!(benches, bench_with_explicit_batch_size);
criterion_main!(benches);

Explicit batch sizes give you control over the iteration count per sample.

BatchSize Variants

use criterion::BatchSize;
 
fn main() {
    // BatchSize::Auto: Let Criterion decide
    // - Performs initial calibration
    // - Chooses size based on execution time
    // - Balances precision and speed
    
    // BatchSize::Small: ~10-100 iterations per sample
    // - Faster benchmarks
    // - More noise
    // - Good for slow operations
    
    // BatchSize::Large: ~1000-10000 iterations per sample
    // - More precise measurements
    // - Longer benchmarks
    // - Good for fast operations
    
    // BatchSize::NumIterations(n): Exact count
    // - Full control over iterations
    // - Useful for specific scenarios
    // - May need tuning for your hardware
    
    // BatchSize::NumBatches(n): Run exactly n batches
    // - Control sample count
    // - Each batch contains Auto-sized iterations
}

Each variant serves different measurement needs.

iter vs iter_batched

use criterion::{black_box, BatchSize, criterion_group, criterion_main, Criterion};
 
fn bench_iteration_styles(c: &mut Criterion) {
    // iter: Simple iteration, BatchSize determined by measured time
    // Uses the loop: for _ in 0..iters { black_box(func()); }
    c.bench_function("iter_style", |b| {
        b.iter(|| {
            // This runs once per iteration
            let mut sum = 0u64;
            for i in 0..100 {
                sum += i;
            }
            black_box(sum)
        });
    });
    
    // iter_batched: Setup per batch, then run multiple iterations
    // Useful when setup is expensive
    c.bench_function("batched_style", |b| {
        b.iter_batched(
            || {
                // Setup: runs once per batch
                Vec::<u64>::with_capacity(1000)
            },
            |mut vec| {
                // Routine: runs batch_size times
                vec.push(42);
                black_box(vec.len())
            },
            BatchSize::Small,
        );
    });
    
    // iter_batched_ref: Same but passes by reference
    c.bench_function("batched_ref_style", |b| {
        b.iter_batched_ref(
            || Vec::<u64>::with_capacity(1000),
            |vec| {
                vec.borrow_mut().push(42);
                black_box(vec.borrow().len())
            },
            BatchSize::Small,
        );
    });
}
 
criterion_group!(benches, bench_iteration_styles);
criterion_main!(benches);

iter_batched separates setup from iteration for more accurate measurement.

Why Batch Size Matters

use criterion::{black_box, BatchSize, criterion_group, criterion_main, Criterion, Throughput};
 
fn fast_operation() -> u64 {
    42
}
 
fn slow_operation() -> u64 {
    std::thread::sleep(std::time::Duration::from_micros(100));
    42
}
 
fn bench_timing_impact(c: &mut Criterion) {
    let mut group = c.benchmark_group("timing_impact");
    
    // Fast operation with small batch
    // Problem: Measurement overhead dominates
    group.bench_function("fast_small_batch", |b| {
        b.iter_batched(
            || (),
            |_| fast_operation(),
            BatchSize::Small,
        );
    });
    
    // Fast operation with large batch
    // Better: Amortizes measurement overhead
    group.bench_function("fast_large_batch", |b| {
        b.iter_batched(
            || (),
            |_| fast_operation(),
            BatchSize::Large,
        );
    });
    
    // Slow operation with small batch
    // Fine: Overhead is negligible relative to operation time
    group.bench_function("slow_small_batch", |b| {
        b.iter_batched(
            || (),
            |_| slow_operation(),
            BatchSize::Small,
        );
    });
    
    group.finish();
}
 
criterion_group!(benches, bench_timing_impact);
criterion_main!(benches);

Fast operations benefit from larger batches; slow operations don't need them.

Measurement Overhead

use criterion::{black_box, BatchSize, criterion_group, criterion_main, Criterion};
 
// Very fast operation: just addition
fn add(a: u64, b: u64) -> u64 {
    a + b
}
 
fn bench_overhead(c: &mut Criterion) {
    // Small batch: measurement overhead visible
    // Each sample has more relative noise
    c.bench_function("add_small", |b| {
        b.iter_batched(
            || (1u64, 2u64),
            |(a, b)| add(black_box(a), black_box(b)),
            BatchSize::Small,
        );
    });
    
    // Large batch: overhead distributed across iterations
    // Measurement is more accurate
    c.bench_function("add_large", |b| {
        b.iter_batched(
            || (1u64, 2u64),
            |(a, b)| add(black_box(a), black_box(b)),
            BatchSize::Large,
        );
    });
    
    // NumIterations: precise control
    // Good for reproducibility
    c.bench_function("add_exact_10000", |b| {
        b.iter_batched(
            || (1u64, 2u64),
            |(a, b)| add(black_box(a), black_box(b)),
            BatchSize::NumIterations(10000),
        );
    });
}
 
criterion_group!(benches, bench_overhead);
criterion_main!(benches);

Measurement overhead (timer reads, loop control) is amortized across batch iterations.

Throughput Measurement

use criterion::{black_box, BatchSize, Throughput, criterion_group, criterion_main, Criterion};
 
fn process_data(data: &[u8]) -> u64 {
    data.iter().fold(0u64, |acc, &b| acc + b as u64)
}
 
fn bench_throughput(c: &mut Criterion) {
    let data = vec![0u8; 1024];
    
    let mut group = c.benchmark_group("throughput");
    
    // Set throughput for meaningful bytes/sec metrics
    group.throughput(Throughput::Bytes(1024));
    
    // Batch size affects how throughput is calculated
    group.bench_function("process_1kb", |b| {
        b.iter_batched(
            || data.clone(),
            |d| process_data(black_box(&d)),
            BatchSize::Small,
        );
    });
    
    group.finish();
}
 
criterion_group!(benches, bench_throughput);
criterion_main!(benches);

Throughput metrics work with batch sizes to report bytes/sec or elements/sec.

Sample Count and Confidence

use criterion::{black_box, BatchSize, criterion_group, criterion_main, Criterion};
 
fn bench_sample_count(c: &mut Criterion) {
    // Default: 100 samples
    // More samples = more confidence in results
    let mut group = c.benchmark_group("sample_demo");
    
    // Smaller batch size + more samples = precise but slow
    group.sample_size(1000);
    group.bench_function("many_samples_small_batch", |b| {
        b.iter_batched(
            || 0u64,
            |n| n + 1,
            BatchSize::Small,
        );
    });
    
    // Larger batch size + fewer samples = fast but less precise
    group.sample_size(10);
    group.bench_function("few_samples_large_batch", |b| {
        b.iter_batched(
            || 0u64,
            |n| n + 1,
            BatchSize::Large,
        );
    });
    
    group.finish();
}
 
criterion_group!(benches, bench_sample_count);
criterion_main!(benches);

Sample size and batch size together determine measurement precision.

Warm-up Phase

use criterion::{black_box, BatchSize, criterion_group, criterion_main, Criterion};
 
fn bench_warmup(c: &mut Criterion) {
    let mut group = c.benchmark_group("warmup_demo");
    
    // Warm-up: initial iterations before measurement
    // Default: ~1 second of warm-up
    
    // Custom warm-up time
    group.warm_up_time(std::time::Duration::from_millis(100));
    
    // Measurement time
    group.measurement_time(std::time::Duration::from_secs(5));
    
    group.bench_function("with_warmup", |b| {
        b.iter_batched(
            || Vec::<u64>::new(),
            |mut v| {
                v.push(42);
                black_box(v.len())
            },
            BatchSize::Large,
        );
    });
    
    group.finish();
}
 
criterion_group!(benches, bench_warmup);
criterion_main!(benches);

Warm-up ensures the function is "hot" before measurement begins.

Handling Setup Costs

use criterion::{black_box, BatchSize, criterion_group, criterion_main, Criterion};
 
fn expensive_setup() -> Vec<u64> {
    // Expensive setup that should NOT be measured
    (0..10000).collect()
}
 
fn operation(data: &Vec<u64>) -> u64 {
    // This is what we're measuring
    data.iter().sum()
}
 
fn bench_with_setup(c: &mut Criterion) {
    // WRONG: Setup included in measurement
    c.bench_function("wrong_setup", |b| {
        b.iter(|| operation(&expensive_setup()));
    });
    
    // CORRECT: Setup runs once per batch, not measured
    c.bench_function("correct_setup", |b| {
        b.iter_batched(
            || expensive_setup(),  // Setup: not measured
            |data| operation(&data), // Operation: measured
            BatchSize::Small,
        );
    });
    
    // For expensive setup, use NumBatches(1)
    c.bench_function("single_batch", |b| {
        b.iter_batched(
            || expensive_setup(),
            |data| operation(&data),
            BatchSize::NumBatches(1), // Single batch, many iterations
        );
    });
}
 
criterion_group!(benches, bench_with_setup);
criterion_main!(benches);

iter_batched separates setup from measured operations.

Comparison Measurement

use criterion::{black_box, BatchSize, criterion_group, criterion_main, Criterion, Throughput};
 
fn linear_search(data: &[u64], target: u64) -> Option<usize> {
    data.iter().position(|&x| x == target)
}
 
fn binary_search(data: &[u64], target: u64) -> Option<usize> {
    data.binary_search(&target).ok()
}
 
fn bench_comparison(c: &mut Criterion) {
    let data: Vec<u64> = (0..10000).collect();
    
    let mut group = c.benchmark_group("search_comparison");
    group.throughput(Throughput::Elements(1));
    
    // Use same batch size for fair comparison
    group.bench_function("linear", |b| {
        b.iter_batched(
            || (&data, 5000u64),
            |(d, t)| linear_search(black_box(d), black_box(t)),
            BatchSize::NumIterations(1000),
        );
    });
    
    group.bench_function("binary", |b| {
        b.iter_batched(
            || (&data, 5000u64),
            |(d, t)| binary_search(black_box(d), black_box(t)),
            BatchSize::NumIterations(1000),
        );
    });
    
    group.finish();
}
 
criterion_group!(benches, bench_comparison);
criterion_main!(benches);

Use consistent batch sizes when comparing algorithms.

Statistical Reliability

use criterion::{black_box, BatchSize, criterion_group, criterion_main, Criterion};
 
fn bench_statistical(c: &mut Criterion) {
    let mut group = c.benchmark_group("statistical_demo");
    
    // High sample size + large batch = most reliable but slowest
    group.sample_size(100);
    group.bench_function("reliable", |b| {
        b.iter_batched(
            || 0u64,
            |n| n.wrapping_add(1),
            BatchSize::Large,
        );
    });
    
    // Low sample size + small batch = least reliable but fastest
    group.sample_size(10);
    group.bench_function("fast", |b| {
        b.iter_batched(
            || 0u64,
            |n| n.wrapping_add(1),
            BatchSize::Small,
        );
    });
    
    group.finish();
}
 
criterion_group!(benches, bench_statistical);
criterion_main!(benches);

Statistical confidence increases with more samples and larger batches.

Common Pitfalls

use criterion::{black_box, BatchSize, criterion_group, criterion_main, Criterion};
 
fn bench_pitfalls(c: &mut Criterion) {
    // PITFALL 1: Batch size too small for fast operations
    c.bench_function("too_small", |b| {
        b.iter_batched(
            || (),
            |_| black_box(1 + 1), // Very fast operation
            BatchSize::Small,      // Noise dominates
        );
    });
    
    // FIX: Use larger batch for fast operations
    c.bench_function("appropriate_batch", |b| {
        b.iter_batched(
            || (),
            |_| black_box(1 + 1),
            BatchSize::Large, // Overhead amortized
        );
    });
    
    // PITFALL 2: Batch size too large for slow operations
    // Wastes time, doesn't improve precision
    c.bench_function("too_large", |b| {
        b.iter_batched(
            || (),
            |_| {
                std::thread::sleep(std::time::Duration::from_millis(10));
                black_box(1)
            },
            BatchSize::Large, // Unnecessarily large
        );
    });
    
    // FIX: Use small batch for slow operations
    c.bench_function("appropriate_slow", |b| {
        b.iter_batched(
            || (),
            |_| {
                std::thread::sleep(std::time::Duration::from_millis(10));
                black_box(1)
            },
            BatchSize::Small, // Appropriate for slow ops
        );
    });
    
    // PITFALL 3: Inconsistent batch sizes across comparisons
    let mut group = c.benchmark_group("comparison");
    
    group.bench_function("method_a", |b| {
        b.iter(|| fast_operation()); // Default batch
    });
    
    group.bench_function("method_b", |b| {
        b.iter_batched(
            || (),
            |_| fast_operation(),
            BatchSize::Small, // Different batch size!
        );
    });
    
    // FIX: Use consistent batch sizes for comparisons
    group.finish();
}
 
fn fast_operation() -> u64 { 42 }
 
criterion_group!(benches, bench_pitfalls);
criterion_main!(benches);

Avoid these common mistakes when configuring batch sizes.

Automatic Batch Size Selection

use criterion::{black_box, BatchSize, criterion_group, criterion_main, Criterion};
 
fn bench_auto_selection(c: &mut Criterion) {
    // BatchSize::Auto selection process:
    // 1. Run initial warm-up
    // 2. Time a few iterations
    // 3. Estimate iterations needed for ~1ms
    // 4. Adjust based on subsequent samples
    
    c.bench_function("auto_batch", |b| {
        // Let Criterion choose the batch size
        b.iter(|| {
            // Criterion will time this and decide batch size
            let mut sum = 0u64;
            for i in 0..100 {
                sum = sum.wrapping_add(i);
            }
            black_box(sum)
        });
    });
    
    // Auto is good for most cases
    // Override when you have specific needs:
    // - Very fast operations (use Large)
    // - Very slow operations (use Small)
    // - Reproducibility (use NumIterations)
}
 
criterion_group!(benches, bench_auto_selection);
criterion_main!(benches);

BatchSize::Auto works well for most benchmarks automatically.

Memory Allocation Patterns

use criterion::{black_box, BatchSize, criterion_group, criterion_main, Criterion};
 
fn bench_memory(c: &mut Criterion) {
    // When benchmark allocates, batch size affects measurement
    
    let mut group = c.benchmark_group("allocation");
    
    // Allocation happens per iteration
    group.bench_function("alloc_per_iter", |b| {
        b.iter(|| {
            let v: Vec<u64> = (0..1000).collect();
            black_box(v.len())
        });
    });
    
    // Allocation happens once per batch
    group.bench_function("alloc_per_batch", |b| {
        b.iter_batched(
            || Vec::<u64>::with_capacity(1000),
            |mut v| {
                v.clear();
                for i in 0..1000 {
                    v.push(i);
                }
                black_box(v.len())
            },
            BatchSize::Small,
        );
    });
    
    // If allocation is part of what you're measuring, use iter
    // If allocation is setup, use iter_batched
    
    group.finish();
}
 
criterion_group!(benches, bench_memory);
criterion_main!(benches);

Choose iter or iter_batched based on what you want to measure.

Real-World Example

use criterion::{black_box, BatchSize, criterion_group, criterion_main, Criterion, Throughput};
use std::collections::HashMap;
 
fn insert_into_map(map: &mut HashMap<u64, u64>, key: u64, value: u64) {
    map.insert(key, value);
}
 
fn lookup_from_map(map: &HashMap<u64>, key: u64) -> Option<u64> {
    map.get(&key).copied()
}
 
fn bench_hashmap(c: &mut Criterion) {
    let mut group = c.benchmark_group("hashmap_ops");
    group.throughput(Throughput::Elements(1));
    
    // Insert benchmark
    group.bench_function("insert", |b| {
        b.iter_batched(
            || {
                // Setup: fresh map for each batch
                HashMap::with_capacity(1000)
            },
            |mut map| {
                // Measured operation
                for i in 0..100 {
                    insert_into_map(&mut map, i, i * 2);
                }
                black_box(map.len())
            },
            BatchSize::Small,
        );
    });
    
    // Lookup benchmark
    let populated_map: HashMap<u64, u64> = (0..1000).map(|i| (i, i * 2)).collect();
    
    group.bench_function("lookup", |b| {
        b.iter_batched(
            || &populated_map, // Setup: reference to map
            |map| {
                // Measured operation
                lookup_from_map(map, 500)
            },
            BatchSize::Large, // Large batch for fast operation
        );
    });
    
    group.finish();
}
 
criterion_group!(benches, bench_hashmap);
criterion_main!(benches);

Real-world benchmarks often need different batch sizes for different operations.

Comparison Table

BatchSize	Iterations per Sample	Use Case	Precision	Speed
`Small`	10-100	Slow operations	Lower	Faster
`Large`	1000-10000	Fast operations	Higher	Slower
`NumIterations(n)`	Exactly n	Reproducibility	Consistent	Varies
`NumBatches(n)`	Auto-sized	Setup-heavy	Consistent	Varies
`Auto`	Adaptive	General purpose	Balanced	Adaptive

Synthesis

BatchSize purpose:

Controls iterations per sample
Amortizes measurement overhead
Affects statistical precision
Impacts benchmark duration

BatchSize::Auto:

Default choice for most benchmarks
Performs calibration automatically
Balances precision and speed
Works well for typical operations

When to use Small:

Slow operations (milliseconds per iteration)
Limited benchmark time
Memory-constrained environments
When overhead is negligible relative to operation

When to use Large:

Very fast operations (nanoseconds)
High precision required
Reproducibility critical
Statistical noise is a concern

When to use NumIterations:

Reproducible benchmarks
Comparing across machines
CI/CD pipelines
Performance regression testing

Key insight: Batch size is about distributing measurement overhead across multiple iterations. The timer has finite precision; measuring a 1-nanosecond operation with a 100-nanosecond timer resolution produces noise. By running 10,000 iterations in a batch, the overhead is amortized to ~0.01 nanoseconds per iteration. Fast operations need larger batches; slow operations don't. The trade-off is benchmark duration versus statistical confidence. BatchSize::Auto handles this automatically for most cases, but understanding the underlying mechanism helps when precision matters or when debugging noisy benchmarks.

How does criterion::BatchSize control sample sizes and what impact does it have on benchmark reliability?