How does `criterion::BatchSize` affect benchmarking setup and teardown measurements?

criterion::BatchSize controls how Criterion groups benchmark iterations for measurement, directly affecting whether setup and teardown costs are included in timing. The key insight is that Criterion measures the time to process a batch of iterations, then divides by the batch size to get per-iteration time. With BatchSize::Small, each batch contains few iterations, so the overhead of calling the benchmarked function (including setup/teardown) is amortized over fewer iterations, potentially inflating measurements. With BatchSize::Large, more iterations per batch dilute per-call overhead, but may introduce cache effects. The BatchSize::NumBatches variant allows specifying how many times setup/teardown occur, giving control over how these costs factor into measurements.

Basic Benchmarking Without BatchSize

use criterion::{black_box, criterion_group, criterion_main, Criterion};
 
fn benchmark_basic(c: &mut Criterion) {
    c.bench_function("addition", |bencher| {
        bencher.iter(|| {
            // This is measured
            let x = black_box(1 + 1);
            x
        })
    });
}
 
fn benchmark_with_setup(c: &mut Criterion) {
    c.bench_function("with_setup", |bencher| {
        bencher.iter(|| {
            // Setup - THIS IS MEASURED!
            let data: Vec<u64> = (0..1000).collect();
            
            // The actual operation we want to measure
            let sum: u64 = data.iter().sum();
            black_box(sum)
        })
    });
}
 
criterion_group!(benches, benchmark_basic, benchmark_with_setup);
criterion_main!(benches);

Without explicit handling, setup inside iter() is included in measurements.

The Problem: Setup Skews Measurements

use criterion::{black_box, criterion_group, criterion_main, Criterion};
 
fn problem_demonstration(c: &mut Criterion) {
    // Benchmark 1: Setup included in measurement
    c.bench_function("setup_included", |bencher| {
        bencher.iter(|| {
            // Setup creates 10,000 element vector EVERY iteration
            let data: Vec<u64> = (0..10_000).collect();
            
            // What we actually want to measure
            let sum: u64 = data.iter().sum();
            black_box(sum)
        })
    });
    
    // The measured time includes:
    // 1. Vector allocation
    // 2. Iterator creation and collection
    // 3. The sum operation
    // But we only wanted to measure #3!
}
 
criterion_group!(benches, problem_demonstration);
criterion_main!(benches);

Including setup distorts the measurement of the actual operation.

iter_batched for Proper Setup Handling

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn iter_batched_basic(c: &mut Criterion) {
    c.bench_function("proper_setup", |bencher| {
        // iter_batched separates setup from measurement
        bencher.iter_batched(
            // Setup routine - NOT measured
            || {
                let data: Vec<u64> = (0..10_000).collect();
                data
            },
            // Measurement routine - this IS measured
            |data| {
                let sum: u64 = data.iter().sum();
                black_box(sum)
            },
            // BatchSize determines iteration grouping
            BatchSize::SmallInput,
        )
    });
}
 
criterion_group!(benches, iter_batched_basic);
criterion_main!(benches);

iter_batched separates setup from the timed portion.

BatchSize Variants

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn batch_size_variants(c: &mut Criterion) {
    // Small batches: setup/teardown overhead more visible
    c.bench_function("small_batch", |bencher| {
        bencher.iter_batched(
            || vec![1u64; 100],
            |data| data.iter().sum::<u64>(),
            BatchSize::SmallInput,
        )
    });
    
    // Large batches: overhead diluted across more iterations
    c.bench_function("large_batch", |bencher| {
        bencher.iter_batched(
            || vec![1u64; 100],
            |data| data.iter().sum::<u64>(),
            BatchSize::LargeInput,
        )
    });
    
    // PerIteration: setup runs for each iteration
    c.bench_function("per_iteration", |bencher| {
        bencher.iter_batched(
            || vec![1u64; 100],
            |data| data.iter().sum::<u64>(),
            BatchSize::PerIteration,
        )
    });
    
    // NumBatches: control number of batches explicitly
    c.bench_function("num_batches", |bencher| {
        bencher.iter_batched(
            || vec![1u64; 100],
            |data| data.iter().sum::<u64>(),
            BatchSize::NumBatches(10),  // 10 batches total
        )
    });
}
 
criterion_group!(benches, batch_size_variants);
criterion_main!(benches);

Different BatchSize variants control how iterations are grouped.

How BatchSize Affects Measurement

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn measurement_explanation(c: &mut Criterion) {
    // Criterion's measurement model:
    // 1. Run setup
    // 2. Time the batch of iterations
    // 3. Run teardown (via Drop)
    // 4. Divide total time by batch size for per-iteration time
    
    // With BatchSize::SmallInput:
    // - Few iterations per batch
    // - Setup/teardown runs more frequently
    // - Per-call overhead is more visible in measurements
    // - Better for operations with high variance
    
    // With BatchSize::LargeInput:
    // - Many iterations per batch
    // - Setup/teardown runs less frequently
    // - Per-call overhead is amortized
    // - Better for very fast operations
    
    let mut group = c.benchmark_group("batch_comparison");
    
    group.bench_function("small", |bencher| {
        bencher.iter_batched(
            || 0u64,
            |x| black_box(x + 1),
            BatchSize::SmallInput,
        )
    });
    
    group.bench_function("large", |bencher| {
        bencher.iter_batched(
            || 0u64,
            |x| black_box(x + 1),
            BatchSize::LargeInput,
        )
    });
    
    group.finish();
}
 
criterion_group!(benches, measurement_explanation);
criterion_main!(benches);

Batch size affects how setup/teardown overhead is amortized.

Visualizing Batch Size Impact

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn visualize_batch_impact(c: &mut Criterion) {
    // Simulate expensive setup
    let mut group = c.benchmark_group("expensive_setup");
    
    // Setup takes ~100µs, operation takes ~1µs
    group.bench_function("small_batch_expensive_setup", |bencher| {
        bencher.iter_batched(
            || {
                // Expensive setup
                std::thread::sleep(std::time::Duration::from_micros(100));
                vec![1u64; 1000]
            },
            |data| {
                // Fast operation
                data.iter().sum::<u64>()
            },
            BatchSize::SmallInput,  // Setup runs many times
        )
    });
    
    group.bench_function("large_batch_expensive_setup", |bencher| {
        bencher.iter_batched(
            || {
                std::thread::sleep(std::time::Duration::from_micros(100));
                vec![1u64; 1000]
            },
            |data| data.iter().sum::<u64>(),
            BatchSize::LargeInput,  // Setup runs fewer times
        )
    });
    
    group.finish();
    
    // Small batch: setup overhead is more visible
    // Large batch: setup overhead is diluted across more iterations
    // Both measure the operation correctly, but numbers differ
}
 
criterion_group!(benches, visualize_batch_impact);
criterion_main!(benches);

Expensive setup amplifies the difference between batch sizes.

PerIteration vs Batched

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn per_iteration_vs_batched(c: &mut Criterion) {
    let mut group = c.benchmark_group("per_iter_comparison");
    
    // PerIteration: setup runs once per measured iteration
    group.bench_function("per_iteration", |bencher| {
        bencher.iter_batched(
            || {
                // This runs for EVERY iteration
                let mut data = vec![0u64; 100];
                for i in 0..100 {
                    data[i] = i as u64;
                }
                data
            },
            |data| data.iter().sum::<u64>(),
            BatchSize::PerIteration,
        )
    });
    
    // SmallInput: setup runs once per small batch
    group.bench_function("small_input", |bencher| {
        bencher.iter_batched(
            || {
                // Runs once per batch (e.g., every 10-100 iterations)
                let mut data = vec![0u64; 100];
                for i in 0..100 {
                    data[i] = i as u64;
                }
                data
            },
            |data| data.iter().sum::<u64>(),
            BatchSize::SmallInput,
        )
    });
    
    // Key difference:
    // PerIteration: setup overhead is fully included in measurement
    // SmallInput/LargeInput: setup overhead is amortized
    
    group.finish();
}
 
criterion_group!(benches, per_iteration_vs_batched);
criterion_main!(benches);

PerIteration includes setup in every measurement; batched modes amortize it.

Teardown Behavior

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn teardown_example(c: &mut Criterion) {
    // iter_batched moves input to the routine
    // Teardown (Drop) happens AFTER measurement
    
    c.bench_function("with_teardown", |bencher| {
        bencher.iter_batched(
            || {
                // Setup: create expensive resource
                vec![1u64; 1000]
            },
            |mut data| {
                // Measurement: modify the data
                data.push(42);
                let sum: u64 = data.iter().sum();
                black_box(sum)
                // Teardown: data is dropped here (NOT measured)
            },
            BatchSize::SmallInput,
        )
    });
    
    // If teardown should be measured, include it in the routine
    c.bench_function("with_measured_teardown", |bencher| {
        bencher.iter_batched(
            || {
                vec![1u64; 1000]
            },
            |mut data| {
                let sum: u64 = data.iter().sum();
                black_box(sum);
                // Explicitly measure teardown
                drop(data);  // This IS measured
            },
            BatchSize::SmallInput,
        )
    });
}
 
criterion_group!(benches, teardown_example);
criterion_main!(benches);

Teardown (Drop) happens after measurement by default.

Benchmarking I/O Operations

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
use std::io::{Cursor, Read, Write};
 
fn io_benchmark(c: &mut Criterion) {
    // Setup creates fresh state for each batch
    c.bench_function("read_from_cursor", |bencher| {
        bencher.iter_batched(
            || {
                // Setup: create cursor with data
                let data = vec![0u8; 1024];
                Cursor::new(data)
            },
            |mut cursor| {
                // Measurement: read from cursor
                let mut buf = [0u8; 512];
                cursor.read_exact(&mut buf).unwrap();
                black_box(buf)
            },
            BatchSize::SmallInput,
        )
    });
    
    // For file I/O, ensure each iteration gets fresh state
    c.bench_function("write_to_cursor", |bencher| {
        bencher.iter_batched(
            || {
                // Setup: fresh cursor for writing
                Cursor::new(Vec::with_capacity(1024))
            },
            |mut cursor| {
                // Measurement: write to cursor
                let data = [1u8; 512];
                cursor.write_all(&data).unwrap();
                black_box(cursor)
            },
            BatchSize::SmallInput,
        )
    });
}
 
criterion_group!(benches, io_benchmark);
criterion_main!(benches);

I/O benchmarks benefit from fresh setup state per batch.

NumBatches for Control

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn num_batches_control(c: &mut Criterion) {
    // NumBatches gives explicit control over batch count
    // This affects how often setup/teardown run
    
    let setup_count = std::sync::atomic::AtomicUsize::new(0);
    
    c.bench_function("explicit_10_batches", |bencher| {
        bencher.iter_batched(
            || {
                // Setup will run exactly 10 times
                setup_count.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
                vec![1u64; 100]
            },
            |data| {
                data.iter().sum::<u64>()
            },
            BatchSize::NumBatches(10),  // Exactly 10 batches
        )
    });
    
    // Useful when:
    // 1. Setup is very expensive
    // 2. You want consistent number of setup calls
    // 3. Debugging measurement overhead
    
    c.bench_function("explicit_100_batches", |bencher| {
        bencher.iter_batched(
            || vec![1u64; 100],
            |data| data.iter().sum::<u64>(),
            BatchSize::NumBatches(100),  // Exactly 100 batches
        )
    });
}
 
criterion_group!(benches, num_batches_control);
criterion_main!(benches);

NumBatches gives precise control over batch count.

Choosing the Right BatchSize

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn choosing_batch_size(c: &mut Criterion) {
    let mut group = c.benchmark_group("batch_size_guide");
    
    // Very fast operation (< 1µs): use LargeInput
    // Minimizes measurement overhead
    group.bench_function("fast_operation_large", |bencher| {
        bencher.iter_batched(
            || 1u64,
            |x| black_box(x + 1),
            BatchSize::LargeInput,
        )
    });
    
    // Medium operation (1µs - 1ms): use SmallInput
    // Good balance of overhead and statistical accuracy
    group.bench_function("medium_operation_small", |bencher| {
        bencher.iter_batched(
            || vec![1u64; 100],
            |data| data.iter().sum::<u64>(),
            BatchSize::SmallInput,
        )
    });
    
    // Slow operation (> 1ms): use PerIteration
    // Setup overhead is negligible
    group.bench_function("slow_operation_per_iter", |bencher| {
        bencher.iter_batched(
            || {
                // Some setup work
                1000u64
            },
            |n| {
                // Slow operation
                std::thread::sleep(std::time::Duration::from_micros(n));
                black_box(42)
            },
            BatchSize::PerIteration,
        )
    });
    
    // Expensive setup: consider NumBatches
    // Control how often setup runs
    group.bench_function("expensive_setup_num_batches", |bencher| {
        bencher.iter_batched(
            || {
                // Expensive setup
                (0..1000).collect::<Vec<_>>()
            },
            |data| data.iter().sum::<u64>(),
            BatchSize::NumBatches(10),
        )
    });
    
    group.finish();
}
 
criterion_group!(benches, choosing_batch_size);
criterion_main!(benches);

Choose batch size based on operation speed and setup cost.

Common Pitfalls

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn common_pitfalls(c: &mut Criterion) {
    // Pitfall 1: Forgetting that setup isn't measured
    c.bench_function("pitfall_unmeasured_setup", |bencher| {
        bencher.iter_batched(
            || {
                // This setup isn't in the measurement
                // If it's part of the real workload, benchmark is misleading
                let mut data = vec![0u64; 1_000_000];
                data.fill(42);
                data
            },
            |data| data.iter().sum::<u64>(),
            BatchSize::SmallInput,
        )
    });
    
    // Pitfall 2: Batch size too small for fast operations
    c.bench_function("pitfall_small_batch_fast_op", |bencher| {
        bencher.iter_batched(
            || 1u64,
            |x| black_box(x + 1),
            BatchSize::PerIteration,  // Too much overhead for such fast op
        )
    });
    
    // Correct: use iter() for trivial operations
    c.bench_function("correct_fast_op", |bencher| {
        bencher.iter(|| black_box(1u64 + 1))
    });
    
    // Pitfall 3: Reusing mutable state incorrectly
    c.bench_function("pitfall_state_reuse", |bencher| {
        bencher.iter_batched(
            || vec![1u64; 100],
            |mut data| {
                // Mutating and relying on fresh state each time
                data.clear();  // Wrong if you expect full vector!
                data.iter().sum::<u64>()
            },
            BatchSize::SmallInput,
        )
    });
    
    // Correct: setup provides fresh state for each batch
    c.bench_function("correct_fresh_state", |bencher| {
        bencher.iter_batched(
            || vec![1u64; 100],  // Fresh vector each batch
            |data| {
                data.iter().sum::<u64>()  // Don't mutate
            },
            BatchSize::SmallInput,
        )
    });
}
 
criterion_group!(benches, common_pitfalls);
criterion_main!(benches);

Avoid common mistakes by matching batch size to operation characteristics.

Summary Table

BatchSize	Setup Frequency	Best For
`PerIteration`	Every iteration	Slow operations (> 1ms)
`SmallInput`	Every few iterations	Medium operations (1µs - 1ms)
`LargeInput`	Many iterations per batch	Fast operations (< 1µs)
`NumBatches(n)`	Exactly n times	Expensive setup, debugging

Synthesis

BatchSize controls the trade-off between measurement accuracy and setup/teardown overhead:

How it works:

Criterion times batches of iterations, not individual iterations
Per-iteration time = batch time / batch size
Setup runs once per batch, not per iteration
Smaller batches mean setup overhead affects measurement more

Choosing BatchSize:

Fast operations: Use LargeInput to minimize per-call overhead
Medium operations: Use SmallInput for balanced measurement
Slow operations: Use PerIteration since setup overhead is negligible
Expensive setup: Use NumBatches to control setup frequency

Key insight: The batch size determines how setup/teardown costs are amortized across measurements. With PerIteration, setup costs are fully included in each measurement. With LargeInput, setup costs are spread across many iterations, reducing their impact on per-iteration measurements. Choose based on whether setup is part of what you want to measure or just infrastructure for the benchmark. For operations where setup should be excluded, iter_batched with appropriate BatchSize ensures clean measurements. For operations where setup should be included, use iter or PerIteration.

How does criterion::BatchSize affect benchmarking setup and teardown measurements?