What is the purpose of criterion::BatchSize for controlling iteration counts in memory-intensive benchmarks?

BatchSize controls how many times the benchmarked routine runs per iteration of the measurement loop. Criterion uses this to balance statistical reliability against measurement overhead: too few iterations per sample produces noisy data, while too many can obscure fine-grained performance differences. For memory-intensive benchmarks, proper BatchSize configuration becomes critical because the benchmark harness's overhead must not dominate the measured time, and memory allocation patterns can distort timing if iterations reset state between calls. The BatchSize::Small and BatchSize::Large variants provide hints to Criterion's heuristics, while BatchSize::PerIteration enables per-iteration setup for operations requiring fresh input state.

The Benchmark Loop Structure

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn benchmark_example(c: &mut Criterion) {
    c.bench_function("example", |b| {
        // The bench function runs this setup once
        let data = vec![0u8; 1_000_000];  // 1MB of data
        
        b.iter(|| {
            // This closure runs multiple times
            // How many times? That's what BatchSize controls
            let result = process_data(&data);
            black_box(result);
        });
    });
}
 
fn process_data(data: &[u8]) -> usize {
    data.len()
}
 
criterion_group!(benches, benchmark_example);
criterion_main!(benches);

Criterion runs the measured routine many times per iteration, then takes many iterations to build statistical confidence.

Default Iteration Behavior

use criterion::{black_box, criterion_group, criterion_main, Criterion};
 
fn benchmark_default(c: &mut Criterion) {
    c.bench_function("default_iterations", |b| {
        let data = vec![0u8; 1000];
        
        b.iter(|| {
            // Criterion automatically determines iteration count
            // It tries to run enough iterations to get reliable timing
            // But has no idea about your memory requirements
            process(&data)
        });
    });
}
 
fn process(data: &[u8]) -> usize {
    data.iter().sum::<u8>() as usize
}
 
criterion_group!(benches, benchmark_default);
criterion_main!(benches);

By default, Criterion uses heuristics based on measured time to determine iteration counts.

Why BatchSize Matters for Memory-Intensive Operations

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn benchmark_memory_intensive(c: &mut Criterion) {
    // Problem: Memory-intensive operations have special considerations
    
    c.bench_function("large_allocation", |b| {
        b.iter(|| {
            // Each iteration allocates 100MB
            let large: Vec<u8> = vec
![0u8; 100_000_000];
            black_box(large)
        });
    });
    
    // Issues with default behavior:
    // 1. Criterion may run too many iterations, exhausting memory
    // 2. Memory allocation overhead may dominate the timing
    // 3. GC/memory pressure affects subsequent iterations
    
    // BatchSize helps control this
}
 
criterion_group!(benches, benchmark_memory_intensive);
criterion_main!(benches);

Memory-intensive benchmarks need careful iteration control to avoid resource exhaustion and measurement distortion.

BatchSize::Small for Quick Operations

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn benchmark_small(c: &mut Criterion) {
    // Small operations: very fast, need many iterations
    c.bench_function("small_operation", |b| {
        b.iter_batched(
            || 1u64,  // Input
            |x| x * 2,  // Operation
            BatchSize::Small,  // Hint: operation is fast, run many times
        );
    });
    
    // BatchSize::Small tells Criterion:
    // - Run more iterations per sample
    // - Measurement overhead is significant relative to operation
    // - Need more data points for statistical confidence
    
    // Useful for:
    // - Simple arithmetic
    // - Small data structure operations
    // - Function call overhead measurement
}
 
criterion_group!(benches, benchmark_small);
criterion_main!(benches);

Small hints that many iterations per sample are needed for reliable measurement.

BatchSize::Large for Slow Operations

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn benchmark_large(c: &mut Criterion) {
    // Large operations: slower, fewer iterations needed
    c.bench_function("large_operation", |b| {
        b.iter_batched(
            || vec
![0u8; 10_000_000],  // 10MB
            |data
 {
                // Expensive operation on large data
                data.into_iter().filter(|&x| x > 0).count()
            },
            BatchSize::Large,  // Hint: operation is slow, run fewer times
        );
    });
    
    // BatchSize::Large tells Criterion:
    // - Run fewer iterations per sample
    // - Operation time dominates measurement overhead
    // - Fewer data points needed for statistical confidence
    
    // Useful for:
    // - File I/O
    // - Network operations
    // - Large data processing
    // - Expensive computations
}
 
criterion_group!(benches, benchmark_large);
criterion_main!(benches);

Large hints that fewer iterations are sufficient since the operation itself is slow.

iter_batched for Per-Iteration Setup

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn benchmark_batched(c: &mut Criterion) {
    // iter_batched separates setup from measurement
    
    c.bench_function("batched_example", |b| {
        b.iter_batched(
            // Setup: Runs once per iteration, NOT timed
            || {
                // Create fresh input for each batch
                vec
![0u8; 1000]
            },
            // Routine: Runs multiple times per batch, IS timed
            |data| {
                process_data(data)
            },
            // BatchSize: How many times routine runs per setup call
            BatchSize::SmallInput,
        );
    });
    
    // Key insight:
    // - Setup runs once per batch
    // - Routine runs BatchSize times per batch
    // - Total time is divided by BatchSize
}
 
fn process_data(data: Vec<u8>) -> usize {
    data.len()
}
 
criterion_group!(benches, benchmark_batched);
criterion_main!(benches);

iter_batched allows setup code to run outside the timed region.

BatchSize Variants

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn benchmark_variants(c: &mut Criterion) {
    // BatchSize::Small: Run many iterations (for fast operations)
    c.bench_function("small", |b| {
        b.iter_batched(
            || 1u64,
            |x| x + 1,
            BatchSize::Small,
        );
    });
    
    // BatchSize::Large: Run few iterations (for slow operations)
    c.bench_function("large", |b| {
        b.iter_batched(
            || vec
![0u64; 1_000_000]
,
            |v| v.into_iter().sum::<u64>(),
            BatchSize::Large,
        );
    });
    
    // BatchSize::PerIteration: Setup runs for EACH iteration
    c.bench_function("per_iteration", |b| {
        b.iter_batched(
            || vec
![0u8; 100],  // Fresh vec for each measured iteration
            |v| v.len(),
            BatchSize::PerIteration,
        );
    });
    
    // SmallInput, LargeInput, PerIteration variants
    // that pass input by value instead of reference
}
 
criterion_group!(benches, benchmark_variants);
criterion_main!(benches);

Different variants provide different iteration counts based on expected operation cost.

Per-Iteration Setup for Memory Operations

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn benchmark_per_iteration(c: &mut Criterion) {
    // When each iteration needs fresh data, use PerIteration
    
    c.bench_function("per_iter_setup", |b| {
        b.iter_batched(
            // Setup: Creates fresh Vec for each iteration
            || Vec::with_capacity(1000),
            // Routine: Uses the Vec
            |mut v| {
                for i in 0..100 {
                    v.push(i);
                }
                v.len()
            },
            BatchSize::PerIteration,
        );
    });
    
    // PerIteration ensures:
    // 1. Setup runs before EACH measured routine call
    // 2. Each routine call gets fresh state
    // 3. Memory from previous iteration is freed
    // 4. No accumulated state between iterations
    
    // This is crucial for:
    // - Operations that mutate input
    // - Benchmarks needing fresh allocation each time
    // - Avoiding memory pressure from accumulated allocations
}
 
criterion_group!(benches, benchmark_per_iteration);
criterion_main!(benches);

PerIteration ensures setup runs before each measured iteration.

Memory Pressure and BatchSize

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn benchmark_memory_pressure(c: &mut Criterion) {
    // Without proper BatchSize, memory pressure distorts results
    
    // BAD: Default may run too many iterations
    c.bench_function("memory_pressure_bad", |b| {
        b.iter(|| {
            // Each call allocates 10MB
            let data = vec
![0u8; 10_000_000];
            process(&data)
        });
    });
    // Problem: Criterion may run 1000+ iterations
    // = 10GB+ of allocations during measurement
    // Memory pressure affects timing
    
    // GOOD: Use Large to reduce iterations
    c.bench_function("memory_pressure_good", |b| {
        b.iter_batched(
            || vec
![0u8; 10_000_000],
            |data| process(&data),
            BatchSize::Large,  // Fewer iterations, less memory pressure
        );
    });
    
    // BETTER: Use PerIteration with explicit cleanup
    c.bench_function("memory_pressure_best", |b| {
        b.iter_batched(
            || Vec::with_capacity(10_000_000),
            |mut v| {
                v.extend(std::iter::repeat(0).take(10_000_000));
                process(&v)
            },
            BatchSize::PerIteration,
        );
    });
}
 
fn process(data: &[u8]) -> usize {
    data.len()
}
 
criterion_group!(benches, benchmark_memory_pressure);
criterion_main!(benches);

Controlling iteration count prevents memory pressure from distorting measurements.

iter_batched_ref for Borrowed Input

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn benchmark_batched_ref(c: &mut Criterion) {
    // iter_batched_ref passes input by reference
    
    c.bench_function("batched_ref", |b| {
        b.iter_batched_ref(
            || vec
![0u64; 1000],  // Setup creates Vec
            |v| {  // Receives &Vec, not Vec
                v.iter().sum::<u64>()  // Can only read
            },
            BatchSize::SmallInput,
        );
    });
    
    // When you don't need ownership:
    // - iter_batched_ref passes &T instead of T
    // - Input is reused across batch iterations
    // - Less allocation overhead
    
    // BatchSize::SmallInput passes &Input for multiple iterations
    // BatchSize::PerIteration passes &Input for each iteration separately
}
 
criterion_group!(benches, benchmark_batched_ref);
criterion_main!(benches);

iter_batched_ref provides a reference to the input, useful when ownership isn't required.

Controlling Iteration Count Explicitly

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn benchmark_explicit_iterations(c: &mut Criterion) {
    // Sometimes you need exact control over iteration count
    
    // BatchSize doesn't give exact control, but you can:
    
    // 1. Use iter_batched with appropriate BatchSize variant
    c.bench_function("controlled_batch", |b| {
        b.iter_batched(
            || 1u64,
            |x| x + 1,
            BatchSize::Small,  // More iterations per batch
        );
    });
    
    // 2. Use sample_size to control overall sample count
    let mut criterion = Criterion::default()
        .sample_size(100);  // Fewer samples for slow operations
    
    c.bench_function("controlled_samples", |b| {
        b.iter(|| expensive_operation());
    });
}
 
fn expensive_operation() -> u64 {
    // Simulated expensive operation
    std::hint::black_box(1000)
}
 
criterion_group!(benches, benchmark_explicit_iterations);
criterion_main!(benches);

For fine-grained control, combine BatchSize with sample_size configuration.

BatchSize::NumIterations for Exact Control

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn benchmark_exact_iterations(c: &mut Criterion) {
    // For exact control, use sample_size and measurement_time
    
    let mut criterion = Criterion::default()
        .sample_size(10)  // 10 samples
        .measurement_time(std::time::Duration::from_secs(1));
    
    // With iter_batched, you control iterations per batch
    // With sample_size, you control total batches
    
    // Combined approach for memory-intensive benchmarks:
    criterion.bench_function("exact_control", |b| {
        b.iter_batched(
            || vec
![0u8; 1_000_000],
            |data| {
                // Operation runs BatchSize::Large times per batch
                data.len()
            },
            BatchSize::Large,  // Few iterations per batch
        );
    });
    
    // Total memory used:
    // = sample_size * BatchSize_iterations * 1MB
    // With sample_size=10 and Large BatchSize (~10 iterations)
    // = 10 * 10 * 1MB = 100MB total (approximately)
}
 
criterion_group!(benches, benchmark_exact_iterations);
criterion_main!(benches);

Combining configuration options provides precise control over resource usage.

Comparison of BatchSize Variants

use criterion::BatchSize;
 
fn main() {
    // BatchSize::Small
    // - For fast operations (< 1µs)
    // - Runs many iterations per batch (100+)
    // - Reduces measurement overhead percentage
    
    // BatchSize::Large
    // - For slow operations (> 10ms)
    // - Runs few iterations per batch (1-10)
    // - Avoids excessive total runtime
    
    // BatchSize::PerIteration
    // - Setup runs before EACH iteration
    // - Fresh input for each measurement
    // - Essential for stateful benchmarks
    
    // BatchSize::SmallInput
    // - Like Small, but input passed by value
    // - Use when operation takes ownership
    
    // BatchSize::LargeInput  
    // - Like Large, but input passed by value
    
    // BatchSize::PerIteration
    // - Like PerIteration, but input passed by value
    
    println!("Choose BatchSize based on operation speed and memory requirements");
}

Choose based on operation speed and whether you need fresh input per iteration.

Practical Example: File I/O Benchmark

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
use std::io::Write;
use tempfile::NamedTempFile;
 
fn benchmark_file_io(c: &mut Criterion) {
    // File I/O is slow and memory-intensive
    
    c.bench_function("file_write", |b| {
        b.iter_batched(
            // Setup: Create fresh file for each batch
            || {
                let mut file = NamedTempFile::new().unwrap();
                let data = vec
![0u8; 1_000_000];  // 1MB
                (file, data)
            },
            // Routine: Write data to file
            |(mut file, data)| {
                file.write_all(&data).unwrap();
                file.flush().unwrap();
                file
            },
            BatchSize::SmallInput,  // File I/O is slow enough
        );
    });
    
    // Key choices:
    // - BatchSize::SmallInput: File write is slow, few iterations
    // - Setup creates fresh file: No state accumulation
    // - Input size: Controls memory usage per iteration
}
 
criterion_group!(benches, benchmark_file_io);
criterion_main!(benches);

File I/O benchmarks benefit from Large or SmallInput due to inherent slowness.

Practical Example: Collection Operations

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn benchmark_collection(c: &mut Criterion) {
    // Benchmark that modifies collection state
    
    c.bench_function("vec_push", |b| {
        b.iter_batched(
            // Setup: Fresh Vec for each iteration
            || Vec::with_capacity(1000),
            // Routine: Push elements
            |mut v| {
                for i in 0..100 {
                    v.push(black_box(i));
                }
                v.len()
            },
            // PerIteration: Need fresh Vec each time
            // because routine mutates it
            BatchSize::PerIteration,
        );
    });
    
    // Without PerIteration:
    // - First iteration: push to empty Vec
    // - Second iteration: push to non-empty Vec (wrong!)
    // - Capacity might grow differently
    // - Measurement is inconsistent
    
    c.bench_function("vec_sum", |b| {
        b.iter_batched_ref(
            // Setup: Create Vec once
            || (0..1000).collect::<Vec<u64>>(),
            // Routine: Read-only operation
            |v| v.iter().sum::<u64>(),
            // SmallInput: Fast operation, many iterations
            BatchSize::SmallInput,
        );
    });
    // Can reuse same Vec because routine doesn't mutate it
}
 
criterion_group!(benches, benchmark_collection);
criterion_main!(benches);

Use PerIteration when operations mutate input; use SmallInput or LargeInput for read-only operations.

Memory Allocator Effects

use criterion::{black_box, criterion_group, criterion_main, Criterion, BatchSize};
 
fn benchmark_allocator_effects(c: &mut Criterion) {
    // Memory allocation has significant timing variability
    
    // Problem: Allocator state affects timing
    c.bench_function("allocation_variability", |b| {
        b.iter(|| {
            let v: Vec<u8> = (0..100_000).collect();
            black_box(v);
        });
    });
    
    // Per-iteration setup ensures consistent allocator state
    c.bench_function("allocation_consistent", |b| {
        b.iter_batched(
            || (),
            |_| {
                let v: Vec<u8> = (0..100_000).collect();
                black_box(v);
            },
            BatchSize::PerIteration,  // Fresh allocator state each time
        );
    });
    
    // Note: This doesn't eliminate allocator variability
    // But it reduces state carryover between iterations
    
    // Alternative: Pre-allocate in setup
    c.bench_function("allocation_preallocated", |b| {
        b.iter_batched(
            || vec
![0u8; 100_000],  // Allocation in setup (not timed)
            |mut v| {
                // Use pre-allocated Vec
                v[0] = 1;
                v.len()
            },
            BatchSize::PerIteration,
        );
    });
}
 
criterion_group!(benches, benchmark_allocator_effects);
criterion_main!(benches);

BatchSize::PerIteration helps isolate allocator effects from measured operations.

Synthesis

Key concepts:

Variant Iterations per Batch Use Case
Small Many (100+) Fast operations (< 1µs)
Large Few (1-10) Slow operations (> 10ms)
PerIteration 1 (setup per call) Mutable input, fresh state

Why BatchSize matters for memory-intensive benchmarks:

  1. Memory pressure: Too many iterations exhaust memory, causing GC/allocator overhead
  2. Allocator state: Reusing allocations between iterations distorts timing
  3. Measurement overhead: Fast operations need many iterations; slow operations need few
  4. Fresh state: PerIteration ensures clean input for each measurement

Choosing the right BatchSize:

  • Fast operations (simple arithmetic, small data): Small or SmallInput
  • Slow operations (file I/O, large data): Large or LargeInput
  • Mutating operations (modifying input): PerIteration
  • Read-only operations (no input modification): SmallInput or LargeInput

Common patterns:

// Fast, read-only: SmallInput
b.iter_batched_ref(|| data, |d| d.len(), BatchSize::SmallInput);
 
// Slow, read-only: LargeInput  
b.iter_batched_ref(|| large_data, |d| process(d), BatchSize::LargeInput);
 
// Mutates input: PerIteration
b.iter_batched(|| Vec::new(), |mut v| { v.push(1); v }, BatchSize::PerIteration);
 
// Memory allocation: PerIteration or pre-allocate in setup
b.iter_batched(|| vec
![0; N], |v| process(v), BatchSize::PerIteration);

Key insight: BatchSize controls the relationship between setup code and measured iterations, directly affecting memory usage patterns in benchmarks. For memory-intensive operations, proper BatchSize prevents resource exhaustion, reduces allocator noise, and ensures each iteration starts with consistent state. The choice between Small, Large, and PerIteration depends on operation speed, input mutability requirements, and whether memory pressure must be controlled. Using iter_batched or iter_batched_ref with appropriate BatchSize produces more reliable measurements than iter alone, especially when memory allocation or state mutation is involved.