How does criterion::BatchSize differ from SmallInput for controlling benchmark iteration costs?

BatchSize and SmallInput are both enum variants that tell Criterion how to account for per-iteration overhead when benchmarking operations that are too fast to measure individually. BatchSize::SmallInput specifically configures the benchmark to create fresh inputs for each iteration by cloning small values, ensuring each iteration's input is independent. The general BatchSize enum provides multiple strategies for handling input preparation costs, while SmallInput is a convenience that assumes the input type is cheap to clone and that you want per-iteration input isolation. The key difference is that BatchSize gives you explicit control over batching strategy, while SmallInput is a specific preset for common small-input scenarios.

Understanding Per-Iteration Overhead

use criterion::{Criterion, black_box};
 
fn main() {
    // When benchmarking very fast operations:
    // - Individual iterations are too fast to measure accurately
    // - Criterion runs many iterations per sample
    // - Setup/teardown between iterations adds overhead
 
    // The problem: if input creation is part of each iteration,
    // the benchmark measures operation + input creation, not just operation
 
    // Example: measuring hash function
    let input = "hello world".to_string();
    
    // Fast operation: compute hash
    criterion::Criterion::default()
        .bench_function("hash_fast", |b| {
            b.iter(|| {
                // This is too fast to measure individually
                let hash = black_box(&input).len().wrapping_mul(31);
                black_box(hash)
            });
        });
}

For fast operations, the measurement includes setup overhead that may not reflect real-world usage.

The BatchSize Enum

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    // BatchSize controls how Criterion handles input preparation
    
    // SmallInput: Clone input for each iteration
    // - Best for: small, cheap-to-clone types
    // - Input is fresh each iteration
    // - Overhead from cloning is subtracted
    
    // SmallInput variant assumes:
    // - Input type: cheap to clone (Copy or fast Clone)
    // - You want fresh input per iteration
    
    // Other BatchSize variants:
    // - BatchSize::PerIteration: Measure input creation per iteration
    // - BatchSize::NumBatches(n): Use n batches of input
    // - BatchSize::NumIterations: One input for all iterations
    
    let c = Criterion::default();
    
    c.bench_function("with_batch_size", |b| {
        let input = vec![1, 2, 3, 4, 5];
        
        b.iter_batched(
            || input.clone(),           // Setup: create fresh input
            |data| {                    // Benchmark routine
                black_box(data.iter().sum::<i32>())
            },
            BatchSize::SmallInput,       // Use SmallInput for vec cloning
        );
    });
}

BatchSize is an enum with multiple strategies; SmallInput is one variant optimized for small, cloneable inputs.

When to Use SmallInput

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    // SmallInput is appropriate when:
    // 1. Input type is small (few bytes)
    // 2. Clone/Copy is very cheap
    // 3. You need fresh input each iteration
    
    let c = Criterion::default();
    
    // Example 1: Integer input (Copy)
    c.bench_function("fib_small_input", |b| {
        b.iter_batched(
            || 42u64,                    // Setup: trivial Copy
            |n| black_box(fib(n)),       // Benchmark
            BatchSize::SmallInput,       // Perfect for Copy types
        );
    });
    
    // Example 2: Small string slice (no allocation)
    c.bench_function("parse_small_string", |b| {
        b.iter_batched(
            || "test_string",            // Setup: &'static str, no cost
            |s| black_box(s.len()),      // Benchmark
            BatchSize::SmallInput,       // Appropriate
        );
    });
    
    // Example 3: Small array (Copy)
    c.bench_function("sum_array", |b| {
        b.iter_batched(
            || [1i32, 2, 3, 4, 5],       // Setup: array on stack
            |arr| black_box(arr.iter().sum()),
            BatchSize::SmallInput,       // Array is Copy
        );
    });
}
 
fn fib(n: u64) -> u64 {
    if n <= 1 { n } else { fib(n - 1) + fib(n - 2) }
}

Use SmallInput for small, trivially-copyable inputs where cloning overhead is negligible.

BatchSize Variants Comparison

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    let c = Criterion::default();
    
    // BatchSize::SmallInput
    // - Clones input for each iteration
    // - Measures cloning overhead separately and subtracts
    // - Input is fresh every time
    // - Best for: Copy types, small Clone types
    
    c.bench_function("small_input", |b| {
        let input = 42i32;
        b.iter_batched(
            || input,
            |n| black_box(n * 2),
            BatchSize::SmallInput,
        );
    });
    
    // BatchSize::PerIteration
    // - Calls setup once per iteration
    // - Does NOT subtract setup cost
    // - Use when setup IS part of what you're measuring
    // - Input is fresh but setup is measured
    
    c.bench_function("per_iteration", |b| {
        b.iter_batched(
            || vec![1, 2, 3],            // Setup measured
            |v| black_box(v.len()),
            BatchSize::PerIteration,
        );
    });
    
    // BatchSize::NumBatches(n)
    // - Creates n batches of input
    // - Each batch used for multiple iterations
    // - Balances setup frequency vs input freshness
    // - Useful for moderately expensive setup
    
    c.bench_function("num_batches", |b| {
        b.iter_batched(
            || vec![1, 2, 3, 4, 5],
            |v| black_box(v.iter().sum::<i32>()),
            BatchSize::NumBatches(10),   // 10 batches total
        );
    });
    
    // BatchSize::NumIterations (or similar)
    // - One input for entire benchmark
    // - Input reused across all iterations
    // - Only setup once per sample
    // - Use when input can be reused safely
    
    c.bench_function("num_iterations", |b| {
        b.iter_batched(
            || vec![1, 2, 3, 4, 5],
            |v| black_box(v.capacity()),
            BatchSize::NumBatches(1),    // Minimal batching
        );
    });
}

Each variant handles the balance between input freshness and setup cost differently.

Measuring Input Creation Overhead

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    // The fundamental problem: input creation might be expensive
    
    let c = Criterion::default();
    
    // Scenario: parsing JSON from string
    // Input creation: allocate string (potentially expensive)
    // Operation: parse the string
    
    // WRONG: iter() doesn't separate input creation
    c.bench_function("parse_json_wrong", |b| {
        b.iter(|| {
            let input = r#"{"key": "value"}"#.to_string();
            black_box(input.len())
        });
        // Measures: allocation + len(), not just len()
    });
    
    // CORRECT with SmallInput: measure len() only
    c.bench_function("parse_json_small", |b| {
        b.iter_batched(
            || r#"{"key": "value"}"#,    // Setup (cheap: &'static str)
            |s| black_box(s.len()),       // Just measure this
            BatchSize::SmallInput,
        );
    });
    
    // When input IS expensive and you want to measure it:
    c.bench_function("alloc_and_len", |b| {
        b.iter_batched(
            || "hello".repeat(1000),     // Expensive setup
            |s| s.len(),                  // Operation
            BatchSize::PerIteration,     // Setup is MEASURED
        );
    });
}

SmallInput separates setup cost from measurement; PerIteration includes setup in measurement.

Practical Example: Vector Operations

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    let c = Criterion::default();
    
    // Benchmark sorting vectors of different sizes
    
    // Small vectors: SmallInput is appropriate
    c.bench_function("sort_small_vec", |b| {
        b.iter_batched(
            || vec![3, 1, 4, 1, 5, 9, 2, 6],  // Small vec, cheap clone
            |mut v| {
                v.sort();
                black_box(v)
            },
            BatchSize::SmallInput,           // Clone each iteration
        );
    });
    
    // Medium vectors: consider NumBatches
    c.bench_function("sort_medium_vec", |b| {
        b.iter_batched(
            || (0..1000).rev().collect::<Vec<i32>>(),
            |mut v| {
                v.sort();
                black_box(v)
            },
            BatchSize::NumBatches(100),      // Fewer clones
        );
    });
    
    // Large vectors: minimize cloning overhead
    c.bench_function("sort_large_vec", |b| {
        b.iter_batched(
            || (0..100_000).rev().collect::<Vec<i32>>(),
            |mut v| {
                v.sort();
                black_box(v)
            },
            BatchSize::NumBatches(4),        // Minimal clones
        );
    });
}

Choose batching strategy based on input size and clone cost.

When SmallInput is NOT Appropriate

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    let c = Criterion::default();
    
    // DON'T use SmallInput when:
    
    // 1. Clone is expensive
    c.bench_function("bad_small_input", |b| {
        b.iter_batched(
            || {
                // Expensive to clone: large vec
                (0..1_000_000).collect::<Vec<i32>>()
            },
            |v| black_box(v.len()),
            // BAD: BatchSize::SmallInput,  // Clones 1M items each iteration!
            BatchSize::NumBatches(4),      // Better: clone rarely
        );
    });
    
    // 2. Clone changes semantics (e.g., File handles)
    // File doesn't implement Clone, so this won't compile anyway
    
    // 3. You want to measure setup too
    c.bench_function("setup_included", |b| {
        b.iter_batched(
            || {
                // This setup is part of real-world usage
                let mut v = Vec::new();
                for i in 0..100 {
                    v.push(i);
                }
                v
            },
            |v| black_box(v.len()),
            BatchSize::PerIteration,       // Include setup in measurement
        );
    });
    
    // 4. Input doesn't need fresh copy each time
    c.bench_function("read_only", |b| {
        b.iter_batched(
            || vec![1, 2, 3, 4, 5],
            |v| black_box(v.iter().sum::<i32>()), // Read-only
            // Can reuse input safely
            BatchSize::NumBatches(1),      // One clone for all iterations
        );
    });
}

Large inputs, setup measurement, and read-only operations need different strategies.

Understanding Overhead Subtraction

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    // SmallInput automatically subtracts clone overhead
    
    // When you use SmallInput:
    // 1. Criterion measures clone time separately
    // 2. Criterion measures iter_batched time (clone + operation)
    // 3. Criterion subtracts clone time from total
    // 4. Result: pure operation time
    
    // This only works because SmallInput implies:
    // - Clone is deterministic (same cost each time)
    // - Clone is cheap enough to measure accurately
    // - Clone overhead is not part of what you're benchmarking
    
    // If clone is expensive or variable, use PerIteration instead
    
    let c = Criterion::default();
    
    // Example where overhead subtraction matters:
    c.bench_function("overhead_example", |b| {
        let data = vec![1, 2, 3, 4, 5];
        
        b.iter_batched(
            || data.clone(),               // Setup: clone
            |v| black_box(v.iter().sum::<i32>()), // Operation
            BatchSize::SmallInput,         // Subtract clone overhead
        );
    });
    
    // Compare with PerIteration (overhead included):
    c.bench_function("overhead_included", |b| {
        let data = vec![1, 2, 3, 4, 5];
        
        b.iter_batched(
            || data.clone(),
            |v| black_box(v.iter().sum::<i32>()),
            BatchSize::PerIteration,        // Don't subtract overhead
        );
    });
}

SmallInput subtracts clone overhead; PerIteration includes it in measurements.

Working with iter_batched_ref

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    let c = Criterion::default();
    
    // iter_batched: takes ownership of input
    c.bench_function("owned_input", |b| {
        b.iter_batched(
            || vec![1, 2, 3],
            |v| black_box(v.into_iter().sum::<i32>()), // Takes ownership
            BatchSize::SmallInput,
        );
    });
    
    // iter_batched_ref: passes reference
    c.bench_function("ref_input", |b| {
        b.iter_batched(
            || vec![1, 2, 3],
            |v| {
                // v is &Vec<i32>
                black_box(v.iter().sum::<i32>())
            },
            BatchSize::SmallInput,
        );
    });
    
    // Both use the same BatchSize parameter
    // The choice is about whether you need ownership
    
    // Use iter_batched when:
    // - Operation consumes input
    // - You need mutable access
    // - Operation takes ownership
    
    // Use iter_batched_ref when:
    // - Operation only reads input
    // - You can reuse input
    // - Ownership not needed
}

Both functions use BatchSize; the difference is ownership vs reference passing.

Benchmarked vs Throughput

use criterion::{Criterion, BatchSize, black_box, Throughput};
 
fn main() {
    let c = Criterion::default();
    
    // BatchSize affects how operations are counted
    // Throughput tells Criterion about data size
    
    c.bench_function("throughput_example", |b| {
        b.throughput(Throughput::Bytes(1000)); // Processing 1000 bytes
        
        b.iter_batched(
            || vec![0u8; 1000],
            |v| {
                // Operation processes 1000 bytes
                black_box(v.iter().sum::<u8>())
            },
            BatchSize::SmallInput,
        );
    });
    
    // With SmallInput:
    // - Input creation (1000 bytes) happens each iteration
    // - Clone overhead is subtracted
    // - Throughput reflects bytes processed by operation
    // - Report shows: bytes/sec based on operation time
    
    // With PerIteration:
    // - Input creation counted in measurement
    // - Throughput might show lower bytes/sec
    // - Because setup time is included
}

Throughput combined with BatchSize affects how Criterion reports performance.

Common Patterns and Best Practices

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    let c = Criterion::default();
    
    // Pattern 1: Copy types always use SmallInput
    c.bench_function("copy_types", |b| {
        b.iter_batched(
            || 42i64,                     // Copy: no overhead
            |n| black_box(n.wrapping_mul(31)),
            BatchSize::SmallInput,
        );
    });
    
    // Pattern 2: Small Vec/String: SmallInput is fine
    c.bench_function("small_allocations", |b| {
        b.iter_batched(
            || "hello".to_string(),      // Small allocation
            |s| black_box(s.len()),
            BatchSize::SmallInput,
        );
    });
    
    // Pattern 3: Large allocations: NumBatches
    c.bench_function("large_allocations", |b| {
        b.iter_batched(
            || (0..100_000).collect::<Vec<_>>(),
            |v| black_box(v.len()),
            BatchSize::NumBatches(4),
        );
    });
    
    // Pattern 4: Setup IS the benchmark: PerIteration
    c.bench_function("setup_matters", |b| {
        b.iter_batched(
            || {
                // Real-world: allocation is part of cost
                let mut v = Vec::with_capacity(100);
                v.extend(0..100);
                v
            },
            |v| black_box(v),
            BatchSize::PerIteration,
        );
    });
    
    // Pattern 5: Reuse input when possible
    c.bench_function("reuse_input", |b| {
        let input = (0..1000).collect::<Vec<_>>();
        b.iter_batched(
            || input.clone(),            // Need clone for fresh input
            |v| black_box(v.len()),
            BatchSize::NumBatches(100),  // Reuse input across batches
        );
    });
}

Match the batching strategy to your input characteristics and what you're measuring.

Comparison Summary

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    // Summary table:
    
    // BatchSize::SmallInput
    // - Fresh input per iteration (via Clone/Copy)
    // - Clone overhead subtracted from measurement
    // - Best for: small types, Copy types
    // - Cost: N clones (N = iterations)
    
    // BatchSize::PerIteration
    // - Fresh input per iteration
    // - Clone overhead INCLUDED in measurement
    // - Best for: when setup is part of what you measure
    // - Cost: N clones (measured)
    
    // BatchSize::NumBatches(k)
    // - k batches, each used for multiple iterations
    // - Reduces clone frequency
    // - Best for: medium-sized inputs
    // - Cost: k clones
    
    // Choosing based on input size:
    // - Copy types: SmallInput (free to clone)
    // - < 100 bytes: SmallInput usually fine
    // - 100 bytes - 1KB: SmallInput or NumBatches
    // - 1KB - 1MB: NumBatches with low batch count
    // - > 1MB: NumBatches with very low batch count
    
    // Choosing based on measurement goal:
    // - Want to exclude setup: SmallInput
    // - Want to include setup: PerIteration
    // - Setup is expensive: NumBatches (few batches)
}

Synthesis

Quick reference:

use criterion::{Criterion, BatchSize, black_box};
 
fn main() {
    let c = Criterion::default();
    
    // SmallInput: clone input each iteration, subtract clone overhead
    c.bench_function("small_input", |b| {
        b.iter_batched(
            || 42i32,                     // Copy: free
            |n| black_box(n * 2),         // Just this measured
            BatchSize::SmallInput,         // Subtract "clone" overhead
        );
    });
    
    // PerIteration: clone input each iteration, include clone in measurement
    c.bench_function("per_iteration", |b| {
        b.iter_batched(
            || vec![1, 2, 3],             // Setup
            |v| black_box(v.len()),        // Operation
            BatchSize::PerIteration,       // Measure setup + operation
        );
    });
    
    // NumBatches: balance input freshness with clone overhead
    c.bench_function("num_batches", |b| {
        b.iter_batched(
            || (0..10000).collect::<Vec<_>>(), // Large input
            |v| black_box(v.len()),
            BatchSize::NumBatches(4),          // Only 4 clones
        );
    });
    
    // Decision tree:
    // 1. Is input Copy? -> SmallInput
    // 2. Is input small (<100 bytes)? -> SmallInput usually fine
    // 3. Is clone expensive? -> NumBatches with low count
    // 4. Is setup part of real-world cost? -> PerIteration
    // 5. Can input be reused safely? -> NumBatches(1) or pass directly
}

Key insight: BatchSize is an enum providing multiple strategies for handling input preparation in benchmarks. SmallInput is one variant that assumes input types are cheap to clone and that you want fresh input per iteration with clone overhead subtracted. The other variants (PerIteration, NumBatches) handle different scenarios where setup cost should be measured or input creation should be amortized. The choice depends on what you're measuring: SmallInput when you want to exclude input creation overhead, PerIteration when setup is part of the benchmarked operation, and NumBatches when inputs are expensive to clone but you still need fresh inputs occasionally. For Copy types and small Clone types, SmallInput is the default choice because the overhead subtraction is accurate and the clone cost is negligible.