How does criterion::black_box prevent compiler optimizations from skewing benchmark results?

criterion::black_box is a function that forces the compiler to treat its input as "used" without generating any actual code that observes the value. This prevents the compiler from optimizing away computations whose results would otherwise be unused, or caching values that should be recomputed each iteration. The function is implemented using inline assembly or volatile reads that have no side effects visible to the optimizer, creating a barrier that the compiler cannot see through. Without black_box, the compiler may eliminate benchmarked code entirely, hoist invariant computations out of loops, or constant-fold expressions that should be measured dynamically.

The Optimization Problem

fn main() {
    let x = 42;
    let y = x * 2;
    // y is never used
    // Compiler will likely eliminate both x and y
}

Unused values get optimized away by the compiler.

Benchmark Without Black Box

fn fibonacci(n: u64) -> u64 {
    if n < 2 {
        n
    } else {
        fibonacci(n - 1) + fibonacci(n - 2)
    }
}
 
fn main() {
    let start = std::time::Instant::now();
    
    for _ in 0..1000 {
        fibonacci(20);  // Result is discarded
    }
    
    println!("Time: {:?}", start.elapsed());
}

The compiler may eliminate fibonacci(20) calls since results are unused.

Using Black Box

fn fibonacci(n: u64) -> u64 {
    if n < 2 {
        n
    } else {
        fibonacci(n - 1) + fibonacci(n - 2)
    }
}
 
use std::hint::black_box;
 
fn main() {
    let start = std::time::Instant::now();
    
    for _ in 0..1000 {
        let result = fibonacci(20);
        black_box(result);  // Forces result to be "used"
    }
    
    println!("Time: {:?}", start.elapsed());
}

black_box prevents the compiler from eliminating the computation.

What Black Box Does

use std::hint::black_box;
 
fn main() {
    let x = 42;
    
    // black_box returns its input unchanged
    let y = black_box(x);
    
    // But the compiler cannot assume anything about:
    // - Whether x was read
    // - What the value of y is
    // - Whether the read had side effects
    
    println!("y = {}", y);  // y is still 42
}

black_box is a identity function that the compiler cannot optimize through.

Implementation in std::hint

// From std::hint source (simplified)
#[inline(always)]
pub fn black_box<T>(dummy: T) -> T {
    // Implementation varies by platform
    // Common approaches:
    
    // 1. Inline assembly that does nothing
    //    asm!("", in(reg) dummy, out(reg) dummy);
    
    // 2. Volatile read
    //    unsafe { ptr::read_volatile(&dummy) }
    
    // The key: compiler cannot optimize through it
    dummy
}

The implementation uses techniques opaque to the optimizer.

Constant Folding Prevention

use std::hint::black_box;
 
fn compute(x: u64) -> u64 {
    x * x + x
}
 
fn main() {
    // Without black_box:
    // let result = compute(10);
    // Compiler may fold this to: 110 (constant)
    
    // With black_box:
    let input = black_box(10);
    let result = compute(input);
    
    // Compiler cannot assume input == 10
    // Must actually call compute
    println!("Result: {}", result);
}

black_box prevents constant folding through its input.

Loop Invariant Code Motion

use std::hint::black_box;
 
fn expensive_computation(x: u64) -> u64 {
    x * x + x
}
 
fn main() {
    // Without black_box:
    // let invariant = expensive_computation(100);
    // for i in 0..1000 {
    //     use(invariant);  // Computed once
    // }
    
    // With black_box:
    for i in 0..1000 {
        let result = expensive_computation(black_box(100));
        black_box(result);  // Computed each iteration
    }
}

black_box prevents hoisting computations out of loops.

Dead Code Elimination

use std::hint::black_box;
 
fn allocate_buffer(size: usize) -> Vec<u8> {
    Vec::with_capacity(size)
}
 
fn main() {
    // Without black_box:
    // let buffer = allocate_buffer(1024);
    // Buffer never used, allocator call eliminated
    
    // With black_box:
    let buffer = allocate_buffer(1024);
    black_box(&buffer);  // Must allocate
}

black_box forces side effects to occur.

Criterion's Use of Black Box

use criterion::{black_box, criterion_group, criterion_main, Criterion};
 
fn fibonacci(n: u64) -> u64 {
    if n < 2 {
        n
    } else {
        fibonacci(n - 1) + fibonacci(n - 2)
    }
}
 
fn criterion_benchmark(c: &mut Criterion) {
    c.bench_function("fibonacci 20", |b| {
        b.iter(|| {
            // black_box ensures fibonacci is called each iteration
            // and its result is not optimized away
            fibonacci(black_box(20))
        });
    });
}
 
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

Criterion wraps all benchmarks with black_box internally.

Input and Output Black Boxing

use criterion::{black_box, Criterion};
 
fn process(data: &[u8]) -> u64 {
    data.iter().map(|&b| b as u64).sum()
}
 
fn benchmark(c: &mut Criterion) {
    let data = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
    
    c.bench_function("process", |b| {
        b.iter(|| {
            // Black box input: compiler cannot assume data contents
            // Black box output: compiler cannot eliminate result
            process(black_box(&data))
        });
    });
}

Black box both input and output for accurate measurement.

Black Box on Inputs vs Outputs

use std::hint::black_box;
 
fn compute(x: u64) -> u64 {
    x * 2
}
 
fn main() {
    // Black box input: prevents constant propagation into function
    let input = black_box(42);
    let result = compute(input);
    
    // Black box output: prevents elimination of computation
    let result = compute(42);
    black_box(result);
    
    // Both: maximum protection
    let result = compute(black_box(42));
    black_box(result);
}

Apply to both sides for complete optimization protection.

What Black Box Does NOT Do

use std::hint::black_box;
 
fn main() {
    // black_box does NOT:
    
    // 1. Add delays or synchronization
    let x = black_box(42);
    
    // 2. Prevent CPU-level optimizations
    // CPU may still cache, prefetch, etc.
    
    // 3. Add memory barriers
    // No synchronization with other threads
    
    // 4. Prevent previous code from being optimized
    // Only affects code AFTER the call
    
    // 5. Make timing more predictable
    // Timing still varies due to CPU behavior
}

black_box only prevents compiler optimizations, not CPU optimizations.

Performance Overhead

use std::hint::black_box;
use std::time::Instant;
 
fn main() {
    let iterations = 10_000_000u64;
    
    // Without black_box
    let start = Instant::now();
    for i in 0..iterations {
        let _ = i + 1;
    }
    println!("Without black_box: {:?}", start.elapsed());
    
    // With black_box
    let start = Instant::now();
    for i in 0..iterations {
        black_box(i + 1);
    }
    println!("With black_box: {:?}", start.elapsed());
    
    // Difference: black_box prevents some optimizations
    // But adds minimal runtime overhead
}

black_box adds negligible runtime cost; it's an optimization barrier.

Multiple Black Box Calls

use std::hint::black_box;
 
fn complex_computation(a: u64, b: u64, c: u64) -> u64 {
    a * b + b * c + a * c
}
 
fn main() {
    // Each parameter needs protection
    let result = complex_computation(
        black_box(10),
        black_box(20),
        black_box(30),
    );
    black_box(result);
}

Each input should be black-boxed if needed.

Black Box in Idiomatic Benchmarks

use criterion::{black_box, criterion_group, criterion_main, Criterion};
 
fn sort_slice(data: &mut [u64]) {
    data.sort();
}
 
fn benchmark_sort(c: &mut Criterion) {
    let mut data: Vec<u64> = (0..1000).rev().collect();
    
    c.bench_function("sort 1000", |b| {
        // Need to reset data each iteration
        // Otherwise sorting already-sorted data is much faster
        b.iter_batched(
            || data.clone(),  // Setup: fresh data each time
            |mut data| {
                sort_slice(black_box(&mut data));
                black_box(data);  // Ensure result is used
            },
            criterion::BatchSize::SmallInput,
        );
    });
}
 
criterion_group!(benches, benchmark_sort);
criterion_main!(benches);

Complex benchmarks may need setup/teardown with black boxing.

Volatile vs Black Box

use std::hint::black_box;
use std::ptr;
 
fn main() {
    let x = 42;
    
    // volatile_read: reads memory each time
    // Compiler cannot cache the value
    let v = unsafe { ptr::read_volatile(&x) };
    
    // black_box: opaque to compiler
    // Compiler cannot assume anything about it
    let b = black_box(x);
    
    // Difference:
    // - volatile: memory read each time (for hardware)
    // - black_box: compiler barrier only (for optimization)
    
    // Use volatile for memory-mapped I/O
    // Use black_box for benchmarks
}

Volatile and black box serve different purposes.

Common Pitfalls

use std::hint::black_box;
 
fn main() {
    let data = vec![1, 2, 3, 4, 5];
    
    // WRONG: black_box doesn't prevent allocation optimization
    black_box(&data);  // Only marks the reference as used
    // data could still be optimized in complex ways
    
    // WRONG: black_box outside loop
    black_box(|| {
        for i in 0..1000 {
            let _ = i + 1;  // Loop can still be optimized
        }
    });
    
    // CORRECT: black_box on values inside computation
    for i in 0..1000 {
        black_box(i + 1);
    }
}

Black box must be placed where computation actually happens.

Comparison Table

Technique Prevents Adds Use Case
black_box Compiler elimination Nothing Benchmarks
volatile Compiler caching Memory read Hardware I/O
std::mem::forget Drop code Nothing Resource management
std::hint::must_use Dead code warnings Nothing API design

Synthesis

black_box solves the fundamental problem that compilers optimize unused code away, and benchmarks intentionally run code for its side effect (time) rather than its output:

Why it works: black_box is implemented as an identity function that the compiler cannot see through. Using inline assembly or volatile operations that have no defined behavior, it creates an opaque barrier. The compiler must assume the value could be read, could be any value, and could have side effects—even though none of these are true.

What it prevents: Dead code elimination (removing computations with unused results), constant folding (replacing runtime computations with compile-time constants), and loop invariant code motion (hoisting computations out of loops). These are precisely the optimizations that make code faster but benchmarks meaningless.

What it doesn't prevent: CPU-level optimizations like caching, prefetching, branch prediction, or out-of-order execution. Black box affects the compiler, not the hardware. Benchmark timing still varies due to CPU behavior.

Key insight: The purpose of black_box is to ensure the benchmark measures what you intend. The compiler is your adversary—it wants to make code faster by eliminating work. In normal code, this is good. In benchmarks, it means measuring nothing. black_box tells the compiler "this value matters" without adding any actual runtime cost, ensuring measurements reflect real computation time.