How does `rand::seq::IteratorRandom` enable random sampling from iterators without collecting?

rand::seq::IteratorRandom provides the choose and choose_multiple methods that extend iterators with random sampling capabilities. The key insight is that these methods use reservoir sampling algorithms that can select random elements in a single pass through the iterator, without needing to collect all elements into a container first. The choose method returns a single random element using Algorithm R with reservoir size 1, while choose_multiple uses a more general reservoir sampling algorithm to select multiple elements uniformly at random. This is memory-efficient for large or infinite iterators since it only needs to store the sampled elements, not the entire iterator.

Basic Random Element Selection

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    let numbers = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
    
    // Choose a single random element
    let chosen = numbers.iter().choose(&mut thread_rng());
    println!("Random element: {:?}", chosen);
    
    // Choose multiple random elements
    let chosen_multiple: Vec<_> = numbers.iter().choose_multiple(&mut thread_rng(), 3);
    println!("Random 3 elements: {:?}", chosen_multiple);
}

The choose method returns Option<&T> since the iterator might be empty.

Sampling Without Collecting

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    // Infinite iterator - cannot collect
    let random_from_infinite = (0..).choose(&mut thread_rng());
    println!("Random from infinite: {:?}", random_from_infinite);
    
    // Large range - expensive to collect
    let random_from_large = (1..1_000_000).choose(&mut thread_rng());
    println!("Random from large range: {:?}", random_from_large);
    
    // Iterator that generates values on demand
    let generated = (0..).filter(|x| x % 7 == 0).take(1000);
    let random_multiple: Vec<_> = generated.choose_multiple(&mut thread_rng(), 5);
    println!("Random 5 multiples of 7: {:?}", random_multiple);
}

These methods work on any iterator, regardless of whether it can be collected.

Reservoir Sampling Algorithm

use rand::seq::IteratorRandom;
use rand::thread_rng;
use rand::rngs::ThreadRng;
 
fn manual_reservoir_sample<T>(mut iter: impl Iterator<Item = T>, rng: &mut ThreadRng) -> Option<T> {
    // This is what choose() does internally
    let mut result: Option<T> = None;
    let mut count = 0;
    
    for item in iter {
        count += 1;
        if count == 1 {
            result = Some(item);
        } else {
            // Replace with probability 1/count
            let r = rand::random::<usize>() % count;
            if r == 0 {
                result = Some(item);
            }
        }
    }
    
    result
}
 
fn main() {
    let data = vec![10, 20, 30, 40, 50];
    
    // Using IteratorRandom
    let chosen1 = data.iter().choose(&mut thread_rng());
    
    // Manual reservoir sampling (equivalent)
    let chosen2 = manual_reservoir_sample(data.iter(), &mut thread_rng());
    
    println!("IteratorRandom: {:?}", chosen1);
    println!("Manual reservoir: {:?}", chosen2);
}

Each element has equal probability of being selected: 1/n where n is total elements.

Memory Efficiency Comparison

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    // APPROACH 1: Collect then sample (memory intensive)
    fn collect_then_sample() -> Option<i32> {
        let collected: Vec<i32> = (1..1_000_000_000).collect();
        collected.into_iter().choose(&mut thread_rng())
    }
    
    // APPROACH 2: Sample directly (memory efficient)
    fn sample_directly() -> Option<i32> {
        (1..1_000_000_000).choose(&mut thread_rng())
    }
    
    // Approach 1 allocates ~8GB of memory
    // Approach 2 allocates essentially nothing
    
    // Time both approaches
    use std::time::Instant;
    
    // let start = Instant::now();
    // let _ = collect_then_sample(); // Would use 8GB RAM
    // println!("Collect approach: {:?}", start.elapsed());
    
    let start = Instant::now();
    let _ = sample_directly();
    println!("Direct sample: {:?}", start.elapsed());
}

Direct sampling uses O(1) memory regardless of iterator size.

Choosing Multiple Elements

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    let data = vec!['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'];
    
    // Choose exactly 3 elements
    let chosen: Vec<_> = data.iter().choose_multiple(&mut thread_rng(), 3);
    println!("3 elements: {:?}", chosen);
    
    // Choosing more than available returns all elements
    let chosen: Vec<_> = data.iter().choose_multiple(&mut thread_rng(), 20);
    println!("20 from 8 elements: {:?}", chosen.len()); // Returns 8
    
    // Empty iterator returns empty vector
    let empty: Vec<i32> = std::iter::empty().choose_multiple(&mut thread_rng(), 5);
    println!("Empty: {:?}", empty);
}

choose_multiple returns at most n elements if fewer are available.

Weighted Random Selection

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    let items = vec!["common", "uncommon", "rare", "legendary"];
    let weights = [60, 25, 12, 3]; // Probability weights
    
    // Weighted sampling - more likely to pick lower indices
    let mut rng = thread_rng();
    for _ in 0..10 {
        let chosen = items.iter()
            .choose_weighted(&mut rng, |item| {
                let idx = items.iter().position(|x| x == item).unwrap();
                weights[idx]
            });
        println!("Weighted choice: {:?}", chosen);
    }
    
    // Choose multiple with weights
    let chosen_multiple: Vec<_> = items.iter()
        .choose_multiple_weighted(&mut rng, 2, |item| {
            let idx = items.iter().position(|x| x == item).unwrap();
            weights[idx]
        });
    println!("Multiple weighted: {:?}", chosen_multiple);
}

Weighted sampling allows non-uniform probability distributions.

Choose Multiple Reservoir Algorithm

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn manual_reservoir_multiple<T>(mut iter: impl Iterator<Item = T>, n: usize, rng: &mut ThreadRng) -> Vec<T> {
    // This is similar to what choose_multiple does internally
    let mut reservoir: Vec<T> = Vec::with_capacity(n);
    let mut count = 0;
    
    // Fill reservoir with first n elements
    for _ in 0..n {
        if let Some(item) = iter.next() {
            reservoir.push(item);
            count += 1;
        } else {
            break;
        }
    }
    
    // Replace elements with decreasing probability
    use rand::Rng;
    for item in iter {
        count += 1;
        let r = rng.gen_range(0..count);
        if r < n {
            reservoir[r] = item;
        }
    }
    
    reservoir
}
 
fn main() {
    let data: Vec<i32> = (0..100).collect();
    let mut rng = thread_rng();
    
    // Using IteratorRandom
    let chosen1: Vec<_> = data.iter().choose_multiple(&mut rng, 5);
    
    // Manual reservoir
    let chosen2 = manual_reservoir_multiple(data.iter(), 5, &mut rng);
    
    println!("IteratorRandom: {:?}", chosen1);
    println!("Manual reservoir: {:?}", chosen2);
}

Each element has equal probability of being in the result: min(n, len)/len.

Infinite Iterator Handling

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    // Infinite iterator with choose
    let infinite = std::iter::repeat_with(|| rand::random::<i32>());
    if let Some(val) = infinite.choose(&mut thread_rng()) {
        println!("Random from infinite: {}", val);
    }
    
    // BUT: choose_multiple with infinite iterator is dangerous
    // This would never terminate:
    // let chosen: Vec<_> = std::iter::repeat(1).choose_multiple(&mut rng, 5);
    // Because reservoir sampling requires knowing total count
    
    // Safe: finite take from infinite
    let chosen: Vec<_> = std::iter::repeat(1)
        .take(1_000_000)
        .choose_multiple(&mut thread_rng(), 5);
    println!("5 from bounded infinite: {:?}", chosen);
}

choose_multiple may not terminate on infinite iterators.

Stream Processing Example

use rand::seq::IteratorRandom;
use rand::thread_rng;
use std::io::{self, BufRead};
 
fn main() {
    // Process lines from stdin without loading all into memory
    let stdin = io::stdin();
    let mut rng = thread_rng();
    
    // Sample random line from potentially huge input
    if let Some(random_line) = stdin.lock().lines()
        .filter_map(|l| l.ok())
        .choose(&mut rng)
    {
        println!("Random line: {}", random_line);
    }
}

This works even for files larger than available memory.

Database Query Sampling

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
// Hypothetical database streaming scenario
struct DatabaseRow {
    id: i32,
    data: String,
}
 
fn stream_rows() -> impl Iterator<Item = DatabaseRow> {
    // In reality, this would stream from a database
    (0..1_000_000).map(|id| DatabaseRow {
        id,
        data: format!("data_{}", id),
    })
}
 
fn main() {
    let mut rng = thread_rng();
    
    // Sample without loading all rows into memory
    let sampled: Vec<_> = stream_rows()
        .choose_multiple(&mut rng, 100);
    
    println!("Sampled {} rows", sampled.len());
    
    // Each row has equal probability
    for row in &sampled {
        println!("Row {}: {}", row.id, row.data);
    }
}

Database cursors can be sampled without full result set materialization.

Performance Characteristics

use rand::seq::IteratorRandom;
use rand::thread_rng;
use std::time::Instant;
 
fn main() {
    let sizes = [1_000, 10_000, 100_000, 1_000_000];
    
    for size in sizes {
        let start = Instant::now();
        let _ = (0..size).choose(&mut thread_rng());
        let choose_time = start.elapsed();
        
        let start = Instant::now();
        let _: Vec<_> = (0..size).choose_multiple(&mut thread_rng(), 10);
        let choose_multiple_time = start.elapsed();
        
        println!("Size {}: choose={:?}, choose_multiple={:?}", 
                 size, choose_time, choose_multiple_time);
    }
}

Both methods are O(n) time complexity but O(1) or O(k) memory.

Common Pitfalls

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    // PITFALL 1: Iterator is consumed
    let data = vec![1, 2, 3, 4, 5];
    let mut iter = data.iter();
    let _ = iter.choose(&mut thread_rng());
    // iter is now exhausted
    // let _ = iter.choose(&mut thread_rng()); // Returns None
    
    // PITFALL 2: Clone if you need to reuse
    let data = vec![1, 2, 3, 4, 5];
    let mut rng = thread_rng();
    for _ in 0..3 {
        let chosen = data.iter().choose(&mut rng);
        println!("Choice: {:?}", chosen);
    }
    
    // PITFALL 3: choose_multiple may not terminate on infinite
    // This would hang:
    // let _: Vec<_> = std::iter::repeat(1).choose_multiple(&mut rng, 5);
    
    // PITFALL 4: Empty iterator returns None/empty
    let empty: Vec<i32> = vec![];
    assert!(empty.iter().choose(&mut rng).is_none());
    let chosen: Vec<_> = empty.iter().choose_multiple(&mut rng, 5);
    assert!(chosen.is_empty());
}

These edge cases are important to handle in production code.

Using with Custom Types

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
#[derive(Debug, Clone)]
struct User {
    id: u32,
    name: String,
    active: bool,
}
 
impl User {
    fn generate_users(n: u32) -> impl Iterator<Item = User> {
        (0..n).map(|id| User {
            id,
            name: format!("user_{}", id),
            active: id % 2 == 0,
        })
    }
}
 
fn main() {
    let mut rng = thread_rng();
    
    // Sample from user stream
    let sampled: Vec<_> = User::generate_users(1_000)
        .filter(|u| u.active) // Only active users
        .choose_multiple(&mut rng, 10);
    
    println!("Sampled {} active users", sampled.len());
    
    // Weighted by user ID (lower IDs more likely)
    let chosen = User::generate_users(100)
        .choose_weighted(&mut rng, |u| 100 - u.id);
    
    if let Some(user) = chosen {
        println!("Weighted choice: {:?}", user);
    }
}

Custom types work seamlessly with the trait.

Combining with Other Iterator Methods

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    let data = (0..100).collect::<Vec<_>>();
    
    // Chain multiple operations before sampling
    let chosen = data.iter()
        .filter(|x| *x % 3 == 0)    // Only multiples of 3
        .map(|x| x * 2)              // Double them
        .choose(&mut thread_rng());
    println!("Random filtered: {:?}", chosen);
    
    // Sample after grouping (hypothetical)
    let grouped_samples: Vec<_> = (0..20)
        .collect::<Vec<_>>()
        .chunks(5)
        .flat_map(|chunk| {
            chunk.iter().choose_multiple(&mut thread_rng(), 2)
        })
        .collect();
    println!("Samples from chunks: {:?}", grouped_samples);
}
 
// Helper trait for chunks (not in std)
trait Chunks<T> {
    fn chunks(&self, size: usize) -> Vec<Vec<&T>>;
}
 
impl<T> Chunks<T> for Vec<T> {
    fn chunks(&self, size: usize) -> Vec<Vec<&T>> {
        self.as_slice().chunks(size).map(|c| c.iter().collect()).collect()
    }
}

Sampling integrates naturally with iterator pipelines.

Synthesis

Core mechanism:

IteratorRandom extends Iterator with choose and choose_multiple
Uses reservoir sampling for O(1) or O(k) memory
Works in single pass without collecting

Key methods:

choose: Returns single random element as Option<T>
choose_multiple: Returns Vec<T> of k elements
choose_weighted: Weighted random selection
choose_multiple_weighted: Weighted multiple selection

Algorithm details:

Single element: Store first, replace with probability 1/n
Multiple elements: Fill reservoir, replace with probability k/n
All elements have equal probability of selection

When to use:

Large datasets that don't fit in memory
Streaming data sources
Database query results
Infinite or generated iterators
When you need random sample without allocation overhead

Key insight: The genius of IteratorRandom is bringing reservoir sampling to the iterator interface directly. Rather than requiring users to understand and implement the algorithm themselves, it exposes simple choose and choose_multiple methods that work on any iterator. This enables memory-efficient random sampling from sources that would be impractical to collect—files larger than RAM, database cursors, network streams, or even infinite generators. The single-pass nature means it works in O(n) time with O(1) or O(k) memory, making it suitable for production workloads where collecting billions of elements is not an option.

How does rand::seq::IteratorRandom enable random sampling from iterators without collecting?