How does rand::seq::IteratorRandom enable random sampling from iterators without collecting?

rand::seq::IteratorRandom provides the choose and choose_multiple methods that extend iterators with random sampling capabilities. The key insight is that these methods use reservoir sampling algorithms that can select random elements in a single pass through the iterator, without needing to collect all elements into a container first. The choose method returns a single random element using Algorithm R with reservoir size 1, while choose_multiple uses a more general reservoir sampling algorithm to select multiple elements uniformly at random. This is memory-efficient for large or infinite iterators since it only needs to store the sampled elements, not the entire iterator.

Basic Random Element Selection

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    let numbers = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
    
    // Choose a single random element
    let chosen = numbers.iter().choose(&mut thread_rng());
    println!("Random element: {:?}", chosen);
    
    // Choose multiple random elements
    let chosen_multiple: Vec<_> = numbers.iter().choose_multiple(&mut thread_rng(), 3);
    println!("Random 3 elements: {:?}", chosen_multiple);
}

The choose method returns Option<&T> since the iterator might be empty.

Sampling Without Collecting

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    // Infinite iterator - cannot collect
    let random_from_infinite = (0..).choose(&mut thread_rng());
    println!("Random from infinite: {:?}", random_from_infinite);
    
    // Large range - expensive to collect
    let random_from_large = (1..1_000_000).choose(&mut thread_rng());
    println!("Random from large range: {:?}", random_from_large);
    
    // Iterator that generates values on demand
    let generated = (0..).filter(|x| x % 7 == 0).take(1000);
    let random_multiple: Vec<_> = generated.choose_multiple(&mut thread_rng(), 5);
    println!("Random 5 multiples of 7: {:?}", random_multiple);
}

These methods work on any iterator, regardless of whether it can be collected.

Reservoir Sampling Algorithm

use rand::seq::IteratorRandom;
use rand::thread_rng;
use rand::rngs::ThreadRng;
 
fn manual_reservoir_sample<T>(mut iter: impl Iterator<Item = T>, rng: &mut ThreadRng) -> Option<T> {
    // This is what choose() does internally
    let mut result: Option<T> = None;
    let mut count = 0;
    
    for item in iter {
        count += 1;
        if count == 1 {
            result = Some(item);
        } else {
            // Replace with probability 1/count
            let r = rand::random::<usize>() % count;
            if r == 0 {
                result = Some(item);
            }
        }
    }
    
    result
}
 
fn main() {
    let data = vec![10, 20, 30, 40, 50];
    
    // Using IteratorRandom
    let chosen1 = data.iter().choose(&mut thread_rng());
    
    // Manual reservoir sampling (equivalent)
    let chosen2 = manual_reservoir_sample(data.iter(), &mut thread_rng());
    
    println!("IteratorRandom: {:?}", chosen1);
    println!("Manual reservoir: {:?}", chosen2);
}

Each element has equal probability of being selected: 1/n where n is total elements.

Memory Efficiency Comparison

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    // APPROACH 1: Collect then sample (memory intensive)
    fn collect_then_sample() -> Option<i32> {
        let collected: Vec<i32> = (1..1_000_000_000).collect();
        collected.into_iter().choose(&mut thread_rng())
    }
    
    // APPROACH 2: Sample directly (memory efficient)
    fn sample_directly() -> Option<i32> {
        (1..1_000_000_000).choose(&mut thread_rng())
    }
    
    // Approach 1 allocates ~8GB of memory
    // Approach 2 allocates essentially nothing
    
    // Time both approaches
    use std::time::Instant;
    
    // let start = Instant::now();
    // let _ = collect_then_sample(); // Would use 8GB RAM
    // println!("Collect approach: {:?}", start.elapsed());
    
    let start = Instant::now();
    let _ = sample_directly();
    println!("Direct sample: {:?}", start.elapsed());
}

Direct sampling uses O(1) memory regardless of iterator size.

Choosing Multiple Elements

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    let data = vec!['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'];
    
    // Choose exactly 3 elements
    let chosen: Vec<_> = data.iter().choose_multiple(&mut thread_rng(), 3);
    println!("3 elements: {:?}", chosen);
    
    // Choosing more than available returns all elements
    let chosen: Vec<_> = data.iter().choose_multiple(&mut thread_rng(), 20);
    println!("20 from 8 elements: {:?}", chosen.len()); // Returns 8
    
    // Empty iterator returns empty vector
    let empty: Vec<i32> = std::iter::empty().choose_multiple(&mut thread_rng(), 5);
    println!("Empty: {:?}", empty);
}

choose_multiple returns at most n elements if fewer are available.

Weighted Random Selection

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    let items = vec!["common", "uncommon", "rare", "legendary"];
    let weights = [60, 25, 12, 3]; // Probability weights
    
    // Weighted sampling - more likely to pick lower indices
    let mut rng = thread_rng();
    for _ in 0..10 {
        let chosen = items.iter()
            .choose_weighted(&mut rng, |item| {
                let idx = items.iter().position(|x| x == item).unwrap();
                weights[idx]
            });
        println!("Weighted choice: {:?}", chosen);
    }
    
    // Choose multiple with weights
    let chosen_multiple: Vec<_> = items.iter()
        .choose_multiple_weighted(&mut rng, 2, |item| {
            let idx = items.iter().position(|x| x == item).unwrap();
            weights[idx]
        });
    println!("Multiple weighted: {:?}", chosen_multiple);
}

Weighted sampling allows non-uniform probability distributions.

Choose Multiple Reservoir Algorithm

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn manual_reservoir_multiple<T>(mut iter: impl Iterator<Item = T>, n: usize, rng: &mut ThreadRng) -> Vec<T> {
    // This is similar to what choose_multiple does internally
    let mut reservoir: Vec<T> = Vec::with_capacity(n);
    let mut count = 0;
    
    // Fill reservoir with first n elements
    for _ in 0..n {
        if let Some(item) = iter.next() {
            reservoir.push(item);
            count += 1;
        } else {
            break;
        }
    }
    
    // Replace elements with decreasing probability
    use rand::Rng;
    for item in iter {
        count += 1;
        let r = rng.gen_range(0..count);
        if r < n {
            reservoir[r] = item;
        }
    }
    
    reservoir
}
 
fn main() {
    let data: Vec<i32> = (0..100).collect();
    let mut rng = thread_rng();
    
    // Using IteratorRandom
    let chosen1: Vec<_> = data.iter().choose_multiple(&mut rng, 5);
    
    // Manual reservoir
    let chosen2 = manual_reservoir_multiple(data.iter(), 5, &mut rng);
    
    println!("IteratorRandom: {:?}", chosen1);
    println!("Manual reservoir: {:?}", chosen2);
}

Each element has equal probability of being in the result: min(n, len)/len.

Infinite Iterator Handling

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    // Infinite iterator with choose
    let infinite = std::iter::repeat_with(|| rand::random::<i32>());
    if let Some(val) = infinite.choose(&mut thread_rng()) {
        println!("Random from infinite: {}", val);
    }
    
    // BUT: choose_multiple with infinite iterator is dangerous
    // This would never terminate:
    // let chosen: Vec<_> = std::iter::repeat(1).choose_multiple(&mut rng, 5);
    // Because reservoir sampling requires knowing total count
    
    // Safe: finite take from infinite
    let chosen: Vec<_> = std::iter::repeat(1)
        .take(1_000_000)
        .choose_multiple(&mut thread_rng(), 5);
    println!("5 from bounded infinite: {:?}", chosen);
}

choose_multiple may not terminate on infinite iterators.

Stream Processing Example

use rand::seq::IteratorRandom;
use rand::thread_rng;
use std::io::{self, BufRead};
 
fn main() {
    // Process lines from stdin without loading all into memory
    let stdin = io::stdin();
    let mut rng = thread_rng();
    
    // Sample random line from potentially huge input
    if let Some(random_line) = stdin.lock().lines()
        .filter_map(|l| l.ok())
        .choose(&mut rng)
    {
        println!("Random line: {}", random_line);
    }
}

This works even for files larger than available memory.

Database Query Sampling

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
// Hypothetical database streaming scenario
struct DatabaseRow {
    id: i32,
    data: String,
}
 
fn stream_rows() -> impl Iterator<Item = DatabaseRow> {
    // In reality, this would stream from a database
    (0..1_000_000).map(|id| DatabaseRow {
        id,
        data: format!("data_{}", id),
    })
}
 
fn main() {
    let mut rng = thread_rng();
    
    // Sample without loading all rows into memory
    let sampled: Vec<_> = stream_rows()
        .choose_multiple(&mut rng, 100);
    
    println!("Sampled {} rows", sampled.len());
    
    // Each row has equal probability
    for row in &sampled {
        println!("Row {}: {}", row.id, row.data);
    }
}

Database cursors can be sampled without full result set materialization.

Performance Characteristics

use rand::seq::IteratorRandom;
use rand::thread_rng;
use std::time::Instant;
 
fn main() {
    let sizes = [1_000, 10_000, 100_000, 1_000_000];
    
    for size in sizes {
        let start = Instant::now();
        let _ = (0..size).choose(&mut thread_rng());
        let choose_time = start.elapsed();
        
        let start = Instant::now();
        let _: Vec<_> = (0..size).choose_multiple(&mut thread_rng(), 10);
        let choose_multiple_time = start.elapsed();
        
        println!("Size {}: choose={:?}, choose_multiple={:?}", 
                 size, choose_time, choose_multiple_time);
    }
}

Both methods are O(n) time complexity but O(1) or O(k) memory.

Common Pitfalls

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    // PITFALL 1: Iterator is consumed
    let data = vec![1, 2, 3, 4, 5];
    let mut iter = data.iter();
    let _ = iter.choose(&mut thread_rng());
    // iter is now exhausted
    // let _ = iter.choose(&mut thread_rng()); // Returns None
    
    // PITFALL 2: Clone if you need to reuse
    let data = vec![1, 2, 3, 4, 5];
    let mut rng = thread_rng();
    for _ in 0..3 {
        let chosen = data.iter().choose(&mut rng);
        println!("Choice: {:?}", chosen);
    }
    
    // PITFALL 3: choose_multiple may not terminate on infinite
    // This would hang:
    // let _: Vec<_> = std::iter::repeat(1).choose_multiple(&mut rng, 5);
    
    // PITFALL 4: Empty iterator returns None/empty
    let empty: Vec<i32> = vec![];
    assert!(empty.iter().choose(&mut rng).is_none());
    let chosen: Vec<_> = empty.iter().choose_multiple(&mut rng, 5);
    assert!(chosen.is_empty());
}

These edge cases are important to handle in production code.

Using with Custom Types

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
#[derive(Debug, Clone)]
struct User {
    id: u32,
    name: String,
    active: bool,
}
 
impl User {
    fn generate_users(n: u32) -> impl Iterator<Item = User> {
        (0..n).map(|id| User {
            id,
            name: format!("user_{}", id),
            active: id % 2 == 0,
        })
    }
}
 
fn main() {
    let mut rng = thread_rng();
    
    // Sample from user stream
    let sampled: Vec<_> = User::generate_users(1_000)
        .filter(|u| u.active) // Only active users
        .choose_multiple(&mut rng, 10);
    
    println!("Sampled {} active users", sampled.len());
    
    // Weighted by user ID (lower IDs more likely)
    let chosen = User::generate_users(100)
        .choose_weighted(&mut rng, |u| 100 - u.id);
    
    if let Some(user) = chosen {
        println!("Weighted choice: {:?}", user);
    }
}

Custom types work seamlessly with the trait.

Combining with Other Iterator Methods

use rand::seq::IteratorRandom;
use rand::thread_rng;
 
fn main() {
    let data = (0..100).collect::<Vec<_>>();
    
    // Chain multiple operations before sampling
    let chosen = data.iter()
        .filter(|x| *x % 3 == 0)    // Only multiples of 3
        .map(|x| x * 2)              // Double them
        .choose(&mut thread_rng());
    println!("Random filtered: {:?}", chosen);
    
    // Sample after grouping (hypothetical)
    let grouped_samples: Vec<_> = (0..20)
        .collect::<Vec<_>>()
        .chunks(5)
        .flat_map(|chunk| {
            chunk.iter().choose_multiple(&mut thread_rng(), 2)
        })
        .collect();
    println!("Samples from chunks: {:?}", grouped_samples);
}
 
// Helper trait for chunks (not in std)
trait Chunks<T> {
    fn chunks(&self, size: usize) -> Vec<Vec<&T>>;
}
 
impl<T> Chunks<T> for Vec<T> {
    fn chunks(&self, size: usize) -> Vec<Vec<&T>> {
        self.as_slice().chunks(size).map(|c| c.iter().collect()).collect()
    }
}

Sampling integrates naturally with iterator pipelines.

Synthesis

Core mechanism:

  • IteratorRandom extends Iterator with choose and choose_multiple
  • Uses reservoir sampling for O(1) or O(k) memory
  • Works in single pass without collecting

Key methods:

  • choose: Returns single random element as Option<T>
  • choose_multiple: Returns Vec<T> of k elements
  • choose_weighted: Weighted random selection
  • choose_multiple_weighted: Weighted multiple selection

Algorithm details:

  • Single element: Store first, replace with probability 1/n
  • Multiple elements: Fill reservoir, replace with probability k/n
  • All elements have equal probability of selection

When to use:

  • Large datasets that don't fit in memory
  • Streaming data sources
  • Database query results
  • Infinite or generated iterators
  • When you need random sample without allocation overhead

Key insight: The genius of IteratorRandom is bringing reservoir sampling to the iterator interface directly. Rather than requiring users to understand and implement the algorithm themselves, it exposes simple choose and choose_multiple methods that work on any iterator. This enables memory-efficient random sampling from sources that would be impractical to collect—files larger than RAM, database cursors, network streams, or even infinite generators. The single-pass nature means it works in O(n) time with O(1) or O(k) memory, making it suitable for production workloads where collecting billions of elements is not an option.