Loading pageā¦
Rust walkthroughs
Loading pageā¦
rand::seq::IteratorRandom enable random sampling from iterators without collecting?rand::seq::IteratorRandom provides the choose and choose_multiple methods that extend iterators with random sampling capabilities. The key insight is that these methods use reservoir sampling algorithms that can select random elements in a single pass through the iterator, without needing to collect all elements into a container first. The choose method returns a single random element using Algorithm R with reservoir size 1, while choose_multiple uses a more general reservoir sampling algorithm to select multiple elements uniformly at random. This is memory-efficient for large or infinite iterators since it only needs to store the sampled elements, not the entire iterator.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn main() {
let numbers = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
// Choose a single random element
let chosen = numbers.iter().choose(&mut thread_rng());
println!("Random element: {:?}", chosen);
// Choose multiple random elements
let chosen_multiple: Vec<_> = numbers.iter().choose_multiple(&mut thread_rng(), 3);
println!("Random 3 elements: {:?}", chosen_multiple);
}The choose method returns Option<&T> since the iterator might be empty.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn main() {
// Infinite iterator - cannot collect
let random_from_infinite = (0..).choose(&mut thread_rng());
println!("Random from infinite: {:?}", random_from_infinite);
// Large range - expensive to collect
let random_from_large = (1..1_000_000).choose(&mut thread_rng());
println!("Random from large range: {:?}", random_from_large);
// Iterator that generates values on demand
let generated = (0..).filter(|x| x % 7 == 0).take(1000);
let random_multiple: Vec<_> = generated.choose_multiple(&mut thread_rng(), 5);
println!("Random 5 multiples of 7: {:?}", random_multiple);
}These methods work on any iterator, regardless of whether it can be collected.
use rand::seq::IteratorRandom;
use rand::thread_rng;
use rand::rngs::ThreadRng;
fn manual_reservoir_sample<T>(mut iter: impl Iterator<Item = T>, rng: &mut ThreadRng) -> Option<T> {
// This is what choose() does internally
let mut result: Option<T> = None;
let mut count = 0;
for item in iter {
count += 1;
if count == 1 {
result = Some(item);
} else {
// Replace with probability 1/count
let r = rand::random::<usize>() % count;
if r == 0 {
result = Some(item);
}
}
}
result
}
fn main() {
let data = vec![10, 20, 30, 40, 50];
// Using IteratorRandom
let chosen1 = data.iter().choose(&mut thread_rng());
// Manual reservoir sampling (equivalent)
let chosen2 = manual_reservoir_sample(data.iter(), &mut thread_rng());
println!("IteratorRandom: {:?}", chosen1);
println!("Manual reservoir: {:?}", chosen2);
}Each element has equal probability of being selected: 1/n where n is total elements.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn main() {
// APPROACH 1: Collect then sample (memory intensive)
fn collect_then_sample() -> Option<i32> {
let collected: Vec<i32> = (1..1_000_000_000).collect();
collected.into_iter().choose(&mut thread_rng())
}
// APPROACH 2: Sample directly (memory efficient)
fn sample_directly() -> Option<i32> {
(1..1_000_000_000).choose(&mut thread_rng())
}
// Approach 1 allocates ~8GB of memory
// Approach 2 allocates essentially nothing
// Time both approaches
use std::time::Instant;
// let start = Instant::now();
// let _ = collect_then_sample(); // Would use 8GB RAM
// println!("Collect approach: {:?}", start.elapsed());
let start = Instant::now();
let _ = sample_directly();
println!("Direct sample: {:?}", start.elapsed());
}Direct sampling uses O(1) memory regardless of iterator size.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn main() {
let data = vec!['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'];
// Choose exactly 3 elements
let chosen: Vec<_> = data.iter().choose_multiple(&mut thread_rng(), 3);
println!("3 elements: {:?}", chosen);
// Choosing more than available returns all elements
let chosen: Vec<_> = data.iter().choose_multiple(&mut thread_rng(), 20);
println!("20 from 8 elements: {:?}", chosen.len()); // Returns 8
// Empty iterator returns empty vector
let empty: Vec<i32> = std::iter::empty().choose_multiple(&mut thread_rng(), 5);
println!("Empty: {:?}", empty);
}choose_multiple returns at most n elements if fewer are available.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn main() {
let items = vec!["common", "uncommon", "rare", "legendary"];
let weights = [60, 25, 12, 3]; // Probability weights
// Weighted sampling - more likely to pick lower indices
let mut rng = thread_rng();
for _ in 0..10 {
let chosen = items.iter()
.choose_weighted(&mut rng, |item| {
let idx = items.iter().position(|x| x == item).unwrap();
weights[idx]
});
println!("Weighted choice: {:?}", chosen);
}
// Choose multiple with weights
let chosen_multiple: Vec<_> = items.iter()
.choose_multiple_weighted(&mut rng, 2, |item| {
let idx = items.iter().position(|x| x == item).unwrap();
weights[idx]
});
println!("Multiple weighted: {:?}", chosen_multiple);
}Weighted sampling allows non-uniform probability distributions.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn manual_reservoir_multiple<T>(mut iter: impl Iterator<Item = T>, n: usize, rng: &mut ThreadRng) -> Vec<T> {
// This is similar to what choose_multiple does internally
let mut reservoir: Vec<T> = Vec::with_capacity(n);
let mut count = 0;
// Fill reservoir with first n elements
for _ in 0..n {
if let Some(item) = iter.next() {
reservoir.push(item);
count += 1;
} else {
break;
}
}
// Replace elements with decreasing probability
use rand::Rng;
for item in iter {
count += 1;
let r = rng.gen_range(0..count);
if r < n {
reservoir[r] = item;
}
}
reservoir
}
fn main() {
let data: Vec<i32> = (0..100).collect();
let mut rng = thread_rng();
// Using IteratorRandom
let chosen1: Vec<_> = data.iter().choose_multiple(&mut rng, 5);
// Manual reservoir
let chosen2 = manual_reservoir_multiple(data.iter(), 5, &mut rng);
println!("IteratorRandom: {:?}", chosen1);
println!("Manual reservoir: {:?}", chosen2);
}Each element has equal probability of being in the result: min(n, len)/len.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn main() {
// Infinite iterator with choose
let infinite = std::iter::repeat_with(|| rand::random::<i32>());
if let Some(val) = infinite.choose(&mut thread_rng()) {
println!("Random from infinite: {}", val);
}
// BUT: choose_multiple with infinite iterator is dangerous
// This would never terminate:
// let chosen: Vec<_> = std::iter::repeat(1).choose_multiple(&mut rng, 5);
// Because reservoir sampling requires knowing total count
// Safe: finite take from infinite
let chosen: Vec<_> = std::iter::repeat(1)
.take(1_000_000)
.choose_multiple(&mut thread_rng(), 5);
println!("5 from bounded infinite: {:?}", chosen);
}choose_multiple may not terminate on infinite iterators.
use rand::seq::IteratorRandom;
use rand::thread_rng;
use std::io::{self, BufRead};
fn main() {
// Process lines from stdin without loading all into memory
let stdin = io::stdin();
let mut rng = thread_rng();
// Sample random line from potentially huge input
if let Some(random_line) = stdin.lock().lines()
.filter_map(|l| l.ok())
.choose(&mut rng)
{
println!("Random line: {}", random_line);
}
}This works even for files larger than available memory.
use rand::seq::IteratorRandom;
use rand::thread_rng;
// Hypothetical database streaming scenario
struct DatabaseRow {
id: i32,
data: String,
}
fn stream_rows() -> impl Iterator<Item = DatabaseRow> {
// In reality, this would stream from a database
(0..1_000_000).map(|id| DatabaseRow {
id,
data: format!("data_{}", id),
})
}
fn main() {
let mut rng = thread_rng();
// Sample without loading all rows into memory
let sampled: Vec<_> = stream_rows()
.choose_multiple(&mut rng, 100);
println!("Sampled {} rows", sampled.len());
// Each row has equal probability
for row in &sampled {
println!("Row {}: {}", row.id, row.data);
}
}Database cursors can be sampled without full result set materialization.
use rand::seq::IteratorRandom;
use rand::thread_rng;
use std::time::Instant;
fn main() {
let sizes = [1_000, 10_000, 100_000, 1_000_000];
for size in sizes {
let start = Instant::now();
let _ = (0..size).choose(&mut thread_rng());
let choose_time = start.elapsed();
let start = Instant::now();
let _: Vec<_> = (0..size).choose_multiple(&mut thread_rng(), 10);
let choose_multiple_time = start.elapsed();
println!("Size {}: choose={:?}, choose_multiple={:?}",
size, choose_time, choose_multiple_time);
}
}Both methods are O(n) time complexity but O(1) or O(k) memory.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn main() {
// PITFALL 1: Iterator is consumed
let data = vec![1, 2, 3, 4, 5];
let mut iter = data.iter();
let _ = iter.choose(&mut thread_rng());
// iter is now exhausted
// let _ = iter.choose(&mut thread_rng()); // Returns None
// PITFALL 2: Clone if you need to reuse
let data = vec![1, 2, 3, 4, 5];
let mut rng = thread_rng();
for _ in 0..3 {
let chosen = data.iter().choose(&mut rng);
println!("Choice: {:?}", chosen);
}
// PITFALL 3: choose_multiple may not terminate on infinite
// This would hang:
// let _: Vec<_> = std::iter::repeat(1).choose_multiple(&mut rng, 5);
// PITFALL 4: Empty iterator returns None/empty
let empty: Vec<i32> = vec![];
assert!(empty.iter().choose(&mut rng).is_none());
let chosen: Vec<_> = empty.iter().choose_multiple(&mut rng, 5);
assert!(chosen.is_empty());
}These edge cases are important to handle in production code.
use rand::seq::IteratorRandom;
use rand::thread_rng;
#[derive(Debug, Clone)]
struct User {
id: u32,
name: String,
active: bool,
}
impl User {
fn generate_users(n: u32) -> impl Iterator<Item = User> {
(0..n).map(|id| User {
id,
name: format!("user_{}", id),
active: id % 2 == 0,
})
}
}
fn main() {
let mut rng = thread_rng();
// Sample from user stream
let sampled: Vec<_> = User::generate_users(1_000)
.filter(|u| u.active) // Only active users
.choose_multiple(&mut rng, 10);
println!("Sampled {} active users", sampled.len());
// Weighted by user ID (lower IDs more likely)
let chosen = User::generate_users(100)
.choose_weighted(&mut rng, |u| 100 - u.id);
if let Some(user) = chosen {
println!("Weighted choice: {:?}", user);
}
}Custom types work seamlessly with the trait.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn main() {
let data = (0..100).collect::<Vec<_>>();
// Chain multiple operations before sampling
let chosen = data.iter()
.filter(|x| *x % 3 == 0) // Only multiples of 3
.map(|x| x * 2) // Double them
.choose(&mut thread_rng());
println!("Random filtered: {:?}", chosen);
// Sample after grouping (hypothetical)
let grouped_samples: Vec<_> = (0..20)
.collect::<Vec<_>>()
.chunks(5)
.flat_map(|chunk| {
chunk.iter().choose_multiple(&mut thread_rng(), 2)
})
.collect();
println!("Samples from chunks: {:?}", grouped_samples);
}
// Helper trait for chunks (not in std)
trait Chunks<T> {
fn chunks(&self, size: usize) -> Vec<Vec<&T>>;
}
impl<T> Chunks<T> for Vec<T> {
fn chunks(&self, size: usize) -> Vec<Vec<&T>> {
self.as_slice().chunks(size).map(|c| c.iter().collect()).collect()
}
}Sampling integrates naturally with iterator pipelines.
Core mechanism:
IteratorRandom extends Iterator with choose and choose_multipleKey methods:
choose: Returns single random element as Option<T>choose_multiple: Returns Vec<T> of k elementschoose_weighted: Weighted random selectionchoose_multiple_weighted: Weighted multiple selectionAlgorithm details:
When to use:
Key insight: The genius of IteratorRandom is bringing reservoir sampling to the iterator interface directly. Rather than requiring users to understand and implement the algorithm themselves, it exposes simple choose and choose_multiple methods that work on any iterator. This enables memory-efficient random sampling from sources that would be impractical to collectāfiles larger than RAM, database cursors, network streams, or even infinite generators. The single-pass nature means it works in O(n) time with O(1) or O(k) memory, making it suitable for production workloads where collecting billions of elements is not an option.