Loading page…
Rust walkthroughs
Loading page…
rand::seq::IteratorRandom::choose uniformly sample from iterators of unknown length?IteratorRandom::choose uses reservoir sampling with a single-element reservoir—the algorithm maintains a probability of selecting each element inversely proportional to the number of elements seen so far, ensuring that after processing n elements, each has exactly 1/n probability of being selected. The key insight is that on encountering the i-th element (1-indexed), the algorithm selects it with probability 1/i, replacing the currently held element; this ensures uniform selection because the probability of any earlier element j surviving is the product of not being replaced at each subsequent step, which mathematically equals 1/n for all elements.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn basic_choose() {
let items = vec
![1, 2, 3, 4, 5];
// Choose a random element from the iterator
let chosen = items.iter().choose(&mut thread_rng());
// Result is Option<&item>
// Some(item) if iterator was non-empty
// None if iterator was empty
assert!(chosen.is_some());
}choose returns an Option because the iterator might be empty.
// Problem: Choose uniformly from an iterator
// - You don't know the length in advance
// - You can only iterate once (stream)
// - You can't store all elements (possibly infinite)
// Naive approach (requires knowing length):
fn naive_choose<T>(items: &[T]) -> Option<&T> {
if items.is_empty() {
return None;
}
let index = rand::thread_rng().gen_range(0..items.len());
Some(&items[index])
}
// This works for slices, but not for iterators of unknown length
// The challenge: How to choose uniformly without knowing length?Iterators don't always support len() or size_hint() may be inaccurate.
use rand::Rng;
fn reservoir_sampling_explanation() {
// Algorithm for choosing from unknown-length stream:
// 1. Store first element (i=1)
// 2. For each subsequent element i (i=2, 3, ...):
// - Select it with probability 1/i
// - If selected, replace current choice
// Example walkthrough:
// Elements: [A, B, C, D]
// i=1: Store A. Choice = A.
// i=2: Select B with probability 1/2.
// If selected (50%): Choice = B
// If not (50%): Choice = A
// i=3: Select C with probability 1/3.
// If selected (33%): Choice = C
// If not (67%): Choice stays (A or B)
// i=4: Select D with probability 1/4.
// If selected (25%): Choice = D
// If not (75%): Choice stays
// After 4 elements, each has probability 1/4:
// P(A) = 1 × (1-1/2) × (1-1/3) × (1-1/4) = 1/4
// P(B) = 1/2 × (1-1/3) × (1-1/4) = 1/4
// P(C) = 1/3 × (1-1/4) = 1/4
// P(D) = 1/4 = 1/4
}Each element has equal probability because selection chance decreases as more elements are seen.
use rand::Rng;
fn choose_implementation<I, R>(mut iter: I, mut rng: R) -> Option<I::Item>
where
I: Iterator,
R: Rng,
{
// Step 1: Take the first element as initial choice
let mut chosen = iter.next()?;
// Step 2: Process remaining elements
for (i, element) in iter.enumerate() {
// i is 0-indexed, we want 1-indexed count
// We've seen i+2 elements now (initial + i more + current)
let count = i + 2; // Number of elements seen including this one
// Select this element with probability 1/count
if rng.gen_range(0..count) == 0 {
chosen = element;
}
}
Some(chosen)
}
// This is essentially what IteratorRandom::choose doesThe algorithm stores one element and potentially replaces it on each iteration.
fn proof_of_uniformity() {
// Claim: After processing n elements, each has probability 1/n
// For the first element:
// P(surviving) = P(not replaced at step 2) × P(not replaced at step 3) × ... × P(not replaced at step n)
// = (1 - 1/2) × (1 - 1/3) × ... × (1 - 1/n)
// = 1/2 × 2/3 × 3/4 × ... × (n-1)/n
// = 1/n (telescoping product)
// For element at position k:
// P(being chosen at step k) = 1/k
// P(surviving after) = (1 - 1/(k+1)) × ... × (1 - 1/n) = k/n
// P(final) = 1/k × k/n = 1/n
// For the last element:
// P(being chosen) = 1/n (directly)
// Result: All n elements have probability 1/n
}The telescoping product ensures uniform probability for all elements.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn choose_multiple_example() {
let items = vec
![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
// Choose 3 elements uniformly
let chosen: Vec<_> = items.iter().choose_multiple(&mut thread_rng(), 3);
// Returns exactly 3 elements (or fewer if iterator is shorter)
assert_eq!(chosen.len(), 3);
// Each element has equal probability of being included
// Uses reservoir sampling with 3-element reservoir
}choose_multiple extends the algorithm to select multiple elements.
use rand::Rng;
fn reservoir_multiple<I, R>(mut iter: I, mut rng: R, k: usize) -> Vec<I::Item>
where
I: Iterator,
R: Rng,
{
let mut reservoir: Vec<I::Item> = Vec::with_capacity(k);
// Fill reservoir with first k elements
for _ in 0..k {
if let Some(item) = iter.next() {
reservoir.push(item);
} else {
return reservoir; // Fewer than k elements
}
}
// Process remaining elements
for (i, element) in iter.enumerate() {
let count = k + i + 1; // Elements seen so far
// Random index in [0, count)
let j = rng.gen_range(0..count);
// If j < k, replace reservoir[j] with this element
if j < k {
reservoir[j] = element;
}
}
reservoir
}
// Each of the n elements has probability k/n of being in final reservoir
// Each subset of size k is equally likelyFor k samples, the algorithm maintains a k-element reservoir.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn empty_iterator() {
let empty: Vec<i32> = vec
![];
let result = empty.iter().choose(&mut thread_rng());
// Returns None for empty iterator
assert!(result.is_none());
// choose_multiple also handles empty:
let multiple: Vec<_> = empty.iter().choose_multiple(&mut thread_rng(), 3);
assert!(multiple.is_empty());
}choose returns None when the iterator produces no elements.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn size_hint_optimization() {
// IteratorRandom::choose may use size_hint for optimization
// If size_hint is exact and small, can use direct random index
// For iterators with known size:
let vec = vec
![1, 2, 3, 4, 5];
// size_hint returns (5, Some(5))
// Can choose: random index in [0, 5), return that element
// No need for reservoir sampling
// For iterators with unknown size:
let iter = (0..).filter(|x| x % 2 == 0).take(100);
// size_hint may be inaccurate
// Must use reservoir sampling
}When size_hint provides exact size, a simpler algorithm may be used.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn infinite_iterators() {
// Works with infinite iterators!
// Process elements one at a time, maintain single choice
// This would process forever without .take():
// let chosen = (0..).choose(&mut thread_rng()); // Never returns
// With .take(), processes limited elements:
let chosen = (0..1_000_000).choose(&mut thread_rng());
assert!(chosen.is_some());
// Works because algorithm is online:
// - Processes one element at a time
// - Stores only the current choice
// - Memory: O(1)
}The algorithm is online—processing elements one at a time with constant memory.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn memory_efficiency() {
// choose stores only:
// 1. Current chosen element (or None)
// 2. A counter
// Memory: O(1)
// Even for iterators yielding large items:
let large_items: Vec<Vec<u8>> = (0..1000)
.map(|i| vec
![0u8; 1_000_000]) // 1MB per item
.collect();
// choose only stores ONE item at a time
let chosen = large_items.iter().choose(&mut thread_rng());
// Not storing all items, just current choice
// Works with streaming data that can't fit in memory
}O(1) memory means the algorithm works with arbitrarily large streams.
use rand::seq::IteratorRandom;
use rand::{thread_rng, Rng};
fn comparison() {
let data = vec
![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
// Approach 1: collect then choose (requires storing all)
let all: Vec<_> = data.iter().collect();
let index = thread_rng().gen_range(0..all.len());
let chosen1 = &all[index];
// Memory: O(n) - must store all elements
// Approach 2: choose directly (constant memory)
let chosen2 = data.iter().choose(&mut thread_rng());
// Memory: O(1) - stores only one element
// Both give uniform distribution
// Approach 2 works on streams of unknown length
// When to use approach 1:
// - You already have all data in memory
// - You need random access multiple times
// When to use approach 2:
// - Data is streaming
// - Don't want to store all elements
// - Single pass over data
}choose is essential when you can't or don't want to store all elements.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn size_classes() {
// Size known exactly (slice, Vec, etc.):
let vec = vec
![1, 2, 3, 4, 5];
// May use fast path: random index
// Size bounded (range, etc.):
let range_iter = 0..100;
// size_hint gives exact size
// Fast path possible
// Size unknown (filter, etc.):
let filtered = vec.iter().filter(|x| *x % 2 == 0);
// size_hint may be (0, None)
// Must use reservoir sampling
// Size potentially infinite:
let infinite = std::iter::repeat(1).take(100);
// Works fine with reservoir sampling
// (if infinite and no take, would run forever)
}The algorithm adapts to different iterator characteristics.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn weighted_choice() {
// choose gives uniform distribution
// For weighted choice, use choose_weighted:
let items = vec
![
("common", 10),
("rare", 1),
("legendary", 0.1),
];
// choose_weighted uses weights:
let result = items.iter()
.choose_weighted(&mut thread_rng(), |item| item.1);
// Or choose_weighted_multiple for multiple selections:
let results: Vec<_> = items.iter()
.choose_weighted_multiple(&mut thread_rng(), 2, |item| item.1);
}For non-uniform selection, use choose_weighted instead of choose.
use rand::seq::IteratorRandom;
fn performance() {
// Time complexity: O(n)
// Must examine every element exactly once
// Cannot skip elements (unknown length)
// Space complexity: O(1)
// Single element storage
// Constant regardless of iterator size
// Random bits needed: O(n)
// One random number per element (roughly)
// For exact-size iterators:
// May use O(1) random bits (single index)
// Depends on implementation details
}Time is linear in iterator length, but memory is constant.
use rand::seq::IteratorRandom;
use rand::{thread_rng, SeedableRng, rngs::StdRng};
fn thread_safety() {
// thread_rng() is thread-local
// Each thread has its own RNG
// For reproducible results across runs:
let mut rng = StdRng::seed_from_u64(42);
let result1 = vec
![1, 2, 3].iter().choose(&mut rng);
let mut rng2 = StdRng::seed_from_u64(42);
let result2 = vec
![1, 2, 3].iter().choose(&mut rng2);
// result1 and result2 are identical
// For parallel usage:
// Each thread needs its own Rng
// thread_rng() provides one per thread
}Each thread should have its own RNG for parallel random selection.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn log_sampling() {
// Sample one log line uniformly from a potentially huge log
fn sample_log_line(filename: &str) -> Option<String> {
use std::fs::File;
use std::io::{BufRead, BufReader};
let file = File::open(filename).ok()?;
let reader = BufReader::new(file);
// Process lines one at a time
// No need to load entire file into memory
reader.lines()
.filter_map(|line| line.ok())
.choose(&mut thread_rng())
}
// Works for files larger than memory
// Only one line is retained at any time
}Sampling from large files without loading them entirely.
use rand::seq::IteratorRandom;
use rand::thread_rng;
fn analytics_reservoir() {
// Maintain a sample of recent events
struct ReservoirSample<T> {
sample: Vec<T>,
count: usize, // Total events seen
}
impl<T> ReservoirSample<T> {
fn with_capacity(k: usize) -> Self {
ReservoirSample {
sample: Vec::with_capacity(k),
count: 0,
}
}
fn add(&mut self, item: T, rng: &mut impl rand::Rng) {
self.count += 1;
if self.sample.len() < self.sample.capacity() {
self.sample.push(item);
} else {
// Replace random element with probability k/count
let j = rng.gen_range(0..self.count);
if j < self.sample.len() {
self.sample[j] = item;
}
}
}
}
// This is essentially choose_multiple's algorithm
}Streaming analytics use reservoir sampling for unbiased samples.
The core algorithm:
// For single element:
// 1. Store first element
// 2. For each subsequent element i (1-indexed):
// - Replace current choice with probability 1/i
// Result: Each of n elements has probability 1/n
// For k elements:
// 1. Store first k elements
// 2. For each subsequent element i (k+1-indexed):
// - Generate random index j in [0, i)
// - If j < k, replace reservoir[j] with this element
// Result: Each element has probability k/n of being selectedWhy it works:
// The probability of element at position k being in final result:
// P(chosen at k) × P(not replaced after)
// = 1/k × (k/(k+1) × (k+1)/(k+2) × ... × (n-1)/n)
// = 1/k × k/n
// = 1/n
// The telescoping product (k/(k+1) × (k+1)/(k+2) × ...) cancels
// All intermediate terms cancel, leaving k/nKey properties:
// 1. Uniform: Each element has equal probability
// 2. Online: Process elements one at a time
// 3. Memory O(1): Store only current choice (or k for multiple)
// 4. Time O(n): Examine each element once
// 5. Works for unknown length: No need for size in advanceWhen to use:
// Use choose when:
// - Iterator length unknown
// - Single-pass processing required
// - Memory constrained
// - Streaming data
// Use collect + random index when:
// - Data already in memory
// - Need multiple random selections
// - Size known exactlyKey insight: IteratorRandom::choose implements reservoir sampling to uniformly select from iterators of unknown length by maintaining a probability of 1/i for selecting the i-th element, ensuring that after n elements, each has exactly 1/n probability of being the final choice. The algorithm is online (single pass), memory-efficient (O(1) storage), and works for any iterator—even infinite ones (with a take). For multiple elements, choose_multiple extends the same logic using a k-element reservoir where each new element has probability k/i of entering the reservoir by replacing a random existing element.