Loading page…
Rust walkthroughs
Loading page…
rayon::slice::ParallelSlice::par_chunks handle uneven chunk sizes for parallel processing?par_chunks divides a slice into chunks of the specified size and processes them in parallel, with the final chunk potentially being smaller when the slice length isn't evenly divisible by the chunk size. This "remainder chunk" contains all remaining elements and is processed just like the full-size chunks—Rayon doesn't discard elements or pad the chunk. The chunk size determines the maximum elements per chunk, not a fixed size, so a slice of 10 elements with chunk size 3 produces four chunks: three chunks of 3 elements each and one remainder chunk of 1 element. This design ensures all elements are processed regardless of divisibility, but requires your parallel closure to handle chunks of varying sizes.
use rayon::prelude::*;
fn main() {
let data = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
// Split into chunks of 3
// 10 elements / 3 per chunk = 3 full chunks + 1 remainder
// Chunks: [1,2,3], [4,5,6], [7,8,9], [10]
data.par_chunks(3).for_each(|chunk| {
println!("Chunk: {:?}", chunk);
});
// Output (order may vary due to parallelism):
// Chunk: [1, 2, 3]
// Chunk: [4, 5, 6]
// Chunk: [7, 8, 9]
// Chunk: [10]
}The final chunk contains remaining elements when division isn't even.
use rayon::prelude::*;
fn main() {
// Different slice lengths with chunk size 4
// Length 8, chunk size 4: perfect division
// Chunks: [0,1,2,3], [4,5,6,7] - both size 4
let data: Vec<i32> = (0..8).collect();
println!("8 elements, chunk size 4:");
data.par_chunks(4).for_each(|chunk| {
println!(" Chunk size {}: {:?}", chunk.len(), chunk);
});
// Length 10, chunk size 4: remainder
// Chunks: [0,1,2,3], [4,5,6,7], [8,9] - last is smaller
let data: Vec<i32> = (0..10).collect();
println!("\n10 elements, chunk size 4:");
data.par_chunks(4).for_each(|chunk| {
println!(" Chunk size {}: {:?}", chunk.len(), chunk);
});
// Length 3, chunk size 4: single small chunk
// Chunks: [0,1,2] - entire slice is one chunk
let data: Vec<i32> = (0..3).collect();
println!("\n3 elements, chunk size 4:");
data.par_chunks(4).for_each(|chunk| {
println!(" Chunk size {}: {:?}", chunk.len(), chunk);
});
// Length 0, chunk size 4: no chunks
let data: Vec<i32> = vec![];
println!("\n0 elements, chunk size 4:");
data.par_chunks(4).for_each(|chunk| {
println!(" Chunk: {:?}", chunk);
});
println!(" (no output - empty slice produces no chunks)");
}Chunk sizes vary: full-size chunks followed by a potentially smaller remainder.
use rayon::prelude::*;
fn main() {
let data: Vec<i32> = (0..10).collect();
// CORRECT: Handle variable chunk sizes
let sum: i32 = data.par_chunks(3).map(|chunk| {
// Each chunk can be different size
// Last chunk may have 1-2 elements instead of 3
chunk.iter().sum()
}).sum();
println!("Sum: {}", sum);
// WRONG: Assuming fixed chunk size
// Don't do this - it will panic or produce wrong results
// let wrong: i32 = data.par_chunks(3).map(|chunk| {
// chunk[0] + chunk[1] + chunk[2] // PANIC on last chunk!
// }).sum();
// CORRECT: Index with bounds checking or iteration
let sum: i32 = data.par_chunks(3).map(|chunk| {
// Safe: use iteration
chunk.iter().sum()
}).sum();
// Or if you need indices, use get()
let first_elements: Vec<i32> = data.par_chunks(3)
.filter_map(|chunk| chunk.get(0).copied())
.collect();
println!("First elements: {:?}", first_elements);
}Your parallel closure must handle chunks of varying sizes safely.
use rayon::prelude::*;
fn main() {
let data: Vec<i32> = (0..10).collect();
// Sequential chunks - guaranteed order
println!("Sequential chunks (ordered):");
for chunk in data.chunks(3) {
println!(" {:?}", chunk);
}
// Parallel chunks - order depends on parallel execution
println!("\nParallel chunks (unordered):");
data.par_chunks(3).for_each(|chunk| {
// Order is NOT guaranteed
println!(" {:?}", chunk);
});
// To get ordered results, use collect or map+reduce
let results: Vec<i32> = data.par_chunks(3)
.map(|chunk| chunk.iter().sum())
.collect(); // Preserves order
println!("\nOrdered results: {:?}", results);
}par_chunks processes in parallel; order of execution is not guaranteed.
use rayon::prelude::*;
fn main() {
let mut data: Vec<i32> = (0..10).collect();
// par_chunks_mut allows modifying elements in place
data.par_chunks_mut(3).for_each(|chunk| {
// Each chunk can be modified
for elem in chunk.iter_mut() {
*elem *= 2;
}
});
println!("Doubled: {:?}", data);
// The remainder chunk is also modified
// Input: [0,1,2,3,4,5,6,7,8,9]
// Chunks: [0,1,2], [3,4,5], [6,7,8], [9]
// After: [0,2,4,6,8,10,12,14,16,18]
// Example: Normalize each chunk by its max
let mut data: Vec<f64> = vec![1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0];
data.par_chunks_mut(3).for_each(|chunk| {
if let Some(&max_val) = chunk.iter().cloned().max_by(|a, b| a.partial_cmp(b).unwrap()) {
for elem in chunk.iter_mut() {
*elem /= max_val;
}
}
});
println!("Normalized per chunk: {:?}", data);
}par_chunks_mut divides into mutable chunks for in-place modification.
use rayon::prelude::*;
fn main() {
let data: Vec<i32> = (0..1000).collect();
// SMALL CHUNK SIZE: More chunks, more overhead
// Each chunk is a separate parallel task
// Too small = excessive scheduling overhead
let small: Vec<i32> = data.par_chunks(1)
.map(|chunk| chunk.iter().sum())
.collect();
// LARGE CHUNK SIZE: Fewer chunks, less parallelism
// If chunk size >= data length, only one chunk (no parallelism)
let large: Vec<i32> = data.par_chunks(1000)
.map(|chunk| chunk.iter().sum())
.collect();
// Only 1 chunk, no parallelism benefit
// BALANCED: Enough chunks for parallelism, big enough to amortize overhead
let balanced: Vec<i32> = data.par_chunks(100)
.map(|chunk| chunk.iter().sum())
.collect();
// 10 chunks, good for parallel execution
// GENERAL GUIDANCE:
// - More CPU cores = can use smaller chunks
// - More expensive work per element = can use smaller chunks
// - Cheap work per element = use larger chunks
// - Aim for at least as many chunks as cores
println!("Small (chunk=1): {} results", small.len());
println!("Large (chunk=1000): {} results", large.len());
println!("Balanced (chunk=100): {} results", balanced.len());
}Choose chunk size based on workload and available parallelism.
use rayon::prelude::*;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;
fn main() {
let data: Vec<i32> = (0..20).collect();
// Track which thread processes which chunk
let thread_id = Arc::new(AtomicUsize::new(0));
let thread_names: Arc<std::sync::Mutex<Vec<(usize, Vec<i32>)>>> =
Arc::new(std::sync::Mutex::new(Vec::new()));
// Create unique thread identifier
let counter = Arc::clone(&thread_id);
let results = Arc::clone(&thread_names);
rayon::ThreadPoolBuilder::new()
.num_threads(4)
.build_global()
.unwrap();
data.par_chunks(3).for_each(|chunk| {
// Different threads may process different chunks
// Rayon's work-stealing scheduler distributes chunks
let results = Arc::clone(&results);
let chunk_vec: Vec<i32> = chunk.to_vec();
results.lock().unwrap().push(chunk_vec);
});
println!("Processed chunks: {:?}", thread_names.lock().unwrap().len());
// Rayon distributes work based on:
// 1. Number of available threads
// 2. Current thread workload
// 3. Work-stealing between threads
// 4. Chunk count relative to thread count
}Rayon's work-stealing scheduler distributes chunks across available threads.
use rayon::prelude::*;
fn main() {
let data: Vec<i32> = (0..10).collect();
// par_chunks_exact requires divisor, panics on remainder
// This ensures every chunk is exactly the specified size
// data.par_chunks_exact(3) // Would skip last element (10)
// par_chunks_exact returns iterator over full-sized chunks
// The remainder must be handled separately
// First, check if division is even
if data.len() % 3 == 0 {
data.par_chunks_exact(3).for_each(|chunk| {
println!("Exact chunk: {:?}", chunk);
});
} else {
// Handle remainder separately
let chunk_size = 3;
let remainder_start = (data.len() / chunk_size) * chunk_size;
data.par_chunks_exact(3).for_each(|chunk| {
println!("Exact chunk: {:?}", chunk);
});
// Handle remainder
let remainder = &data[remainder_start..];
println!("Remainder: {:?}", remainder);
}
// Alternatively, use par_chunks and handle variable sizes
data.par_chunks(3).for_each(|chunk| {
// chunk can be size 3 OR smaller for the last one
println!("Chunk size {}: {:?}", chunk.len(), chunk);
});
}par_chunks_exact guarantees uniform size but excludes remainder elements.
use rayon::prelude::*;
fn main() {
let data: Vec<i32> = (0..10).collect();
let chunk_size = 3;
// PATTERN 1: Process everything with par_chunks
// Simple but must handle variable sizes
let sum: i32 = data.par_chunks(chunk_size)
.map(|chunk| chunk.iter().sum())
.sum();
println!("Total sum: {}", sum);
// PATTERN 2: Separate remainder processing
let full_chunks_count = data.len() / chunk_size;
let remainder_len = data.len() % chunk_size;
let full_chunks_sum: i32 = data.par_chunks_exact(chunk_size)
.map(|chunk| chunk.iter().sum())
.sum();
let remainder_sum: i32 = if remainder_len > 0 {
data[full_chunks_count * chunk_size..].iter().sum()
} else {
0
};
println!("Full chunks: {}, Remainder: {}", full_chunks_sum, remainder_sum);
// PATTERN 3: Pad to make even
let padded_len = ((data.len() + chunk_size - 1) / chunk_size) * chunk_size;
let mut padded_data = data.clone();
padded_data.resize(padded_len, 0); // Pad with zeros
// Now all chunks are equal size
let padded_sum: i32 = padded_data.par_chunks_exact(chunk_size)
.map(|chunk| chunk.iter().sum())
.sum();
println!("Padded sum: {}", padded_sum);
// Note: padding changes results for some operations
}Choose your remainder handling strategy based on your computation requirements.
use rayon::prelude::*;
fn main() {
// Process a matrix in row-chunks
let width = 10;
let height = 7; // Not divisible by typical chunk sizes
let matrix: Vec<f64> = (0..width * height)
.map(|i| i as f64)
.collect();
// Process rows in parallel chunks
// Each row is one element for this example
let row_sums: Vec<f64> = matrix.par_chunks(width)
.map(|row| row.iter().sum())
.collect();
println!("Row sums: {:?}", row_sums);
// 7 rows -> 7 chunks, each chunk is exactly width elements
// In this case, all chunks are same size because width divides matrix
// More realistic: process blocks of rows
let rows_per_chunk = 2; // 7 rows / 2 = 3 chunks of 2 rows + 1 remainder
let block_sums: Vec<f64> = (0..height)
.collect::<Vec<_>>()
.par_chunks(rows_per_chunk)
.map(|row_indices| {
row_indices.iter()
.flat_map(|&row_idx| {
let start = row_idx * width;
&matrix[start..start + width]
})
.sum()
})
.collect();
println!("Block sums: {:?}", block_sums);
// Last block may have fewer rows
}Row-based chunking often produces even chunks; block-based chunking may have remainders.
use rayon::prelude::*;
fn main() {
// Simulate processing records in chunks
let records: Vec<String> = (0..23)
.map(|i| format!("Record {}", i))
.collect();
let batch_size = 5;
// Process in batches, handling remainder
let batch_results: Vec<usize> = records.par_chunks(batch_size)
.map(|batch| {
println!("Processing batch of {} records", batch.len());
batch.len() // Each batch returns count
})
.collect();
println!("Batch sizes: {:?}", batch_results);
// Output: [5, 5, 5, 5, 3] - last batch is smaller
// All records processed, including remainder
let total: usize = batch_results.iter().sum();
println!("Total processed: {}", total);
assert_eq!(total, records.len());
}Real-world batch processing often has uneven final batches.
use rayon::prelude::*;
fn main() {
let data: Vec<i32> = (0..100).collect();
// Sum using chunk-based processing
// Remainder handling is automatic - all elements included
let sum: i32 = data.par_chunks(7)
.map(|chunk| chunk.iter().sum())
.sum();
println!("Sum: {}", sum);
assert_eq!(sum, (0..100).sum());
// Find max using chunk-based processing
let max: Option<i32> = data.par_chunks(13)
.map(|chunk| chunk.iter().max().copied())
.max();
println!("Max: {:?}", max);
assert_eq!(max, Some(99));
// Filter across chunks
let even_count: usize = data.par_chunks(10)
.map(|chunk| chunk.iter().filter(|&&x| x % 2 == 0).count())
.sum();
println!("Even numbers: {}", even_count);
// All elements processed regardless of chunk size
// The 7, 13, and 10 don't divide 100 evenly, but results are correct
}Reduction operations correctly handle the remainder chunk.
Chunk size behavior:
| Slice Length | Chunk Size | Chunks Produced | Last Chunk Size | |--------------|------------|-----------------|-----------------| | 10 | 3 | 4 | 1 (remainder) | | 10 | 4 | 3 | 2 (remainder) | | 10 | 5 | 2 | 5 (exact) | | 10 | 10 | 1 | 10 (entire slice) | | 10 | 20 | 1 | 10 (single chunk) | | 0 | 5 | 0 | N/A (empty) |
Key behaviors:
| Aspect | Behavior | |--------|----------| | Remainder | Always processed, size < chunk_size possible | | Order | Execution order undefined, result order preserved with collect | | Empty slice | Produces no chunks | | Single element slice | Single chunk of size 1 | | Chunk size >= length | Single chunk containing entire slice |
Chunk size selection factors:
| Factor | Recommendation | |--------|----------------| | Work per element | More work = smaller chunks OK | | Number of cores | More cores = smaller chunks beneficial | | Overhead concern | Larger chunks reduce scheduling overhead | | Load balancing | Smaller chunks enable better stealing |
Paradigm comparison:
| Method | Guarantees | Use Case |
|--------|------------|----------|
| par_chunks(n) | All elements processed, last may be smaller | General parallel processing |
| par_chunks_exact(n) | Only full chunks, excludes remainder | When remainder handled separately |
| par_chunks_mut(n) | Mutable access to all elements | In-place modification |
| par_chunks_exact_mut(n) | Mutable full chunks only | In-place with separate remainder |
Key insight: par_chunks handles uneven division by including a final "remainder chunk" that contains all remaining elements, ensuring complete coverage of the input slice. The chunk size you specify is a maximum, not a guarantee—your parallel closures must handle chunks of varying sizes, typically by iterating over the chunk rather than assuming a fixed number of elements. This design prioritizes completeness over uniformity: every element is processed exactly once, but the final chunk may be smaller. For cases where you need guaranteed equal-sized chunks, par_chunks_exact excludes the remainder, requiring you to handle those elements separately. The choice of chunk size affects parallelism granularity: smaller chunks create more parallel tasks (more overhead, better load balancing), while larger chunks create fewer tasks (less overhead, potentially unbalanced). A reasonable starting point is to have at least as many chunks as available threads, adjusting based on the computational cost of processing each element.