How do I work with Rayon for Data Parallelism in Rust?

Walkthrough

Rayon is a data parallelism library that makes it easy to convert sequential computations into parallel ones. It provides a work-stealing thread pool and parallel iterators that automatically distribute work across multiple CPU cores. The key insight is that Rayon guarantees the same result as sequential execution, just faster.

Key concepts:

Parallel iterators — .par_iter() instead of .iter()
Work stealing — Threads steal tasks from each other for load balancing
Data parallelism — Same operation on different data elements
Join — Run two closures in parallel
Scope — Create scoped parallel tasks

When to use Rayon:

CPU-bound computations on collections
Data processing pipelines
Map/filter/reduce operations on large datasets
Image processing, simulations, scientific computing

When NOT to use Rayon:

I/O-bound operations (use async instead)
Very small collections (overhead > benefit)
Operations with side effects
When order must be preserved strictly

Code Examples

Basic Parallel Iteration

use rayon::prelude::*;
 
fn main() {
    let data = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
    
    // Sequential
    let sum: i32 = data.iter().sum();
    println!("Sequential sum: {}", sum);
    
    // Parallel (just add par_)
    let parallel_sum: i32 = data.par_iter().sum();
    println!("Parallel sum: {}", parallel_sum);
}

Parallel Map

use rayon::prelude::*;
 
fn main() {
    let data = vec![1, 2, 3, 4, 5];
    
    // Parallel map
    let squares: Vec<i32> = data.par_iter()
        .map(|x| x * x)
        .collect();
    
    println!("Squares: {:?}", squares);
    
    // Parallel map with expensive computation
    let results: Vec<i32> = data.par_iter()
        .map(|x| {
            // Simulate expensive work
            (1..=*x).sum()
        })
        .collect();
    
    println!("Results: {:?}", results);
}

Parallel Filter

use rayon::prelude::*;
 
fn main() {
    let data: Vec<i32> = (1..=100).collect();
    
    // Parallel filter
    let evens: Vec<i32> = data.par_iter()
        .filter(|x| *x % 2 == 0)
        .cloned()
        .collect();
    
    println!("Even numbers: {} found", evens.len());
}

Parallel Filter Map

use rayon::prelude::*;
 
fn main() {
    let data = vec!["1", "2", "three", "4", "five", "6"];
    
    // Parse strings to numbers, skip failures
    let numbers: Vec<i32> = data.par_iter()
        .filter_map(|s| s.parse().ok())
        .collect();
    
    println!("Parsed numbers: {:?}", numbers);
}

Parallel Reduce

use rayon::prelude::*;
 
fn main() {
    let data: Vec<i32> = (1..=100).collect();
    
    // Parallel reduce (sum)
    let sum = data.par_iter().reduce(|| 0, |a, b| a + b);
    println!("Sum: {}", sum);
    
    // Parallel reduce (max)
    let max = data.par_iter().reduce(|| i32::MIN, |a, b| a.max(b));
    println!("Max: {}", max);
    
    # Parallel reduce (product)
    let product: i32 = data.par_iter().take(5).reduce(|| 1, |a, b| a * b);
    println!("Product of first 5: {}", product);
}

Parallel Fold

use rayon::prelude::*;
 
fn main() {
    let data: Vec<i32> = (1..=10).collect();
    
    # Parallel fold
    let sum = data.par_iter().fold(|| 0, |acc, x| acc + x);
    println!("Sum: {}", sum);
    
    # Fold then reduce
    let total = data.par_iter()
        .fold(|| 0, |acc, x| acc + x)
        .reduce(|| 0, |a, b| a + b);
    println!("Total: {}", total);
}

Parallel For Each

use rayon::prelude::*;
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;
 
fn main() {
    let data: Vec<i32> = (1..=100).collect();
    let counter = Arc::new(AtomicUsize::new(0));
    
    # Parallel for_each
    data.par_iter().for_each(|x| {
        # Process each item
        counter.fetch_add(1, Ordering::Relaxed);
    });
    
    println!("Processed {} items", counter.load(Ordering::Relaxed));
}

Parallel Find and Any

use rayon::prelude::*;
 
fn main() {
    let data: Vec<i32> = (1..=1000).collect();
    
    # Parallel find
    let found = data.par_iter().find(|x| **x > 500);
    println!("Found: {:?}", found);
    
    # Parallel find_first
    let first = data.par_iter().find_first(|x| **x > 500);
    println!("First > 500: {:?}", first);
    
    # Parallel any
    let has_large = data.par_iter().any(|x| *x > 900);
    println!("Has number > 900: {}", has_large);
    
    # Parallel all
    let all_positive = data.par_iter().all(|x| *x > 0);
    println!("All positive: {}", all_positive);
}

Parallel Sort

use rayon::prelude::*;
 
fn main() {
    let mut data = vec![5, 3, 8, 1, 9, 2, 7, 4, 6, 0];
    
    # Parallel sort
    data.par_sort();
    println!("Sorted: {:?}", data);
    
    # Parallel sort unstable (faster)
    let mut data2 = vec![5, 3, 8, 1, 9, 2, 7, 4, 6, 0];
    data2.par_sort_unstable();
    println!("Sorted unstable: {:?}", data2);
    
    # Parallel sort by
    let mut pairs = vec![(2, 'b'), (1, 'a'), (3, 'c')];
    pairs.par_sort_by(|a, b| a.0.cmp(&b.0));
    println!("Sorted pairs: {:?}", pairs);
}

Parallel Partition

use rayon::prelude::*;
 
fn main() {
    let data: Vec<i32> = (1..=10).collect();
    
    # Partition into two vectors
    let (evens, odds): (Vec<i32>, Vec<i32>) = data.par_iter()
        .partition(|x| *x % 2 == 0);
    
    println!("Evens: {:?}", evens);
    println!("Odds: {:?}", odds);
}

Join for Fork-Join Parallelism

use rayon::prelude::*;
 
fn main() {
    let data1 = vec![1, 2, 3, 4, 5];
    let data2 = vec![6, 7, 8, 9, 10];
    
    # Run two computations in parallel
    let (sum1, sum2) = rayon::join(
        || data1.iter().sum::<i32>(),
        || data2.iter().sum::<i32>()
    );
    
    println!("Sum 1: {}, Sum 2: {}", sum1, sum2);
}

Recursive Parallelism

use rayon::prelude::*;
 
fn quick_sort<T: Ord + Send>(mut data: Vec<T>) -> Vec<T> {
    if data.len() <= 1 {
        return data;
    }
    
    let pivot = data.swap_remove(0);
    let (less, greater): (Vec<T>, Vec<T>) = data
        .into_iter()
        .partition(|x| *x < pivot);
    
    # Sort partitions in parallel
    let (sorted_less, sorted_greater) = rayon::join(
        || quick_sort(less),
        || quick_sort(greater)
    );
    
    let mut result = sorted_less;
    result.push(pivot);
    result.extend(sorted_greater);
    result
}
 
fn main() {
    let data = vec![5, 3, 8, 1, 9, 2, 7, 4, 6, 0];
    let sorted = quick_sort(data);
    println!("Sorted: {:?}", sorted);
}

Scoped Threads

use rayon::prelude::*;
 
fn main() {
    let mut data = vec![1, 2, 3, 4, 5];
    
    # Scoped parallel access
    rayon::scope(|s| {
        s.spawn(|_| {
            data[0] = 10;
        });
        s.spawn(|_| {
            data[1] = 20;
        });
    });
    
    println!("Modified data: {:?}", data);
}

Thread Pool Configuration

use rayon::{ThreadPool, ThreadPoolBuilder};
 
fn main() {
    # Create custom thread pool
    let pool: ThreadPool = ThreadPoolBuilder::new()
        .num_threads(4)
        .build()
        .unwrap();
    
    let data: Vec<i32> = (1..=100).collect();
    
    # Use the pool
    let sum: i32 = pool.install(|| {
        data.par_iter().sum()
    });
    
    println!("Sum: {}", sum);
}

Current Thread Index

use rayon::prelude::*;
use std::collections::HashMap;
use std::sync::Mutex;
 
fn main() {
    let data: Vec<i32> = (1..=1000).collect();
    let per_thread: Mutex<HashMap<usize, usize>> = Mutex::new(HashMap::new());
    
    data.par_iter().for_each(|_| {
        let thread_idx = rayon::current_thread_index().unwrap();
        *per_thread.lock().unwrap().entry(thread_idx).or_insert(0) += 1;
    });
    
    println!("Work per thread: {:?}", per_thread.into_inner().unwrap());
}

Flat Map Parallel

use rayon::prelude::*;
 
fn main() {
    let data = vec![[1, 2], [3, 4], [5, 6]];
    
    # Flatten nested structures in parallel
    let flat: Vec<i32> = data.par_iter()
        .flat_map(|arr| arr.iter().cloned())
        .collect();
    
    println!("Flattened: {:?}", flat);
}

Parallel Zip

use rayon::prelude::*;
 
fn main() {
    let a = vec![1, 2, 3, 4, 5];
    let b = vec![10, 20, 30, 40, 50];
    
    # Zip two iterators in parallel
    let sums: Vec<i32> = a.par_iter()
        .zip(b.par_iter())
        .map(|(x, y)| x + y)
        .collect();
    
    println!("Sums: {:?}", sums);
}

Parallel Chains

use rayon::prelude::*;
 
fn main() {
    let a = vec![1, 2, 3];
    let b = vec![4, 5, 6];
    
    # Chain parallel iterators
    let combined: Vec<i32> = a.par_iter()
        .chain(b.par_iter())
        .cloned()
        .collect();
    
    println!("Combined: {:?}", combined);
}

Image Processing Example

use rayon::prelude::*;
 
struct Pixel {
    r: u8,
    g: u8,
    b: u8,
}
 
impl Pixel {
    fn grayscale(&self) -> u8 {
        ((self.r as u16 + self.g as u16 + self.b as u16) / 3) as u8
    }
    
    fn invert(&self) -> Self {
        Pixel {
            r: 255 - self.r,
            g: 255 - self.g,
            b: 255 - self.b,
        }
    }
}
 
fn main() {
    let mut image: Vec<Pixel> = (0..100_000)
        .map(|i| Pixel { r: i as u8, g: (i * 2) as u8, b: (i * 3) as u8 })
        .collect();
    
    # Parallel image processing
    image.par_iter_mut().for_each(|pixel| {
        let gray = pixel.grayscale();
        pixel.r = gray;
        pixel.g = gray;
        pixel.b = gray;
    });
    
    println!("Processed {} pixels", image.len());
}

Matrix Operations

use rayon::prelude::*;
 
type Matrix = Vec<Vec<f64>>;
 
fn matrix_multiply(a: &Matrix, b: &Matrix) -> Matrix {
    let n = a.len();
    let m = b[0].len();
    let p = b.len();
    
    (0..n).into_par_iter()
        .map(|i| {
            (0..m).map(|j| {
                (0..p).map(|k| a[i][k] * b[k][j]).sum()
            }).collect()
        })
        .collect()
}
 
fn main() {
    let a = vec![
        vec![1.0, 2.0],
        vec![3.0, 4.0],
    ];
    
    let b = vec![
        vec![5.0, 6.0],
        vec![7.0, 8.0],
    ];
    
    let result = matrix_multiply(&a, &b);
    println!("Result: {:?}", result);
}

Word Frequency Counter

use rayon::prelude::*;
use std::collections::HashMap;
 
fn main() {
    let text = "the quick brown fox jumps over the lazy dog the fox was quick";
    let words: Vec<&str> = text.split_whitespace().collect();
    
    # Parallel word frequency counting
    let freq: HashMap<&str, usize> = words.par_iter()
        .fold(
            || HashMap::new(),
            |mut map, word| {
                *map.entry(word).or_insert(0) += 1;
                map
            }
        )
        .reduce(
            || HashMap::new(),
            |mut a, b| {
                for (word, count) in b {
                    *a.entry(word).or_insert(0) += count;
                }
                a
            }
        );
    
    println!("Word frequencies: {:?}", freq);
}

Benchmark Comparison

use rayon::prelude::*;
use std::time::Instant;
 
fn main() {
    let data: Vec<i32> = (1..=10_000_000).collect();
    
    # Sequential
    let start = Instant::now();
    let seq_sum: i32 = data.iter().sum();
    let seq_time = start.elapsed();
    
    # Parallel
    let start = Instant::now();
    let par_sum: i32 = data.par_iter().sum();
    let par_time = start.elapsed();
    
    println!("Sequential sum: {} ({:?})", seq_sum, seq_time);
    println!("Parallel sum: {} ({:?})", par_sum, par_time);
    println!("Speedup: {:.2}x", seq_time.as_secs_f64() / par_time.as_secs_f64());
}

Collect into Different Types

use rayon::prelude::*;
use std::collections::{HashMap, HashSet};
 
fn main() {
    let data = vec![1, 2, 3, 4, 5, 1, 2, 3];
    
    # Collect into Vec
    let vec: Vec<i32> = data.par_iter().cloned().collect();
    println!("Vec: {:?}", vec);
    
    # Collect into HashSet
    let set: HashSet<i32> = data.par_iter().cloned().collect();
    println!("Set: {:?}", set);
    
    # Collect into HashMap
    let map: HashMap<i32, i32> = data.par_iter()
        .map(|x| (*x, x * x))
        .collect();
    println!("Map: {:?}", map);
}

Summary

Rayon Key Imports:

use rayon::prelude::*;  // Parallel iterator traits
use rayon::join;       // Fork-join parallelism
use rayon::scope;      // Scoped threads

Sequential to Parallel Conversion:

Sequential	Parallel
`.iter()`	`.par_iter()`
`.iter_mut()`	`.par_iter_mut()`
`.into_iter()`	`.into_par_iter()`

Parallel Iterator Methods:

Method	Description
`map`	Transform each element
`filter`	Keep matching elements
`filter_map`	Filter and map combined
`fold`	Accumulate with identity
`reduce`	Combine all elements
`for_each`	Execute action on each
`collect`	Collect into container
`find`	Find matching element
`any`	Check if any matches
`all`	Check if all match
`sort`	Sort in parallel

Key Functions:

Function	Description
`join(f1, f2)`	Run two closures in parallel
`scope(s)`	Create scoped parallel tasks
`spawn(f)`	Spawn a task in scope

Performance Considerations:

Factor	Recommendation
Collection size	Large collections (> 1000 elements)
Work per item	Expensive operations benefit more
Overhead	Small collections may be slower
Memory	Parallel requires more memory

Key Points:

Just add par_ to convert iterators to parallel
Work-stealing thread pool for load balancing
join for fork-join parallelism
scope for spawning tasks with borrowed data
Perfect for CPU-bound computations
Avoid I/O operations in parallel iterators
Results are deterministic like sequential
Custom thread pools for fine control