How do I parallelize computations in Rust?

Walkthrough

Parallelizing code can dramatically improve performance on multi-core systems, but traditional threading introduces complexity and bugs. Rayon provides data-parallelism through a simple API that automatically distributes work across available CPU cores using a work-stealing thread pool.

Rayon's key concepts:

  1. par_iter() and par_iter_mut() — parallel iterators that replace sequential iteration
  2. Work-stealing scheduler — automatically balances load across threads
  3. No data races — the API enforces thread-safe patterns at compile time
  4. Rayon uses a global thread pool sized to your CPU core count by default

The beauty of Rayon is that converting sequential code to parallel often requires only changing .iter() to .par_iter().

Code Example

# Cargo.toml
[dependencies]
rayon = "1.10"
use rayon::prelude::*;
use std::time::Instant;
 
fn main() {
    // ===== Basic Parallel Iteration =====
    let data: Vec<i32> = (1..=1_000_000).collect();
    
    // Sequential sum
    let start = Instant::now();
    let seq_sum: i32 = data.iter().sum();
    println!("Sequential sum: {} ({:?})", seq_sum, start.elapsed());
    
    // Parallel sum - just change iter() to par_iter()
    let start = Instant::now();
    let par_sum: i32 = data.par_iter().sum();
    println!("Parallel sum: {} ({:?})", par_sum, start.elapsed());
 
    // ===== Transform and Collect =====
    let numbers: Vec<i32> = (1..=10).collect();
    
    // Apply operation in parallel
    let squared: Vec<i32> = numbers.par_iter().map(|&x| x * x).collect();
    println!("Squared: {:?}", squared);
 
    // ===== Filter and Map Combined =====
    let evens_squared: Vec<i32> = (1..=20)
        .into_par_iter()
        .filter(|&x| x % 2 == 0)
        .map(|x| x * x)
        .collect();
    println!("Even numbers squared: {:?}", evens_squared);
 
    // ===== Parallel Reduce =====
    let product: i32 = (1..=5).into_par_iter().reduce(|| 1, |a, b| a * b);
    println!("Product 1-5: {}", product);
 
    // ===== Find in Parallel =====
    let data = vec!["apple", "banana", "cherry", "date", "elderberry"];
    
    let found = data.par_iter().find_any(|&&s| s.starts_with('c'));
    println!("Found: {:?}", found); // Some("cherry")
 
    // Check if any/all elements match
    let has_long = data.par_iter().any(|s| s.len() > 8);
    let all_short = data.par_iter().all(|s| s.len() < 15);
    println!("Has long word: {}, All short: {}", has_long, all_short);
}

Parallel Processing with Mutable Data

use rayon::prelude::*;
 
fn main() {
    // Parallel iteration with mutation
    let mut values: Vec<i32> = (1..=100).collect();
    
    values.par_iter_mut().for_each(|x| {
        *x = (*x) * (*x); // Square each value in place
    });
    
    println!("First 5 squared values: {:?}", &values[..5]);
 
    // Parallel sorting
    let mut unsorted: Vec<i32> = (1..=1000).rev().collect();
    unsorted.par_sort();
    println!("Sorted first 5: {:?}", &unsorted[..5]);
}

Real-World Example: Parallel File Processing

use rayon::prelude::*;
use std::fs;
 
fn main() -> std::io::Result<()> {
    let files: Vec<String> = vec![
        "src/main.rs".to_string(),
        "src/lib.rs".to_string(),
        "Cargo.toml".to_string(),
    ];
 
    // Process files in parallel
    let line_counts: Vec<(String, usize)> = files
        .par_iter()
        .filter_map(|path| {
            fs::read_to_string(path)
                .ok()
                .map(|content| (path.clone(), content.lines().count()))
        })
        .collect();
 
    for (file, count) in line_counts {
        println!("{}: {} lines", file, count);
    }
 
    Ok(())
}

Parallel Fold

use rayon::prelude::*;
 
fn main() {
    let data: Vec<i32> = (1..=1000).collect();
    
    // Fold with identity and combine function
    let sum_of_squares: i32 = data
        .par_iter()
        .fold(|| 0, |acc, &x| acc + x * x)  // Per-thread accumulation
        .reduce(|| 0, |a, b| a + b);         // Combine thread results
    
    println!("Sum of squares: {}", sum_of_squares);
}

Controlling Parallelism

use rayon::prelude::*;
use rayon::ThreadPoolBuilder;
 
fn main() {
    // Configure a custom thread pool
    let pool = ThreadPoolBuilder::new()
        .num_threads(4)  // Limit to 4 threads
        .build()
        .unwrap();
    
    // Use the pool for parallel work
    let result: i32 = pool.install(|| {
        (1..=1000).into_par_iter().sum()
    });
    
    println!("Sum with 4 threads: {}", result);
}

Summary

  • Replace .iter() with .par_iter() to parallelize most iterator chains
  • Use .into_par_iter() to consume collections in parallel
  • par_iter_mut() enables in-place parallel modification
  • Operations like map, filter, reduce, for_each all work on parallel iterators
  • par_sort() provides parallel sorting for mutable slices
  • Rayon's work-stealing scheduler automatically balances work across threads
  • Use ThreadPoolBuilder to customize thread count and configuration
  • No manual synchronization needed—the API prevents data races at compile time
  • Ideal for CPU-bound operations on large datasets; less beneficial for small collections