How does `dashmap::DashMap::shards` expose internal segments for advanced concurrent access patterns?

shards() returns a slice of the internal RwLock<HashMap> segments that comprise the DashMap, allowing direct access to individual shards for advanced operations like shard-aware iteration, manual lock management, or implementing custom concurrent access patterns that the standard API doesn't support. This is an escape hatch from DashMap's automatic sharding abstraction, giving power users direct control over the underlying segments.

Basic DashMap Sharding Model

use dashmap::DashMap;
 
fn basic_sharding() {
    // DashMap splits data across multiple internal segments (shards)
    // Each shard is a RwLock<HashMap<K, V>>
    // The number of shards is determined at creation
    
    let map: DashMap<u32, String> = DashMap::new();  // Default: number of CPU cores
    
    // The map internally hashes keys to determine which shard contains the entry
    // Operations on different shards can proceed in parallel
    // Operations on the same shard are serialized by the RwLock
    
    println!("Number of shards: {}", map.shards().len());
    // Typically prints something like 8, 16, 32, etc. depending on CPU count
}

DashMap achieves concurrency by partitioning data across multiple locked segments.

Accessing the Shards

use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
 
fn access_shards() {
    let map: DashMap<u32, String> = DashMap::new();
    
    // Insert some data
    for i in 0..100 {
        map.insert(i, format!("value_{}", i));
    }
    
    // shards() returns &[RwLock<HashMap<K, V>>]
    let shards: &[RwLock<HashMap<u32, String>>] = map.shards();
    
    // Each shard is an independent HashMap protected by a RwLock
    for (i, shard) in shards.iter().enumerate() {
        let read_guard = shard.read().unwrap();
        println!("Shard {} has {} entries", i, read_guard.len());
    }
    
    // Total entries should equal the map size
    let total: usize = shards.iter()
        .map(|s| s.read().unwrap().len())
        .sum();
    assert_eq!(total, map.len());
}

shards() provides direct access to the underlying RwLock<HashMap> segments.

Shard-Aware Operations

use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
 
fn shard_aware_operations() {
    let map: DashMap<String, u32> = DashMap::new();
    
    // Populate the map
    for i in 0..1000 {
        map.insert(format!("key_{}", i), i);
    }
    
    // Process entries shard-by-shard
    // This can be more efficient for certain operations
    fn process_shard<K, V>(
        shard: &RwLock<HashMap<K, V>>,
        process_fn: impl Fn(&K, &V)
    ) {
        let guard = shard.read().unwrap();
        for (k, v) in guard.iter() {
            process_fn(k, v);
        }
    }
    
    // Process each shard with read access
    for shard in map.shards() {
        process_shard(shard, |k, v| {
            // Process each entry in this shard
        });
    }
    
    // This approach holds locks for shorter periods
    // compared to iterating through the entire map at once
}

Shard-by-shard processing can reduce lock contention for certain operations.

Manual Lock Control

use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
 
fn manual_lock_control() {
    let map: DashMap<u32, Vec<u32>> = DashMap::new();
    
    // Initialize some data
    for i in 0..10 {
        map.insert(i, Vec::new());
    }
    
    // Suppose we want to do complex operations on specific entries
    // while minimizing lock scope
    
    // Standard DashMap approach - each operation acquires/releases locks
    for i in 0..10 {
        map.entry(i).and_modify(|v| v.push(i));
    }
    
    // Manual shard approach - batch operations within same shard
    // First, determine which shard a key belongs to
    fn determine_shard<K>(map: &DashMap<K, Vec<u32>>, key: &K) -> usize 
    where K: std::hash::Hash + Eq
    {
        let hash = dashmap::HashMap::hasher(map).hash_one(key);
        let shard_count = map.shards().len();
        hash as usize % shard_count
    }
    
    // Group operations by shard to batch them
    let shards = map.shards();
    
    // This allows for custom batching strategies
    // where you hold a write lock on a single shard
    // and perform multiple operations
    
    for shard in shards {
        let mut write_guard = shard.write().unwrap();
        // Now we have exclusive access to this shard
        // Can perform multiple operations without re-acquiring
        for (key, value) in write_guard.iter_mut() {
            // Modify multiple entries in this shard
            value.push(999);
        }
    }
}

Direct shard access allows holding locks across multiple operations.

Read-Only Shard Access

use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
 
fn read_only_access() {
    let map: DashMap<u32, String> = DashMap::new();
    
    // Populate
    for i in 0..100 {
        map.insert(i, format!("value_{}", i));
    }
    
    // Read-only iteration over shards
    // Acquiring read locks allows concurrent reads
    fn find_in_shards<K, V>(
        map: &DashMap<K, V>,
        predicate: impl Fn(&K, &V) -> bool
    ) -> Option<(K, V)>
    where
        K: Clone + std::hash::Hash + Eq,
        V: Clone,
    {
        for shard in map.shards() {
            let read_guard = shard.read().unwrap();
            for (k, v) in read_guard.iter() {
                if predicate(k, v) {
                    return Some((k.clone(), v.clone()));
                }
            }
        }
        None
    }
    
    // This approach allows other readers to access different shards
    // while we're searching
}

Read locks on shards allow concurrent reads across different shards.

Write-Only Shard Access

use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
 
fn write_only_access() {
    let map: DashMap<String, u32> = DashMap::new();
    
    // Batch insert into specific shard
    // This is useful when you know the distribution
    
    let shards = map.shards();
    
    // Write to first shard exclusively
    {
        let mut first_shard = shards[0].write().unwrap();
        // This shard is now locked for writing
        // Other shards can still be read/written by other threads
        first_shard.insert("key_a".to_string(), 1);
        first_shard.insert("key_b".to_string(), 2);
        // Lock released at end of scope
    }
    
    // Note: The actual shard a key belongs to depends on hash
    // So inserting into a specific shard directly may not match
    // where DashMap would place that key normally
    
    // More useful: iterate and modify specific entries
    for shard in shards {
        let mut write_guard = shard.write().unwrap();
        write_guard.retain(|k, v| {
            // Remove entries matching some condition
            v % 2 == 0
        });
    }
}

Write locks provide exclusive access to a single shard while other shards remain accessible.

Concurrent Batch Processing

use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
use std::thread;
 
fn concurrent_batch_processing() {
    let map: DashMap<u32, Vec<u32>> = DashMap::new();
    
    // Initialize
    for i in 0..100 {
        map.insert(i, (0..10).collect());
    }
    
    // Process shards in parallel using rayon or threads
    let shards = map.shards();
    
    // Each thread can take a different shard
    thread::scope(|s| {
        for shard in shards {
            s.spawn(move || {
                let mut write_guard = shard.write().unwrap();
                for (_key, values) in write_guard.iter_mut() {
                    // Process entries in this shard
                    for v in values.iter_mut() {
                        *v *= 2;
                    }
                }
            });
        }
    });
    
    // Each shard is processed independently
    // Maximum concurrency = number of shards
    
    // Verify
    let total: u32 = map.iter().map(|(_, v)| v.iter().sum::<u32>()).sum();
    println!("Total after processing: {}", total);
}

Processing shards in parallel maximizes concurrency.

Custom Shard Count

use dashmap::DashMap;
 
fn custom_shard_count() {
    // Default: number of CPU cores (or a reasonable default)
    let default_map: DashMap<u32, u32> = DashMap::new();
    
    // With capacity (pre-allocate space)
    let with_capacity: DashMap<u32, u32> = DashMap::with_capacity(10000);
    
    // With shard count (control concurrency granularity)
    let with_shards: DashMap<u32, u32> = DashMap::with_shard_amount(32);
    
    // Both capacity and shard count
    let custom: DashMap<u32, u32> = DashMap::with_capacity_and_shard_amount(10000, 64);
    
    println!("Default shards: {}", default_map.shards().len());
    println!("Custom shards: {}", with_shards.shards().len());
    
    // Higher shard count = more potential concurrency
    // But also more memory overhead and larger lock table
    
    // Lower shard count = less memory overhead
    // But more contention per shard
    
    // For read-heavy workloads, more shards can help
    // For write-heavy workloads, balance shard count with typical batch size
}

Shard count affects concurrency granularity and memory overhead.

Shard Statistics

use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
 
fn shard_statistics() {
    let map: DashMap<String, u32> = DashMap::new();
    
    // Insert many entries
    for i in 0..1000 {
        let key = format!("key_{:04}", i);
        map.insert(key, i);
    }
    
    // Analyze distribution across shards
    let shards = map.shards();
    let shard_sizes: Vec<usize> = shards.iter()
        .map(|s| s.read().unwrap().len())
        .collect();
    
    let total: usize = shard_sizes.iter().sum();
    let max = shard_sizes.iter().max().unwrap();
    let min = shard_sizes.iter().min().unwrap();
    let avg = total as f64 / shards.len() as f64;
    
    println!("Total entries: {}", total);
    println!("Shard count: {}", shards.len());
    println!("Min shard size: {}", min);
    println!("Max shard size: {}", max);
    println!("Avg shard size: {:.2}", avg);
    
    // Check for imbalance
    let variance: f64 = shard_sizes.iter()
        .map(|&s| (s as f64 - avg).powi(2))
        .sum::<f64>() / shards.len() as f64;
    println!("Size variance: {:.2}", variance);
    
    // Good hash distribution should give roughly equal sizes
}

Analyzing shard distribution helps identify hash quality issues.

Advanced: Rehashing One Shard

use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
 
fn rehashing() {
    let map: DashMap<u32, String> = DashMap::with_shard_amount(4);
    
    // Insert some data
    for i in 0..100 {
        map.insert(i, format!("value_{}", i));
    }
    
    // DashMap handles rehashing internally
    // But you can inspect individual shard state
    
    let shards = map.shards();
    for (i, shard) in shards.iter().enumerate() {
        let guard = shard.read().unwrap();
        println!("Shard {}: capacity={}, len={}", 
            i, guard.capacity(), guard.len());
    }
    
    // After many insertions, individual shards may have different capacities
    // Each HashMap resizes independently based on its load
    
    // Note: You cannot manually rehash - DashMap manages this
    // But you can observe the state through shards()
}

Each shard resizes independently based on its own load factor.

Combining with Rayon

use dashmap::DashMap;
use rayon::prelude::*;
 
fn rayon_integration() {
    let map: DashMap<u32, u32> = DashMap::new();
    
    // Populate
    for i in 0..1000 {
        map.insert(i, i);
    }
    
    // Use rayon to process shards in parallel
    let shards = map.shards();
    
    // Note: This requires careful synchronization
    // rayon's parallel iterator on shards needs each shard to be Send + Sync
    
    // A pattern: collect results from each shard
    let results: Vec<usize> = shards
        .iter()
        .map(|shard| {
            let guard = shard.read().unwrap();
            guard.values().filter(|&&v| v > 500).count()
        })
        .collect();
    
    let total: usize = results.into_iter().sum();
    println!("Entries > 500: {}", total);
    
    // For true parallel processing, use rayon's par_iter
    // But be careful about lock ordering to avoid deadlocks
    
    // Better: Use DashMap's built-in parallel iteration
    let count = map.iter().filter(|(_, &v)| v > 500).count();
    assert_eq!(count, total);
}

Be careful when combining direct shard access with parallel iterators.

Memory Layout

use dashmap::DashMap;
use std::mem;
 
fn memory_layout() {
    // Each DashMap is essentially:
    // struct DashMap<K, V> {
    //     shards: Box<[RwLock<HashMap<K, V>>]>,
    //     // ... other fields
    // }
    
    // Each RwLock<HashMap<K, V>> has:
    // - A reader-writer lock
    // - A HashMap with its own bucket array
    
    let map: DashMap<u32, u32> = DashMap::with_shard_amount(16);
    
    println!("Shard count: {}", map.shards().len());
    println!("RwLock size: {}", mem::size_of::<std::sync::RwLock<std::collections::HashMap<u32, u32>>>());
    
    // Memory overhead:
    // - Each shard has its own RwLock
    // - Each HashMap has its own bucket array
    // - Total overhead ~= shards * (RwLock + HashMap overhead)
    
    // For small maps, overhead of many shards can be significant
    // For large maps, per-shard HashMap overhead is amortized
}

More shards means more lock overhead but better concurrency potential.

When to Use shards()

use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
 
fn when_to_use_shards() {
    // Standard DashMap API is sufficient for most cases:
    // - map.insert(), map.remove(), map.get()
    // - map.entry().or_insert()
    // - map.iter(), map.iter_mut()
    
    // Use shards() for advanced cases:
    
    // 1. Shard statistics and monitoring
    let map: DashMap<u32, u32> = DashMap::new();
    let max_shard_size = map.shards().iter()
        .map(|s| s.read().unwrap().len())
        .max()
        .unwrap();
    
    // 2. Custom iteration with controlled locking
    // Process each shard separately to minimize lock hold time
    
    // 3. Batch operations on same shard
    // Multiple operations while holding single write lock
    
    // 4. Read-heavy workloads where you want to
    // control exactly when locks are acquired/released
    
    // 5. Debugging hash distribution
    // Check if keys are evenly distributed across shards
    
    // For most use cases, prefer the standard DashMap API
    // It's safer and more ergonomic
}

Use shards() when you need control beyond the standard API.

Synthesis

Quick reference:

Method	Returns	Use Case
`map.shards()`	`&[RwLock<HashMap<K, V>>]`	Direct shard access
`map.shards_mut()`	`&mut [RwLock<HashMap<K, V>>]`	Mutable shard access
`map.shards().len()`	`usize`	Number of shards

Shard access patterns:

use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
 
fn patterns() {
    let map: DashMap<String, u32> = DashMap::new();
    
    // Pattern 1: Statistics across shards
    let stats: Vec<usize> = map.shards().iter()
        .map(|s| s.read().unwrap().len())
        .collect();
    
    // Pattern 2: Batch write to one shard
    {
        let shard = &map.shards()[0];
        let mut guard = shard.write().unwrap();
        // Multiple operations while holding lock
        guard.insert("a".to_string(), 1);
        guard.insert("b".to_string(), 2);
    }
    
    // Pattern 3: Read-heavy scan
    for shard in map.shards() {
        let guard = shard.read().unwrap();
        // Other threads can read from other shards
        for (k, v) in guard.iter() {
            // Process entry
        }
    }
    
    // Pattern 4: Process shards in parallel
    // (Careful with lock ordering)
}

Key insight: DashMap::shards() exposes the internal array of RwLock<HashMap> segments, providing an escape hatch from DashMap's automatic sharding abstraction. Each shard is a standard HashMap protected by a RwLock, allowing read-write access patterns similar to RwLock<HashMap<K, V>>. The primary use cases are: (1) monitoring shard statistics to verify hash distribution quality, (2) batch operations on entries within the same shard to amortize lock acquisition costs, (3) shard-by-shard iteration to minimize lock hold time, and (4) implementing custom concurrent access patterns not supported by DashMap's standard API. However, direct shard access requires careful consideration of lock ordering to prevent deadlocks—when acquiring multiple shards, always acquire in a consistent order (e.g., increasing index). The number of shards (controlled by with_shard_amount()) determines the maximum concurrency potential: more shards enable more parallel operations but increase memory overhead and lock management complexity. For most use cases, DashMap's standard API (insert, get, remove, entry) is safer and more ergonomic; shards() is for power users who need fine-grained control over the concurrent access patterns.

How does dashmap::DashMap::shards expose internal segments for advanced concurrent access patterns?