How does dashmap::DashMap::shards expose internal segments for advanced concurrent access patterns?
shards() returns a slice of the internal RwLock<HashMap> segments that comprise the DashMap, allowing direct access to individual shards for advanced operations like shard-aware iteration, manual lock management, or implementing custom concurrent access patterns that the standard API doesn't support. This is an escape hatch from DashMap's automatic sharding abstraction, giving power users direct control over the underlying segments.
Basic DashMap Sharding Model
use dashmap::DashMap;
fn basic_sharding() {
// DashMap splits data across multiple internal segments (shards)
// Each shard is a RwLock<HashMap<K, V>>
// The number of shards is determined at creation
let map: DashMap<u32, String> = DashMap::new(); // Default: number of CPU cores
// The map internally hashes keys to determine which shard contains the entry
// Operations on different shards can proceed in parallel
// Operations on the same shard are serialized by the RwLock
println!("Number of shards: {}", map.shards().len());
// Typically prints something like 8, 16, 32, etc. depending on CPU count
}DashMap achieves concurrency by partitioning data across multiple locked segments.
Accessing the Shards
use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
fn access_shards() {
let map: DashMap<u32, String> = DashMap::new();
// Insert some data
for i in 0..100 {
map.insert(i, format!("value_{}", i));
}
// shards() returns &[RwLock<HashMap<K, V>>]
let shards: &[RwLock<HashMap<u32, String>>] = map.shards();
// Each shard is an independent HashMap protected by a RwLock
for (i, shard) in shards.iter().enumerate() {
let read_guard = shard.read().unwrap();
println!("Shard {} has {} entries", i, read_guard.len());
}
// Total entries should equal the map size
let total: usize = shards.iter()
.map(|s| s.read().unwrap().len())
.sum();
assert_eq!(total, map.len());
}shards() provides direct access to the underlying RwLock<HashMap> segments.
Shard-Aware Operations
use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
fn shard_aware_operations() {
let map: DashMap<String, u32> = DashMap::new();
// Populate the map
for i in 0..1000 {
map.insert(format!("key_{}", i), i);
}
// Process entries shard-by-shard
// This can be more efficient for certain operations
fn process_shard<K, V>(
shard: &RwLock<HashMap<K, V>>,
process_fn: impl Fn(&K, &V)
) {
let guard = shard.read().unwrap();
for (k, v) in guard.iter() {
process_fn(k, v);
}
}
// Process each shard with read access
for shard in map.shards() {
process_shard(shard, |k, v| {
// Process each entry in this shard
});
}
// This approach holds locks for shorter periods
// compared to iterating through the entire map at once
}Shard-by-shard processing can reduce lock contention for certain operations.
Manual Lock Control
use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
fn manual_lock_control() {
let map: DashMap<u32, Vec<u32>> = DashMap::new();
// Initialize some data
for i in 0..10 {
map.insert(i, Vec::new());
}
// Suppose we want to do complex operations on specific entries
// while minimizing lock scope
// Standard DashMap approach - each operation acquires/releases locks
for i in 0..10 {
map.entry(i).and_modify(|v| v.push(i));
}
// Manual shard approach - batch operations within same shard
// First, determine which shard a key belongs to
fn determine_shard<K>(map: &DashMap<K, Vec<u32>>, key: &K) -> usize
where K: std::hash::Hash + Eq
{
let hash = dashmap::HashMap::hasher(map).hash_one(key);
let shard_count = map.shards().len();
hash as usize % shard_count
}
// Group operations by shard to batch them
let shards = map.shards();
// This allows for custom batching strategies
// where you hold a write lock on a single shard
// and perform multiple operations
for shard in shards {
let mut write_guard = shard.write().unwrap();
// Now we have exclusive access to this shard
// Can perform multiple operations without re-acquiring
for (key, value) in write_guard.iter_mut() {
// Modify multiple entries in this shard
value.push(999);
}
}
}Direct shard access allows holding locks across multiple operations.
Read-Only Shard Access
use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
fn read_only_access() {
let map: DashMap<u32, String> = DashMap::new();
// Populate
for i in 0..100 {
map.insert(i, format!("value_{}", i));
}
// Read-only iteration over shards
// Acquiring read locks allows concurrent reads
fn find_in_shards<K, V>(
map: &DashMap<K, V>,
predicate: impl Fn(&K, &V) -> bool
) -> Option<(K, V)>
where
K: Clone + std::hash::Hash + Eq,
V: Clone,
{
for shard in map.shards() {
let read_guard = shard.read().unwrap();
for (k, v) in read_guard.iter() {
if predicate(k, v) {
return Some((k.clone(), v.clone()));
}
}
}
None
}
// This approach allows other readers to access different shards
// while we're searching
}Read locks on shards allow concurrent reads across different shards.
Write-Only Shard Access
use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
fn write_only_access() {
let map: DashMap<String, u32> = DashMap::new();
// Batch insert into specific shard
// This is useful when you know the distribution
let shards = map.shards();
// Write to first shard exclusively
{
let mut first_shard = shards[0].write().unwrap();
// This shard is now locked for writing
// Other shards can still be read/written by other threads
first_shard.insert("key_a".to_string(), 1);
first_shard.insert("key_b".to_string(), 2);
// Lock released at end of scope
}
// Note: The actual shard a key belongs to depends on hash
// So inserting into a specific shard directly may not match
// where DashMap would place that key normally
// More useful: iterate and modify specific entries
for shard in shards {
let mut write_guard = shard.write().unwrap();
write_guard.retain(|k, v| {
// Remove entries matching some condition
v % 2 == 0
});
}
}Write locks provide exclusive access to a single shard while other shards remain accessible.
Concurrent Batch Processing
use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
use std::thread;
fn concurrent_batch_processing() {
let map: DashMap<u32, Vec<u32>> = DashMap::new();
// Initialize
for i in 0..100 {
map.insert(i, (0..10).collect());
}
// Process shards in parallel using rayon or threads
let shards = map.shards();
// Each thread can take a different shard
thread::scope(|s| {
for shard in shards {
s.spawn(move || {
let mut write_guard = shard.write().unwrap();
for (_key, values) in write_guard.iter_mut() {
// Process entries in this shard
for v in values.iter_mut() {
*v *= 2;
}
}
});
}
});
// Each shard is processed independently
// Maximum concurrency = number of shards
// Verify
let total: u32 = map.iter().map(|(_, v)| v.iter().sum::<u32>()).sum();
println!("Total after processing: {}", total);
}Processing shards in parallel maximizes concurrency.
Custom Shard Count
use dashmap::DashMap;
fn custom_shard_count() {
// Default: number of CPU cores (or a reasonable default)
let default_map: DashMap<u32, u32> = DashMap::new();
// With capacity (pre-allocate space)
let with_capacity: DashMap<u32, u32> = DashMap::with_capacity(10000);
// With shard count (control concurrency granularity)
let with_shards: DashMap<u32, u32> = DashMap::with_shard_amount(32);
// Both capacity and shard count
let custom: DashMap<u32, u32> = DashMap::with_capacity_and_shard_amount(10000, 64);
println!("Default shards: {}", default_map.shards().len());
println!("Custom shards: {}", with_shards.shards().len());
// Higher shard count = more potential concurrency
// But also more memory overhead and larger lock table
// Lower shard count = less memory overhead
// But more contention per shard
// For read-heavy workloads, more shards can help
// For write-heavy workloads, balance shard count with typical batch size
}Shard count affects concurrency granularity and memory overhead.
Shard Statistics
use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
fn shard_statistics() {
let map: DashMap<String, u32> = DashMap::new();
// Insert many entries
for i in 0..1000 {
let key = format!("key_{:04}", i);
map.insert(key, i);
}
// Analyze distribution across shards
let shards = map.shards();
let shard_sizes: Vec<usize> = shards.iter()
.map(|s| s.read().unwrap().len())
.collect();
let total: usize = shard_sizes.iter().sum();
let max = shard_sizes.iter().max().unwrap();
let min = shard_sizes.iter().min().unwrap();
let avg = total as f64 / shards.len() as f64;
println!("Total entries: {}", total);
println!("Shard count: {}", shards.len());
println!("Min shard size: {}", min);
println!("Max shard size: {}", max);
println!("Avg shard size: {:.2}", avg);
// Check for imbalance
let variance: f64 = shard_sizes.iter()
.map(|&s| (s as f64 - avg).powi(2))
.sum::<f64>() / shards.len() as f64;
println!("Size variance: {:.2}", variance);
// Good hash distribution should give roughly equal sizes
}Analyzing shard distribution helps identify hash quality issues.
Advanced: Rehashing One Shard
use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
fn rehashing() {
let map: DashMap<u32, String> = DashMap::with_shard_amount(4);
// Insert some data
for i in 0..100 {
map.insert(i, format!("value_{}", i));
}
// DashMap handles rehashing internally
// But you can inspect individual shard state
let shards = map.shards();
for (i, shard) in shards.iter().enumerate() {
let guard = shard.read().unwrap();
println!("Shard {}: capacity={}, len={}",
i, guard.capacity(), guard.len());
}
// After many insertions, individual shards may have different capacities
// Each HashMap resizes independently based on its load
// Note: You cannot manually rehash - DashMap manages this
// But you can observe the state through shards()
}Each shard resizes independently based on its own load factor.
Combining with Rayon
use dashmap::DashMap;
use rayon::prelude::*;
fn rayon_integration() {
let map: DashMap<u32, u32> = DashMap::new();
// Populate
for i in 0..1000 {
map.insert(i, i);
}
// Use rayon to process shards in parallel
let shards = map.shards();
// Note: This requires careful synchronization
// rayon's parallel iterator on shards needs each shard to be Send + Sync
// A pattern: collect results from each shard
let results: Vec<usize> = shards
.iter()
.map(|shard| {
let guard = shard.read().unwrap();
guard.values().filter(|&&v| v > 500).count()
})
.collect();
let total: usize = results.into_iter().sum();
println!("Entries > 500: {}", total);
// For true parallel processing, use rayon's par_iter
// But be careful about lock ordering to avoid deadlocks
// Better: Use DashMap's built-in parallel iteration
let count = map.iter().filter(|(_, &v)| v > 500).count();
assert_eq!(count, total);
}Be careful when combining direct shard access with parallel iterators.
Memory Layout
use dashmap::DashMap;
use std::mem;
fn memory_layout() {
// Each DashMap is essentially:
// struct DashMap<K, V> {
// shards: Box<[RwLock<HashMap<K, V>>]>,
// // ... other fields
// }
// Each RwLock<HashMap<K, V>> has:
// - A reader-writer lock
// - A HashMap with its own bucket array
let map: DashMap<u32, u32> = DashMap::with_shard_amount(16);
println!("Shard count: {}", map.shards().len());
println!("RwLock size: {}", mem::size_of::<std::sync::RwLock<std::collections::HashMap<u32, u32>>>());
// Memory overhead:
// - Each shard has its own RwLock
// - Each HashMap has its own bucket array
// - Total overhead ~= shards * (RwLock + HashMap overhead)
// For small maps, overhead of many shards can be significant
// For large maps, per-shard HashMap overhead is amortized
}More shards means more lock overhead but better concurrency potential.
When to Use shards()
use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
fn when_to_use_shards() {
// Standard DashMap API is sufficient for most cases:
// - map.insert(), map.remove(), map.get()
// - map.entry().or_insert()
// - map.iter(), map.iter_mut()
// Use shards() for advanced cases:
// 1. Shard statistics and monitoring
let map: DashMap<u32, u32> = DashMap::new();
let max_shard_size = map.shards().iter()
.map(|s| s.read().unwrap().len())
.max()
.unwrap();
// 2. Custom iteration with controlled locking
// Process each shard separately to minimize lock hold time
// 3. Batch operations on same shard
// Multiple operations while holding single write lock
// 4. Read-heavy workloads where you want to
// control exactly when locks are acquired/released
// 5. Debugging hash distribution
// Check if keys are evenly distributed across shards
// For most use cases, prefer the standard DashMap API
// It's safer and more ergonomic
}Use shards() when you need control beyond the standard API.
Synthesis
Quick reference:
| Method | Returns | Use Case |
|---|---|---|
map.shards() |
&[RwLock<HashMap<K, V>>] |
Direct shard access |
map.shards_mut() |
&mut [RwLock<HashMap<K, V>>] |
Mutable shard access |
map.shards().len() |
usize |
Number of shards |
Shard access patterns:
use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
fn patterns() {
let map: DashMap<String, u32> = DashMap::new();
// Pattern 1: Statistics across shards
let stats: Vec<usize> = map.shards().iter()
.map(|s| s.read().unwrap().len())
.collect();
// Pattern 2: Batch write to one shard
{
let shard = &map.shards()[0];
let mut guard = shard.write().unwrap();
// Multiple operations while holding lock
guard.insert("a".to_string(), 1);
guard.insert("b".to_string(), 2);
}
// Pattern 3: Read-heavy scan
for shard in map.shards() {
let guard = shard.read().unwrap();
// Other threads can read from other shards
for (k, v) in guard.iter() {
// Process entry
}
}
// Pattern 4: Process shards in parallel
// (Careful with lock ordering)
}Key insight: DashMap::shards() exposes the internal array of RwLock<HashMap> segments, providing an escape hatch from DashMap's automatic sharding abstraction. Each shard is a standard HashMap protected by a RwLock, allowing read-write access patterns similar to RwLock<HashMap<K, V>>. The primary use cases are: (1) monitoring shard statistics to verify hash distribution quality, (2) batch operations on entries within the same shard to amortize lock acquisition costs, (3) shard-by-shard iteration to minimize lock hold time, and (4) implementing custom concurrent access patterns not supported by DashMap's standard API. However, direct shard access requires careful consideration of lock ordering to prevent deadlocksâwhen acquiring multiple shards, always acquire in a consistent order (e.g., increasing index). The number of shards (controlled by with_shard_amount()) determines the maximum concurrency potential: more shards enable more parallel operations but increase memory overhead and lock management complexity. For most use cases, DashMap's standard API (insert, get, remove, entry) is safer and more ergonomic; shards() is for power users who need fine-grained control over the concurrent access patterns.
