What are the performance trade-offs between `dashmap::DashMap` sharding strategy vs `RwLock<HashMap>`?

Concurrent access to shared maps requires synchronization, and the choice of strategy dramatically affects performance under contention. DashMap uses sharding to reduce lock contention, while RwLock<HashMap> uses a single lock for the entire map. Understanding their trade-offs helps you choose the right approach for your workload.

The Baseline: RwLock

The straightforward approach wraps a HashMap in an RwLock:

use std::collections::HashMap;
use std::sync::RwLock;
 
struct SharedMap<K, V> {
    map: RwLock<HashMap<K, V>>,
}
 
impl<K, V> SharedMap<K, V> 
where 
    K: Eq + std::hash::Hash + Clone,
    V: Clone,
{
    fn get(&self, key: &K) -> Option<V> {
        let map = self.map.read().unwrap();
        map.get(key).cloned()
    }
    
    fn insert(&self, key: K, value: V) -> Option<V> {
        let mut map = self.map.write().unwrap();
        map.insert(key, value)
    }
}

Every operation acquires either a read lock or a write lock on the single RwLock. While multiple readers can proceed concurrently, any writer blocks all readers and other writers.

DashMap's Sharding Strategy

DashMap divides the map into multiple independent shards, each with its own lock:

use dashmap::DashMap;
 
let map: DashMap<String, i32> = DashMap::new();
 
// Each shard has its own lock
// Operations only lock the relevant shard
map.insert("key1".to_string(), 1);  // Locks shard N
map.insert("key2".to_string(), 2);  // Locks shard M (possibly different)
 
// Reads also only lock one shard
if let Some(value) = map.get("key1") {
    println!("Value: {}", value);
}

The number of shards defaults to the number of CPU cores, allowing concurrent operations on different shards to proceed without interference.

How Sharding Reduces Contention

Consider a scenario with 16 concurrent writers:

use std::sync::{Arc, RwLock};
use std::collections::HashMap;
use dashmap::DashMap;
use std::thread;
 
// RwLock<HashMap>: All 16 writers serialize on single lock
let map: Arc<RwLock<HashMap<u64, u64>>> = Arc::new(RwLock::new(HashMap::new()));
 
let handles: Vec<_> = (0..16)
    .map(|i| {
        let map = Arc::clone(&map);
        thread::spawn(move || {
            for j in 0..1000 {
                let mut map = map.write().unwrap();  // Blocks all other writers
                map.insert(i * 1000 + j, j);
            }
        })
    })
    .collect();
 
// DashMap: Writers distribute across shards
let map: Arc<DashMap<u64, u64>> = Arc::new(DashMap::new());
 
let handles: Vec<_> = (0..16)
    .map(|i| {
        let map = Arc::clone(&map);
        thread::spawn(move || {
            for j in 0..1000 {
                map.insert(i * 1000 + j, j);  // Only locks one shard
            }
        })
    })
    .collect();

With RwLock<HashMap>, all 16 threads compete for a single lock, creating a serialization bottleneck. With DashMap, threads writing to different shards proceed concurrently.

Memory Overhead Comparison

DashMap trades memory for concurrency:

use dashmap::DashMap;
use std::collections::HashMap;
use std::mem::size_of;
 
// Each shard has its own HashMap with separate allocation overhead
// Default shards = number of CPU cores
 
// Memory overhead per shard:
// - HashMap structure (~48 bytes on 64-bit)
// - Lock/RwLock structure
// - Separate allocation for buckets
 
// Total overhead ≈ shards * (HashMap overhead + lock overhead + allocation)

RwLock<HashMap> has a single allocation and lock overhead. For small maps, DashMap's overhead can be significant:

// Small map with high shard count
let small_map: DashMap<u64, u64> = DashMap::new();
// If shards = 16 and you store 4 entries...
// You might have 16 shards with ~empty HashMaps, each pre-allocating buckets
 
// RwLock is more memory-efficient for small maps
let small_map: RwLock<HashMap<u64, u64>> = RwLock::new(HashMap::new());

Read-Heavy vs Write-Heavy Workloads

The trade-offs differ based on read/write patterns:

Read-Heavy Workloads

use std::sync::{Arc, RwLock};
use std::collections::HashMap;
use dashmap::DashMap;
 
// RwLock allows concurrent reads
let map: Arc<RwLock<HashMap<u64, u64>>> = /* ... */;
 
// Multiple threads can hold read locks simultaneously
// Good for read-heavy workloads
 
// DashMap also allows concurrent reads (within same shard)
let map: Arc<DashMap<u64, u64>> = /* ... */;
 
// Reads lock only one shard
// Other shards can be read/written concurrently

For pure read-heavy workloads, RwLock<HashMap> performs well because multiple readers don't block each other. However, any write completely blocks all readers.

Write-Heavy Workloads

// RwLock: Writers block everything
let map: Arc<RwLock<HashMap<u64, u64>>> = /* ... */;
 
// Each write lock blocks all readers and other writers
// Severe contention under write-heavy load
 
// DashMap: Writers only block operations on the same shard
let map: Arc<DashMap<u64, u64>> = /* ... */;
 
// Writes to different shards proceed concurrently
// Much better throughput under write-heavy load

DashMap excels in write-heavy scenarios where RwLock<HashMap> would serialize all operations.

Lock Granularity and Hot Keys

Sharding has a weakness: hot keys that hash to the same shard:

use dashmap::DashMap;
use std::sync::Arc;
use std::thread;
 
let map: Arc<DashMap<u64, u64>> = Arc::new(DashMap::new());
 
// If many threads access the same key, they all contend on one shard
let handles: Vec<_> = (0..16)
    .map(|_| {
        let map = Arc::clone(&map);
        thread::spawn(move || {
            for _ in 0..1000 {
                // All threads hit the same key
                // All contend on the same shard's lock
                *map.entry(42).or_insert(0) += 1;
            }
        })
    })
    .collect();

With RwLock<HashMap>, hot keys don't cause additional contention beyond what's already present from the single lock.

DashMap's Iteration Guarantees

Iterating over a DashMap requires locking each shard:

use dashmap::DashMap;
 
let map: DashMap<u64, u64> = /* ... */;
 
// Iteration locks shards one at a time
for entry in map.iter() {
    println!("{}: {}", entry.key(), entry.value());
    // Each shard is locked as the iterator reaches it
    // Other operations on unlocked shards can proceed
}

This is different from RwLock<HashMap> where iteration requires a single lock:

use std::sync::RwLock;
use std::collections::HashMap;
 
let map: RwLock<HashMap<u64, u64>> = /* ... */;
 
// Iteration holds one lock for the entire duration
let map = map.read().unwrap();
for (key, value) in map.iter() {
    println!("{}: {}", key, value);
    // No other operations can proceed during iteration
}

Entry API and Atomic Operations

DashMap provides sharded entry API for atomic operations:

use dashmap::DashMap;
 
let map = DashMap::new();
 
// Atomic update: get or insert
map.entry("key".to_string())
    .or_insert_with(|| expensive_computation());
 
// Atomic modification
map.entry("counter".to_string())
    .and_modify(|v| *v += 1)
    .or_insert(1);
 
// With RwLock<HashMap>, you'd need a write lock for these operations

With RwLock<HashMap>, atomic operations require a write lock:

use std::sync::RwLock;
use std::collections::HashMap;
 
let map: RwLock<HashMap<String, i32>> = RwLock::new(HashMap::new());
 
// Requires write lock
{
    let mut map = map.write().unwrap();
    map.entry("counter".to_string())
        .and_modify(|v| *v += 1)
        .or_insert(1);
}

Configuring Shard Count

DashMap allows customizing shard count:

use dashmap::DashMap;
 
// Default: CPU cores
let default_map: DashMap<u64, u64> = DashMap::new();
 
// With shard count (must be power of 2)
let custom_map: DashMap<u64, u64> = DashMap::with_shard_amount(64);
 
// Fewer shards for small maps to reduce overhead
let small_map: DashMap<u64, u64> = DashMap::with_shard_amount(4);

Choosing the right shard count is a trade-off:

More shards: Higher concurrency, more memory overhead
Fewer shards: Less memory overhead, more potential contention

Benchmarking the Difference

Here's a conceptual benchmark comparing the two approaches:

use std::sync::{Arc, RwLock};
use std::collections::HashMap;
use dashmap::DashMap;
use std::thread;
use std::time::Instant;
 
fn benchmark_rwlock(threads: usize, ops_per_thread: usize) -> u64 {
    let map: Arc<RwLock<HashMap<u64, u64>>> = Arc::new(RwLock::new(HashMap::new()));
    
    let start = Instant::now();
    let handles: Vec<_> = (0..threads)
        .map(|i| {
            let map = Arc::clone(&map);
            thread::spawn(move || {
                for j in 0..ops_per_thread {
                    let key = (i as u64) * 1000000 + j as u64;
                    let mut map = map.write().unwrap();
                    map.insert(key, j as u64);
                }
            })
        })
        .collect();
    
    for h in handles {
        h.join().unwrap();
    }
    
    start.elapsed().as_millis() as u64
}
 
fn benchmark_dashmap(threads: usize, ops_per_thread: usize) -> u64 {
    let map: Arc<DashMap<u64, u64>> = Arc::new(DashMap::new());
    
    let start = Instant::now();
    let handles: Vec<_> = (0..threads)
        .map(|i| {
            let map = Arc::clone(&map);
            thread::spawn(move || {
                for j in 0..ops_per_thread {
                    let key = (i as u64) * 1000000 + j as u64;
                    map.insert(key, j as u64);
                }
            })
        })
        .collect();
    
    for h in handles {
        h.join().unwrap();
    }
    
    start.elapsed().as_millis() as u64
}
 
// Typical results (write-heavy, uniform key distribution):
// threads=1:  RwLock ~= DashMap (no contention)
// threads=4:  RwLock ~2-3x slower than DashMap
// threads=16: RwLock ~5-10x slower than DashMap
// threads=32: RwLock ~10-20x slower than DashMap

When to Use RwLock

RwLock<HashMap> is appropriate when:

// 1. Small maps with low contention
struct Config {
    values: RwLock<HashMap<String, String>>,
}
 
// 2. Read-heavy workloads with infrequent writes
struct Cache {
    data: RwLock<HashMap<String, Arc<CachedValue>>>,
}
 
// 3. You need to iterate over the entire map while blocking all modifications
let map = map.read().unwrap();
let total: u64 = map.values().sum();  // Consistent snapshot
 
// 4. Memory is constrained
// DashMap has higher overhead for small maps

When to Use DashMap

DashMap is appropriate when:

// 1. High write contention from multiple threads
struct Counters {
    values: DashMap<String, AtomicU64>,
}
 
// 2. Large maps accessed by many threads
struct SessionStore {
    sessions: DashMap<SessionId, Session>,
}
 
// 3. You need atomic entry operations
map.entry(key)
    .or_insert_with(|| compute_value());
 
// 4. You need concurrent access without blocking all readers for writes
// Reading shard A doesn't block writing to shard B

Hybrid Approaches

Sometimes a combination works best:

use dashmap::DashMap;
use std::sync::RwLock;
use std::collections::HashMap;
 
// Sharded map for high-contention keys
struct AdaptiveMap<K, V> {
    hot: DashMap<K, V>,      // High-contention keys
    cold: RwLock<HashMap<K, V>>,  // Low-contention keys
}
 
// Or use RwLock<HashMap> for small working sets that move between threads
struct ThreadLocalCache {
    local: std::cell::RefCell<HashMap<Key, Value>>,
    shared: DashMap<Key, Value>,
}

Synthesis

The choice between DashMap and RwLock<HashMap> depends on your contention pattern:

DashMap excels when you have many concurrent writers or mixed read/write workloads with uniform key distribution. The sharding strategy allows operations on different keys to proceed in parallel, providing near-linear scaling up to the shard count.

RwLock<HashMap> is better for small maps, read-heavy workloads with infrequent writes, or when you need consistent snapshots during iteration. Its simpler structure has lower memory overhead and no hot-key penalty.

The key insight is that sharding trades memory and complexity for concurrency. If your workload doesn't benefit from concurrent access to different keys—whether because contention is low or keys aren't well-distributed—the trade-off isn't worthwhile. But for high-contention scenarios with diverse keys, DashMap's sharding provides substantial throughput improvements over a single lock.

What are the performance trade-offs between dashmap::DashMap sharding strategy vs RwLock<HashMap>?