How does indexmap::IndexMap maintain insertion order while still providing O(1) lookups?
IndexMap combines the best of two data structure worlds: the O(1) lookup performance of hash tables and the insertion-order iteration of sequences. This hybrid structure achieves its properties through a dual-indexing design that maintains both a hash table for fast lookups and a dense array for ordered iteration.
The Problem: HashMap's Unordered Nature
Standard HashMap provides O(1) lookups but makes no guarantees about iteration order:
use std::collections::HashMap;
let mut map = HashMap::new();
map.insert("first", 1);
map.insert("second", 2);
map.insert("third", 3);
// Iteration order is unpredictable
for (key, value) in &map {
println!("{}: {}", key, value);
}
// Output order varies based on hash values and internal layoutThe internal structure of HashMap organizes entries by hash buckets, not insertion order. Rehashing during growth can completely reorder the internal layout.
The IndexMap Solution: Dual Indices
IndexMap maintains two parallel data structures:
// Conceptual representation
struct IndexMap<K, V> {
entries: Vec<Entry<K, V>>, // Dense array for ordered iteration
indices: RawTable<usize>, // Hash table mapping hash -> index
}
struct Entry<K, V> {
hash: u64, // Cached hash for fast lookup
key: K,
value: V,
}The entries vector stores key-value pairs in insertion order, enabling ordered iteration. The indices hash table maps hash values to indices in the entries vector, enabling O(1) lookup.
Lookup Operation: Hash to Index to Entry
When you look up a key, IndexMap uses a two-step process:
use indexmap::IndexMap;
let mut map = IndexMap::new();
map.insert("apple", 1);
map.insert("banana", 2);
map.insert("cherry", 3);
// Lookup: "banana"
let value = map.get("banana");The lookup proceeds as follows:
- Hash the key: Compute
hash("banana") - Find the index: Look up the hash in
indicesto get the position inentries - Access the entry: Use the index to directly access
entries[position]
// Simplified lookup implementation
fn get(&self, key: &K) -> Option<&V> {
let hash = self.hash(key);
// Step 1: O(1) hash table lookup
let index = self.indices.get(hash, |&idx| {
// Verify the key matches (handle collisions)
self.entries[idx].key == key
})?;
// Step 2: O(1) array access
Some(&self.entries[index].value)
}Both steps are O(1), so the overall lookup is O(1) amortized.
Insertion Operation: Append and Index
Insertion adds entries at the end of the vector and creates a hash table entry:
use indexmap::IndexMap;
let mut map = IndexMap::new();
map.insert("first", 1); // entries: [("first", 1)]
// indices: { hash("first") -> 0 }
map.insert("second", 2); // entries: [("first", 1), ("second", 2)]
// indices: { hash("first") -> 0, hash("second") -> 1 }The process:
- Compute the hash of the key
- Check if the key already exists (hash table lookup)
- If new, append to
entriesand store the index inindices - If existing, update the value in place (preserving position)
// Simplified insertion
fn insert(&mut self, key: K, value: V) -> Option<V> {
let hash = self.hash(&key);
// Check if key exists
if let Some(index) = self.indices.get_mut(hash, |&idx| {
self.entries[idx].key == key
}) {
// Key exists: update value, return old
let old = std::mem::replace(&mut self.entries[*index].value, value);
return Some(old);
}
// New key: append and index
let index = self.entries.len();
self.entries.push(Entry { hash, key, value });
self.indices.insert(hash, index, |idx| {
self.entries[*idx].hash
});
None
}Ordered Iteration: Direct Vector Traversal
Iteration simply walks the entries vector:
use indexmap::IndexMap;
let mut map = IndexMap::new();
map.insert("a", 1);
map.insert("b", 2);
map.insert("c", 3);
// Iteration is in insertion order
for (key, value) in &map {
println!("{}: {}", key, value);
}
// Output: a: 1, b: 2, c: 3 (always)The iterator implementation:
// Iter iterates over the entries vector
impl<K, V> Iterator for Iter<'_, K, V> {
type Item = (&K, &V);
fn next(&mut self) -> Option<Self::Item> {
self.iter.next().map(|entry| (&entry.key, &entry.value))
}
}This is cache-friendly: entries are stored contiguously in memory, providing excellent locality for iteration.
The Challenge of Removal: Maintaining Order
Removal is where the design becomes clever. Simply removing an entry from entries would leave a gap, and shifting all subsequent entries would be O(n). Instead, IndexMap uses swap removal with index updates:
use indexmap::IndexMap;
let mut map = IndexMap::new();
map.insert("a", 1); // index 0
map.insert("b", 2); // index 1
map.insert("c", 3); // index 2
map.insert("d", 4); // index 3
map.remove("b"); // Remove entry at index 1
// After swap removal:
// entries: [("a", 1), ("d", 4), ("c", 3)]
// indices: { hash("a") -> 0, hash("d") -> 1, hash("c") -> 2 }The process:
- Find the index of the entry to remove (via hash table)
- Swap the entry with the last entry in the vector
- Update the index of the swapped entry in the hash table
- Pop the last entry (now the one to remove)
- Remove the hash table entry
// Simplified swap removal
fn remove(&mut self, key: &K) -> Option<V> {
let hash = self.hash(key);
// Find index of entry to remove
let index = self.indices.remove(hash, |&idx| {
self.entries[idx].key == *key
})?;
// If not the last entry, swap with last
let last = self.entries.len() - 1;
if index != last {
// Swap entries
self.entries.swap(index, last);
// Update index of the swapped entry
let swapped_hash = self.entries[index].hash;
self.indices.get_mut(swapped_hash, |&idx| idx == last)
.map(|idx| *idx = index);
}
// Pop the removed entry
let entry = self.entries.pop()?;
Some(entry.value)
}This approach keeps removal at O(1) amortized, but it changes the iteration order of remaining elements after a removal.
Preserving Order with Shift Removal
For cases where removal must preserve order, IndexMap provides shift_remove:
use indexmap::IndexMap;
let mut map = IndexMap::new();
map.insert("a", 1);
map.insert("b", 2);
map.insert("c", 3);
map.shift_remove("b"); // O(n) but preserves order
// entries: [("a", 1), ("c", 3)]
// Order preserved: a still before cThis shifts all subsequent entries and updates all their indices, which is O(n) but maintains strict insertion order.
Hash Collision Handling
Like any hash table, IndexMap handles collisions through probing. The indices table stores indices, and when multiple keys hash to the same bucket, the lookup probes until it finds the matching key:
use indexmap::IndexMap;
let mut map = IndexMap::new();
// These might hash to the same bucket (depending on hash function)
map.insert("key1", 1);
map.insert("key2", 2); // Potentially same bucket as key1
// Lookup probes until it finds the right key
let value = map.get("key2"); // Works correctly despite collisionThe hash table uses Robin Hood hashing or similar probing strategies to keep probe lengths short, maintaining O(1) expected lookup time.
Memory Layout Benefits
The dense entries vector provides cache-friendly iteration:
use indexmap::IndexMap;
let map: IndexMap<String, i32> = /* ... */;
// Cache-friendly: sequential memory access
let sum: i32 = map.values().sum();
// Compare to HashMap: scattered memory access
use std::collections::HashMap;
let map: HashMap<String, i32> = /* ... */;
let sum: i32 = map.values().sum(); // Potentially more cache missesStandard HashMap stores entries in buckets that may be scattered across memory, especially after rehashing. IndexMap's dense vector keeps entries contiguous.
Practical Usage Examples
Ordered Configuration
use indexmap::IndexMap;
let mut config = IndexMap::new();
config.insert("database_url", "localhost:5432");
config.insert("pool_size", "10");
config.insert("timeout", "30");
config.insert("retry_count", "3");
// Configuration is applied in the order specified
for (key, value) in &config {
apply_config(key, value);
}Preserving JSON Object Order
use indexmap::IndexMap;
use serde_json::{Value, to_string_pretty};
// JSON objects are represented as IndexMap in serde_json
let mut obj: IndexMap<String, Value> = IndexMap::new();
obj.insert("name".to_string(), "example".into());
obj.insert("version".to_string(), "1.0".into());
obj.insert("dependencies".to_string(), Value::Object(IndexMap::new()));
// Output preserves insertion order
let json = to_string_pretty(&obj)?;Ordered Command Processing
use indexmap::IndexMap;
struct CommandRegistry {
commands: IndexMap<String, Command>,
}
impl CommandRegistry {
fn execute_all(&self, context: &mut Context) {
// Commands execute in registration order
for (name, cmd) in &self.commands {
println!("Executing: {}", name);
cmd.execute(context);
}
}
}Comparison with Other Structures
| Operation | IndexMap | HashMap | BTreeMap | Vec<(K, V)> |
|---|---|---|---|---|
| Lookup | O(1) | O(1) | O(log n) | O(n) |
| Insert | O(1) | O(1) | O(log n) | O(1) |
| Remove (swap) | O(1) | O(1) | O(log n) | O(n) |
| Remove (shift) | O(n) | N/A | O(log n) | O(n) |
| Ordered iteration | Yes | No | Yes (sorted) | Yes |
| Memory locality | High | Medium | Medium | High |
Performance Trade-offs
The dual structure incurs some overhead:
use indexmap::IndexMap;
use std::collections::HashMap;
// IndexMap uses more memory: both indices and entries
// HashMap uses less: single table with entries
// Lookup comparison (conceptual):
// IndexMap: hash -> indices -> entries -> value (two indirections)
// HashMap: hash -> bucket -> value (one indirection)The extra indirection means IndexMap lookups are slightly slower than HashMap, but still O(1). The trade-off is worthwhile when you need ordered iteration.
Index-Based API
IndexMap exposes indices that can be used for efficient access:
use indexmap::IndexMap;
let mut map = IndexMap::new();
let idx_a = map.insert_full("a", 1).0; // Returns (index, old_value)
let idx_b = map.insert_full("b", 2).0;
// Direct index access
let value = map[idx_a]; // No hash computation needed
// Get index for a key
if let Some(idx) = map.get_index_of("b") {
let (key, value) = map.get_index(idx).unwrap();
println!("Index {}: {} = {}", idx, key, value);
}
// Remove by index
map.swap_remove_index(idx_a);This is useful when you frequently access specific entries without wanting to recompute hashes.
Synthesis
IndexMap achieves its dual guarantees through parallel data structures: a hash table (indices) for O(1) key-to-index mapping, and a dense vector (entries) for ordered storage. Lookups traverse both structures but each step is O(1), preserving hash table performance. Insertion appends to the vector and indexes in the hash table. Removal uses swap-remove to maintain O(1) performance, with an optional shift-remove for strict order preservation.
The design demonstrates that hash table and array properties aren't mutually exclusiveâyou can have O(1) lookups and ordered iteration by maintaining two indices into the same data. The cost is slightly higher memory usage and an extra indirection on lookups, but for many applicationsâconfiguration, serialization, command processingâthe benefits of ordered iteration outweigh these costs.
