What are the trade-offs between indexmap::IndexSet and HashSet for ordered unique collections?
indexmap::IndexSet preserves insertion order while maintaining uniqueness, providing O(1) lookup and O(n) iteration in insertion order, at the cost of additional memory overhead and slightly slower operations compared to std::collections::HashSet, which provides O(1) average lookup but iterates in an unspecified order determined by the internal hash table layout. The fundamental trade-off is order preservation versus raw performance: IndexSet uses a hash map with a parallel index vector to maintain insertion order, while HashSet uses a pure hash table that prioritizes memory efficiency and operation speed over predictable iteration. Choose IndexSet when you need deterministic iteration order, stable serialization, or position-based access; choose HashSet when you need maximum performance, minimal memory, and don't care about iteration order.
Basic Usage Comparison
use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
fn basic_usage() {
// HashSet: iteration order is unspecified
let mut std_set: StdHashSet<i32> = StdHashSet::new();
std_set.insert(3);
std_set.insert(1);
std_set.insert(2);
// Iteration order: unpredictable (depends on hashes)
// IndexSet: iteration preserves insertion order
let mut index_set: IndexSet<i32> = IndexSet::new();
index_set.insert(3);
index_set.insert(1);
index_set.insert(2);
// Iteration order: always [3, 1, 2]
// Both provide O(1) average lookup
assert!(std_set.contains(&1));
assert!(index_set.contains(&1));
}IndexSet guarantees insertion order; HashSet does not.
Iteration Order Guarantees
use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
fn iteration_order() {
// HashSet: order depends on hash values and internal state
let mut std_set: StdHashSet<&str> = StdHashSet::new();
std_set.insert("first");
std_set.insert("second");
std_set.insert("third");
// Order is implementation-defined and may vary between runs
for item in &std_set {
println!("{}", item); // Unpredictable order
}
// IndexSet: insertion order is preserved
let mut index_set: IndexSet<&str> = IndexSet::new();
index_set.insert("first");
index_set.insert("second");
index_set.insert("third");
// Order is always: "first", "second", "third"
for item in &index_set {
println!("{}", item); // Predictable order
}
// This matters for:
// - Serialization (JSON arrays with predictable order)
// - Testing (deterministic assertions)
// - UI rendering (consistent display order)
// - Configuration (preserve user-specified order)
}Deterministic iteration order is essential for reproducible behavior.
Performance Characteristics
use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
fn performance_comparison() {
// HashSet performance:
// - Insert: O(1) average
// - Lookup: O(1) average
// - Remove: O(1) average
// - Iteration: O(n)
// - Memory: Minimal (just hash table entries)
// IndexSet performance:
// - Insert: O(1) average (slightly slower due to index maintenance)
// - Lookup: O(1) average (slightly slower due to indirect indexing)
// - Remove: O(1) average for swap_remove, O(n) for shift_remove
// - Iteration: O(n) in insertion order
// - Memory: Extra overhead for order tracking
// HashSet is faster for pure set operations:
let mut std_set: StdHashSet<i32> = StdHashSet::new();
std_set.insert(42); // Slightly faster
// IndexSet has overhead for order maintenance:
let mut index_set: IndexSet<i32> = IndexSet::new();
index_set.insert(42); // Slightly slower, but maintains order
}HashSet has better raw performance; IndexSet has slower but still O(1) operations.
Memory Overhead
use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
use std::mem::size_of;
fn memory_comparison() {
// HashSet memory layout:
// - Hash table with buckets
// - Each entry: hash + key
// - No additional overhead for ordering
// IndexSet memory layout:
// - Hash table: maps key hash -> index
// - Dense index array: stores (key, hash) pairs in order
// - Additional bookkeeping for indices
// For 1000 i32 elements:
// HashSet: ~16-24 bytes per element (varies by load factor)
// IndexSet: ~24-32 bytes per element (extra index overhead)
// The overhead comes from:
// 1. Parallel arrays for keys and hashes
// 2. Index mapping for O(1) position lookup
// 3. Additional pointer indirection
// Memory trade-off:
// - HashSet: Lower memory, unpredictable iteration
// - IndexSet: Higher memory, predictable iteration
}IndexSet has 20-50% more memory overhead than HashSet.
Position-Based Access
use indexmap::IndexSet;
fn position_access() {
let mut set: IndexSet<&str> = IndexSet::new();
set.insert("first");
set.insert("second");
set.insert("third");
// IndexSet provides position-based access
// HashSet does not support this
// Get element at index
assert_eq!(set.get_index(0), Some(&"first"));
assert_eq!(set.get_index(1), Some(&"second"));
assert_eq!(set.get_index(2), Some(&"third"));
assert_eq!(set.get_index(3), None); // Out of bounds
// Find index of element
assert_eq!(set.get_index_of(&"second"), Some(1));
assert_eq!(set.get_index_of(&"nonexistent"), None);
// Replace element at index
set.insert(0usize, "replaced"); // Insert at index 0
// Can't do this with HashSet
}IndexSet allows treating the set as a list with position indices.
Ordered Set Operations
use indexmap::IndexSet;
fn ordered_operations() {
let mut set: IndexSet<i32> = IndexSet::new();
set.insert(1);
set.insert(2);
set.insert(3);
// Insert at specific position
set.insert(1usize, 10); // Insert 10 at index 1
// Now: [1, 10, 2, 3]
// Remove by index
set.swap_remove_index(1); // Remove at index 1, swap last element
// Now: [1, 2, 3] (last element swapped in)
// Remove by index with shift
set.shift_remove_index(1); // Remove at index 1, shift remaining
// Now: [1, 3] (preserves order of remaining elements)
// HashSet only supports remove by value
// No position-based operations available
}IndexSet supports position-based insertions and removals.
Serialization Determinism
use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
use serde::{Serialize, Deserialize};
#[derive(Serialize, Deserialize)]
struct Config {
// HashSet: JSON order is unpredictable
allowed_users: StdHashSet<String>,
// IndexSet: JSON order is deterministic
ordered_users: IndexSet<String>,
}
fn serialization() {
let mut config = Config {
allowed_users: StdHashSet::new(),
ordered_users: IndexSet::new(),
};
config.allowed_users.insert("alice".to_string());
config.allowed_users.insert("bob".to_string());
config.ordered_users.insert("alice".to_string());
config.ordered_users.insert("bob".to_string());
// HashSet serialization:
// {"allowed_users":["bob","alice"]} // Order varies
// IndexSet serialization:
// {"ordered_users":["alice","bob"]} // Order preserved
// Deterministic serialization is important for:
// - Configuration files (diff-friendly)
// - API responses (consistent for testing)
// - Hash verification (same content = same output)
}IndexSet produces deterministic serialized output; HashSet does not.
Swap Remove vs Shift Remove
use indexmap::IndexSet;
fn removal_strategies() {
let mut set: IndexSet<i32> = IndexSet::new();
set.insert(1);
set.insert(2);
set.insert(3);
set.insert(4);
// [1, 2, 3, 4]
// swap_remove_index: O(1) but changes order
set.swap_remove_index(1); // Remove element at index 1 (value 2)
// [1, 4, 3] - last element (4) swapped into position
// shift_remove_index: O(n) but preserves order
let mut set2: IndexSet<i32> = IndexSet::new();
set2.insert(1);
set2.insert(2);
set2.insert(3);
set2.insert(4);
set2.shift_remove_index(1); // Remove element at index 1
// [1, 3, 4] - remaining elements shifted
// Trade-off:
// - swap_remove: Fast, but breaks iteration order
// - shift_remove: Preserves order, but O(n) cost
// HashSet doesn't have this distinction - all removals are O(1)
}IndexSet offers two removal strategies with different trade-offs.
Use Cases for IndexSet
use indexmap::IndexSet;
fn index_set_use_cases() {
// Use IndexSet when you need:
// 1. Deterministic iteration order
let mut ordered: IndexSet<&str> = IndexSet::new();
ordered.insert("first");
ordered.insert("second");
// Always iterates: first, second
// 2. Position-based access
ordered.insert(0, "new-first"); // Insert at position
let first = ordered.get_index(0); // Get by position
// 3. Serialization with predictable order
// JSON: ["first", "second", "third"]
// Consistent across runs
// 4. Preserve user-specified order
// Configuration: user's priority order
// Processing: operation order matters
// 5. Index-based iteration
for (idx, item) in ordered.iter().enumerate() {
println!("{}: {}", idx, item);
}
// 6. Set operations while maintaining order
let mut a: IndexSet<i32> = IndexSet::new();
let mut b: IndexSet<i32> = IndexSet::new();
// Union, intersection preserve left operand's order
}Choose IndexSet when order matters or you need position access.
Use Cases for HashSet
use std::collections::HashSet;
fn hash_set_use_cases() {
// Use HashSet when you need:
// 1. Maximum performance
let mut set: HashSet<i32> = HashSet::new();
set.insert(42); // Slightly faster than IndexSet
// 2. Minimal memory usage
// Lower overhead, better cache locality
// 3. Order doesn't matter
// Just need uniqueness and fast lookup
// 4. Large collections
// Memory savings compound at scale
// 5. Pure set operations
// Membership testing, union, intersection
// Don't care about iteration order
// 6. Standard library ecosystem
// HashSet is in std, no external dependency
// Example: Deduplication
let items = vec![1, 2, 2, 3, 3, 3];
let unique: HashSet<i32> = items.into_iter().collect();
// Order doesn't matter, just need uniqueness
// Example: Cache/memoization
// Don't care about order, just fast lookup
let mut cache: HashSet<String> = HashSet::new();
cache.insert("cached_result".to_string());
}Choose HashSet for performance-critical code where order doesn't matter.
Set Operations Comparison
use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
fn set_operations() {
// Both support standard set operations
// HashSet
let mut a: StdHashSet<i32> = StdHashSet::from([1, 2, 3]);
let b: StdHashSet<i32> = StdHashSet::from([2, 3, 4]);
// Union
let union: StdHashSet<i32> = a.union(&b).copied().collect();
// Order unspecified
// Intersection
let intersection: StdHashSet<i32> = a.intersection(&b).copied().collect();
// Order unspecified
// IndexSet
let mut c: IndexSet<i32> = IndexSet::from([1, 2, 3]);
let d: IndexSet<i32> = IndexSet::from([2, 3, 4]);
// Union preserves order of left operand, then right
let union: IndexSet<i32> = c.union(&d).copied().collect();
// [1, 2, 3, 4] - c's order, then d's new elements
// Intersection preserves left operand's order
let intersection: IndexSet<i32> = c.intersection(&d).copied().collect();
// [2, 3] - c's order for common elements
}Set operations on IndexSet preserve order; HashSet does not.
API Differences
use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
fn api_comparison() {
// Common operations (both have):
// - insert(value) -> bool
// - contains(value) -> bool
// - remove(value) -> bool
// - len() -> usize
// - is_empty() -> bool
// - iter() -> Iterator
// - union, intersection, difference, symmetric_difference
// IndexSet additional operations:
let mut set: IndexSet<i32> = IndexSet::new();
// Position-based access
set.get_index(0); // Option<&T>
set.get_index_of(&value); // Option<usize>
// Position-based modification
set.insert(0usize, value); // Insert at position
set.swap_remove_index(0); // O(1) remove, changes order
set.shift_remove_index(0); // O(n) remove, preserves order
// Replace at index
set.replace_index(0, new_value); // Returns old value
// Move element to different position
set.move_index(0, 2); // Move from index 0 to 2
// Index-based iteration
for (idx, value) in set.iter().enumerate() {
// idx is guaranteed to be sequential
}
// HashSet has none of these position-based operations
}IndexSet provides position-based operations that HashSet lacks.
Hash Table Implementation
use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
fn implementation_details() {
// HashSet implementation:
// - Raw hash table with buckets
// - Each bucket contains hash + key
// - Positions determined by hash function
// - Iteration walks bucket array (order from hash distribution)
// IndexSet implementation:
// - Hash table maps hash -> index in entries array
// - Dense entries array stores (key, hash) in insertion order
// - Lookup: hash -> index -> entry
// - Iteration: walk entries array (insertion order)
// Why IndexSet is slower:
// - Extra indirection for lookups
// - Two data structures to maintain
// - More cache misses (indirect access)
// Why IndexSet preserves order:
// - Entries array is append-only (except removals)
// - Order is explicit, not determined by hash
// Memory layout:
// HashSet: [hash|key] entries in buckets
// IndexSet: hash_table[hash->index] + entries[(key,hash)]
}The implementation differences explain the performance trade-offs.
Benchmarks and When to Switch
use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
fn decision_guide() {
// Switch from HashSet to IndexSet when:
// 1. Tests fail due to non-deterministic iteration
// for item in &set { assert!(...) } // Flaky with HashSet
// 2. Serialization needs to be reproducible
// JSON should be identical across runs
// 3. You need position-based operations
// set.get_index(i) or set.move_index(from, to)
// 4. Order is part of the data model
// User-specified order, priority order, etc.
// Keep using HashSet when:
// 1. Order is irrelevant
// Just need uniqueness and lookup
// 2. Memory is constrained
// Large collections, embedded systems
// 3. Performance is critical
// Hot paths, tight loops
// 4. Standard library preference
// No external dependencies
// Rule of thumb:
// - Default to HashSet
// - Switch to IndexSet when order matters
}Default to HashSet, switch to IndexSet when ordering requirements emerge.
Synthesis
Performance trade-offs:
// HashSet:
// - Insert: O(1) average (fastest)
// - Lookup: O(1) average (fastest)
// - Remove: O(1) average
// - Iteration: O(n) in hash order (unpredictable)
// - Memory: Minimal overhead
// IndexSet:
// - Insert: O(1) average (slightly slower)
// - Lookup: O(1) average (slightly slower)
// - Remove: O(1) swap, O(n) shift
// - Iteration: O(n) in insertion order (predictable)
// - Memory: Additional overhead for order trackingOrder guarantees:
// HashSet: No order guarantees
// - Iteration order depends on hash values
// - May change between runs, Rust versions
// - Not suitable for serialization requiring determinism
// IndexSet: Insertion order preserved
// - First inserted element is at index 0
// - Iteration order is insertion order
// - Suitable for deterministic serializationAPI differences:
// HashSet:
// - Standard set operations
// - No position-based access
// - No index operations
// IndexSet:
// - Standard set operations
// - get_index(idx): Position-based access
// - get_index_of(&val): Reverse lookup
// - insert(idx, val): Insert at position
// - swap_remove_index(idx): Fast removal
// - shift_remove_index(idx): Order-preserving removal
// - move_index(from, to): Reorder elementsKey insight: indexmap::IndexSet and std::collections::HashSet serve the same fundamental purposeâunique element storage with O(1) lookupâbut differ in their approach to ordering. HashSet is a pure hash table that prioritizes performance and memory efficiency, accepting that iteration order is implementation-defined and may vary. IndexSet is a hybrid data structure combining a hash table with a dense array to maintain insertion order, paying a performance and memory cost for this guarantee. The choice is architectural: if your code relies on iteration order, position access, or deterministic serialization, use IndexSet. If you're computing set membership without caring about order, HashSet is the better choice. A common pattern is to start with HashSet and migrate to IndexSet when ordering requirements become apparentâfor example, when serialization produces non-deterministic JSON or when tests become flaky due to iteration order.
