What are the trade-offs between `indexmap::IndexSet` and `HashSet` for ordered unique collections?

indexmap::IndexSet preserves insertion order while maintaining uniqueness, providing O(1) lookup and O(n) iteration in insertion order, at the cost of additional memory overhead and slightly slower operations compared to std::collections::HashSet, which provides O(1) average lookup but iterates in an unspecified order determined by the internal hash table layout. The fundamental trade-off is order preservation versus raw performance: IndexSet uses a hash map with a parallel index vector to maintain insertion order, while HashSet uses a pure hash table that prioritizes memory efficiency and operation speed over predictable iteration. Choose IndexSet when you need deterministic iteration order, stable serialization, or position-based access; choose HashSet when you need maximum performance, minimal memory, and don't care about iteration order.

Basic Usage Comparison

use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
 
fn basic_usage() {
    // HashSet: iteration order is unspecified
    let mut std_set: StdHashSet<i32> = StdHashSet::new();
    std_set.insert(3);
    std_set.insert(1);
    std_set.insert(2);
    // Iteration order: unpredictable (depends on hashes)
    
    // IndexSet: iteration preserves insertion order
    let mut index_set: IndexSet<i32> = IndexSet::new();
    index_set.insert(3);
    index_set.insert(1);
    index_set.insert(2);
    // Iteration order: always [3, 1, 2]
    
    // Both provide O(1) average lookup
    assert!(std_set.contains(&1));
    assert!(index_set.contains(&1));
}

IndexSet guarantees insertion order; HashSet does not.

Iteration Order Guarantees

use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
 
fn iteration_order() {
    // HashSet: order depends on hash values and internal state
    let mut std_set: StdHashSet<&str> = StdHashSet::new();
    std_set.insert("first");
    std_set.insert("second");
    std_set.insert("third");
    
    // Order is implementation-defined and may vary between runs
    for item in &std_set {
        println!("{}", item);  // Unpredictable order
    }
    
    // IndexSet: insertion order is preserved
    let mut index_set: IndexSet<&str> = IndexSet::new();
    index_set.insert("first");
    index_set.insert("second");
    index_set.insert("third");
    
    // Order is always: "first", "second", "third"
    for item in &index_set {
        println!("{}", item);  // Predictable order
    }
    
    // This matters for:
    // - Serialization (JSON arrays with predictable order)
    // - Testing (deterministic assertions)
    // - UI rendering (consistent display order)
    // - Configuration (preserve user-specified order)
}

Deterministic iteration order is essential for reproducible behavior.

Performance Characteristics

use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
 
fn performance_comparison() {
    // HashSet performance:
    // - Insert: O(1) average
    // - Lookup: O(1) average
    // - Remove: O(1) average
    // - Iteration: O(n)
    // - Memory: Minimal (just hash table entries)
    
    // IndexSet performance:
    // - Insert: O(1) average (slightly slower due to index maintenance)
    // - Lookup: O(1) average (slightly slower due to indirect indexing)
    // - Remove: O(1) average for swap_remove, O(n) for shift_remove
    // - Iteration: O(n) in insertion order
    // - Memory: Extra overhead for order tracking
    
    // HashSet is faster for pure set operations:
    let mut std_set: StdHashSet<i32> = StdHashSet::new();
    std_set.insert(42);  // Slightly faster
    
    // IndexSet has overhead for order maintenance:
    let mut index_set: IndexSet<i32> = IndexSet::new();
    index_set.insert(42);  // Slightly slower, but maintains order
}

HashSet has better raw performance; IndexSet has slower but still O(1) operations.

Memory Overhead

use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
use std::mem::size_of;
 
fn memory_comparison() {
    // HashSet memory layout:
    // - Hash table with buckets
    // - Each entry: hash + key
    // - No additional overhead for ordering
    
    // IndexSet memory layout:
    // - Hash table: maps key hash -> index
    // - Dense index array: stores (key, hash) pairs in order
    // - Additional bookkeeping for indices
    
    // For 1000 i32 elements:
    // HashSet: ~16-24 bytes per element (varies by load factor)
    // IndexSet: ~24-32 bytes per element (extra index overhead)
    
    // The overhead comes from:
    // 1. Parallel arrays for keys and hashes
    // 2. Index mapping for O(1) position lookup
    // 3. Additional pointer indirection
    
    // Memory trade-off:
    // - HashSet: Lower memory, unpredictable iteration
    // - IndexSet: Higher memory, predictable iteration
}

IndexSet has 20-50% more memory overhead than HashSet.

Position-Based Access

use indexmap::IndexSet;
 
fn position_access() {
    let mut set: IndexSet<&str> = IndexSet::new();
    set.insert("first");
    set.insert("second");
    set.insert("third");
    
    // IndexSet provides position-based access
    // HashSet does not support this
    
    // Get element at index
    assert_eq!(set.get_index(0), Some(&"first"));
    assert_eq!(set.get_index(1), Some(&"second"));
    assert_eq!(set.get_index(2), Some(&"third"));
    assert_eq!(set.get_index(3), None);  // Out of bounds
    
    // Find index of element
    assert_eq!(set.get_index_of(&"second"), Some(1));
    assert_eq!(set.get_index_of(&"nonexistent"), None);
    
    // Replace element at index
    set.insert(0usize, "replaced");  // Insert at index 0
    // Can't do this with HashSet
}

IndexSet allows treating the set as a list with position indices.

Ordered Set Operations

use indexmap::IndexSet;
 
fn ordered_operations() {
    let mut set: IndexSet<i32> = IndexSet::new();
    set.insert(1);
    set.insert(2);
    set.insert(3);
    
    // Insert at specific position
    set.insert(1usize, 10);  // Insert 10 at index 1
    // Now: [1, 10, 2, 3]
    
    // Remove by index
    set.swap_remove_index(1);  // Remove at index 1, swap last element
    // Now: [1, 2, 3] (last element swapped in)
    
    // Remove by index with shift
    set.shift_remove_index(1);  // Remove at index 1, shift remaining
    // Now: [1, 3] (preserves order of remaining elements)
    
    // HashSet only supports remove by value
    // No position-based operations available
}

IndexSet supports position-based insertions and removals.

Serialization Determinism

use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
use serde::{Serialize, Deserialize};
 
#[derive(Serialize, Deserialize)]
struct Config {
    // HashSet: JSON order is unpredictable
    allowed_users: StdHashSet<String>,
    
    // IndexSet: JSON order is deterministic
    ordered_users: IndexSet<String>,
}
 
fn serialization() {
    let mut config = Config {
        allowed_users: StdHashSet::new(),
        ordered_users: IndexSet::new(),
    };
    
    config.allowed_users.insert("alice".to_string());
    config.allowed_users.insert("bob".to_string());
    
    config.ordered_users.insert("alice".to_string());
    config.ordered_users.insert("bob".to_string());
    
    // HashSet serialization:
    // {"allowed_users":["bob","alice"]}  // Order varies
    
    // IndexSet serialization:
    // {"ordered_users":["alice","bob"]}  // Order preserved
    
    // Deterministic serialization is important for:
    // - Configuration files (diff-friendly)
    // - API responses (consistent for testing)
    // - Hash verification (same content = same output)
}

IndexSet produces deterministic serialized output; HashSet does not.

Swap Remove vs Shift Remove

use indexmap::IndexSet;
 
fn removal_strategies() {
    let mut set: IndexSet<i32> = IndexSet::new();
    set.insert(1);
    set.insert(2);
    set.insert(3);
    set.insert(4);
    // [1, 2, 3, 4]
    
    // swap_remove_index: O(1) but changes order
    set.swap_remove_index(1);  // Remove element at index 1 (value 2)
    // [1, 4, 3] - last element (4) swapped into position
    
    // shift_remove_index: O(n) but preserves order
    let mut set2: IndexSet<i32> = IndexSet::new();
    set2.insert(1);
    set2.insert(2);
    set2.insert(3);
    set2.insert(4);
    
    set2.shift_remove_index(1);  // Remove element at index 1
    // [1, 3, 4] - remaining elements shifted
    
    // Trade-off:
    // - swap_remove: Fast, but breaks iteration order
    // - shift_remove: Preserves order, but O(n) cost
    
    // HashSet doesn't have this distinction - all removals are O(1)
}

IndexSet offers two removal strategies with different trade-offs.

Use Cases for IndexSet

use indexmap::IndexSet;
 
fn index_set_use_cases() {
    // Use IndexSet when you need:
    
    // 1. Deterministic iteration order
    let mut ordered: IndexSet<&str> = IndexSet::new();
    ordered.insert("first");
    ordered.insert("second");
    // Always iterates: first, second
    
    // 2. Position-based access
    ordered.insert(0, "new-first");  // Insert at position
    let first = ordered.get_index(0);  // Get by position
    
    // 3. Serialization with predictable order
    // JSON: ["first", "second", "third"]
    // Consistent across runs
    
    // 4. Preserve user-specified order
    // Configuration: user's priority order
    // Processing: operation order matters
    
    // 5. Index-based iteration
    for (idx, item) in ordered.iter().enumerate() {
        println!("{}: {}", idx, item);
    }
    
    // 6. Set operations while maintaining order
    let mut a: IndexSet<i32> = IndexSet::new();
    let mut b: IndexSet<i32> = IndexSet::new();
    // Union, intersection preserve left operand's order
}

Choose IndexSet when order matters or you need position access.

Use Cases for HashSet

use std::collections::HashSet;
 
fn hash_set_use_cases() {
    // Use HashSet when you need:
    
    // 1. Maximum performance
    let mut set: HashSet<i32> = HashSet::new();
    set.insert(42);  // Slightly faster than IndexSet
    
    // 2. Minimal memory usage
    // Lower overhead, better cache locality
    
    // 3. Order doesn't matter
    // Just need uniqueness and fast lookup
    
    // 4. Large collections
    // Memory savings compound at scale
    
    // 5. Pure set operations
    // Membership testing, union, intersection
    // Don't care about iteration order
    
    // 6. Standard library ecosystem
    // HashSet is in std, no external dependency
    
    // Example: Deduplication
    let items = vec![1, 2, 2, 3, 3, 3];
    let unique: HashSet<i32> = items.into_iter().collect();
    // Order doesn't matter, just need uniqueness
    
    // Example: Cache/memoization
    // Don't care about order, just fast lookup
    let mut cache: HashSet<String> = HashSet::new();
    cache.insert("cached_result".to_string());
}

Choose HashSet for performance-critical code where order doesn't matter.

Set Operations Comparison

use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
 
fn set_operations() {
    // Both support standard set operations
    
    // HashSet
    let mut a: StdHashSet<i32> = StdHashSet::from([1, 2, 3]);
    let b: StdHashSet<i32> = StdHashSet::from([2, 3, 4]);
    
    // Union
    let union: StdHashSet<i32> = a.union(&b).copied().collect();
    // Order unspecified
    
    // Intersection
    let intersection: StdHashSet<i32> = a.intersection(&b).copied().collect();
    // Order unspecified
    
    // IndexSet
    let mut c: IndexSet<i32> = IndexSet::from([1, 2, 3]);
    let d: IndexSet<i32> = IndexSet::from([2, 3, 4]);
    
    // Union preserves order of left operand, then right
    let union: IndexSet<i32> = c.union(&d).copied().collect();
    // [1, 2, 3, 4] - c's order, then d's new elements
    
    // Intersection preserves left operand's order
    let intersection: IndexSet<i32> = c.intersection(&d).copied().collect();
    // [2, 3] - c's order for common elements
}

Set operations on IndexSet preserve order; HashSet does not.

API Differences

use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
 
fn api_comparison() {
    // Common operations (both have):
    // - insert(value) -> bool
    // - contains(value) -> bool
    // - remove(value) -> bool
    // - len() -> usize
    // - is_empty() -> bool
    // - iter() -> Iterator
    // - union, intersection, difference, symmetric_difference
    
    // IndexSet additional operations:
    let mut set: IndexSet<i32> = IndexSet::new();
    
    // Position-based access
    set.get_index(0);           // Option<&T>
    set.get_index_of(&value);   // Option<usize>
    
    // Position-based modification
    set.insert(0usize, value);   // Insert at position
    set.swap_remove_index(0);    // O(1) remove, changes order
    set.shift_remove_index(0);   // O(n) remove, preserves order
    
    // Replace at index
    set.replace_index(0, new_value);  // Returns old value
    
    // Move element to different position
    set.move_index(0, 2);       // Move from index 0 to 2
    
    // Index-based iteration
    for (idx, value) in set.iter().enumerate() {
        // idx is guaranteed to be sequential
    }
    
    // HashSet has none of these position-based operations
}

IndexSet provides position-based operations that HashSet lacks.

Hash Table Implementation

use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
 
fn implementation_details() {
    // HashSet implementation:
    // - Raw hash table with buckets
    // - Each bucket contains hash + key
    // - Positions determined by hash function
    // - Iteration walks bucket array (order from hash distribution)
    
    // IndexSet implementation:
    // - Hash table maps hash -> index in entries array
    // - Dense entries array stores (key, hash) in insertion order
    // - Lookup: hash -> index -> entry
    // - Iteration: walk entries array (insertion order)
    
    // Why IndexSet is slower:
    // - Extra indirection for lookups
    // - Two data structures to maintain
    // - More cache misses (indirect access)
    
    // Why IndexSet preserves order:
    // - Entries array is append-only (except removals)
    // - Order is explicit, not determined by hash
    
    // Memory layout:
    // HashSet: [hash|key] entries in buckets
    // IndexSet: hash_table[hash->index] + entries[(key,hash)]
}

The implementation differences explain the performance trade-offs.

Benchmarks and When to Switch

use std::collections::HashSet as StdHashSet;
use indexmap::IndexSet;
 
fn decision_guide() {
    // Switch from HashSet to IndexSet when:
    
    // 1. Tests fail due to non-deterministic iteration
    // for item in &set { assert!(...) }  // Flaky with HashSet
    
    // 2. Serialization needs to be reproducible
    // JSON should be identical across runs
    
    // 3. You need position-based operations
    // set.get_index(i) or set.move_index(from, to)
    
    // 4. Order is part of the data model
    // User-specified order, priority order, etc.
    
    // Keep using HashSet when:
    
    // 1. Order is irrelevant
    // Just need uniqueness and lookup
    
    // 2. Memory is constrained
    // Large collections, embedded systems
    
    // 3. Performance is critical
    // Hot paths, tight loops
    
    // 4. Standard library preference
    // No external dependencies
    
    // Rule of thumb:
    // - Default to HashSet
    // - Switch to IndexSet when order matters
}

Default to HashSet, switch to IndexSet when ordering requirements emerge.

Synthesis

Performance trade-offs:

// HashSet:
// - Insert: O(1) average (fastest)
// - Lookup: O(1) average (fastest)
// - Remove: O(1) average
// - Iteration: O(n) in hash order (unpredictable)
// - Memory: Minimal overhead
 
// IndexSet:
// - Insert: O(1) average (slightly slower)
// - Lookup: O(1) average (slightly slower)
// - Remove: O(1) swap, O(n) shift
// - Iteration: O(n) in insertion order (predictable)
// - Memory: Additional overhead for order tracking

Order guarantees:

// HashSet: No order guarantees
// - Iteration order depends on hash values
// - May change between runs, Rust versions
// - Not suitable for serialization requiring determinism
 
// IndexSet: Insertion order preserved
// - First inserted element is at index 0
// - Iteration order is insertion order
// - Suitable for deterministic serialization

API differences:

// HashSet:
// - Standard set operations
// - No position-based access
// - No index operations
 
// IndexSet:
// - Standard set operations
// - get_index(idx): Position-based access
// - get_index_of(&val): Reverse lookup
// - insert(idx, val): Insert at position
// - swap_remove_index(idx): Fast removal
// - shift_remove_index(idx): Order-preserving removal
// - move_index(from, to): Reorder elements

Key insight: indexmap::IndexSet and std::collections::HashSet serve the same fundamental purpose—unique element storage with O(1) lookup—but differ in their approach to ordering. HashSet is a pure hash table that prioritizes performance and memory efficiency, accepting that iteration order is implementation-defined and may vary. IndexSet is a hybrid data structure combining a hash table with a dense array to maintain insertion order, paying a performance and memory cost for this guarantee. The choice is architectural: if your code relies on iteration order, position access, or deterministic serialization, use IndexSet. If you're computing set membership without caring about order, HashSet is the better choice. A common pattern is to start with HashSet and migrate to IndexSet when ordering requirements become apparent—for example, when serialization produces non-deterministic JSON or when tests become flaky due to iteration order.

What are the trade-offs between indexmap::IndexSet and HashSet for ordered unique collections?