What are the trade-offs between uuid::Uuid::new_v4 and new_v5 for generating UUIDs with different collision guarantees?

new_v4 generates random UUIDs with 122 random bits, offering probabilistic uniqueness with no coordination required, while new_v5 generates deterministic UUIDs from a namespace and name, providing guaranteed uniqueness within a namespace and reproducible results. The key trade-off is randomness versus determinism: new_v4 produces different UUIDs each time (making it impossible to regenerate the same UUID), while new_v5 always produces the same UUID for the same namespace and name (enabling deterministic ID generation). For collision resistance, both are sufficient in practice—v4 has 2^122 possible values making collisions astronomically unlikely, while v5 guarantees uniqueness within its namespace but requires careful namespace management across systems.

Basic UUID Generation

use uuid::{Uuid, UuidNamespace};
 
fn main() {
    // v4: Random UUID
    let v4_1 = Uuid::new_v4();
    let v4_2 = Uuid::new_v4();
    
    println!("v4 #1: {}", v4_1);
    println!("v4 #2: {}", v4_2);
    
    // Every call produces a different UUID
    assert_ne!(v4_1, v4_2);
    
    // v5: Deterministic UUID from namespace and name
    let namespace = UuidNamespace::Dns;
    let name = "example.com";
    
    let v5_1 = Uuid::new_v5(namespace, name.as_bytes());
    let v5_2 = Uuid::new_v5(namespace, name.as_bytes());
    
    println!("v5 #1: {}", v5_1);
    println!("v5 #2: {}", v5_2);
    
    // Same inputs always produce the same UUID
    assert_eq!(v5_1, v5_2);
}

v4 is random; v5 is deterministic based on input.

Collision Guarantees

use uuid::{Uuid, UuidNamespace};
 
fn main() {
    // v4 collision analysis
    let v4_uuids: Vec<Uuid> = (0..1_000_000).map(|_| Uuid::new_v4()).collect();
    
    // Probability of collision with n random v4 UUIDs:
    // P ≈ n² / (2 × 2^122) ≈ n² / 2^123
    // For 1 million UUIDs: P ≈ 10^12 / 2^123 ≈ 10^12 / 10^37 ≈ 10^-25
    // Essentially zero for practical purposes
    
    let unique_count = v4_uuids.iter().collect::<std::collections::HashSet<_>>().len();
    println!("Unique v4 UUIDs: {}/1000000", unique_count);
    
    // v5 collision analysis
    // Within same namespace: same name = same UUID (guaranteed)
    // Different names produce different UUIDs with very high probability
    
    let namespace = UuidNamespace::Dns;
    
    let uuid_a = Uuid::new_v5(namespace, "example-a.com".as_bytes());
    let uuid_b = Uuid::new_v5(namespace, "example-b.com".as_bytes());
    
    // Different names produce different UUIDs
    assert_ne!(uuid_a, uuid_b);
    
    // Same name in different namespaces produces different UUIDs
    let uuid_dns = Uuid::new_v5(UuidNamespace::Dns, "example.com".as_bytes());
    let uuid_url = Uuid::new_v5(UuidNamespace::Url, "example.com".as_bytes());
    
    assert_ne!(uuid_dns, uuid_url);
    println!("Different namespaces: {} vs {}", uuid_dns, uuid_url);
}

v4 collisions are probabilistically impossible; v5 guarantees uniqueness within a namespace.

Deterministic ID Generation

use uuid::{Uuid, UuidNamespace};
 
fn main() {
    // v5 enables deterministic ID generation
    // Same input always produces the same output
    
    fn get_user_id(email: &str) -> Uuid {
        Uuid::new_v5(UuidNamespace::Dns, email.as_bytes())
    }
    
    // Same email always produces the same UUID
    let id1 = get_user_id("alice@example.com");
    let id2 = get_user_id("alice@example.com");
    
    assert_eq!(id1, id2);
    println!("User ID for alice: {}", id1);
    
    // Different emails produce different IDs
    let bob_id = get_user_id("bob@example.com");
    assert_ne!(id1, bob_id);
    println!("User ID for bob: {}", bob_id);
    
    // This enables:
    // 1. Generating IDs without storing them
    // 2. Reproducible IDs across systems
    // 3. No coordination needed for ID generation
}

v5 allows generating consistent IDs without storing a mapping.

When to Use Each Version

use uuid::{Uuid, UuidNamespace};
 
fn main() {
    // Use v4 when:
    // 1. You need truly random IDs with no pattern
    // 2. IDs should not be guessable
    // 3. No need to regenerate the same ID
    // 4. Privacy matters (can't reverse-engineer input)
    
    fn create_session_token() -> Uuid {
        Uuid::new_v4()  // Random, unpredictable
    }
    
    fn create_api_key() -> Uuid {
        Uuid::new_v4()  // Should be unique each time
    }
    
    // Use v5 when:
    // 1. You need deterministic ID generation
    // 2. Same input should produce same ID
    // 3. ID generation without storage
    // 4. Cross-system consistency required
    
    fn get_resource_id(resource_type: &str, resource_key: &str) -> Uuid {
        Uuid::new_v5(UuidNamespace::Dns, format!("{}:{}", resource_type, resource_key).as_bytes())
    }
    
    fn get_document_id(doc_url: &str) -> Uuid {
        Uuid::new_v5(UuidNamespace::Url, doc_url.as_bytes())
    }
    
    // v5 IDs are reproducible across systems
    let id1 = get_document_id("https://example.com/doc/123");
    let id2 = get_document_id("https://example.com/doc/123");
    assert_eq!(id1, id2);
    println!("Document ID: {}", id1);
    
    // v4 for session tokens
    let session = create_session_token();
    println!("Session: {}", session);
}

Choose v4 for randomness and privacy; v5 for determinism and reproducibility.

Namespace Isolation

use uuid::{Uuid, UuidNamespace};
 
fn main() {
    // v5 namespaces prevent collisions across different contexts
    // The same name in different namespaces produces different UUIDs
    
    let name = "user-12345";
    
    // Standard namespaces
    let dns_uuid = Uuid::new_v5(UuidNamespace::Dns, name.as_bytes());
    let url_uuid = Uuid::new_v5(UuidNamespace::Url, name.as_bytes());
    let oid_uuid = Uuid::new_v5(UuidNamespace::Oid, name.as_bytes());
    let x500_uuid = Uuid::new_v5(UuidNamespace::X500, name.as_bytes());
    
    println!("DNS namespace: {}", dns_uuid);
    println!("URL namespace: {}", url_uuid);
    println!("OID namespace: {}", oid_uuid);
    println!("X500 namespace: {}", x500_uuid);
    
    // All different despite same name
    assert_ne!(dns_uuid, url_uuid);
    assert_ne!(url_uuid, oid_uuid);
    
    // Custom namespaces for your application
    let app_namespace = Uuid::new_v4();  // Generate once, use as namespace
    
    let user_id = Uuid::new_v5(app_namespace, "user-12345".as_bytes());
    let order_id = Uuid::new_v5(app_namespace, "order-12345".as_bytes());
    
    println!("App namespace: {}", app_namespace);
    println!("User ID: {}", user_id);
    println!("Order ID: {}", order_id);
    
    // Use custom namespaces to isolate ID spaces
    let ns_a = Uuid::parse_str("00000000-0000-0000-0000-000000000001").unwrap();
    let ns_b = Uuid::parse_str("00000000-0000-0000-0000-000000000002").unwrap();
    
    let id_a = Uuid::new_v5(ns_a, name.as_bytes());
    let id_b = Uuid::new_v5(ns_b, name.as_bytes());
    
    assert_ne!(id_a, id_b);
}

Namespaces prevent collisions across different contexts and applications.

Performance Comparison

use uuid::{Uuid, UuidNamespace};
use std::time::Instant;
 
fn main() {
    // v4: Uses random number generator
    // Very fast, just random bytes
    
    let start = Instant::now();
    let v4_count = 100_000;
    for _ in 0..v4_count {
        let _ = Uuid::new_v4();
    }
    let v4_duration = start.elapsed();
    println!("v4: {:?} for {} UUIDs", v4_duration, v4_count);
    
    // v5: Uses SHA-1 hash (internally)
    // Slightly slower due to hashing, but still fast
    
    let start = Instant::now();
    let v5_count = 100_000;
    let namespace = UuidNamespace::Dns;
    for i in 0..v5_count {
        let _ = Uuid::new_v5(namespace, format!("name-{}", i).as_bytes());
    }
    let v5_duration = start.elapsed();
    println!("v5: {:?} for {} UUIDs", v5_duration, v5_count);
    
    // Both are extremely fast for practical purposes
    // The difference is negligible for most applications
    
    // Key insight: v5 performance depends on name length
    // Longer names = more hashing work
    
    let short_name = "a";
    let long_name = "a".repeat(1000);
    
    let start = Instant::now();
    for _ in 0..10_000 {
        let _ = Uuid::new_v5(namespace, short_name.as_bytes());
    }
    let short_duration = start.elapsed();
    
    let start = Instant::now();
    for _ in 0..10_000 {
        let _ = Uuid::new_v5(namespace, long_name.as_bytes());
    }
    let long_duration = start.elapsed();
    
    println!("Short name: {:?}", short_duration);
    println!("Long name: {:?}", long_duration);
}

v4 is faster; v5 has SHA-1 overhead but both are practical.

Security Considerations

use uuid::{Uuid, UuidNamespace};
 
fn main() {
    // v4: Random, unpredictable
    // - Cannot be guessed
    // - Cannot be reversed to find input
    // - Good for security-sensitive tokens
    
    let session_token = Uuid::new_v4();
    println!("Random session token: {}", session_token);
    // An attacker cannot guess this or derive anything from it
    
    // v5: Deterministic, potentially guessable
    // - If namespace and name are known, UUID is predictable
    // - Can be reversed if you know the namespace
    
    let user_email = "alice@example.com";
    let predictable_id = Uuid::new_v5(UuidNamespace::Dns, user_email.as_bytes());
    println!("Predictable user ID: {}", predictable_id);
    
    // Anyone who knows the email and namespace can regenerate this UUID
    // This is intentional - but may not be desired for all use cases
    
    // Don't use v5 for:
    // - Session tokens (guessable)
    // - API keys (guessable)
    // - Password reset tokens (guessable)
    
    // Use v5 for:
    // - Content-addressable IDs
    // - Deterministic entity IDs
    // - Cross-system references
    
    // Example: v5 for deduplication is safe
    fn dedup_id(content: &str) -> Uuid {
        Uuid::new_v5(UuidNamespace::Dns, content.as_bytes())
    }
    
    // Same content = same ID, useful for deduplication
    let content = "Hello, World!";
    let id1 = dedup_id(content);
    let id2 = dedup_id(content);
    assert_eq!(id1, id2);
}

v4 is unpredictable; v5 is deterministic and potentially guessable.

Practical Example: Content Addressing

use uuid::{Uuid, UuidNamespace};
 
fn main() {
    // v5 is ideal for content-addressable storage
    
    fn content_id(content: &[u8]) -> Uuid {
        Uuid::new_v5(UuidNamespace::Dns, content)
    }
    
    // Same content always gets the same ID
    let content1 = b"Hello, World!";
    let content2 = b"Hello, World!";
    let content3 = b"Different content";
    
    let id1 = content_id(content1);
    let id2 = content_id(content2);
    let id3 = content_id(content3);
    
    assert_eq!(id1, id2);
    assert_ne!(id1, id3);
    
    println!("Content ID 1: {}", id1);
    println!("Content ID 2: {}", id2);
    println!("Content ID 3: {}", id3);
    
    // This enables:
    // 1. Deduplication without storing mappings
    // 2. Verification that content hasn't changed
    // 3. Consistent IDs across distributed systems
    
    // Store content by ID
    let mut store: std::collections::HashMap<Uuid, Vec<u8>> = std::collections::HashMap::new();
    
    let content = b"Important data".to_vec();
    let id = content_id(&content);
    store.insert(id, content);
    
    // Later, retrieve by regenerating ID from content
    let lookup_content = b"Important data".to_vec();
    let lookup_id = content_id(&lookup_content);
    
    if let Some(stored) = store.get(&lookup_id) {
        println!("Found content: {:?}", stored);
    }
}

v5 enables content-addressable storage without coordination.

Practical Example: Distributed Systems

use uuid::{Uuid, UuidNamespace};
 
fn main() {
    // In distributed systems, v5 eliminates coordination for ID generation
    
    // Scenario: Multiple servers need to generate consistent IDs for users
    
    // Approach with v4: Need coordination or storage
    // Problem: Different servers might generate different IDs for same user
    
    // Approach with v5: No coordination needed
    let app_namespace = Uuid::parse_str("f47ac10b-58cc-4372-a567-0e02b2c3d479").unwrap();
    
    fn generate_user_id(namespace: Uuid, email: &str) -> Uuid {
        Uuid::new_v5(namespace, email.as_bytes())
    }
    
    // Server A
    let id_server_a = generate_user_id(app_namespace, "alice@example.com");
    
    // Server B (different machine, no communication)
    let id_server_b = generate_user_id(app_namespace, "alice@example.com");
    
    // Both servers generate the same ID without any coordination
    assert_eq!(id_server_a, id_server_b);
    println!("Both servers agree: {}", id_server_a);
    
    // This works across:
    // - Multiple servers
    // - Offline systems
    // - Different processes
    // - Any system that knows the namespace
    
    // Namespace must be shared and fixed
    // Consider storing namespace in config or deriving from app ID
}

v5 enables coordination-free ID generation across distributed systems.

Migration Path

use uuid::{Uuid, UuidNamespace};
 
fn main() {
    // Migrating from v4 to v5 or vice versa
    
    // If using v4 and need to switch to v5:
    // - v5 IDs will be different from v4 IDs
    // - Need migration strategy or dual-write period
    
    // v4 example
    let v4_id = Uuid::new_v4();
    
    // Equivalent v5 (not really - can't convert)
    // You cannot convert v4 to v5 - they're fundamentally different
    
    // Strategy 1: Keep both during migration
    struct Entity {
        legacy_v4_id: Uuid,  // Keep for backward compatibility
        primary_v5_id: Uuid,  // New deterministic ID
    }
    
    // Strategy 2: Use v5 from start if determinism might be needed
    fn entity_id(entity_type: &str, key: &str) -> Uuid {
        let namespace = Uuid::parse_str("your-app-namespace").unwrap();
        Uuid::new_v5(namespace, format!("{}:{}", entity_type, key).as_bytes())
    }
    
    // Strategy 3: Hybrid approach
    fn generate_id(needs_determinism: bool, namespace: Uuid, name: &[u8]) -> Uuid {
        if needs_determinism {
            Uuid::new_v5(namespace, name)
        } else {
            Uuid::new_v4()
        }
    }
}

Choose the right version upfront when possible; migration requires planning.

Version Identification

use uuid::Uuid;
 
fn main() {
    // UUIDs encode their version in the structure
    // Version is stored in bits 48-51 (the "version" field)
    
    let v4 = Uuid::new_v4();
    let v5 = Uuid::new_v5(uuid::UuidNamespace::Dns, b"example");
    
    // Check version
    println!("v4 version: {:?}", v4.get_version());  // Some(Random)
    println!("v5 version: {:?}", v5.get_version());  // Some(Sha1)
    
    // The version is encoded in the UUID itself
    // v4: Version 4 (random)
    // v5: Version 5 (SHA-1)
    
    // This allows identifying how a UUID was generated
    match v4.get_version() {
        Some(uuid::Version::Random) => println!("This is a v4 UUID"),
        Some(uuid::Version::Sha1) => println!("This is a v5 UUID"),
        Some(uuid::Version::Md5) => println!("This is a v3 UUID"),
        Some(uuid::Version::Time) => println!("This is a v1 UUID"),
        _ => println!("Unknown version"),
    }
    
    match v5.get_version() {
        Some(uuid::Version::Sha1) => println!("This is a v5 UUID"),
        _ => println!("Not v5"),
    }
    
    // Get the version number (4 or 5)
    let v4_version_num = v4.get_version_num();
    let v5_version_num = v5.get_version_num();
    
    println!("v4 version number: {}", v4_version_num);  // 4
    println!("v5 version number: {}", v5_version_num);  // 5
}

UUIDs carry their version; you can determine how they were generated.

Complete Comparison Example

use uuid::{Uuid, UuidNamespace};
 
fn main() {
    // Comprehensive comparison
    
    // Creation
    let v4 = Uuid::new_v4();
    let v5 = Uuid::new_v5(UuidNamespace::Dns, b"example.com");
    
    println!("v4: {}", v4);
    println!("v5: {}", v5);
    
    // Uniqueness guarantees
    println!("\nUniqueness:");
    println!("v4: Probabilistic (2^122 possibilities, collisions virtually impossible)");
    println!("v5: Deterministic (same input = same output, unique within namespace)");
    
    // Reproducibility
    println!("\nReproducibility:");
    let v4_a = Uuid::new_v4();
    let v4_b = Uuid::new_v4();
    println!("v4 same twice? {}", v4_a == v4_b);  // false
    
    let v5_a = Uuid::new_v5(UuidNamespace::Dns, b"test");
    let v5_b = Uuid::new_v5(UuidNamespace::Dns, b"test");
    println!("v5 same twice? {}", v5_a == v5_b);  // true
    
    // Use cases
    println!("\nUse v4 for:");
    println!("  - Session tokens");
    println!("  - API keys");
    println!("  - One-time tokens");
    println!("  - Anonymous IDs");
    println!("  - When unpredictability matters");
    
    println!("\nUse v5 for:");
    println!("  - Content-addressable IDs");
    println!("  - Deterministic entity IDs");
    println!("  - Cross-system consistency");
    println!("  - Deduplication keys");
    println!("  - When you need to regenerate IDs");
}

Synthesis

Quick reference:

use uuid::{Uuid, UuidNamespace};
 
// v4: Random UUID
let random_id = Uuid::new_v4();
// - 122 random bits
// - Every call produces different UUID
// - Collision probability: ~10^-25 for 1 million UUIDs
// - Use for: tokens, keys, anonymous IDs
// - Security: unpredictable
 
// v5: Deterministic UUID
let deterministic_id = Uuid::new_v5(UuidNamespace::Dns, b"name");
// - SHA-1 hash of namespace + name
// - Same inputs always produce same UUID
// - Collision: guaranteed unique within namespace
// - Use for: entity IDs, content addressing
// - Security: predictable if inputs known
 
// Standard namespaces
UuidNamespace::Dns;   // Domain names
UuidNamespace::Url;   // URLs
UuidNamespace::Oid;   // ISO OIDs
UuidNamespace::X500;  // X.500 DNs
 
// Custom namespace (generate once, use everywhere)
let my_namespace = Uuid::new_v4();
// Store this and use consistently across all systems
 
// Key differences:
// 1. v4 is random; v5 is deterministic
// 2. v4 can't be regenerated; v5 can be
// 3. v4 is unpredictable; v5 is predictable
// 4. v4 needs no inputs; v5 needs namespace and name
// 5. v4 is slightly faster; v5 has SHA-1 overhead

Key insight: The fundamental trade-off between v4 and v5 is randomness versus determinism. v4 provides strong probabilistic uniqueness through randomness—each UUID is 122 random bits, making collisions astronomically unlikely (approximately 1 in 2^122 for any two UUIDs). This makes v4 ideal for tokens, keys, and any scenario where unpredictability is valuable. However, v4 UUIDs cannot be regenerated; if you lose the UUID, you cannot recover it.

v5 provides deterministic uniqueness within a namespace. The same namespace and name always produce the same UUID, enabling reproducible ID generation without coordination. This is powerful for distributed systems, content addressing, and any scenario where you need to generate the same ID from the same input. The trade-off is predictability: anyone who knows your namespace and naming scheme can generate the same UUIDs. For sensitive tokens, use v4. For entity IDs, content hashes, and distributed systems, v5's determinism is often the right choice.