What are the trade-offs between uuid::Uuid::new_v4 and new_v5 for generating UUIDs with different collision guarantees?
UUID v4 is generated from random bytes and provides probabilistic uniqueness with no determinism, while UUID v5 is generated from a namespace and name using SHA-1 hashing, providing deterministic generation where the same inputs always produce the same UUID. The collision guarantees differ fundamentally: v4 relies on the statistical improbability of random collisions (approximately 1 in 2^122 for random UUIDs), while v5 guarantees no collisions within the same namespace for identical names and uses SHA-1's collision resistance for different names. The choice between them depends on whether you need deterministic generation or true randomness.
Understanding UUID v4
use uuid::Uuid;
fn main() {
// v4 UUIDs are generated from random bytes
let id1 = Uuid::new_v4();
let id2 = Uuid::new_v4();
println!("UUID 1: {}", id1);
println!("UUID 2: {}", id2);
// Each call produces a different UUID
assert_ne!(id1, id2);
// v4 characteristics:
// - Random bits everywhere except version/variant
// - Probability of collision: ~1 in 2^122 (astronomically small)
// - No determinism - same program generates different UUIDs each run
// - Requires a secure random number generator
}v4 generates truly random UUIDs with no relationship between calls.
Understanding UUID v5
use uuid::Uuid;
fn main() {
// v5 UUIDs are generated from namespace + name
let namespace = Uuid::NAMESPACE_URL; // Predefined namespace
let name = "https://example.com/users/123";
let id1 = Uuid::new_v5(namespace, name.as_bytes());
let id2 = Uuid::new_v5(namespace, name.as_bytes());
println!("UUID 1: {}", id1);
println!("UUID 2: {}", id2);
// Same inputs always produce the same UUID
assert_eq!(id1, id2);
// Different name produces different UUID
let different_name = "https://example.com/users/456";
let id3 = Uuid::new_v5(namespace, different_name.as_bytes());
assert_ne!(id1, id3);
}v5 produces deterministic UUIDs based on namespace and name.
Collision Guarantees Compared
use uuid::Uuid;
fn main() {
// v4 Collision Analysis:
// - 122 random bits (6 bits used for version/variant)
// - Collision probability after generating n UUIDs:
// P ≈ n² / (2 × 2^122)
// - After 1 billion UUIDs: P ≈ 10^-18
// - Birthday paradox: need ~2^61 UUIDs for 50% collision chance
// - Practical: collisions are astronomically unlikely
// Generate many v4 UUIDs - collisions are virtually impossible
let v4_uuids: Vec<Uuid> = (0..1_000_000)
.map(|_| Uuid::new_v4())
.collect();
// v5 Collision Analysis:
// - Deterministic: same (namespace, name) = same UUID
// - No collisions possible for same inputs
// - Different inputs: relies on SHA-1 collision resistance
// - SHA-1 produces 160 bits, truncated to 128 for UUID
// - Known SHA-1 collisions exist but require intentional construction
let namespace = Uuid::NAMESPACE_DNS;
let names: Vec<&str> = vec!["user1", "user2", "user3", "user4", "user5"];
let v5_uuids: Vec<Uuid> = names.iter()
.map(|name| Uuid::new_v5(namespace, name.as_bytes()))
.collect();
// Regenerating same names gives same UUIDs
let regenerated: Vec<Uuid> = names.iter()
.map(|name| Uuid::new_v5(namespace, name.as_bytes()))
.collect();
assert_eq!(v5_uuids, regenerated);
}v4 has probabilistic uniqueness; v5 has deterministic uniqueness within namespace.
When to Use v4
use uuid::Uuid;
// Use v4 when:
// 1. You need truly random identifiers
// 2. Different runs should produce different results
// 3. No natural namespace/name available
// 4. Security requires unpredictability
struct Session {
id: Uuid,
user_id: u64,
created_at: std::time::Instant,
}
impl Session {
fn new(user_id: u64) -> Self {
Session {
id: Uuid::new_v4(), // Random session ID
user_id,
created_at: std::time::Instant::now(),
}
}
}
// Security-sensitive IDs should be unpredictable
fn generate_api_key() -> String {
Uuid::new_v4().to_string()
}
// Anonymous objects without natural identity
struct UploadedFile {
id: Uuid,
filename: String,
data: Vec<u8>,
}
impl UploadedFile {
fn new(filename: String, data: Vec<u8>) -> Self {
UploadedFile {
id: Uuid::new_v4(), // Random ID for uploaded file
filename,
data,
}
}
}v4 is ideal for random, unpredictable identifiers.
When to Use v5
use uuid::Uuid;
// Use v5 when:
// 1. Deterministic generation needed
// 2. Same entity should get same UUID across systems
// 3. Natural namespace/name available
// 4. Reproducible results required
// Entity ID based on external identifier
fn user_uuid_from_email(email: &str) -> Uuid {
Uuid::new_v5(Uuid::NAMESPACE_URL, email.as_bytes())
}
// Same email always produces same UUID
fn main() {
let email = "user@example.com";
let id1 = user_uuid_from_email(email);
let id2 = user_uuid_from_email(email);
assert_eq!(id1, id2); // Deterministic
// Useful for:
// - Generating consistent IDs across distributed systems
// - Reproducible identifiers for external resources
// - Content-addressable storage
}
// Content-addressed storage
fn content_uuid(content: &[u8]) -> Uuid {
Uuid::new_v5(Uuid::NAMESPACE_OID, content)
}
// Cache key that persists across restarts
struct CacheKey {
resource_type: &'static str,
resource_id: String,
}
impl CacheKey {
fn to_uuid(&self) -> Uuid {
let name = format!("{}:{}", self.resource_type, self.resource_id);
Uuid::new_v5(Uuid::NAMESPACE_DNS, name.as_bytes())
}
}v5 is ideal for deterministic, reproducible identifiers.
Predefined Namespaces
use uuid::Uuid;
fn main() {
// UUID library provides standard namespaces
let namespaces = [
Uuid::NAMESPACE_DNS, // For DNS names
Uuid::NAMESPACE_URL, // For URLs
Uuid::NAMESPACE_OID, // For ISO OIDs
Uuid::NAMESPACE_X500, // For X.500 DNs
];
// Each namespace ensures uniqueness scope
let name = "example.com";
// Same name in different namespaces = different UUIDs
let dns_uuid = Uuid::new_v5(Uuid::NAMESPACE_DNS, name.as_bytes());
let url_uuid = Uuid::new_v5(Uuid::NAMESPACE_URL, name.as_bytes());
assert_ne!(dns_uuid, url_uuid);
// Custom namespace for your application
let my_namespace = Uuid::new_v4(); // Use v4 to create namespace once
println!("My namespace: {}", my_namespace);
// Store namespace and use consistently
let user_uuid = Uuid::new_v5(my_namespace, b"user@example.com");
let order_uuid = Uuid::new_v5(my_namespace, b"order-123");
}Namespaces scope the uniqueness guarantee of v5 UUIDs.
Performance Comparison
use uuid::Uuid;
use std::time::Instant;
fn main() {
const COUNT: usize = 100_000;
// v4: Requires random number generation
let start = Instant::now();
let v4_uuids: Vec<Uuid> = (0..COUNT).map(|_| Uuid::new_v4()).collect();
let v4_duration = start.elapsed();
println!("v4: {:?} for {} UUIDs", v4_duration, COUNT);
// v5: Requires SHA-1 hashing
let namespace = Uuid::NAMESPACE_DNS;
let start = Instant::now();
let v5_uuids: Vec<Uuid> = (0..COUNT)
.map(|i| Uuid::new_v5(namespace, format!("name-{}", i).as_bytes()))
.collect();
let v5_duration = start.elapsed();
println!("v5: {:?} for {} UUIDs", v5_duration, COUNT);
// v4 is generally faster:
// - v4: Just random bytes + bit manipulation
// - v5: SHA-1 hash + bit manipulation
//
// However, v5 can be faster for repeated lookups:
// - No need to store the UUID, regenerate when needed
// - Same result across systems without coordination
println!("v4 is typically {:.1}x faster", v5_duration.as_nanos() as f64 / v4_duration.as_nanos() as f64);
}v4 is faster for generation; v5 provides determinism benefits.
Distributed Systems Considerations
use uuid::Uuid;
// v4 in distributed systems
// - Each node generates independently
// - No coordination needed
// - Statistically unique (very high probability)
// - No determinism across nodes
// v5 in distributed systems
// - Same UUID generated anywhere with same inputs
// - No coordination needed for ID generation
// - Deterministic across all nodes
// - Perfect for external resource IDs
struct Order {
// v4 for random internal ID
id: Uuid,
// v5 for consistent external reference
external_ref: Uuid,
}
impl Order {
fn new(user_id: u64, order_number: u64) -> Self {
Order {
id: Uuid::new_v4(), // Random internal ID
external_ref: Uuid::new_v5(
Uuid::NAMESPACE_OID,
format!("order:{}:{}", user_id, order_number).as_bytes()
), // Deterministic external reference
}
}
// External reference can be regenerated without storage
fn get_external_ref(user_id: u64, order_number: u64) -> Uuid {
Uuid::new_v5(
Uuid::NAMESPACE_OID,
format!("order:{}:{}", user_id, order_number).as_bytes()
)
}
}
fn main() {
// Same order in different systems
let ref1 = Order::get_external_ref(1, 100);
let ref2 = Order::get_external_ref(1, 100);
assert_eq!(ref1, ref2); // Same UUID everywhere
}v5 eliminates coordination overhead for consistent identifiers.
Migration and Version Detection
use uuid::Uuid;
fn main() {
// Both v4 and v5 can be identified by their version
let v4 = Uuid::new_v4();
let v5 = Uuid::new_v5(Uuid::NAMESPACE_DNS, b"example");
// Get version
assert_eq!(v4.get_version(), Some(uuid::Version::Random));
assert_eq!(v5.get_version(), Some(uuid::Version::Sha1));
// Get variant
assert!(matches!(v4.get_variant(), uuid::Variant::RFC4122));
assert!(matches!(v5.get_variant(), uuid::Variant::RFC4122));
// Parse and check version
let parsed = Uuid::parse_str("550e8400-e29b-41d4-a716-446655440000").unwrap();
match parsed.get_version() {
Some(uuid::Version::Random) => println!("This is a v4 UUID"),
Some(uuid::Version::Sha1) => println!("This is a v5 UUID"),
Some(uuid::Version::Md5) => println!("This is a v3 UUID"),
Some(uuid::Version::Mac) => println!("This is a v1 UUID"),
Some(uuid::Version::Dce) => println!("This is a v2 UUID"),
_ => println!("Unknown version"),
}
}UUID version is embedded in the format itself.
Security Considerations
use uuid::Uuid;
fn main() {
// v4 Security:
// - Uses cryptographically secure random number generator
// - UUIDs are unpredictable
// - Safe for security-sensitive identifiers
// - Cannot be guessed or reproduced
// v5 Security:
// - Deterministic: same inputs = same output
// - If attacker knows namespace and name, they can generate the UUID
// - Not suitable for security tokens or one-time identifiers
// - Safe for public identifiers (resource IDs, etc.)
// DON'T use v5 for:
// - Password reset tokens (predictable)
// - Session IDs (can be regenerated)
// - API keys (guessable if inputs known)
// DO use v5 for:
// - User IDs derived from email (public)
// - Resource IDs derived from URLs (public)
// - Consistent identifiers across systems
// Example: Safe v5 usage
fn get_user_id(email: &str) -> Uuid {
// Public information, deterministic ID is fine
Uuid::new_v5(Uuid::NAMESPACE_URL, email.as_bytes())
}
// Example: Dangerous v5 usage
fn generate_password_reset_token(user_id: u64) -> Uuid {
// DON'T DO THIS - predictable if user_id is known
Uuid::new_v5(Uuid::NAMESPACE_OID, user_id.to_string().as_bytes())
// Instead, use v4:
// Uuid::new_v4()
}
}v4 is unpredictable; v5 is deterministic and potentially guessable.
Hybrid Approaches
use uuid::Uuid;
// Combine both for different use cases
struct User {
// v5 for consistent external ID based on email
external_id: Uuid,
// v4 for internal session/token
session_id: Uuid,
}
impl User {
fn new(email: &str) -> Self {
User {
external_id: Uuid::new_v5(Uuid::NAMESPACE_URL, email.as_bytes()),
session_id: Uuid::new_v4(),
}
}
}
// Use v5 for deduplication
fn process_document(doc_url: &str, content: &[u8]) -> Uuid {
// v5 ensures same URL always maps to same UUID
// Allows checking if already processed
Uuid::new_v5(Uuid::NAMESPACE_URL, doc_url.as_bytes())
}
// Use v4 for transient operations
fn create_request_id() -> Uuid {
Uuid::new_v4()
}
fn main() {
// Document processing with deduplication
let url = "https://example.com/doc1.pdf";
let id1 = process_document(url, &[]);
let id2 = process_document(url, &[]);
assert_eq!(id1, id2); // Same URL = same ID
// Request tracking with random IDs
let req1 = create_request_id();
let req2 = create_request_id();
assert_ne!(req1, req2); // Always different
}Use v4 and v5 for different aspects of the same system.
Synthesis
Quick reference:
use uuid::Uuid;
// UUID v4: Random
let id = Uuid::new_v4();
// - Randomly generated
// - Probabilistic uniqueness (collision ~1/2^122)
// - Unpredictable
// - Different each run
// - Use for: sessions, tokens, random IDs
// UUID v5: Hash-based
let id = Uuid::new_v5(namespace, name.as_bytes());
// - Deterministic (same inputs = same output)
// - No collisions for same (namespace, name)
// - SHA-1 collision resistance for different names
// - Use for: external IDs, content addressing, consistent IDs
// Predefined namespaces
Uuid::NAMESPACE_DNS // DNS names
Uuid::NAMESPACE_URL // URLs
Uuid::NAMESPACE_OID // ISO OIDs
Uuid::NAMESPACE_X500 // X.500 DNs
// Decision factors:
// 1. Need determinism? → v5
// 2. Need unpredictability? → v4
// 3. Cross-system consistency? → v5
// 4. Security-sensitive token? → v4
// 5. Content-addressable? → v5
// 6. No natural identifier? → v4
// Collision probability:
// v4: P(collision) ≈ n² / (2 × 2^122)
// After 1 trillion UUIDs: P ≈ 10^-12
// v5: P(collision) = 0 for same (namespace, name)
// For different names: relies on SHA-1 (160-bit → 128-bit)Key insight: The fundamental trade-off is determinism versus unpredictability. v4 generates random UUIDs where each call produces a unique result—ideal for security tokens, session IDs, and any case where you need unpredictable identifiers. v5 generates deterministic UUIDs where the same namespace and name always produce the same UUID—ideal for content-addressable storage, consistent identifiers across distributed systems, and cases where you want to regenerate the same UUID without storing it. v4's collision probability is astronomically small in practice, while v5 guarantees no collisions for identical inputs. Choose v5 when you need reproducibility or want to derive IDs from existing data; choose v4 for everything else.
