What are the trade-offs between zstd::bulk::compress and streaming compression for large data sets?

zstd::bulk::compress compresses entire data in a single operation, requiring the complete input and output to fit in memory, while streaming compression processes data incrementally through an encoder that maintains state across calls, enabling constant memory usage regardless of input size but requiring more code complexity and preventing parallel compression of chunks within a single stream. The choice depends on data size, memory constraints, and whether you have all data upfront or need to process it as it arrives.

Bulk Compression Basics

use zstd::bulk::compress;
use std::io::Write;
 
fn bulk_compression() -> Result<(), Box<dyn std::error::Error>> {
    // Bulk compression: entire data in memory
    let data = b"Hello, World! This is some text to compress.";
    
    // Single call compresses everything
    let compressed = compress(data, 3)?;  // level 3
    
    println!("Original: {} bytes", data.len());
    println!("Compressed: {} bytes", compressed.len());
    
    // Requires entire input in memory
    // Requires output buffer sized for worst case
    // Returns compressed data directly
    
    Ok(())
}

Bulk compression is simple: pass all data, receive compressed output.

Streaming Compression Basics

use zstd::stream::write::Encoder;
use std::io::Write;
 
fn streaming_compression() -> Result<(), Box<dyn std::error::Error>> {
    // Streaming: process data incrementally
    let mut buffer = Vec::new();
    
    // Create encoder that maintains compression state
    let mut encoder = Encoder::new(&mut buffer, 3)?;
    
    // Write in chunks - state is maintained
    encoder.write_all(b"Hello, World!")?;
    encoder.write_all(b" This is some text")?;
    encoder.write_all(b" to compress.")?;
    
    // Must finalize to complete compression
    encoder.finish()?;
    
    println!("Compressed: {} bytes", buffer.len());
    
    // Works with arbitrary-sized input
    // Constant memory for encoder state
    // Requires explicit finish/flush
    
    Ok(())
}

Streaming compression processes data incrementally through an encoder.

Memory Requirements

use zstd::bulk::compress;
use zstd::stream::write::Encoder;
use std::io::Write;
 
fn memory_requirements() -> Result<(), Box<dyn std::error::Error>> {
    // Bulk compression memory:
    // - Input must fit in memory
    // - Output must fit in memory
    // - Plus temporary working memory
    
    let large_data = vec![0u8; 100_000_000];  // 100 MB input
    let compressed = compress(&large_data, 3)?;
    
    // Memory usage: ~200+ MB during compression
    // (input + output + working memory)
    
    // Streaming compression memory:
    // - Only chunk being processed in memory
    // - Encoder state (small, constant)
    // - Output buffer grows incrementally
    
    let mut output = Vec::new();
    let mut encoder = Encoder::new(&mut output, 3)?;
    
    // Process in 1 MB chunks
    for chunk in large_data.chunks(1_000_000) {
        encoder.write_all(chunk)?;
    }
    encoder.finish()?;
    
    // Memory usage: ~1 MB chunk + encoder state + output
    // Constant regardless of total input size
    
    Ok(())
}

Bulk requires all data in memory; streaming uses constant memory.

API Complexity Comparison

use zstd::bulk::compress;
use zstd::stream::write::Encoder;
use std::io::Write;
 
fn api_comparison() -> Result<(), Box<dyn std::error::Error>> {
    // Bulk compression: Simple, one function call
    let data = b"simple data";
    let compressed = compress(data, 3)?;
    
    // Streaming compression: More complex
    let mut output = Vec::new();
    {
        let mut encoder = Encoder::new(&mut output, 3)?;
        encoder.write_all(data)?;
        encoder.finish()?;  // Must call finish!
    }  // Must handle encoder lifetime
    
    // Streaming requires:
    // - Creating encoder
    // - Writing in chunks
    // - Calling finish()
    // - Handling encoder lifetime
    // - Managing output writer
    
    // Bulk requires:
    // - One function call
    
    Ok(())
}

Bulk compression has a simpler API; streaming requires more boilerplate.

Data Availability Patterns

use zstd::bulk::compress;
use zstd::stream::write::Encoder;
use std::io::Write;
 
fn data_patterns() -> Result<(), Box<dyn std::error::Error>> {
    // Pattern 1: All data available immediately
    let complete_data = read_entire_file("data.bin")?;
    let compressed = compress(&complete_data, 3)?;
    // Bulk compression is ideal
    
    // Pattern 2: Data arrives in chunks
    let mut output = Vec::new();
    let mut encoder = Encoder::new(&mut output, 3)?;
    
    for chunk in read_chunks_from_network()? {
        encoder.write_all(&chunk)?;
        // Can compress before all data arrives
    }
    encoder.finish()?;
    // Streaming compression is required
    
    // Pattern 3: Unknown total size
    let mut output = Vec::new();
    let mut encoder = Encoder::new(&mut output, 3)?;
    
    while let Some(chunk) = read_next_chunk()? {
        encoder.write_all(&chunk)?;
    }
    encoder.finish()?;
    // Streaming handles unknown sizes
    
    Ok(())
}
 
# fn read_entire_file(_path: &str) -> Result<Vec<u8>, Box<dyn std::error::Error>> {
#     Ok(vec![0u8; 1000])
# }
# fn read_chunks_from_network() -> Result<Vec<Vec<u8>>, Box<dyn std::error::Error>> {
#     Ok(vec![vec![0u8; 100], vec![0u8; 100]])
# }
# fn read_next_chunk() -> Result<Option<Vec<u8>>, Box<dyn std::error::Error>> {
#     Ok(Some(vec![0u8; 100]))
# }

Use streaming when data arrives incrementally or size is unknown.

Compression Ratio

use zstd::bulk::compress;
use zstd::stream::write::Encoder;
use std::io::Write;
 
fn compression_ratio() -> Result<(), Box<dyn std::error::Error>> {
    // Same data, same level should produce similar results
    let data = b"Hello, World! Hello, World! Hello, World!";
    
    let bulk_compressed = compress(data, 3)?;
    
    let mut stream_output = Vec::new();
    let mut encoder = Encoder::new(&mut stream_output, 3)?;
    encoder.write_all(data)?;
    encoder.finish()?;
    
    // Bulk and streaming produce similar compression ratios
    // Differences are minimal (a few bytes for framing)
    
    println!("Bulk: {} bytes", bulk_compressed.len());
    println!("Streaming: {} bytes", stream_output.len());
    
    // Note: Streaming may have slightly larger output due to
    // chunk boundaries, but difference is typically negligible
    
    Ok(())
}

Both approaches produce similar compression ratios for the same data.

Performance Characteristics

use zstd::bulk::compress;
use zstd::stream::write::Encoder;
use std::io::Write;
use std::time::Instant;
 
fn performance_comparison() -> Result<(), Box<dyn std::error::Error>> {
    let data = vec![0u8; 10_000_000];  // 10 MB
    
    // Bulk compression: Often faster for moderate sizes
    let start = Instant::now();
    let bulk_compressed = compress(&data, 3)?;
    let bulk_duration = start.elapsed();
    
    // Streaming compression: May be slower due to chunk overhead
    let start = Instant::now();
    let mut output = Vec::new();
    let mut encoder = Encoder::new(&mut output, 3)?;
    
    // Single write (equivalent to bulk)
    encoder.write_all(&data)?;
    encoder.finish()?;
    let stream_duration = start.elapsed();
    
    println!("Bulk: {:?}", bulk_duration);
    println!("Streaming: {:?}", stream_duration);
    
    // Bulk can use optimized paths for known-size input
    // Streaming has function call overhead per chunk
    
    // For large data and small chunks, streaming overhead adds up
    // For large chunks, difference is minimal
    
    Ok(())
}

Bulk may be faster for moderate sizes; streaming has per-chunk overhead.

When to Use Bulk Compression

use zstd::bulk::{compress, decompress};
 
fn bulk_use_cases() -> Result<(), Box<dyn std::error::Error>> {
    // Use bulk compression when:
    
    // 1. Data fits comfortably in memory
    let small_data = b"small data";
    let compressed = compress(small_data, 3)?;
    
    // 2. All data available upfront
    let file_contents = std::fs::read("input.bin")?;
    let compressed = compress(&file_contents, 3)?;
    
    // 3. Simple API preferred
    let compressed = compress(&data, level)?;
    let decompressed = decompress(&compressed, max_size)?;
    
    // 4. Memory is not constrained
    // Desktop app with plenty of RAM
    
    // 5. Compression speed matters more than memory
    // Bulk has less overhead
    
    // 6. Known maximum output size
    // Can allocate output buffer appropriately
    
    Ok(())
}
 
# let data = vec![0u8; 1000];
# let level = 3;

Use bulk when data fits in memory and you have all data upfront.

When to Use Streaming Compression

use zstd::stream::write::Encoder;
use std::io::Write;
 
fn streaming_use_cases() -> Result<(), Box<dyn std::error::Error>> {
    // Use streaming compression when:
    
    // 1. Data is larger than available memory
    let large_file = std::fs::File::open("huge_file.bin")?;
    // Process without loading entire file
    
    // 2. Data arrives incrementally
    // Network streams, chunked uploads
    
    // 3. Unknown total size
    // Streaming from a source until EOF
    
    // 4. Memory-constrained environments
    // Embedded systems, WebAssembly
    
    // 5. Processing pipeline
    // Read -> Transform -> Compress -> Write
    
    // 6. Real-time compression
    // Compress as data is generated
    
    let mut output = Vec::new();
    let mut encoder = Encoder::new(&mut output, 3)?;
    
    // Simulate streaming data source
    for _ in 0..1000 {
        let chunk = generate_chunk();
        encoder.write_all(&chunk)?;
        // Process next chunk without storing all chunks
    }
    
    encoder.finish()?;
    
    Ok(())
}
 
# fn generate_chunk() -> Vec<u8> { vec![0u8; 1024] }

Use streaming when memory is limited or data arrives incrementally.

File Compression Example

use zstd::bulk::compress;
use zstd::stream::write::Encoder;
use std::io::{Read, Write};
use std::fs::File;
 
fn compress_file_bulk(input_path: &str, output_path: &str) -> Result<(), Box<dyn std::error::Error>> {
    // Bulk: Load entire file, then compress
    let mut input = File::open(input_path)?;
    let mut data = Vec::new();
    input.read_to_end(&mut data)?;
    
    let compressed = compress(&data, 3)?;
    
    let mut output = File::create(output_path)?;
    output.write_all(&compressed)?;
    
    // Memory: input_size + compressed_size + working_memory
    // Simple: read, compress, write
    
    Ok(())
}
 
fn compress_file_streaming(input_path: &str, output_path: &str) -> Result<(), Box<dyn std::error::Error>> {
    // Streaming: Process in chunks
    let mut input = File::open(input_path)?;
    let output = File::create(output_path)?;
    let mut encoder = Encoder::new(output, 3)?;
    
    let mut buffer = [0u8; 8192];
    loop {
        let bytes_read = input.read(&mut buffer)?;
        if bytes_read == 0 {
            break;
        }
        encoder.write_all(&buffer[..bytes_read])?;
    }
    
    encoder.finish()?;
    
    // Memory: chunk_size + encoder_state
    // Works with arbitrary file sizes
    
    Ok(())
}

Streaming enables constant-memory file compression regardless of file size.

Decompression Considerations

use zstd::bulk::decompress;
use zstd::stream::write::Decoder;
use std::io::Write;
 
fn decompression_comparison() -> Result<(), Box<dyn std::error::Error>> {
    let compressed = compress_data();  // Some compressed data
    
    // Bulk decompression: Must know max size
    let decompressed = decompress(&compressed, 10_000_000)?;
    // Requires knowing or estimating maximum decompressed size
    
    // Streaming decompression: No size limit needed
    let mut output = Vec::new();
    let mut decoder = Decoder::new(&mut output)?;
    decoder.write_all(&compressed)?;
    decoder.finish()?;
    
    // Streaming decompression handles unknown sizes automatically
    
    // Bulk requires:
    // - Knowing maximum decompressed size
    // - Allocating buffer for that size
    
    // Streaming requires:
    // - Managing decoder lifetime
    // - Explicit finish call
    
    Ok(())
}
 
# fn compress_data() -> Vec<u8> { zstd::bulk::compress(b"test data", 3).unwrap() }

Bulk decompression requires knowing the maximum size; streaming handles unknown sizes.

Compression Levels

use zstd::bulk::compress;
use zstd::stream::write::Encoder;
use std::io::Write;
 
fn compression_levels() -> Result<(), Box<dyn std::error::Error>> {
    let data = b"Hello, World!";
    
    // Both support levels 1-22 (higher = better compression, slower)
    
    // Bulk with level
    let fast = compress(data, 1)?;      // Fast, larger output
    let balanced = compress(data, 3)?;   // Balanced
    let best = compress(data, 19)?;       // Slow, smallest output
    
    // Streaming with level
    let mut output = Vec::new();
    let mut encoder = Encoder::new(&mut output, 19)?;
    encoder.write_all(data)?;
    encoder.finish()?;
    
    // Streaming has same level options
    // Level affects compression ratio and speed similarly
    
    // Higher levels use more working memory during compression
    // This affects both bulk and streaming
    
    Ok(())
}

Both approaches support the full range of zstd compression levels.

Dictionary Compression

use zstd::bulk::{compress, decompress};
use zstd::stream::write::Encoder;
use std::io::Write;
 
fn dictionary_compression() -> Result<(), Box<dyn std::error::Error>> {
    // Pre-trained dictionary for specific data types
    let dictionary = b"sample dictionary data for training";
    
    let data = b"Hello, World!";
    
    // Bulk with dictionary
    let compressed = compress(data, 3)?;
    // Note: Dictionary API varies by zstd crate version
    
    // Streaming with dictionary
    let mut output = Vec::new();
    let mut encoder = Encoder::with_dictionary(&mut output, 3, dictionary)?;
    encoder.write_all(data)?;
    encoder.finish()?;
    
    // Dictionaries improve compression for small, similar data
    // Both bulk and streaming support dictionaries
    
    Ok(())
}

Both approaches can use dictionaries for improved compression on small, similar data.

Error Handling

use zstd::bulk::compress;
use zstd::stream::write::Encoder;
use std::io::Write;
 
fn error_handling() -> Result<(), Box<dyn std::error::Error>> {
    // Bulk errors: Single point of failure
    let data = b"test";
    let compressed = compress(data, 3)?;
    // Error can only occur during compress call
    
    // Streaming errors: Can occur at multiple points
    let mut output = Vec::new();
    let mut encoder = Encoder::new(&mut output, 3)?;
    
    // Error on creation
    // (if writer fails)
    
    // Error on write
    encoder.write_all(data)?;
    
    // Error on finish
    encoder.finish()?;
    
    // Streaming has more error points
    // Requires handling errors at each stage
    
    Ok(())
}

Bulk has a single error point; streaming has multiple potential failure points.

Parallel Processing

use zstd::bulk::compress;
use rayon::prelude::*;
 
fn parallel_bulk() -> Result<(), Box<dyn std::error::Error>> {
    // Bulk compression can be parallelized across chunks
    let chunks: Vec<Vec<u8>> = vec![
        vec![0u8; 1000],
        vec![1u8; 1000],
        vec![2u8; 1000],
    ];
    
    // Compress each chunk independently
    let compressed_chunks: Vec<Vec<u8>> = chunks
        .par_iter()
        .map(|chunk| compress(chunk, 3).unwrap())
        .collect();
    
    // Each compression is independent
    // Can use all CPU cores
    
    // Note: Compressing chunks independently means
    // compression ratio is worse than single stream
    // No dictionary sharing between chunks
    
    Ok(())
}
 
// Streaming compression within a single stream
// cannot be parallelized - each chunk depends on previous
// State is maintained across writes

Bulk can parallelize independent chunks; streaming maintains state and must be sequential.

Choosing Between Approaches

use zstd::bulk::compress;
use zstd::stream::write::Encoder;
use std::io::Write;
 
fn choose_approach() -> Result<(), Box<dyn std::error::Error>> {
    // Decision factors:
    
    // Data size: Small -> Bulk
    let small = b"small data";
    let _ = compress(small, 3)?;
    
    // Data size: Large/Unknown -> Streaming
    // Use streaming for files > available memory
    
    // Memory constraint: Tight -> Streaming
    // Use streaming for embedded/WebAssembly
    
    // Data availability: All at once -> Bulk
    let all_data = collect_all_data();
    let _ = compress(&all_data, 3)?;
    
    // Data availability: Incremental -> Streaming
    // Use streaming for network data, pipelines
    
    // Code simplicity: Bulk is simpler
    // Prefer bulk when memory allows
    
    // Compression quality: Similar for both
    // Difference is negligible
    
    Ok(())
}
 
# fn collect_all_data() -> Vec<u8> { vec![0u8; 1000] }

Choose based on data size, memory constraints, and data availability.

Synthesis

Quick reference:

Aspect Bulk Compression Streaming Compression
Memory O(input + output) O(chunk + state)
Data required All upfront Incremental
API complexity Simple (one call) Complex (encoder lifecycle)
Performance Often faster Per-chunk overhead
Parallelizable Yes (independent) No (stateful)
Unknown sizes Not supported Supported

When to use bulk compression:

use zstd::bulk::compress;
 
fn bulk_cases() -> Result<(), Box<dyn std::error::Error>> {
    // Data fits in memory
    let data = std::fs::read("file.bin")?;
    let compressed = compress(&data, 3)?;
    
    // All data available
    // Memory not constrained
    // Prefer simple API
    
    Ok(())
}

When to use streaming compression:

use zstd::stream::write::Encoder;
use std::io::Write;
 
fn streaming_cases() -> Result<(), Box<dyn std::error::Error>> {
    // Data larger than memory
    // Data arrives incrementally
    // Unknown total size
    // Memory-constrained environment
    
    let mut output = Vec::new();
    let mut encoder = Encoder::new(&mut output, 3)?;
    
    while let Some(chunk) = read_chunk()? {
        encoder.write_all(&chunk)?;
    }
    encoder.finish()?;
    
    Ok(())
}
 
# fn read_chunk() -> Result<Option<Vec<u8>>, Box<dyn std::error::Error>> {
#     Ok(None)
# }

Key insight: zstd::bulk::compress and streaming compression represent two ends of a spectrum trading simplicity for memory efficiency. Bulk compression is simpler—pass all data, get compressed output—but requires the entire input and output to fit in memory simultaneously. Streaming compression is more complex—create encoder, write chunks, finish—but uses constant memory proportional to chunk size plus a small encoder state, enabling compression of arbitrarily large data. Use bulk when data fits comfortably in memory and simplicity matters. Use streaming when processing large files, network streams, or data that arrives incrementally. Streaming also handles decompression of unknown-sized data, while bulk decompression requires specifying the maximum output size upfront. The compression ratio is similar for both approaches; the difference is in memory usage and API complexity. Bulk compression can be parallelized across independent chunks, while streaming maintains state across writes and must be processed sequentially. The choice depends primarily on memory constraints and data availability patterns, not compression quality.