What are the trade-offs between zstd::bulk::compress and stream::Encoder for one-shot vs streaming compression?

zstd::bulk::compress performs one-shot compression on complete data, requiring the entire input to be in memory but offering simpler API usage and potentially better compression ratios through full-context analysis, while stream::Encoder processes data incrementally in chunks, enabling compression of arbitrarily large data with bounded memory usage at the cost of slightly reduced compression efficiency and more complex API usage. The bulk API is ideal for small to medium-sized data that fits comfortably in memory, while the streaming API is necessary for large files, network streams, or situations where memory usage must be constrained. Both produce valid zstd output, but they represent fundamentally different approaches to the compression problem.

Basic bulk::compress Usage

use zstd::bulk::compress;
 
fn bulk_example() -> Result<(), Box<dyn std::error::Error>> {
    let data = b"Hello, World! This is some data to compress.";
    
    // One-shot compression: entire input must be in memory
    let compressed = compress(data, 3)?;  // compression level 3
    
    println!("Original size: {}", data.len());
    println!("Compressed size: {}", compressed.len());
    
    // Decompress with bulk API
    let decompressed = zstd::bulk::decompress(&compressed, data.len())?;
    assert_eq!(data.to_vec(), decompressed);
    
    Ok(())
}

bulk::compress takes complete data and returns complete compressed output.

Basic stream::Encoder Usage

use zstd::stream::Encoder;
use std::io::Write;
 
fn stream_example() -> Result<(), Box<dyn std::error::Error>> {
    let data = b"Hello, World! This is some data to compress.";
    let mut output = Vec::new();
    
    // Create encoder that writes to output
    let mut encoder = Encoder::new(&mut output, 3)?;
    
    // Write data in chunks (streaming)
    encoder.write_all(data)?;
    
    // Finalize compression
    encoder.finish()?;
    
    println!("Original size: {}", data.len());
    println!("Compressed size: {}", output.len());
    
    // Decompress with streaming API
    let decompressed = zstd::stream::decode_all(&output[..])?;
    assert_eq!(data.to_vec(), decompressed);
    
    Ok(())
}

stream::Encoder processes data incrementally, writing compressed output as data arrives.

Memory Usage Comparison

use zstd::bulk::compress;
use zstd::stream::Encoder;
use std::io::Write;
 
fn memory_comparison() {
    // bulk::compress memory usage:
    // - Input: must be entirely in memory
    // - Output: must fit entirely in memory
    // - Working memory: zstd context (few MB)
    // - Total: ~input_size + output_size + context
    
    let large_data = vec
![0u8; 100_000_000];  // 100 MB
    
    // bulk requires all 100 MB in memory
    // plus compressed output (say 50 MB)
    // plus working memory
    
    // stream::Encoder memory usage:
    // - Input: can process in small chunks
    // - Output: can write to stream/pipe
    // - Working memory: bounded buffer (typically few MB)
    // - Total: bounded, regardless of data size
    
    // Streaming can process 100 GB file with only MB of memory
    // Bulk would need 100 GB + compressed size in memory
}

Bulk requires all data in memory; streaming has bounded memory usage.

Processing Large Files

use zstd::stream::{Encoder, Decoder};
use std::fs::File;
use std::io::{BufReader, BufWriter, Read, Write};
 
fn compress_large_file() -> Result<(), Box<dyn std::error::Error>> {
    // Stream processing: read from file, write to file
    // Never loads entire file into memory
    
    let input = File::open("large_input.bin")?;
    let output = File::create("compressed.zst")?;
    
    let reader = BufReader::new(input);
    let mut writer = BufWriter::new(output);
    
    // Encoder writes to output as it compresses
    let mut encoder = Encoder::new(&mut writer, 3)?;
    
    // Process in chunks
    let mut buffer = [0u8; 8192];
    let mut reader = reader;
    
    loop {
        let bytes_read = reader.read(&mut buffer)?;
        if bytes_read == 0 {
            break;
        }
        encoder.write_all(&buffer[..bytes_read])?;
    }
    
    // Finalize
    encoder.finish()?;
    writer.flush()?;
    
    // This works for files of any size
    // Memory usage is bounded by buffer size (8KB)
    
    Ok(())
}

Streaming handles arbitrarily large files with fixed memory.

Bulk API for Known-Size Data

use zstd::bulk::compress;
 
fn compress_known_size(data: &[u8]) -> Result<Vec<u8>, Box<dyn std::error::Error>> {
    // When data size is known and manageable:
    // - Bulk is simpler
    // - Bulk can be faster (no chunk overhead)
    // - Bulk may compress better (full context)
    
    // Good for:
    // - In-memory data structures
    // - Small files (< 100 MB typical)
    // - Network packets
    // - Database records
    
    let compressed = compress(data, 3)?;
    Ok(compressed)
}

Bulk is simpler and often faster for data that fits in memory.

Streaming for Network Protocols

use zstd::stream::Encoder;
use std::io::Write;
 
fn network_streaming() -> Result<(), Box<dyn std::error::Error>> {
    // Simulated network stream
    let mut network_buffer = Vec::new();
    
    // Create encoder that writes to network
    let mut encoder = Encoder::new(&mut network_buffer, 3)?;
    
    // Process data as it arrives from network
    fn receive_data() -> Option<Vec<u8>> {
        // Simulated network receive
        Some(b"chunk of data".to_vec())
    }
    
    while let Some(chunk) = receive_data() {
        // Compress each chunk as it arrives
        // No need to buffer all chunks first
        encoder.write_all(&chunk)?;
        
        // Could flush to send compressed data
        // encoder.flush()?;
    }
    
    // Finalize
    encoder.finish()?;
    
    // network_buffer contains compressed data
    // Could send as we go or at the end
    
    Ok(())
}

Streaming compresses data as it arrives, without buffering everything.

Compression Context Reuse

use zstd::bulk::{compress, Compressor};
 
fn context_reuse_bulk() -> Result<(), Box<dyn std::error::Error>> {
    // For multiple compressions, reuse context
    // This avoids re-initializing compression state
    
    let mut compressor = Compressor::new(3)?;
    
    // Reuse for multiple items
    let data1 = b"First item";
    let data2 = b"Second item";
    
    let compressed1 = compressor.compress(data1)?;
    let compressed2 = compressor.compress(data2)?;
    
    // Context reuse is more efficient than:
    let _ = compress(data1, 3)?;
    let _ = compress(data2, 3)?;
    // Which creates new context each time
    
    Ok(())
}

Compressor reuses context for multiple bulk compressions.

Streaming Context: Encoder

use zstd::stream::Encoder;
use std::io::Write;
 
fn streaming_context() -> Result<(), Box<dyn std::error::Error>> {
    // Encoder maintains context across writes
    // Later data can reference earlier data
    
    let mut output = Vec::new();
    let mut encoder = Encoder::new(&mut output, 3)?;
    
    // First chunk
    encoder.write_all(b"Hello, World!")?;
    
    // Second chunk can reference first chunk
    // Compression context spans writes
    encoder.write_all(b"Hello, again!")?;  // "Hello" likely compressed well
    
    // Third chunk
    encoder.write_all(b"Hello, final!")?;
    
    encoder.finish()?;
    
    // All chunks share context
    // Compression ratio benefits from full data context
    // Even though processed incrementally
    
    Ok(())
}

Streaming maintains compression context across chunks for good ratios.

Compression Ratio Trade-offs

use zstd::bulk::compress;
use zstd::stream::Encoder;
use std::io::Write;
 
fn compression_ratio_comparison() -> Result<(), Box<dyn std::error::Error>> {
    // Repeated data to compress
    let data: Vec<u8> = (0..1000).flat_map(|_| b"Hello, World! ").copied().collect();
    
    // Bulk compression: full context available
    let bulk_compressed = compress(&data, 3)?;
    
    // Streaming compression: same context maintained
    let mut stream_output = Vec::new();
    let mut encoder = Encoder::new(&mut stream_output, 3)?;
    
    // But if we reset encoder between chunks...
    let chunk_size = 100;
    for chunk in data.chunks(chunk_size) {
        encoder.write_all(chunk)?;
        // Context maintained across writes
    }
    encoder.finish()?;
    
    // Results are typically similar
    // Bulk might be slightly smaller due to internal optimization
    // Difference is usually small (few percent at most)
    
    println!("Original: {} bytes", data.len());
    println!("Bulk: {} bytes", bulk_compressed.len());
    println!("Stream: {} bytes", stream_output.len());
    
    Ok(())
}

Both achieve similar compression ratios; bulk may be slightly better.

Performance Characteristics

use zstd::bulk::compress;
use zstd::stream::Encoder;
use std::io::Write;
use std::time::Instant;
 
fn performance_comparison() -> Result<(), Box<dyn std::error::Error>> {
    let data: Vec<u8> = (0..1_000_000).map(|i| (i % 256) as u8).collect();
    
    // Bulk compression
    let start = Instant::now();
    let bulk_compressed = compress(&data, 3)?;
    let bulk_time = start.elapsed();
    
    // Streaming compression
    let start = Instant::now();
    let mut stream_output = Vec::new();
    let mut encoder = Encoder::new(&mut stream_output, 3)?;
    encoder.write_all(&data)?;
    encoder.finish()?;
    let stream_time = start.elapsed();
    
    // Bulk is typically faster because:
    // - Single allocation for output
    // - No chunk handling overhead
    // - Internal optimizations for complete data
    
    // Streaming has overhead from:
    // - Chunk boundary handling
    // - Incremental output buffering
    // - More API calls
    
    println!("Bulk: {} bytes in {:?}", bulk_compressed.len(), bulk_time);
    println!("Stream: {} bytes in {:?}", stream_output.len(), stream_time);
    
    Ok(())
}

Bulk is typically faster; streaming has more overhead per byte.

Streaming with Flush

use zstd::stream::Encoder;
use std::io::Write;
 
fn streaming_with_flush() -> Result<(), Box<dyn std::error::Error>> {
    let mut output = Vec::new();
    let mut encoder = Encoder::new(&mut output, 3)?;
    
    // Write first chunk
    encoder.write_all(b"First chunk")?;
    
    // Flush sends compressed data immediately
    // Useful for:
    // - Sending data over network
    // - Processing pipelined data
    // - Real-time compression
    
    encoder.flush()?;
    // Compressed data for "First chunk" is now in output
    
    // Write second chunk
    encoder.write_all(b"Second chunk")?;
    
    encoder.finish()?;
    
    // output contains all compressed data
    
    // Note: flush() may reduce compression ratio
    // because it forces zstd to output partial blocks
    
    Ok(())
}

flush() outputs compressed data early, useful for streaming but may reduce compression ratio.

Decompression Comparison

use zstd::bulk::decompress;
use zstd::stream::read::Decoder;
use std::io::Read;
 
fn decompression_comparison() -> Result<(), Box<dyn std::error::Error>> {
    let original = b"Hello, World!".repeat(100);
    let compressed = zstd::bulk::compress(&original, 3)?;
    
    // Bulk decompression: need to know size
    // Must know or guess the original size
    let decompressed_bulk = decompress(&compressed, original.len())?;
    
    // Streaming decompression: don't need size
    let mut decoder = Decoder::new(&compressed[..])?;
    let mut decompressed_stream = Vec::new();
    decoder.read_to_end(&mut decompressed_stream)?;
    
    // Streaming is useful when:
    // - Original size unknown
    // - Processing compressed stream
    // - Memory constrained
    
    assert_eq!(decompressed_bulk, original);
    assert_eq!(decompressed_stream, original);
    
    Ok(())
}

Streaming decompression doesn't require knowing the original size.

When to Use Bulk

use zstd::bulk::compress;
 
fn when_to_use_bulk() -> Result<(), Box<dyn std::error::Error>> {
    // Use bulk when:
    
    // 1. Data fits comfortably in memory
    let small_data = vec
![0u8; 1024];
    let _ = compress(&small_data, 3)?;
    
    // 2. Simpler API preferred
    // Bulk: one function call
    // Stream: create encoder, write, finish
    
    // 3. Data size is known
    // Bulk needs complete input at start
    
    // 4. Maximum compression ratio desired
    // Bulk may compress slightly better
    
    // 5. Processing small to medium files
    // Typical threshold: < 100 MB
    // Depends on available memory
    
    // 6. In-memory data structures
    let config = serde_json::to_vec(&Config::default())?;
    let compressed_config = compress(&config, 3)?;
    
    Ok(())
}
 
#[derive(serde::Serialize)]
struct Config {
    setting: String,
}
 
impl Default for Config {
    fn default() -> Self {
        Config { setting: "default".to_string() }
    }
}

Use bulk for in-memory data, simple APIs, and better compression.

When to Use Streaming

use zstd::stream::Encoder;
use std::io::{Read, Write};
 
fn when_to_use_streaming() -> Result<(), Box<dyn std::error::Error>> {
    // Use streaming when:
    
    // 1. Data doesn't fit in memory
    // Processing 10 GB file on 4 GB machine
    
    // 2. Data arrives incrementally
    // Network streams, pipes, real-time data
    
    // 3. Memory must be bounded
    // Embedded systems, constrained environments
    
    // 4. Processing large files
    // File-to-file compression
    
    // 5. Pipelining compression
    // Compress while reading/writing
    
    // 6. Unknown data size
    // Streaming doesn't need size upfront
    
    // Example: Compress any size file
    fn compress_file<R: Read, W: Write>(
        reader: &mut R,
        writer: &mut W,
    ) -> Result<(), std::io::Error> {
        let mut encoder = Encoder::new(writer, 3)?;
        std::io::copy(reader, &mut encoder)?;
        encoder.finish()?;
        Ok(())
    }
    
    Ok(())
}

Use streaming for large files, network data, and bounded memory.

Compression Levels

use zstd::bulk::compress;
use zstd::stream::Encoder;
use std::io::Write;
 
fn compression_levels() -> Result<(), Box<dyn std::error::Error>> {
    let data = b"Hello, World!".repeat(100);
    
    // Both APIs support compression levels 1-22
    // Level 1: fastest, lowest compression
    // Level 3: default balance
    // Level 22: slowest, highest compression
    
    // Bulk with different levels
    let fast = compress(&data, 1)?;
    let balanced = compress(&data, 3)?;
    let slow = compress(&data, 22)?;
    
    // Streaming with different levels
    let mut output = Vec::new();
    let mut encoder = Encoder::new(&mut output, 22)?;
    encoder.write_all(&data)?;
    encoder.finish()?;
    
    // Higher levels:
    // - Better compression ratio
    // - More memory usage during compression
    // - Slower compression time
    // - Same decompression speed
    
    println!("Level 1: {} bytes", fast.len());
    println!("Level 3: {} bytes", balanced.len());
    println!("Level 22: {} bytes", slow.len());
    
    Ok(())
}

Both APIs support the same compression level range.

Memory for Compression Levels

use zstd::stream::Encoder;
use std::io::Write;
 
fn level_memory() -> Result<(), Box<dyn std::error::Error>> {
    // Higher compression levels use more memory
    // This affects both bulk and streaming
    
    // Memory usage scales with level:
    // Level 1: ~few MB
    // Level 3: ~few MB (default)
    // Level 19: ~tens of MB
    // Level 22: ~hundreds of MB
    
    // Streaming still has bounded memory
    // But higher levels have larger working buffers
    
    let mut output = Vec::new();
    
    // High level needs more memory for window
    let mut encoder = Encoder::new(&mut output, 19)?;
    encoder.write_all(b"data")?;
    encoder.finish()?;
    
    // For streaming: memory bounded but larger
    // For bulk: memory scales with data size + level
    
    Ok(())
}

Higher compression levels use more memory in both APIs.

Error Handling Differences

use zstd::bulk::compress;
use zstd::stream::Encoder;
use std::io::Write;
 
fn error_handling() -> Result<(), Box<dyn std::error::Error>> {
    // Bulk: single Result
    // Either entire compression succeeds or fails
    let data = b"test data";
    let compressed = compress(data, 3)?;
    
    // If compression fails, you get an error
    // No partial output
    
    // Streaming: can fail at any write
    let mut output = Vec::new();
    let mut encoder = Encoder::new(&mut output, 3)?;
    
    // Errors can occur during:
    // - Encoder creation (e.g., invalid level)
    // - write_all (e.g., writer error)
    // - finish (e.g., final block error)
    
    encoder.write_all(data)?;
    encoder.finish()?;
    
    // If write fails, you may have partial output
    // Handle each operation separately
    
    Ok(())
}

Bulk returns single result; streaming can fail at multiple points.

Streaming to Multiple Outputs

use zstd::stream::Encoder;
use std::io::Write;
 
fn multiple_outputs() -> Result<(), Box<dyn std::error::Error>> {
    // Streaming allows writing to various destinations
    
    // Write to Vec
    let mut vec_output = Vec::new();
    let mut encoder1 = Encoder::new(&mut vec_output, 3)?;
    encoder1.write_all(b"data")?;
    encoder1.finish()?;
    
    // Write to file
    let file = std::fs::File::create("output.zst")?;
    let mut encoder2 = Encoder::new(file, 3)?;
    encoder2.write_all(b"data")?;
    encoder2.finish()?;
    
    // Write to network (anything implementing Write)
    // let stream = TcpStream::connect("...")?;
    // let mut encoder3 = Encoder::new(stream, 3)?;
    // encoder3.write_all(b"data")?;
    // encoder3.finish()?;
    
    // Bulk output always goes to Vec
    // Then you can write Vec anywhere
    
    Ok(())
}

Streaming can write to any Write implementation.

Real-World Example: File Compression Utility

use zstd::stream::{Encoder, Decoder};
use std::fs::File;
use std::io::{BufReader, BufWriter, Read, Write};
use std::path::Path;
 
fn compress_file_streaming(
    input_path: &Path,
    output_path: &Path,
    level: i32,
) -> Result<(), Box<dyn std::error::Error>> {
    // Works for files of any size
    // Memory usage bounded by buffer sizes
    
    let input = File::open(input_path)?;
    let output = File::create(output_path)?;
    
    let reader = BufReader::new(input);
    let writer = BufWriter::new(output);
    
    let mut encoder = Encoder::new(writer, level)?;
    std::io::copy(&mut reader.auto(), &mut encoder)?;
    encoder.finish()?;
    
    Ok(())
}
 
fn compress_file_bulk(
    input_path: &Path,
    output_path: &Path,
    level: i32,
) -> Result<(), Box<dyn std::error::Error>> {
    // Simpler but loads entire file into memory
    
    let data = std::fs::read(input_path)?;
    let compressed = zstd::bulk::compress(&data, level)?;
    std::fs::write(output_path, &compressed)?;
    
    Ok(())
}
 
// Usage choice:
// - For small files: bulk (simpler)
// - For large files: streaming (bounded memory)
// - For unknown size: streaming
// - For network: streaming

Streaming is essential for production file compression utilities.

API Complexity Comparison

use zstd::bulk::compress;
use zstd::stream::Encoder;
use std::io::Write;
 
fn api_comparison() -> Result<(), Box<dyn std::error::Error>> {
    let data = b"Hello, World!";
    
    // Bulk: Simple, one-shot
    let compressed = compress(data, 3)?;
    // One function call, returns Result<Vec<u8>>
    
    // Streaming: Multiple steps
    let mut output = Vec::new();
    let mut encoder = Encoder::new(&mut output, 3)?;  // Create
    encoder.write_all(data)?;                          // Write
    encoder.finish()?;                                  // Finalize
    // Three operations, output in separate buffer
    
    // Bulk is simpler but less flexible
    // Streaming is more complex but handles any size
    
    Ok(())
}

Bulk API is simpler; streaming requires multiple operations.

Synthesis

Key trade-offs:

Aspect bulk::compress stream::Encoder
Memory usage Entire data + output Bounded (few MB)
Data size Must fit in memory Any size
API complexity Simple (one call) More complex (create/write/finish)
Compression ratio Slightly better Similar (with context)
Performance Faster (less overhead) Slower (more overhead)
Use case In-memory data, small files Large files, streams, bounded memory

When to use each:

// Use bulk::compress when:
// - Data fits comfortably in memory
// - Simpler API is preferred
// - Slightly better compression ratio matters
// - Processing small files (< 100 MB typical)
// - In-memory structures (configs, database records)
 
let compressed = zstd::bulk::compress(&data, 3)?;
 
// Use stream::Encoder when:
// - Data doesn't fit in memory
// - Processing large files
// - Memory usage must be bounded
// - Data arrives incrementally (network, pipe)
// - Data size is unknown
 
let mut encoder = zstd::stream::Encoder::new(&mut output, 3)?;
encoder.write_all(&data)?;
encoder.finish()?;

Key insight: Both APIs produce valid zstd output and maintain similar compression ratios (streaming maintains context across writes). The choice is primarily about memory management and API preference. bulk::compress is simpler and slightly faster for data that fits in memory, while stream::Encoder is necessary for arbitrarily large data or constrained memory environments. The streaming API's overhead is the price for bounded memory usage—it maintains compression context across chunks but must handle chunk boundaries and incremental output buffering. For production systems processing user data or files, streaming is the safer choice because it prevents out-of-memory errors when data size is unknown.