What are the trade-offs between `zstd::bulk` and `zstd::stream` compression APIs for different use cases?

The zstd::bulk API compresses entire buffers in memory with optimal compression ratios and simpler code, while zstd::stream compresses data incrementally through a streaming interface with constant memory overhead regardless of input size. Use bulk when you have the entire input available and it fits comfortably in memory—this yields the best compression ratios because zstd can analyze the full dataset for patterns. Use stream when processing large files, network data, or any scenario where holding the entire input or output in memory is impractical. The streaming API maintains compression state across chunks, so you get proper zstd compression with chunk-by-chunk processing.

Bulk Compression: Simple and Optimal

use zstd::bulk::{compress, decompress};
 
fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = b"Hello, World! This is some text to compress. \
                 Hello, World! This is some text to compress.";
    
    // Bulk compression: entire input must fit in memory
    let compressed = compress(data, 3)?;  // compression level 3
    
    println!("Original: {} bytes", data.len());
    println!("Compressed: {} bytes", compressed.len());
    
    // Decompression also requires entire output in memory
    let decompressed = decompress(&compressed, data.len())?;
    
    assert_eq!(data.to_vec(), decompressed);
    
    Ok(())
}

bulk requires all data in memory but provides optimal compression with a simple API.

Stream Compression: Constant Memory

use zstd::stream::{Compressor, Decompressor};
use std::io::{self, Read, Cursor};
 
fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = b"Hello, World! This is some text to compress. \
                 Hello, World! This is some text to compress.";
    
    // Streaming compression with output buffer
    let mut output = Vec::new();
    {
        let mut compressor = Compressor::new(&mut output, 3)?;
        
        // Can compress in chunks
        compressor.compress(data)?;
        
        // Must finish to write final data
        compressor.finish()?;
    }
    
    println!("Original: {} bytes", data.len());
    println!("Compressed: {} bytes", output.len());
    
    // Streaming decompression
    let mut decompressed = Vec::new();
    {
        let input = Cursor::new(&output);
        let mut decompressor = Decompressor::new(input)?;
        
        // Read decompressed data in chunks
        decompressor.decompress(&mut decompressed, data.len())?;
    }
    
    assert_eq!(data.to_vec(), decompressed);
    
    Ok(())
}

stream processes data incrementally with fixed memory overhead regardless of data size.

Memory Usage Comparison

use zstd::bulk::compress;
use zstd::stream::Compressor;
 
fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Simulate a large dataset
    let large_data: Vec<u8> = (0..=255).cycle().take(10_000_000).collect();
    
    // Bulk: requires entire input AND output in memory
    // Memory = input (10 MB) + compressed output (~few MB) + compression context
    let compressed_bulk = compress(&large_data, 3)?;
    println!("Bulk compressed: {} bytes", compressed_bulk.len());
    // Peak memory: ~15+ MB
    
    // Stream: requires only fixed buffers + compression context
    // Memory = context (few MB) + small buffers
    let mut stream_output = Vec::new();
    {
        let mut compressor = Compressor::new(&mut stream_output, 3)?;
        
        // Process in 64 KB chunks
        for chunk in large_data.chunks(65536) {
            compressor.compress(chunk)?;
        }
        compressor.finish()?;
    }
    
    println!("Stream compressed: {} bytes", stream_output.len());
    
    // Both produce similar compression
    // But stream uses constant memory regardless of input size
    
    Ok(())
}

stream uses O(1) memory relative to input; bulk uses O(n).

Streaming File Compression

use zstd::stream::Compressor;
use std::fs::File;
use std::io::{self, BufReader, BufWriter, Read, Write};
 
fn compress_file(input_path: &str, output_path: &str, level: i32) -> io::Result<()> {
    let input = File::open(input_path)?;
    let output = File::create(output_path)?;
    
    let reader = BufReader::new(input);
    let writer = BufWriter::new(output);
    
    let mut compressor = Compressor::new(writer, level)
        .map_err(|e| io::Error::new(io::ErrorKind::Other, e))?;
    
    // Stream through file without loading entire file
    let mut buffer = [0u8; 65536];  // 64 KB buffer
    let mut total_read = 0;
    
    // Can't use BufReader directly with Compressor::compress
    // Need to read chunks manually
    for chunk in reader.bytes().chunks(65536) {
        let chunk: Result<Vec<_>, _> = chunk.collect();
        let chunk = chunk?;
        compressor.compress(&chunk)?;
        total_read += chunk.len();
    }
    
    compressor.finish()
        .map_err(|e| io::Error::new(io::ErrorKind::Other, e))?;
    
    Ok(())
}
 
fn main() -> io::Result<()> {
    // Compress a large file with constant memory
    compress_file("large_input.txt", "compressed.zst", 3)?;
    
    // This works for files of any size
    // Bulk API would require loading entire file into memory first
    
    Ok(())
}

Streaming enables processing files larger than available memory.

Compression Level Trade-offs

use zstd::bulk::compress;
 
fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = b"Repeated pattern: "].repeat(1000);
    let data = data.concat();
    
    // Test different compression levels
    for level in 1..=5 {
        let compressed = compress(&data, level)?;
        let ratio = compressed.len() as f64 / data.len() as f64;
        println!(
            "Level {}: {} bytes -> {} bytes ({:.1}% of original)",
            level,
            data.len(),
            compressed.len(),
            ratio * 100.0
        );
    }
    
    // Level 1: Fastest compression, larger output
    // Level 3: Default, good balance
    // Level 19+: Maximum compression, slowest
    
    // Higher levels:
    // - Better compression ratio
    // - Slower compression time
    // - Same decompression speed
    // - More memory during compression
    
    Ok(())
}

Compression level affects ratio, speed, and memory use during compression.

Streaming with Read/Write Traits

use zstd::stream::{Compressor, Decompressor};
use std::io::{self, Read, Write, Cursor};
 
fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = b"Hello, streaming world!";
    
    // Compress to in-memory buffer
    let mut compressed = Vec::new();
    {
        let mut compressor = Compressor::new(&mut compressed, 3)?;
        compressor.compress(data)?;
        compressor.finish()?;
    }
    
    // Decompress by reading from stream
    let mut decompressed = Vec::new();
    {
        let cursor = Cursor::new(&compressed);
        let mut decompressor = Decompressor::new(cursor)?;
        
        // Read decompressed data
        decompressor.read_to_end(&mut decompressed)?;
    }
    
    assert_eq!(data.to_vec(), decompressed);
    
    Ok(())
}

Decompressor implements Read, enabling standard I/O patterns.

Chaining with Other Stream Processors

use zstd::stream::Compressor;
use std::io::{self, Cursor};
 
fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = b"Process through multiple stages";
    
    // First stage: zstd compression
    let mut compressed = Vec::new();
    {
        let mut compressor = Compressor::new(&mut compressed, 3)?;
        compressor.compress(data)?;
        compressor.finish()?;
    }
    
    // Second stage: could chain with encryption, base64, etc.
    let base64_encoded = base64::engine::general_purpose::STANDARD.encode(&compressed);
    
    println!("Encoded length: {} bytes", base64_encoded.len());
    
    // Processing pipeline:
    // data -> compress -> encode -> transmit
    // receive -> decode -> decompress -> data
    
    // Decode
    let compressed = base64::engine::general_purpose::STANDARD.decode(&base64_encoded)?;
    
    // Decompress
    use zstd::stream::Decompressor;
    let cursor = Cursor::new(&compressed);
    let mut decompressor = Decompressor::new(cursor)?;
    let mut decompressed = Vec::new();
    decompressor.read_to_end(&mut decompressed)?;
    
    assert_eq!(data.to_vec(), decompressed);
    
    Ok(())
}

Streaming integrates with other data processing stages.

Decompression with Unknown Size

use zstd::bulk::compress;
use zstd::stream::Decompressor;
use std::io::Cursor;
 
fn main() -> Result<(), Box<dyn std::error::Error>> {
    let original = b"Data to compress for size test";
    let compressed = compress(original, 3)?;
    
    // Bulk decompression requires knowing max output size
    // This can be problematic if you don't know the original size
    
    // Stream decompression doesn't require knowing size upfront
    let cursor = Cursor::new(&compressed);
    let mut decompressor = Decompressor::new(cursor)?;
    
    let mut decompressed = Vec::new();
    decompressor.read_to_end(&mut decompressed)?;
    
    assert_eq!(original.to_vec(), decompressed);
    
    // Useful when:
    // - Original size is unknown
    // - Maximum size is known but actual size varies
    // - Processing untrusted compressed data
    
    Ok(())
}

Streaming decompression doesn't require pre-declaring output size.

Compression Context Reuse

use zstd::bulk::{Compressor, Decompressor};
 
fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Bulk compressor can be reused for multiple operations
    // This reuses internal state for efficiency
    let mut compressor = Compressor::new()?;
    
    let data1 = b"First dataset";
    let data2 = b"Second dataset";
    
    // Compress multiple datasets reusing context
    let compressed1 = compressor.compress(data1, 3)?;
    let compressed2 = compressor.compress(data2, 3)?;
    
    // Similarly for decompression
    let mut decompressor = Decompressor::new()?;
    
    let decompressed1 = decompressor.decompress(&compressed1, data1.len())?;
    let decompressed2 = decompressor.decompress(&compressed2, data2.len())?;
    
    assert_eq!(data1.to_vec(), decompressed1);
    assert_eq!(data2.to_vec(), decompressed2);
    
    Ok(())
}

Bulk Compressor/Decompressor types allow reusing context across operations.

When to Use Each Approach

// Use zstd::bulk when:
// 1. Entire input fits in memory
// 2. You want simplest possible API
// 3. Maximum compression ratio matters
// 4. Processing speed is priority over memory
 
// Use zstd::stream when:
// 1. Input is large or streaming
// 2. Memory is constrained
// 3. Processing files that don't fit in RAM
// 4. Network data without buffering entire stream
// 5. Integrating with other streaming components
 
// Example: In-memory cache entry
fn compress_cache_entry(data: &[u8]) -> Vec<u8> {
    // Data is in memory anyway, use bulk
    zstd::bulk::compress(data, 3).unwrap()
}
 
// Example: File archiver
fn compress_file_streaming(input: impl Read, output: impl Write) -> io::Result<()> {
    // File may be large, use streaming
    let mut compressor = zstd::stream::Compressor::new(output, 3)
        .map_err(|e| io::Error::new(io::ErrorKind::Other, e))?;
    
    let mut buffer = [0u8; 65536];
    // Process file in chunks...
    
    compressor.finish()
        .map_err(|e| io::Error::new(io::ErrorKind::Other, e))?;
    Ok(())
}

Choose based on data size, memory constraints, and integration requirements.

Performance Characteristics

use zstd::bulk::compress;
use std::time::Instant;
 
fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data: Vec<u8> = (0..=255).cycle().take(1_000_000).collect();
    
    // Compare compression levels
    for level in [1, 3, 9, 19] {
        let start = Instant::now();
        let compressed = compress(&data, level)?;
        let duration = start.elapsed();
        
        println!(
            "Level {:2}: {:7} bytes ({:5.1}% of original) in {:?}",
            level,
            compressed.len(),
            compressed.len() as f64 / data.len() as f64 * 100.0,
            duration
        );
    }
    
    // Higher levels:
    // - Take longer to compress
    // - Produce smaller output
    // - Same decompression speed
    // - Use more memory during compression
    
    Ok(())
}

Compression level trades time and memory for compression ratio.

Error Handling

use zstd::bulk::compress;
use zstd::stream::Compressor;
use std::io::Cursor;
 
fn main() {
    // Bulk compression errors
    match compress(b"test", 999) {  // Invalid level
        Ok(_) => println!("Success"),
        Err(e) => println!("Error: {}", e),
    }
    
    // Streaming compression errors
    let result: Result<(), _> = {
        let output = Vec::new();
        let mut compressor = Compressor::new(output, 3).unwrap();
        
        // Errors propagate through stream
        compressor.compress(b"data").unwrap();
        compressor.finish()
    };
    
    // Both APIs return Result with zstd::Error
    // Common errors:
    // - Invalid compression level
    // - Corrupted input (decompression)
    // - Insufficient output buffer (bulk decompression)
}

Both APIs return Result with descriptive error types.

Real-World Example: File Archiver

use zstd::stream::Compressor;
use std::fs::File;
use std::io::{self, BufWriter, Read, BufReader};
use std::path::Path;
 
fn compress_file<P: AsRef<Path>>(
    input_path: P,
    output_path: P,
    level: i32,
) -> io::Result<()> {
    let input = File::open(input_path)?;
    let output = File::create(output_path)?;
    
    // Use buffering for I/O efficiency
    let reader = BufReader::new(input);
    let writer = BufWriter::new(output);
    
    // Stream compression
    let mut compressor = Compressor::new(writer, level)
        .map_err(|e| io::Error::new(io::ErrorKind::Other, e))?;
    
    let mut reader = reader;
    let mut buffer = vec![0u8; 65536];
    
    loop {
        let bytes_read = reader.read(&mut buffer)?;
        if bytes_read == 0 {
            break;
        }
        compressor.compress(&buffer[..bytes_read])
            .map_err(|e| io::Error::new(io::ErrorKind::Other, e))?;
    }
    
    compressor.finish()
        .map_err(|e| io::Error::new(io::ErrorKind::Other, e))?;
    
    Ok(())
}
 
fn main() -> io::Result<()> {
    // Compress a large file
    compress_file("large_input.txt", "compressed.zst", 3)?;
    
    // Memory usage: ~64 KB buffer + compression context
    // Works for files of any size
    
    Ok(())
}

Streaming compression handles arbitrarily large files with bounded memory.

Synthesis

Quick reference:

// Bulk API: Simple, optimal compression
use zstd::bulk::{compress, decompress};
 
let data = b"input data";
let compressed = compress(data, 3)?;
let decompressed = decompress(&compressed, data.len())?;
 
// Use bulk when:
// - Data fits in memory
// - Want simplest API
// - Best compression ratio
// - Known input/output sizes
 
// Stream API: Constant memory, streaming
use zstd::stream::{Compressor, Decompressor};
 
let mut output = Vec::new();
let mut compressor = Compressor::new(&mut output, 3)?;
compressor.compress(data)?;
compressor.finish()?;
 
// Use stream when:
// - Large files or streaming data
// - Memory constrained
// - Integrating with other I/O
// - Unknown output size (decompression)

Key insight: The bulk and stream APIs serve different needs along the memory vs. simplicity trade-off. bulk is simpler and gives marginally better compression ratios because zstd can see the entire dataset at once, but requires all data in memory. stream processes data incrementally with fixed memory overhead, making it essential for large files or streaming data, while still maintaining proper zstd compression across chunk boundaries. For most applications, if the data comfortably fits in memory, use bulk for its simpler API. If you're processing files, network streams, or data larger than a few hundred megabytes, use stream to keep memory usage bounded.

What are the trade-offs between zstd::bulk and zstd::stream compression APIs for different use cases?