What are the trade-offs between `zstd::stream::Encoder` and `block::Encoder` for different compression scenarios?

stream::Encoder processes data incrementally through a streaming interface, maintaining internal state and buffers for arbitrary-length inputs, while block::Encoder compresses fixed-size chunks without persistent state, trading streaming flexibility for lower per-operation overhead and more predictable memory usage. The streaming encoder is ideal for unknown or large data sizes, while the block encoder suits fixed-size data or scenarios requiring explicit control over compression boundaries.

Streaming vs Block Compression Models

use std::io::Write;
 
// Zstd provides two compression paradigms:
//
// Streaming (stream::Encoder):
// - Maintains internal state between operations
// - Processes data incrementally
// - Handles arbitrary-length input
// - Buffers data internally
// - Suitable for files, network streams, unknown sizes
//
// Block (block::Encoder):
// - Stateless per operation
// - Compresses complete chunks
// - No internal buffering beyond operation
// - Predictable memory usage
// - Suitable for known-size data, chunked protocols
 
// The key difference: streaming maintains context across writes;
// block compression is atomic per operation.

The streaming model treats compression as a continuous process; the block model treats it as discrete operations.

stream::Encoder: Incremental Compression

use std::io::{self, Write};
use zstd::stream::Encoder;
 
fn streaming_basics() -> io::Result<()> {
    let mut output = Vec::new();
    
    // Create a streaming encoder wrapping the output
    let mut encoder = Encoder::new(&mut output, 3)?;
    
    // Write data incrementally - encoder buffers internally
    encoder.write_all(b"First chunk of data")?;
    encoder.write_all(b"Second chunk of data")?;
    encoder.write_all(b"Third chunk")?;
    
    // Must call finish() to flush and finalize
    // This consumes the encoder and returns the output writer
    encoder.finish()?;
    
    // output now contains compressed data
    println!("Compressed {} bytes", output.len());
    
    Ok(())
}
 
fn streaming_large_file() -> io::Result<()> {
    // Streaming is ideal for large or unknown-size data
    use std::fs::File;
    
    let input = File::open("large_file.txt")?;
    let output = File::create("large_file.txt.zst")?;
    
    // Stream from file to file without loading all in memory
    let mut encoder = Encoder::new(output, 3)?;
    
    // Could also use io::copy for automatic chunking
    io::copy(&mut input.take(1024 * 1024), &mut encoder)?; // Process 1MB at a time
    
    encoder.finish()?;
    
    Ok(())
}

Streaming encoders maintain state across multiple write operations, enabling compression of arbitrarily large data.

block::Encoder: Atomic Compression

use zstd::block::Compressor;
 
fn block_basics() -> Result<(), zstd::Error> {
    let data = b"Hello, World! This is a test of block compression.";
    
    // Create a compressor with compression level
    let mut compressor = Compressor::new();
    
    // Compress entire data in one operation
    let compressed = compressor.compress(data, 3)?;
    
    println!("Original: {} bytes", data.len());
    println!("Compressed: {} bytes", compressed.len());
    
    // Each compress() call is independent
    // No state carried between calls
    
    Ok(())
}
 
fn multiple_blocks() -> Result<(), zstd::Error> {
    let mut compressor = Compressor::new();
    
    // Each block is compressed independently
    let block1 = b"First block of data";
    let block2 = b"Second block of different data";
    
    let compressed1 = compressor.compress(block1, 3)?;
    let compressed2 = compressor.compress(block2, 3)?;
    
    // Important: These are INDEPENDENT compressed blocks
    // Decompression must know block boundaries
    // No cross-block context for better compression
    
    // Decompress each separately
    let decompressor = zstd::block::Decompressor::new();
    let decompressed1 = decompressor.decompress(&compressed1, block1.len())?;
    let decompressed2 = decompressor.decompress(&compressed2, block2.len())?;
    
    assert_eq!(decompressed1.as_slice(), block1);
    assert_eq!(decompressed2.as_slice(), block2);
    
    Ok(())
}

Block compression operates on complete chunks, with no context shared between operations.

Memory Usage Characteristics

// ┌─────────────────────────────────────────────────────────────────────────────┐
// │ Aspect              │ stream::Encoder           │ block::Encoder            │
// ├─────────────────────────────────────────────────────────────────────────────┤
// │ Internal buffers    │ Yes, maintains buffers    │ Minimal per-operation     │
// │ Memory overhead     │ O(window_size)            │ O(1) per operation        │
// │ Peak memory         │ Proportional to window    │ Proportional to block     │
// │ Predictability      │ Depends on flush pattern  │ Exact per call             │
// │ Long-running state  │ Persistent context         │ None                       │
// └─────────────────────────────────────────────────────────────────────────────┘
 
fn memory_comparison() {
    // Streaming encoder maintains:
    // - Internal write buffer (accumulates until threshold)
    // - Compression context (dictionary, tables)
    // - Window buffer for back-references
    
    // Memory usage grows with:
    // - Window size (determines match distance)
    // - Compression level (higher = more tables)
    // - Pending unflushed data
    
    // Block encoder maintains:
    // - Only temporary buffers during compress()
    // - No persistent state between calls
    
    // Memory usage is:
    // - Proportional to input block size
    // - Freed immediately after compress()
}
 
fn configure_streaming_memory() -> Result<(), zstd::Error> {
    use zstd::stream::Encoder;
    use zstd::stream::raw::CParameter::*;
    
    // Control memory usage with parameters
    let mut output = Vec::new();
    let mut encoder = Encoder::new(&mut output, 3)?;
    
    // Lower window size = less memory but worse compression
    // encoder.set_parameter(WindowLog(15))?;  // 32KB window
    
    // Lower hash/log sizes for memory-constrained environments
    // encoder.set_parameter(HashLog(12))?;
    // encoder.set_parameter(ChainLog(12))?;
    
    Ok(())
}

Streaming maintains persistent memory; block frees after each operation.

Compression Context and Efficiency

use zstd::stream::Encoder;
use zstd::block::Compressor;
 
fn context_benefits() -> std::io::Result<()> {
    // Streaming encoder maintains compression context
    // This allows back-references across write calls
    
    let mut stream_output = Vec::new();
    let mut encoder = Encoder::new(&mut stream_output, 3)?;
    
    // Data with repetition across chunks
    let chunk1 = b"Hello World Hello World Hello ";
    let chunk2 = b"World Hello World Hello World";
    
    encoder.write_all(chunk1)?;
    encoder.write_all(chunk2)?;
    encoder.finish()?;
    
    // Streaming can reference "Hello World" from chunk1
    // when compressing chunk2, achieving better compression
    
    // Block compression cannot do this:
    let mut compressor = Compressor::new();
    let compressed1 = compressor.compress(chunk1, 3)?;
    let compressed2 = compressor.compress(chunk2, 3)?;
    
    // Each block compressed independently
    // chunk2 cannot reference chunk1's content
    
    // Total compressed size is often larger with blocks
    
    Ok(())
}
 
fn context_limitation() {
    // Block compression limitation:
    // If data contains repetition across blocks,
    // each block must encode repeats independently
    
    let repeated = b"abcabcabcabc";  // Pattern repeats
    
    // Splitting into blocks:
    let block1 = &repeated[..6];  // "abcabc"
    let block2 = &repeated[6..];  // "abcabc"
    
    // Block compression: both blocks compress similarly
    // Streaming: second half references first half
    
    // Streaming achieves better compression ratio when:
    // - Data has patterns spanning chunk boundaries
    // - Similar content appears throughout stream
}

Streaming maintains context for cross-chunk references; block compression cannot reference other blocks.

API and Usage Patterns

use std::io::{self, Write, Read};
use zstd::stream::{Encoder, Decoder};
 
fn stream_api_pattern() -> io::Result<()> {
    // Stream::Encoder implements io::Write
    // This integrates with Rust's IO ecosystem
    
    let mut output = Vec::new();
    let mut encoder = Encoder::new(&mut output, 3)?;
    
    // Write partial data
    encoder.write_all(b"data")?;
    
    // Flush to ensure data is written (but not finalized)
    encoder.flush()?;
    
    // Continue writing
    encoder.write_all(b"more data")?;
    
    // Finish consumes encoder, returns inner writer
    let mut output = encoder.finish()?;
    
    // Can now use output for other purposes
    
    // Decompression is also streaming
    let mut decoder = Decoder::new(&output[..])?;
    let mut decompressed = Vec::new();
    decoder.read_to_end(&mut decompressed)?;
    
    Ok(())
}
 
fn block_api_pattern() -> Result<(), zstd::Error> {
    // Block API is simpler but less flexible
    
    let mut compressor = zstd::block::Compressor::new();
    
    // Must provide complete data upfront
    let data = b"complete data to compress";
    let compressed = compressor.compress(data, 3)?;
    
    // Decompression requires knowing original size
    let mut decompressor = zstd::block::Decompressor::new();
    let decompressed = decompressor.decompress(&compressed, data.len())?;
    
    // Note: decompress() needs the ORIGINAL (uncompressed) size
    // This is a key difference from streaming decompression
    
    Ok(())
}
 
fn streaming_read_pattern() -> io::Result<()> {
    // Streaming supports io::Read for decompression
    
    let compressed_data: &[u8] = b"some compressed data";
    
    let mut decoder = Decoder::new(compressed_data)?;
    
    // Can read incrementally
    let mut buffer = [0u8; 1024];
    let bytes_read = decoder.read(&mut buffer)?;
    
    // Continue reading
    let more_read = decoder.read(&mut buffer)?;
    
    Ok(())
}

Streaming implements io::Write/io::Read; block uses simple compress/decompress methods.

When to Use Each Approach

// ┌─────────────────────────────────────────────────────────────────────────────┐
// │ Use stream::Encoder when:                                                   │
// │ - Data size is unknown or very large                                        │
// │ - Processing files or network streams                                       │
// │ - Want integration with io::Write ecosystem                                 │
// │ - Compression ratio matters (cross-chunk context)                           │
// │ - Data is naturally streaming (logs, events)                                │
// ├─────────────────────────────────────────────────────────────────────────────┤
// │ Use block::Encoder when:                                                    │
// │ - Data size is known and bounded                                            │
// │ - Memory usage must be predictable                                          │
// │ - Need to compress independent chunks                                       │
// │ - Want to control compression boundaries                                    │
// │ - Implementing chunked protocols (size prefix + compressed data)            │
// │ - Parallel compression of multiple blocks                                    │
// └─────────────────────────────────────────────────────────────────────────────┘
 
fn file_compression() -> io::Result<()> {
    // File compression: Use streaming
    // - Unknown or large size
    // - Streaming read/write
    // - Good compression ratio needed
    
    use std::fs::File;
    
    let input = File::open("input.txt")?;
    let output = File::create("input.txt.zst")?;
    
    let mut encoder = Encoder::new(output, 3)?;
    io::copy(&mut input.take(64 * 1024), &mut encoder)?;
    encoder.finish()?;
    
    Ok(())
}
 
fn message_protocol() -> Result<(), Box<dyn std::error::Error>> {
    // Message protocol: Use block
    // - Known message sizes
    // - Independent messages
    // - Size prefix for framing
    
    let messages: Vec<&[u8]> = vec![
        b"Message one",
        b"Message two",
        b"Message three",
    ];
    
    let mut compressor = zstd::block::Compressor::new();
    let mut compressed_messages = Vec::new();
    
    for msg in messages {
        // Compress each message independently
        let compressed = compressor.compress(msg, 3)?;
        
        // Frame: [size: u32][compressed data]
        compressed_messages.extend_from_slice(&(compressed.len() as u32).to_le_bytes());
        compressed_messages.extend_from_slice(&compressed);
    }
    
    // Each message can be decompressed independently
    // No context needed between messages
    
    Ok(())
}
 
fn parallel_compression() -> Result<(), zstd::Error> {
    // Block compression enables parallelism
    // Each block can be compressed independently
    
    use std::thread;
    
    let chunks: Vec<&[u8]> = vec![
        b"chunk one data",
        b"chunk two data",
        b"chunk three data",
    ];
    
    // Compress in parallel
    let handles: Vec<_> = chunks
        .into_iter()
        .map(|chunk| {
            thread::spawn(move || {
                let mut compressor = Compressor::new();
                compressor.compress(chunk, 3)
            })
        })
        .collect();
    
    let compressed: Vec<_> = handles
        .into_iter()
        .map(|h| h.join().unwrap())
        .collect();
    
    // All compressed independently
    // Streaming cannot do this (sequential dependency)
    
    Ok(())
}

Choose streaming for files and streams; choose block for protocols and parallelism.

Performance Characteristics

use std::time::Instant;
 
fn performance_comparison() -> Result<(), zstd::Error> {
    let data = vec![0u8; 1_000_000];  // 1MB of zeros
    
    // Block compression: simple, predictable
    let start = Instant::now();
    let mut compressor = zstd::block::Compressor::new();
    let compressed_block = compressor.compress(&data, 3)?;
    let block_duration = start.elapsed();
    
    // Streaming compression: overhead for setup, context
    let start = Instant::now();
    let mut output = Vec::new();
    let mut encoder = zstd::stream::Encoder::new(&mut output, 3)?;
    encoder.write_all(&data)?;
    encoder.finish()?;
    let stream_duration = start.elapsed();
    
    // For single-chunk known-size data:
    // - Block is often faster (less overhead)
    // - Similar compression ratio
    
    // But streaming wins for:
    // - Multiple small writes (block overhead per call)
    // - Large data (block needs entire data in memory)
    
    println!("Block: {:?}", block_duration);
    println!("Stream: {:?}", stream_duration);
    
    Ok(())
}
 
fn compression_ratio_comparison() -> std::io::Result<()> {
    // Compression ratio depends on context sharing
    
    let repeating_data: Vec<u8> = (0..1000)
        .flat_map(|_| b"repeating pattern".iter().copied())
        .collect();
    
    // Split into chunks for comparison
    let chunk_size = 5000;
    
    // Block: each chunk compressed independently
    let mut compressor = zstd::block::Compressor::new();
    let block_total: usize = repeating_data
        .chunks(chunk_size)
        .map(|chunk| compressor.compress(chunk, 3).unwrap().len())
        .sum();
    
    // Stream: context shared across chunks
    let mut stream_output = Vec::new();
    let mut encoder = zstd::stream::Encoder::new(&mut stream_output, 3)?;
    for chunk in repeating_data.chunks(chunk_size) {
        encoder.write_all(chunk)?;
    }
    encoder.finish()?;
    let stream_total = stream_output.len();
    
    // Stream compression typically smaller because:
    // - Later chunks can reference earlier content
    // - Dictionary/context built across chunks
    
    println!("Block total: {}", block_total);
    println!("Stream total: {}", stream_total);
    
    Ok(())
}

Block is faster for single chunks; streaming achieves better ratios for repeated patterns.

Practical Example: Chunked File Format

use std::io::{self, Write, Read};
 
// A chunked file format using block compression
// Each chunk: [original_size: u32][compressed_size: u32][compressed_data]
 
const CHUNK_SIZE: usize = 64 * 1024;  // 64KB chunks
 
fn compress_chunked(input: &[u8]) -> io::Result<Vec<u8>> {
    let mut compressor = zstd::block::Compressor::new();
    let mut output = Vec::new();
    
    for chunk in input.chunks(CHUNK_SIZE) {
        let compressed = compressor.compress(chunk, 3)
            .map_err(|e| io::Error::new(io::ErrorKind::Other, e))?;
        
        // Write header: original size, compressed size
        output.write_u32::<byteorder::LittleEndian>(chunk.len() as u32)?;
        output.write_u32::<byteorder::LittleEndian>(compressed.len() as u32)?;
        
        // Write compressed data
        output.extend_from_slice(&compressed);
    }
    
    Ok(output)
}
 
fn decompress_chunked(input: &[u8]) -> io::Result<Vec<u8>> {
    let mut decompressor = zstd::block::Decompressor::new();
    let mut output = Vec::new();
    let mut pos = 0;
    
    while pos < input.len() {
        // Read header
        let original_size = u32::from_le_bytes([input[pos], input[pos+1], input[pos+2], input[pos+3]]) as usize;
        pos += 4;
        let compressed_size = u32::from_le_bytes([input[pos], input[pos+1], input[pos+2], input[pos+3]]) as usize;
        pos += 4;
        
        // Decompress
        let compressed = &input[pos..pos + compressed_size];
        pos += compressed_size;
        
        let decompressed = decompressor.decompress(compressed, original_size)
            .map_err(|e| io::Error::new(io::ErrorKind::Other, e))?;
        
        output.extend_from_slice(&decompressed);
    }
    
    Ok(output)
}
 
// This chunked approach allows:
// - Parallel compression (each chunk independent)
// - Parallel decompression (each chunk independent)
// - Known memory bounds (chunk size limit)
// - Random access to chunks (with index)

Block compression enables chunked formats with parallelism and bounded memory.

Practical Example: Streaming Log Compression

use std::io::{self, Write};
use std::fs::File;
 
struct LogWriter {
    encoder: zstd::stream::Encoder<File>,
}
 
impl LogWriter {
    fn new(path: &str) -> io::Result<Self> {
        let file = File::create(path)?;
        let encoder = zstd::stream::Encoder::new(file, 3)?;
        Ok(Self { encoder })
    }
    
    fn write_log(&mut self, level: &str, message: &str) -> io::Result<()> {
        // Log lines come incrementally - perfect for streaming
        writeln!(self.encoder, "[{}] {}", level, message)
    }
    
    fn close(self) -> io::Result<()> {
        self.encoder.finish()?;
        Ok(())
    }
}
 
fn logging_example() -> io::Result<()> {
    let mut logger = LogWriter::new("app.log.zst")?;
    
    // Streaming handles this incremental writing naturally
    // Block compression would need to buffer or split logs
    
    logger.write_log("INFO", "Application started")?;
    logger.write_log("DEBUG", "Processing request")?;
    logger.write_log("INFO", "Request completed")?;
    logger.write_log("WARN", "Cache miss for key")?;
    logger.write_log("ERROR", "Database connection failed")?;
    
    logger.close()?;
    
    Ok(())
}

Streaming compression suits incremental data like logs; block requires complete chunks.

Complete Summary

use zstd::stream::Encoder;
use zstd::block::Compressor;
 
fn complete_summary() {
    // ┌─────────────────────────────────────────────────────────────────────────┐
    // │ Aspect              │ stream::Encoder       │ block::Encoder           │
    // ├─────────────────────────────────────────────────────────────────────────┤
    // │ API style           │ io::Write, streaming  │ compress/decompress      │
    // │ Input requirement   │ Incremental writes    │ Complete buffer          │
    // │ Output access       │ After finish()        │ Immediate                │
    // │ Memory pattern      │ Persistent context    │ Freed per operation      │
    // │ Context sharing     │ Across writes         │ None                     │
    // │ Compression ratio   │ Better (context)      │ Good per chunk           │
    // │ Parallelism         │ Sequential only       │ Parallel possible        │
    // │ Use case            │ Streams, files        │ Messages, chunks         │
    // └─────────────────────────────────────────────────────────────────────────┘
    
    // Choose stream::Encoder when:
    // 1. Unknown or unbounded data size
    // 2. Processing files, network streams
    // 3. Want best compression ratio
    // 4. Data naturally streams (logs, events)
    // 5. Integration with io::Write ecosystem
    
    // Choose block::Encoder when:
    // 1. Known, bounded data size
    // 2. Memory must be predictable
    // 3. Need independent chunks
    // 4. Want parallel compression
    // 5. Implementing message protocols
    // 6. Need explicit control over boundaries
}
 
// Key insight:
// stream::Encoder is for continuous data where context and incremental
// processing matter—files, streams, logs. It maintains state between
// writes, enabling cross-chunk references for better compression.
// block::Encoder is for discrete data where independence and control
// matter—messages, chunks, fixed-size records. It has no persistent
// state, enabling parallel compression and predictable memory usage.
// The streaming API integrates with io::Write; the block API is simpler
// but requires knowing the uncompressed size for decompression.
// Choose streaming for compression ratio and integration; choose block
// for parallelism and predictability.

Key insight: stream::Encoder maintains compression context across writes, achieving better ratios for data with patterns spanning chunk boundaries, while block::Encoder compresses each chunk independently, enabling parallelism and predictable memory. Streaming is the right choice for files, logs, and network streams where data arrives incrementally and compression ratio matters. Block compression suits message protocols, chunked formats, and scenarios requiring parallel compression or explicit control over compression boundaries. The API difference reflects this: streaming implements io::Write for incremental writing, while block requires complete input upfront and returns complete output immediately.

What are the trade-offs between zstd::stream::Encoder and block::Encoder for different compression scenarios?