What are the trade-offs between zstd::stream::Encoder and block::Encoder for different compression scenarios?
stream::Encoder processes data incrementally through a streaming interface, maintaining internal state and buffers for arbitrary-length inputs, while block::Encoder compresses fixed-size chunks without persistent state, trading streaming flexibility for lower per-operation overhead and more predictable memory usage. The streaming encoder is ideal for unknown or large data sizes, while the block encoder suits fixed-size data or scenarios requiring explicit control over compression boundaries.
Streaming vs Block Compression Models
use std::io::Write;
// Zstd provides two compression paradigms:
//
// Streaming (stream::Encoder):
// - Maintains internal state between operations
// - Processes data incrementally
// - Handles arbitrary-length input
// - Buffers data internally
// - Suitable for files, network streams, unknown sizes
//
// Block (block::Encoder):
// - Stateless per operation
// - Compresses complete chunks
// - No internal buffering beyond operation
// - Predictable memory usage
// - Suitable for known-size data, chunked protocols
// The key difference: streaming maintains context across writes;
// block compression is atomic per operation.The streaming model treats compression as a continuous process; the block model treats it as discrete operations.
stream::Encoder: Incremental Compression
use std::io::{self, Write};
use zstd::stream::Encoder;
fn streaming_basics() -> io::Result<()> {
let mut output = Vec::new();
// Create a streaming encoder wrapping the output
let mut encoder = Encoder::new(&mut output, 3)?;
// Write data incrementally - encoder buffers internally
encoder.write_all(b"First chunk of data")?;
encoder.write_all(b"Second chunk of data")?;
encoder.write_all(b"Third chunk")?;
// Must call finish() to flush and finalize
// This consumes the encoder and returns the output writer
encoder.finish()?;
// output now contains compressed data
println!("Compressed {} bytes", output.len());
Ok(())
}
fn streaming_large_file() -> io::Result<()> {
// Streaming is ideal for large or unknown-size data
use std::fs::File;
let input = File::open("large_file.txt")?;
let output = File::create("large_file.txt.zst")?;
// Stream from file to file without loading all in memory
let mut encoder = Encoder::new(output, 3)?;
// Could also use io::copy for automatic chunking
io::copy(&mut input.take(1024 * 1024), &mut encoder)?; // Process 1MB at a time
encoder.finish()?;
Ok(())
}Streaming encoders maintain state across multiple write operations, enabling compression of arbitrarily large data.
block::Encoder: Atomic Compression
use zstd::block::Compressor;
fn block_basics() -> Result<(), zstd::Error> {
let data = b"Hello, World! This is a test of block compression.";
// Create a compressor with compression level
let mut compressor = Compressor::new();
// Compress entire data in one operation
let compressed = compressor.compress(data, 3)?;
println!("Original: {} bytes", data.len());
println!("Compressed: {} bytes", compressed.len());
// Each compress() call is independent
// No state carried between calls
Ok(())
}
fn multiple_blocks() -> Result<(), zstd::Error> {
let mut compressor = Compressor::new();
// Each block is compressed independently
let block1 = b"First block of data";
let block2 = b"Second block of different data";
let compressed1 = compressor.compress(block1, 3)?;
let compressed2 = compressor.compress(block2, 3)?;
// Important: These are INDEPENDENT compressed blocks
// Decompression must know block boundaries
// No cross-block context for better compression
// Decompress each separately
let decompressor = zstd::block::Decompressor::new();
let decompressed1 = decompressor.decompress(&compressed1, block1.len())?;
let decompressed2 = decompressor.decompress(&compressed2, block2.len())?;
assert_eq!(decompressed1.as_slice(), block1);
assert_eq!(decompressed2.as_slice(), block2);
Ok(())
}Block compression operates on complete chunks, with no context shared between operations.
Memory Usage Characteristics
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
// β Aspect β stream::Encoder β block::Encoder β
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
// β Internal buffers β Yes, maintains buffers β Minimal per-operation β
// β Memory overhead β O(window_size) β O(1) per operation β
// β Peak memory β Proportional to window β Proportional to block β
// β Predictability β Depends on flush pattern β Exact per call β
// β Long-running state β Persistent context β None β
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
fn memory_comparison() {
// Streaming encoder maintains:
// - Internal write buffer (accumulates until threshold)
// - Compression context (dictionary, tables)
// - Window buffer for back-references
// Memory usage grows with:
// - Window size (determines match distance)
// - Compression level (higher = more tables)
// - Pending unflushed data
// Block encoder maintains:
// - Only temporary buffers during compress()
// - No persistent state between calls
// Memory usage is:
// - Proportional to input block size
// - Freed immediately after compress()
}
fn configure_streaming_memory() -> Result<(), zstd::Error> {
use zstd::stream::Encoder;
use zstd::stream::raw::CParameter::*;
// Control memory usage with parameters
let mut output = Vec::new();
let mut encoder = Encoder::new(&mut output, 3)?;
// Lower window size = less memory but worse compression
// encoder.set_parameter(WindowLog(15))?; // 32KB window
// Lower hash/log sizes for memory-constrained environments
// encoder.set_parameter(HashLog(12))?;
// encoder.set_parameter(ChainLog(12))?;
Ok(())
}Streaming maintains persistent memory; block frees after each operation.
Compression Context and Efficiency
use zstd::stream::Encoder;
use zstd::block::Compressor;
fn context_benefits() -> std::io::Result<()> {
// Streaming encoder maintains compression context
// This allows back-references across write calls
let mut stream_output = Vec::new();
let mut encoder = Encoder::new(&mut stream_output, 3)?;
// Data with repetition across chunks
let chunk1 = b"Hello World Hello World Hello ";
let chunk2 = b"World Hello World Hello World";
encoder.write_all(chunk1)?;
encoder.write_all(chunk2)?;
encoder.finish()?;
// Streaming can reference "Hello World" from chunk1
// when compressing chunk2, achieving better compression
// Block compression cannot do this:
let mut compressor = Compressor::new();
let compressed1 = compressor.compress(chunk1, 3)?;
let compressed2 = compressor.compress(chunk2, 3)?;
// Each block compressed independently
// chunk2 cannot reference chunk1's content
// Total compressed size is often larger with blocks
Ok(())
}
fn context_limitation() {
// Block compression limitation:
// If data contains repetition across blocks,
// each block must encode repeats independently
let repeated = b"abcabcabcabc"; // Pattern repeats
// Splitting into blocks:
let block1 = &repeated[..6]; // "abcabc"
let block2 = &repeated[6..]; // "abcabc"
// Block compression: both blocks compress similarly
// Streaming: second half references first half
// Streaming achieves better compression ratio when:
// - Data has patterns spanning chunk boundaries
// - Similar content appears throughout stream
}Streaming maintains context for cross-chunk references; block compression cannot reference other blocks.
API and Usage Patterns
use std::io::{self, Write, Read};
use zstd::stream::{Encoder, Decoder};
fn stream_api_pattern() -> io::Result<()> {
// Stream::Encoder implements io::Write
// This integrates with Rust's IO ecosystem
let mut output = Vec::new();
let mut encoder = Encoder::new(&mut output, 3)?;
// Write partial data
encoder.write_all(b"data")?;
// Flush to ensure data is written (but not finalized)
encoder.flush()?;
// Continue writing
encoder.write_all(b"more data")?;
// Finish consumes encoder, returns inner writer
let mut output = encoder.finish()?;
// Can now use output for other purposes
// Decompression is also streaming
let mut decoder = Decoder::new(&output[..])?;
let mut decompressed = Vec::new();
decoder.read_to_end(&mut decompressed)?;
Ok(())
}
fn block_api_pattern() -> Result<(), zstd::Error> {
// Block API is simpler but less flexible
let mut compressor = zstd::block::Compressor::new();
// Must provide complete data upfront
let data = b"complete data to compress";
let compressed = compressor.compress(data, 3)?;
// Decompression requires knowing original size
let mut decompressor = zstd::block::Decompressor::new();
let decompressed = decompressor.decompress(&compressed, data.len())?;
// Note: decompress() needs the ORIGINAL (uncompressed) size
// This is a key difference from streaming decompression
Ok(())
}
fn streaming_read_pattern() -> io::Result<()> {
// Streaming supports io::Read for decompression
let compressed_data: &[u8] = b"some compressed data";
let mut decoder = Decoder::new(compressed_data)?;
// Can read incrementally
let mut buffer = [0u8; 1024];
let bytes_read = decoder.read(&mut buffer)?;
// Continue reading
let more_read = decoder.read(&mut buffer)?;
Ok(())
}Streaming implements io::Write/io::Read; block uses simple compress/decompress methods.
When to Use Each Approach
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
// β Use stream::Encoder when: β
// β - Data size is unknown or very large β
// β - Processing files or network streams β
// β - Want integration with io::Write ecosystem β
// β - Compression ratio matters (cross-chunk context) β
// β - Data is naturally streaming (logs, events) β
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
// β Use block::Encoder when: β
// β - Data size is known and bounded β
// β - Memory usage must be predictable β
// β - Need to compress independent chunks β
// β - Want to control compression boundaries β
// β - Implementing chunked protocols (size prefix + compressed data) β
// β - Parallel compression of multiple blocks β
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
fn file_compression() -> io::Result<()> {
// File compression: Use streaming
// - Unknown or large size
// - Streaming read/write
// - Good compression ratio needed
use std::fs::File;
let input = File::open("input.txt")?;
let output = File::create("input.txt.zst")?;
let mut encoder = Encoder::new(output, 3)?;
io::copy(&mut input.take(64 * 1024), &mut encoder)?;
encoder.finish()?;
Ok(())
}
fn message_protocol() -> Result<(), Box<dyn std::error::Error>> {
// Message protocol: Use block
// - Known message sizes
// - Independent messages
// - Size prefix for framing
let messages: Vec<&[u8]> = vec![
b"Message one",
b"Message two",
b"Message three",
];
let mut compressor = zstd::block::Compressor::new();
let mut compressed_messages = Vec::new();
for msg in messages {
// Compress each message independently
let compressed = compressor.compress(msg, 3)?;
// Frame: [size: u32][compressed data]
compressed_messages.extend_from_slice(&(compressed.len() as u32).to_le_bytes());
compressed_messages.extend_from_slice(&compressed);
}
// Each message can be decompressed independently
// No context needed between messages
Ok(())
}
fn parallel_compression() -> Result<(), zstd::Error> {
// Block compression enables parallelism
// Each block can be compressed independently
use std::thread;
let chunks: Vec<&[u8]> = vec![
b"chunk one data",
b"chunk two data",
b"chunk three data",
];
// Compress in parallel
let handles: Vec<_> = chunks
.into_iter()
.map(|chunk| {
thread::spawn(move || {
let mut compressor = Compressor::new();
compressor.compress(chunk, 3)
})
})
.collect();
let compressed: Vec<_> = handles
.into_iter()
.map(|h| h.join().unwrap())
.collect();
// All compressed independently
// Streaming cannot do this (sequential dependency)
Ok(())
}Choose streaming for files and streams; choose block for protocols and parallelism.
Performance Characteristics
use std::time::Instant;
fn performance_comparison() -> Result<(), zstd::Error> {
let data = vec![0u8; 1_000_000]; // 1MB of zeros
// Block compression: simple, predictable
let start = Instant::now();
let mut compressor = zstd::block::Compressor::new();
let compressed_block = compressor.compress(&data, 3)?;
let block_duration = start.elapsed();
// Streaming compression: overhead for setup, context
let start = Instant::now();
let mut output = Vec::new();
let mut encoder = zstd::stream::Encoder::new(&mut output, 3)?;
encoder.write_all(&data)?;
encoder.finish()?;
let stream_duration = start.elapsed();
// For single-chunk known-size data:
// - Block is often faster (less overhead)
// - Similar compression ratio
// But streaming wins for:
// - Multiple small writes (block overhead per call)
// - Large data (block needs entire data in memory)
println!("Block: {:?}", block_duration);
println!("Stream: {:?}", stream_duration);
Ok(())
}
fn compression_ratio_comparison() -> std::io::Result<()> {
// Compression ratio depends on context sharing
let repeating_data: Vec<u8> = (0..1000)
.flat_map(|_| b"repeating pattern".iter().copied())
.collect();
// Split into chunks for comparison
let chunk_size = 5000;
// Block: each chunk compressed independently
let mut compressor = zstd::block::Compressor::new();
let block_total: usize = repeating_data
.chunks(chunk_size)
.map(|chunk| compressor.compress(chunk, 3).unwrap().len())
.sum();
// Stream: context shared across chunks
let mut stream_output = Vec::new();
let mut encoder = zstd::stream::Encoder::new(&mut stream_output, 3)?;
for chunk in repeating_data.chunks(chunk_size) {
encoder.write_all(chunk)?;
}
encoder.finish()?;
let stream_total = stream_output.len();
// Stream compression typically smaller because:
// - Later chunks can reference earlier content
// - Dictionary/context built across chunks
println!("Block total: {}", block_total);
println!("Stream total: {}", stream_total);
Ok(())
}Block is faster for single chunks; streaming achieves better ratios for repeated patterns.
Practical Example: Chunked File Format
use std::io::{self, Write, Read};
// A chunked file format using block compression
// Each chunk: [original_size: u32][compressed_size: u32][compressed_data]
const CHUNK_SIZE: usize = 64 * 1024; // 64KB chunks
fn compress_chunked(input: &[u8]) -> io::Result<Vec<u8>> {
let mut compressor = zstd::block::Compressor::new();
let mut output = Vec::new();
for chunk in input.chunks(CHUNK_SIZE) {
let compressed = compressor.compress(chunk, 3)
.map_err(|e| io::Error::new(io::ErrorKind::Other, e))?;
// Write header: original size, compressed size
output.write_u32::<byteorder::LittleEndian>(chunk.len() as u32)?;
output.write_u32::<byteorder::LittleEndian>(compressed.len() as u32)?;
// Write compressed data
output.extend_from_slice(&compressed);
}
Ok(output)
}
fn decompress_chunked(input: &[u8]) -> io::Result<Vec<u8>> {
let mut decompressor = zstd::block::Decompressor::new();
let mut output = Vec::new();
let mut pos = 0;
while pos < input.len() {
// Read header
let original_size = u32::from_le_bytes([input[pos], input[pos+1], input[pos+2], input[pos+3]]) as usize;
pos += 4;
let compressed_size = u32::from_le_bytes([input[pos], input[pos+1], input[pos+2], input[pos+3]]) as usize;
pos += 4;
// Decompress
let compressed = &input[pos..pos + compressed_size];
pos += compressed_size;
let decompressed = decompressor.decompress(compressed, original_size)
.map_err(|e| io::Error::new(io::ErrorKind::Other, e))?;
output.extend_from_slice(&decompressed);
}
Ok(output)
}
// This chunked approach allows:
// - Parallel compression (each chunk independent)
// - Parallel decompression (each chunk independent)
// - Known memory bounds (chunk size limit)
// - Random access to chunks (with index)Block compression enables chunked formats with parallelism and bounded memory.
Practical Example: Streaming Log Compression
use std::io::{self, Write};
use std::fs::File;
struct LogWriter {
encoder: zstd::stream::Encoder<File>,
}
impl LogWriter {
fn new(path: &str) -> io::Result<Self> {
let file = File::create(path)?;
let encoder = zstd::stream::Encoder::new(file, 3)?;
Ok(Self { encoder })
}
fn write_log(&mut self, level: &str, message: &str) -> io::Result<()> {
// Log lines come incrementally - perfect for streaming
writeln!(self.encoder, "[{}] {}", level, message)
}
fn close(self) -> io::Result<()> {
self.encoder.finish()?;
Ok(())
}
}
fn logging_example() -> io::Result<()> {
let mut logger = LogWriter::new("app.log.zst")?;
// Streaming handles this incremental writing naturally
// Block compression would need to buffer or split logs
logger.write_log("INFO", "Application started")?;
logger.write_log("DEBUG", "Processing request")?;
logger.write_log("INFO", "Request completed")?;
logger.write_log("WARN", "Cache miss for key")?;
logger.write_log("ERROR", "Database connection failed")?;
logger.close()?;
Ok(())
}Streaming compression suits incremental data like logs; block requires complete chunks.
Complete Summary
use zstd::stream::Encoder;
use zstd::block::Compressor;
fn complete_summary() {
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
// β Aspect β stream::Encoder β block::Encoder β
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
// β API style β io::Write, streaming β compress/decompress β
// β Input requirement β Incremental writes β Complete buffer β
// β Output access β After finish() β Immediate β
// β Memory pattern β Persistent context β Freed per operation β
// β Context sharing β Across writes β None β
// β Compression ratio β Better (context) β Good per chunk β
// β Parallelism β Sequential only β Parallel possible β
// β Use case β Streams, files β Messages, chunks β
// βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
// Choose stream::Encoder when:
// 1. Unknown or unbounded data size
// 2. Processing files, network streams
// 3. Want best compression ratio
// 4. Data naturally streams (logs, events)
// 5. Integration with io::Write ecosystem
// Choose block::Encoder when:
// 1. Known, bounded data size
// 2. Memory must be predictable
// 3. Need independent chunks
// 4. Want parallel compression
// 5. Implementing message protocols
// 6. Need explicit control over boundaries
}
// Key insight:
// stream::Encoder is for continuous data where context and incremental
// processing matterβfiles, streams, logs. It maintains state between
// writes, enabling cross-chunk references for better compression.
// block::Encoder is for discrete data where independence and control
// matterβmessages, chunks, fixed-size records. It has no persistent
// state, enabling parallel compression and predictable memory usage.
// The streaming API integrates with io::Write; the block API is simpler
// but requires knowing the uncompressed size for decompression.
// Choose streaming for compression ratio and integration; choose block
// for parallelism and predictability.Key insight: stream::Encoder maintains compression context across writes, achieving better ratios for data with patterns spanning chunk boundaries, while block::Encoder compresses each chunk independently, enabling parallelism and predictable memory. Streaming is the right choice for files, logs, and network streams where data arrives incrementally and compression ratio matters. Block compression suits message protocols, chunked formats, and scenarios requiring parallel compression or explicit control over compression boundaries. The API difference reflects this: streaming implements io::Write for incremental writing, while block requires complete input upfront and returns complete output immediately.
