Loading pageā¦
Rust walkthroughs
Loading pageā¦
zstd::bulk::compress and stream::Encoder for one-shot vs streaming compression?zstd::bulk::compress performs one-shot compression on complete data, requiring the entire input to be in memory but offering simpler API usage and potentially better compression ratios through full-context analysis, while stream::Encoder processes data incrementally in chunks, enabling compression of arbitrarily large data with bounded memory usage at the cost of slightly reduced compression efficiency and more complex API usage. The bulk API is ideal for small to medium-sized data that fits comfortably in memory, while the streaming API is necessary for large files, network streams, or situations where memory usage must be constrained. Both produce valid zstd output, but they represent fundamentally different approaches to the compression problem.
use zstd::bulk::compress;
fn bulk_example() -> Result<(), Box<dyn std::error::Error>> {
let data = b"Hello, World! This is some data to compress.";
// One-shot compression: entire input must be in memory
let compressed = compress(data, 3)?; // compression level 3
println!("Original size: {}", data.len());
println!("Compressed size: {}", compressed.len());
// Decompress with bulk API
let decompressed = zstd::bulk::decompress(&compressed, data.len())?;
assert_eq!(data.to_vec(), decompressed);
Ok(())
}bulk::compress takes complete data and returns complete compressed output.
use zstd::stream::Encoder;
use std::io::Write;
fn stream_example() -> Result<(), Box<dyn std::error::Error>> {
let data = b"Hello, World! This is some data to compress.";
let mut output = Vec::new();
// Create encoder that writes to output
let mut encoder = Encoder::new(&mut output, 3)?;
// Write data in chunks (streaming)
encoder.write_all(data)?;
// Finalize compression
encoder.finish()?;
println!("Original size: {}", data.len());
println!("Compressed size: {}", output.len());
// Decompress with streaming API
let decompressed = zstd::stream::decode_all(&output[..])?;
assert_eq!(data.to_vec(), decompressed);
Ok(())
}stream::Encoder processes data incrementally, writing compressed output as data arrives.
use zstd::bulk::compress;
use zstd::stream::Encoder;
use std::io::Write;
fn memory_comparison() {
// bulk::compress memory usage:
// - Input: must be entirely in memory
// - Output: must fit entirely in memory
// - Working memory: zstd context (few MB)
// - Total: ~input_size + output_size + context
let large_data = vec
![0u8; 100_000_000]; // 100 MB
// bulk requires all 100 MB in memory
// plus compressed output (say 50 MB)
// plus working memory
// stream::Encoder memory usage:
// - Input: can process in small chunks
// - Output: can write to stream/pipe
// - Working memory: bounded buffer (typically few MB)
// - Total: bounded, regardless of data size
// Streaming can process 100 GB file with only MB of memory
// Bulk would need 100 GB + compressed size in memory
}Bulk requires all data in memory; streaming has bounded memory usage.
use zstd::stream::{Encoder, Decoder};
use std::fs::File;
use std::io::{BufReader, BufWriter, Read, Write};
fn compress_large_file() -> Result<(), Box<dyn std::error::Error>> {
// Stream processing: read from file, write to file
// Never loads entire file into memory
let input = File::open("large_input.bin")?;
let output = File::create("compressed.zst")?;
let reader = BufReader::new(input);
let mut writer = BufWriter::new(output);
// Encoder writes to output as it compresses
let mut encoder = Encoder::new(&mut writer, 3)?;
// Process in chunks
let mut buffer = [0u8; 8192];
let mut reader = reader;
loop {
let bytes_read = reader.read(&mut buffer)?;
if bytes_read == 0 {
break;
}
encoder.write_all(&buffer[..bytes_read])?;
}
// Finalize
encoder.finish()?;
writer.flush()?;
// This works for files of any size
// Memory usage is bounded by buffer size (8KB)
Ok(())
}Streaming handles arbitrarily large files with fixed memory.
use zstd::bulk::compress;
fn compress_known_size(data: &[u8]) -> Result<Vec<u8>, Box<dyn std::error::Error>> {
// When data size is known and manageable:
// - Bulk is simpler
// - Bulk can be faster (no chunk overhead)
// - Bulk may compress better (full context)
// Good for:
// - In-memory data structures
// - Small files (< 100 MB typical)
// - Network packets
// - Database records
let compressed = compress(data, 3)?;
Ok(compressed)
}Bulk is simpler and often faster for data that fits in memory.
use zstd::stream::Encoder;
use std::io::Write;
fn network_streaming() -> Result<(), Box<dyn std::error::Error>> {
// Simulated network stream
let mut network_buffer = Vec::new();
// Create encoder that writes to network
let mut encoder = Encoder::new(&mut network_buffer, 3)?;
// Process data as it arrives from network
fn receive_data() -> Option<Vec<u8>> {
// Simulated network receive
Some(b"chunk of data".to_vec())
}
while let Some(chunk) = receive_data() {
// Compress each chunk as it arrives
// No need to buffer all chunks first
encoder.write_all(&chunk)?;
// Could flush to send compressed data
// encoder.flush()?;
}
// Finalize
encoder.finish()?;
// network_buffer contains compressed data
// Could send as we go or at the end
Ok(())
}Streaming compresses data as it arrives, without buffering everything.
use zstd::bulk::{compress, Compressor};
fn context_reuse_bulk() -> Result<(), Box<dyn std::error::Error>> {
// For multiple compressions, reuse context
// This avoids re-initializing compression state
let mut compressor = Compressor::new(3)?;
// Reuse for multiple items
let data1 = b"First item";
let data2 = b"Second item";
let compressed1 = compressor.compress(data1)?;
let compressed2 = compressor.compress(data2)?;
// Context reuse is more efficient than:
let _ = compress(data1, 3)?;
let _ = compress(data2, 3)?;
// Which creates new context each time
Ok(())
}Compressor reuses context for multiple bulk compressions.
use zstd::stream::Encoder;
use std::io::Write;
fn streaming_context() -> Result<(), Box<dyn std::error::Error>> {
// Encoder maintains context across writes
// Later data can reference earlier data
let mut output = Vec::new();
let mut encoder = Encoder::new(&mut output, 3)?;
// First chunk
encoder.write_all(b"Hello, World!")?;
// Second chunk can reference first chunk
// Compression context spans writes
encoder.write_all(b"Hello, again!")?; // "Hello" likely compressed well
// Third chunk
encoder.write_all(b"Hello, final!")?;
encoder.finish()?;
// All chunks share context
// Compression ratio benefits from full data context
// Even though processed incrementally
Ok(())
}Streaming maintains compression context across chunks for good ratios.
use zstd::bulk::compress;
use zstd::stream::Encoder;
use std::io::Write;
fn compression_ratio_comparison() -> Result<(), Box<dyn std::error::Error>> {
// Repeated data to compress
let data: Vec<u8> = (0..1000).flat_map(|_| b"Hello, World! ").copied().collect();
// Bulk compression: full context available
let bulk_compressed = compress(&data, 3)?;
// Streaming compression: same context maintained
let mut stream_output = Vec::new();
let mut encoder = Encoder::new(&mut stream_output, 3)?;
// But if we reset encoder between chunks...
let chunk_size = 100;
for chunk in data.chunks(chunk_size) {
encoder.write_all(chunk)?;
// Context maintained across writes
}
encoder.finish()?;
// Results are typically similar
// Bulk might be slightly smaller due to internal optimization
// Difference is usually small (few percent at most)
println!("Original: {} bytes", data.len());
println!("Bulk: {} bytes", bulk_compressed.len());
println!("Stream: {} bytes", stream_output.len());
Ok(())
}Both achieve similar compression ratios; bulk may be slightly better.
use zstd::bulk::compress;
use zstd::stream::Encoder;
use std::io::Write;
use std::time::Instant;
fn performance_comparison() -> Result<(), Box<dyn std::error::Error>> {
let data: Vec<u8> = (0..1_000_000).map(|i| (i % 256) as u8).collect();
// Bulk compression
let start = Instant::now();
let bulk_compressed = compress(&data, 3)?;
let bulk_time = start.elapsed();
// Streaming compression
let start = Instant::now();
let mut stream_output = Vec::new();
let mut encoder = Encoder::new(&mut stream_output, 3)?;
encoder.write_all(&data)?;
encoder.finish()?;
let stream_time = start.elapsed();
// Bulk is typically faster because:
// - Single allocation for output
// - No chunk handling overhead
// - Internal optimizations for complete data
// Streaming has overhead from:
// - Chunk boundary handling
// - Incremental output buffering
// - More API calls
println!("Bulk: {} bytes in {:?}", bulk_compressed.len(), bulk_time);
println!("Stream: {} bytes in {:?}", stream_output.len(), stream_time);
Ok(())
}Bulk is typically faster; streaming has more overhead per byte.
use zstd::stream::Encoder;
use std::io::Write;
fn streaming_with_flush() -> Result<(), Box<dyn std::error::Error>> {
let mut output = Vec::new();
let mut encoder = Encoder::new(&mut output, 3)?;
// Write first chunk
encoder.write_all(b"First chunk")?;
// Flush sends compressed data immediately
// Useful for:
// - Sending data over network
// - Processing pipelined data
// - Real-time compression
encoder.flush()?;
// Compressed data for "First chunk" is now in output
// Write second chunk
encoder.write_all(b"Second chunk")?;
encoder.finish()?;
// output contains all compressed data
// Note: flush() may reduce compression ratio
// because it forces zstd to output partial blocks
Ok(())
}flush() outputs compressed data early, useful for streaming but may reduce compression ratio.
use zstd::bulk::decompress;
use zstd::stream::read::Decoder;
use std::io::Read;
fn decompression_comparison() -> Result<(), Box<dyn std::error::Error>> {
let original = b"Hello, World!".repeat(100);
let compressed = zstd::bulk::compress(&original, 3)?;
// Bulk decompression: need to know size
// Must know or guess the original size
let decompressed_bulk = decompress(&compressed, original.len())?;
// Streaming decompression: don't need size
let mut decoder = Decoder::new(&compressed[..])?;
let mut decompressed_stream = Vec::new();
decoder.read_to_end(&mut decompressed_stream)?;
// Streaming is useful when:
// - Original size unknown
// - Processing compressed stream
// - Memory constrained
assert_eq!(decompressed_bulk, original);
assert_eq!(decompressed_stream, original);
Ok(())
}Streaming decompression doesn't require knowing the original size.
use zstd::bulk::compress;
fn when_to_use_bulk() -> Result<(), Box<dyn std::error::Error>> {
// Use bulk when:
// 1. Data fits comfortably in memory
let small_data = vec
![0u8; 1024];
let _ = compress(&small_data, 3)?;
// 2. Simpler API preferred
// Bulk: one function call
// Stream: create encoder, write, finish
// 3. Data size is known
// Bulk needs complete input at start
// 4. Maximum compression ratio desired
// Bulk may compress slightly better
// 5. Processing small to medium files
// Typical threshold: < 100 MB
// Depends on available memory
// 6. In-memory data structures
let config = serde_json::to_vec(&Config::default())?;
let compressed_config = compress(&config, 3)?;
Ok(())
}
#[derive(serde::Serialize)]
struct Config {
setting: String,
}
impl Default for Config {
fn default() -> Self {
Config { setting: "default".to_string() }
}
}Use bulk for in-memory data, simple APIs, and better compression.
use zstd::stream::Encoder;
use std::io::{Read, Write};
fn when_to_use_streaming() -> Result<(), Box<dyn std::error::Error>> {
// Use streaming when:
// 1. Data doesn't fit in memory
// Processing 10 GB file on 4 GB machine
// 2. Data arrives incrementally
// Network streams, pipes, real-time data
// 3. Memory must be bounded
// Embedded systems, constrained environments
// 4. Processing large files
// File-to-file compression
// 5. Pipelining compression
// Compress while reading/writing
// 6. Unknown data size
// Streaming doesn't need size upfront
// Example: Compress any size file
fn compress_file<R: Read, W: Write>(
reader: &mut R,
writer: &mut W,
) -> Result<(), std::io::Error> {
let mut encoder = Encoder::new(writer, 3)?;
std::io::copy(reader, &mut encoder)?;
encoder.finish()?;
Ok(())
}
Ok(())
}Use streaming for large files, network data, and bounded memory.
use zstd::bulk::compress;
use zstd::stream::Encoder;
use std::io::Write;
fn compression_levels() -> Result<(), Box<dyn std::error::Error>> {
let data = b"Hello, World!".repeat(100);
// Both APIs support compression levels 1-22
// Level 1: fastest, lowest compression
// Level 3: default balance
// Level 22: slowest, highest compression
// Bulk with different levels
let fast = compress(&data, 1)?;
let balanced = compress(&data, 3)?;
let slow = compress(&data, 22)?;
// Streaming with different levels
let mut output = Vec::new();
let mut encoder = Encoder::new(&mut output, 22)?;
encoder.write_all(&data)?;
encoder.finish()?;
// Higher levels:
// - Better compression ratio
// - More memory usage during compression
// - Slower compression time
// - Same decompression speed
println!("Level 1: {} bytes", fast.len());
println!("Level 3: {} bytes", balanced.len());
println!("Level 22: {} bytes", slow.len());
Ok(())
}Both APIs support the same compression level range.
use zstd::stream::Encoder;
use std::io::Write;
fn level_memory() -> Result<(), Box<dyn std::error::Error>> {
// Higher compression levels use more memory
// This affects both bulk and streaming
// Memory usage scales with level:
// Level 1: ~few MB
// Level 3: ~few MB (default)
// Level 19: ~tens of MB
// Level 22: ~hundreds of MB
// Streaming still has bounded memory
// But higher levels have larger working buffers
let mut output = Vec::new();
// High level needs more memory for window
let mut encoder = Encoder::new(&mut output, 19)?;
encoder.write_all(b"data")?;
encoder.finish()?;
// For streaming: memory bounded but larger
// For bulk: memory scales with data size + level
Ok(())
}Higher compression levels use more memory in both APIs.
use zstd::bulk::compress;
use zstd::stream::Encoder;
use std::io::Write;
fn error_handling() -> Result<(), Box<dyn std::error::Error>> {
// Bulk: single Result
// Either entire compression succeeds or fails
let data = b"test data";
let compressed = compress(data, 3)?;
// If compression fails, you get an error
// No partial output
// Streaming: can fail at any write
let mut output = Vec::new();
let mut encoder = Encoder::new(&mut output, 3)?;
// Errors can occur during:
// - Encoder creation (e.g., invalid level)
// - write_all (e.g., writer error)
// - finish (e.g., final block error)
encoder.write_all(data)?;
encoder.finish()?;
// If write fails, you may have partial output
// Handle each operation separately
Ok(())
}Bulk returns single result; streaming can fail at multiple points.
use zstd::stream::Encoder;
use std::io::Write;
fn multiple_outputs() -> Result<(), Box<dyn std::error::Error>> {
// Streaming allows writing to various destinations
// Write to Vec
let mut vec_output = Vec::new();
let mut encoder1 = Encoder::new(&mut vec_output, 3)?;
encoder1.write_all(b"data")?;
encoder1.finish()?;
// Write to file
let file = std::fs::File::create("output.zst")?;
let mut encoder2 = Encoder::new(file, 3)?;
encoder2.write_all(b"data")?;
encoder2.finish()?;
// Write to network (anything implementing Write)
// let stream = TcpStream::connect("...")?;
// let mut encoder3 = Encoder::new(stream, 3)?;
// encoder3.write_all(b"data")?;
// encoder3.finish()?;
// Bulk output always goes to Vec
// Then you can write Vec anywhere
Ok(())
}Streaming can write to any Write implementation.
use zstd::stream::{Encoder, Decoder};
use std::fs::File;
use std::io::{BufReader, BufWriter, Read, Write};
use std::path::Path;
fn compress_file_streaming(
input_path: &Path,
output_path: &Path,
level: i32,
) -> Result<(), Box<dyn std::error::Error>> {
// Works for files of any size
// Memory usage bounded by buffer sizes
let input = File::open(input_path)?;
let output = File::create(output_path)?;
let reader = BufReader::new(input);
let writer = BufWriter::new(output);
let mut encoder = Encoder::new(writer, level)?;
std::io::copy(&mut reader.auto(), &mut encoder)?;
encoder.finish()?;
Ok(())
}
fn compress_file_bulk(
input_path: &Path,
output_path: &Path,
level: i32,
) -> Result<(), Box<dyn std::error::Error>> {
// Simpler but loads entire file into memory
let data = std::fs::read(input_path)?;
let compressed = zstd::bulk::compress(&data, level)?;
std::fs::write(output_path, &compressed)?;
Ok(())
}
// Usage choice:
// - For small files: bulk (simpler)
// - For large files: streaming (bounded memory)
// - For unknown size: streaming
// - For network: streamingStreaming is essential for production file compression utilities.
use zstd::bulk::compress;
use zstd::stream::Encoder;
use std::io::Write;
fn api_comparison() -> Result<(), Box<dyn std::error::Error>> {
let data = b"Hello, World!";
// Bulk: Simple, one-shot
let compressed = compress(data, 3)?;
// One function call, returns Result<Vec<u8>>
// Streaming: Multiple steps
let mut output = Vec::new();
let mut encoder = Encoder::new(&mut output, 3)?; // Create
encoder.write_all(data)?; // Write
encoder.finish()?; // Finalize
// Three operations, output in separate buffer
// Bulk is simpler but less flexible
// Streaming is more complex but handles any size
Ok(())
}Bulk API is simpler; streaming requires multiple operations.
Key trade-offs:
| Aspect | bulk::compress | stream::Encoder | |--------|---------------|-----------------| | Memory usage | Entire data + output | Bounded (few MB) | | Data size | Must fit in memory | Any size | | API complexity | Simple (one call) | More complex (create/write/finish) | | Compression ratio | Slightly better | Similar (with context) | | Performance | Faster (less overhead) | Slower (more overhead) | | Use case | In-memory data, small files | Large files, streams, bounded memory |
When to use each:
// Use bulk::compress when:
// - Data fits comfortably in memory
// - Simpler API is preferred
// - Slightly better compression ratio matters
// - Processing small files (< 100 MB typical)
// - In-memory structures (configs, database records)
let compressed = zstd::bulk::compress(&data, 3)?;
// Use stream::Encoder when:
// - Data doesn't fit in memory
// - Processing large files
// - Memory usage must be bounded
// - Data arrives incrementally (network, pipe)
// - Data size is unknown
let mut encoder = zstd::stream::Encoder::new(&mut output, 3)?;
encoder.write_all(&data)?;
encoder.finish()?;Key insight: Both APIs produce valid zstd output and maintain similar compression ratios (streaming maintains context across writes). The choice is primarily about memory management and API preference. bulk::compress is simpler and slightly faster for data that fits in memory, while stream::Encoder is necessary for arbitrarily large data or constrained memory environments. The streaming API's overhead is the price for bounded memory usageāit maintains compression context across chunks but must handle chunk boundaries and incremental output buffering. For production systems processing user data or files, streaming is the safer choice because it prevents out-of-memory errors when data size is unknown.