What are the trade-offs between zstd::bulk and stream compression APIs for different data sizes?

zstd::bulk provides all-at-once compression where the entire input must fit in memory and the output is produced as a single contiguous buffer, while zstd::stream offers incremental processing that compresses data in chunks, enabling constant memory usage regardless of input size. The bulk API is simpler and faster for small to medium data—compressing a string, a file that fits in memory, or a network response—because it avoids the overhead of maintaining compression state across calls and can optimize based on knowing the total size upfront. The stream API is essential for large files, network streams, or any scenario where holding all data in memory is impractical, trading some overhead for the ability to process gigabytes of data with a fixed memory budget. The choice between them hinges on data size, memory constraints, and whether the data is already available or arriving incrementally.

Bulk Compression Basics

use zstd::bulk;
 
fn main() {
    let data = b"Hello, World! This is some data to compress.";
    
    // Bulk compression: all data at once
    let compressed = bulk::compress(data, 3).unwrap();
    
    println!("Original size: {} bytes", data.len());
    println!("Compressed size: {} bytes", compressed.len());
    println!("Compression ratio: {:.2}x", 
        data.len() as f64 / compressed.len() as f64);
    
    // Decompression also bulk
    let decompressed = bulk::decompress(&compressed, data.len()).unwrap();
    assert_eq!(data.to_vec(), decompressed);
}

bulk::compress takes the entire input and returns the complete compressed output.

Stream Compression Basics

use zstd::stream::{compress, decompress};
use std::io::Cursor;
 
fn main() {
    let data = b"Hello, World! This is some data to compress.";
    
    // Stream compression through Read/Write traits
    let mut input = Cursor::new(data);
    let mut output = Vec::new();
    
    compress(&mut input, &mut output, 3).unwrap();
    
    println!("Original size: {} bytes", data.len());
    println!("Compressed size: {} bytes", output.len());
    
    // Stream decompression
    let mut decompressed = Vec::new();
    decompress(&mut Cursor::new(&output), &mut decompressed).unwrap();
    assert_eq!(data.to_vec(), decompressed);
}

stream::compress uses Read and Write traits for incremental processing.

Memory Usage Comparison

use zstd::bulk;
use zstd::stream::{Encoder, Decoder};
use std::io::{Cursor, Write, Read};
 
fn main() {
    // Create test data
    let data = vec
![0u8; 1024 * 1024];  // 1MB
    
    // Bulk: allocates output buffer sized for worst case
    // Worst case is slightly larger than input for incompressible data
    let compressed_bulk = bulk::compress(&data, 3).unwrap();
    println!("Bulk compressed: {} bytes", compressed_bulk.len());
    // Memory: input buffer + output buffer (~2MB total at peak)
    
    // Stream: fixed buffer size regardless of input
    let mut encoder = Encoder::new(Vec::new(), 3).unwrap();
    // Process in chunks
    for chunk in data.chunks(8192) {
        encoder.write_all(chunk).unwrap();
    }
    let compressed_stream = encoder.finish().unwrap();
    println!("Stream compressed: {} bytes", compressed_stream.len());
    // Memory: small fixed buffers (~8KB chunk + internal state)
}

Bulk requires memory proportional to input size; stream uses fixed buffers.

Bulk API Simplicity

use zstd::bulk;
 
fn compress_data(data: &[u8]) -> Vec<u8> {
    // Single function call
    bulk::compress(data, 3).unwrap()
}
 
fn decompress_data(compressed: &[u8], original_size: usize) -> Vec<u8> {
    // Single function call, but need to know original size
    bulk::decompress(compressed, original_size).unwrap()
}
 
fn main() {
    let original = b"Simple text to compress";
    
    let compressed = compress_data(original);
    let decompressed = decompress_data(&compressed, original.len());
    
    assert_eq!(original.to_vec(), decompressed);
    println!("Bulk API: straightforward for small data");
}

The bulk API is a single function call with no state management.

Stream API Flexibility

use zstd::stream::Encoder;
use std::io::Write;
 
fn main() {
    let mut encoder = Encoder::new(Vec::new(), 3).unwrap();
    
    // Can write data incrementally from multiple sources
    encoder.write_all(b"First chunk ").unwrap();
    encoder.write_all(b"second chunk ").unwrap();
    encoder.write_all(b"third chunk").unwrap();
    
    // Finalize the stream
    let compressed = encoder.finish().unwrap();
    
    println!("Compressed {} bytes from 3 chunks", compressed.len());
    
    // Decompress to verify
    let decompressed = zstd::bulk::decompress(&compressed, 28).unwrap();
    assert_eq!(b"First chunk second chunk third chunk", &decompressed[..]);
}

Stream encoder accepts data in multiple writes, useful for progressive data generation.

File Compression with Bulk

use zstd::bulk;
use std::fs;
 
fn main() {
    // Bulk compression for files that fit in memory
    let content = fs::read("large_file.txt")
        .expect("Failed to read file");
    
    // Compress entire file at once
    let compressed = bulk::compress(&content, 3)
        .expect("Compression failed");
    
    println!("Original: {} bytes", content.len());
    println!("Compressed: {} bytes", compressed.len());
    
    // Write compressed data
    fs::write("large_file.txt.zst", &compressed)
        .expect("Failed to write compressed file");
    
    // Simple but requires entire file in memory
}

Bulk compression is simple for files that comfortably fit in RAM.

File Compression with Stream

use zstd::stream::Encoder;
use std::fs::File;
use std::io::{BufReader, BufWriter, Read, Write};
 
fn compress_file_streaming(input_path: &str, output_path: &str) -> std::io::Result<()> {
    let input = File::open(input_path)?;
    let output = File::create(output_path)?;
    
    let mut reader = BufReader::new(input);
    let mut encoder = Encoder::new(BufWriter::new(output), 3)?;
    
    // Process in fixed-size chunks
    let mut buffer = [0u8; 8192];
    loop {
        let bytes_read = reader.read(&mut buffer)?;
        if bytes_read == 0 {
            break;
        }
        encoder.write_all(&buffer[..bytes_read])?;
    }
    
    encoder.finish()?;
    Ok(())
}
 
fn main() {
    compress_file_streaming("large_file.txt", "large_file.txt.zst")
        .expect("Compression failed");
    
    println!("Streamed compression: constant memory usage");
}

Stream compression processes files of any size with fixed memory.

Known Size Advantage for Bulk

use zstd::bulk;
 
fn main() {
    let data = b"Data with known size";
    
    // Bulk compression can use size for optimization
    // zstd can optimize when it knows total input size
    let compressed = bulk::compress(data, 3).unwrap();
    
    // Decompression requires knowing original size
    // This is a limitation: you must track sizes separately
    let decompressed = bulk::decompress(&compressed, data.len()).unwrap();
    
    // For unknown sizes, stream decompression is needed
    println!("Bulk decompress requires knowing original size: {} bytes", data.len());
}

Bulk decompression requires knowing the original size; streams don't.

Stream Decompression for Unknown Sizes

use zstd::stream::decompress;
use std::io::Cursor;
 
fn main() {
    let original = b"Data with unknown size after compression";
    let compressed = zstd::bulk::compress(original, 3).unwrap();
    
    // Stream decompression doesn't need original size
    let mut decompressed = Vec::new();
    decompress(&mut Cursor::new(&compressed), &mut decompressed).unwrap();
    
    assert_eq!(original.to_vec(), decompressed);
    println!("Stream decompress: no size needed, got {} bytes", decompressed.len());
}

Stream decompression discovers the output size automatically.

Performance Characteristics

use zstd::bulk;
use zstd::stream::{compress, Encoder};
use std::io::Cursor;
use std::time::Instant;
 
fn main() {
    let sizes = [1024, 10 * 1024, 100 * 1024, 1024 * 1024];
    
    for size in sizes {
        let data: Vec<u8> = (0..size).map(|i| (i % 256) as u8).collect();
        
        // Bulk compression
        let start = Instant::now();
        let bulk_compressed = bulk::compress(&data, 3).unwrap();
        let bulk_time = start.elapsed();
        
        // Stream compression
        let start = Instant::now();
        let mut output = Vec::new();
        compress(&mut Cursor::new(&data), &mut output, 3).unwrap();
        let stream_time = start.elapsed();
        
        println!("Size: {} bytes", size);
        println!("  Bulk:   {:?}, {} bytes output", bulk_time, bulk_compressed.len());
        println!("  Stream: {:?}, {} bytes output", stream_time, output.len());
        println!();
    }
}

Bulk is typically faster for small data; overhead differs based on data size.

Network Stream Compression

use zstd::stream::Encoder;
use std::io::Write;
 
struct NetworkStream {
    // Simulated network output
    buffer: Vec<u8>,
}
 
impl Write for NetworkStream {
    fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
        // In reality, this would send over network
        self.buffer.extend_from_slice(buf);
        Ok(buf.len())
    }
    
    fn flush(&mut self) -> std::io::Result<()> {
        Ok(())
    }
}
 
fn main() {
    let network = NetworkStream { buffer: Vec::new() };
    let mut encoder = Encoder::new(network, 3).unwrap();
    
    // Simulate streaming data to network
    let chunks = [
        b"Log entry 1: Application started\n",
        b"Log entry 2: User logged in\n",
        b"Log entry 3: Request processed\n",
    ];
    
    for chunk in chunks {
        encoder.write_all(chunk).unwrap();
        // Data could be sent over network here
    }
    
    let network = encoder.finish().unwrap();
    println!("Compressed {} bytes to network stream", network.buffer.len());
    
    // Bulk API couldn't handle this without buffering all log entries
}

Stream compression enables real-time network compression without buffering.

Compression Level Impact

use zstd::bulk;
use std::time::Instant;
 
fn main() {
    let data: Vec<u8> = (0..1024 * 1024).map(|i| (i % 256) as u8).collect();
    
    let levels = [1, 3, 9, 19];
    
    for level in levels {
        let start = Instant::now();
        let compressed = bulk::compress(&data, level).unwrap();
        let duration = start.elapsed();
        
        println!("Level {}: {:?}, {} bytes ({:.1}% of original)",
            level,
            duration,
            compressed.len(),
            100.0 * compressed.len() as f64 / data.len() as f64
        );
    }
    
    // Higher levels give better compression but take longer
    // Bulk API benefits from knowing size at all levels
}

Compression level affects both APIs similarly; bulk may benefit more from size knowledge.

Using Dictionary Compression

use zstd::bulk;
use zstd::dict::{from_samples, EncoderDictionary, DecoderDictionary};
 
fn main() {
    // Training data
    let samples: Vec<&[u8]> = vec
![
        b"user_id:123,name:Alice,age:30",
        b"user_id:456,name:Bob,age:25",
        b"user_id:789,name:Charlie,age:35",
    ];
    
    // Train dictionary (bulk API)
    let dictionary = from_samples(&samples, 1024).unwrap();
    
    // Create encoder/decoder dictionaries for bulk compression
    let encoder_dict = EncoderDictionary::copy(&dictionary);
    let decoder_dict = DecoderDictionary::copy(&dictionary);
    
    // New data to compress
    let new_data = b"user_id:999,name:David,age:40";
    
    // Bulk compress with dictionary
    let compressed = bulk::compress_with_dict(new_data, 3, &encoder_dict).unwrap();
    
    // Decompress with dictionary
    let decompressed = bulk::decompress_with_dict(&compressed, new_data.len(), &decoder_dict).unwrap();
    
    println!("Original: {} bytes", new_data.len());
    println!("Compressed with dict: {} bytes", compressed.len());
    println!("Decompressed: {}", String::from_utf8_lossy(&decompressed));
}

Dictionary compression improves ratio for similar data; both bulk and stream support it.

Stream with Dictionary

use zstd::stream::{Encoder, Decoder};
use zstd::dict::{from_samples, EncoderDictionary, DecoderDictionary};
use std::io::{Cursor, Read, Write};
 
fn main() {
    let samples: Vec<&[u8]> = vec
![
        b"sample data 1",
        b"sample data 2",
        b"sample data 3",
    ];
    let dictionary = from_samples(&samples, 256).unwrap();
    let encoder_dict = EncoderDictionary::copy(&dictionary);
    let decoder_dict = DecoderDictionary::copy(&dictionary);
    
    // Stream compression with dictionary
    let mut encoder = Encoder::with_dictionary(Vec::new(), 3, &encoder_dict).unwrap();
    encoder.write_all(b"new data to compress").unwrap();
    let compressed = encoder.finish().unwrap();
    
    // Stream decompression with dictionary
    let mut decoder = Decoder::with_dictionary(Cursor::new(&compressed), &decoder_dict).unwrap();
    let mut decompressed = Vec::new();
    decoder.read_to_end(&mut decompressed).unwrap();
    
    println!("Stream + dictionary: {} bytes", compressed.len());
}

Dictionary compression works with streams for similar benefit on streaming data.

Choosing Based on Data Size

use zstd::bulk;
use zstd::stream::Encoder;
use std::io::Write;
 
fn compress_small_data(data: &[u8]) -> Vec<u8> {
    // Bulk is simpler and faster for small data
    bulk::compress(data, 3).unwrap()
}
 
fn compress_large_data<R: std::io::Read, W: std::io::Write>(
    reader: &mut R,
    writer: &mut W,
) -> std::io::Result<()> {
    // Stream for large data or streaming sources
    let mut encoder = Encoder::new(writer, 3)?;
    std::io::copy(reader, &mut encoder)?;
    encoder.finish()?;
    Ok(())
}
 
fn main() {
    // Small data: bulk
    let small = b"Small string";
    let compressed = compress_small_data(small);
    println!("Small data: {} -> {} bytes", small.len(), compressed.len());
    
    // Large or streaming data: stream
    let large_data = vec
![0u8; 10 * 1024 * 1024];  // 10MB
    let mut reader = large_data.as_slice();
    let mut output = Vec::new();
    compress_large_data(&mut reader, &mut output).unwrap();
    println!("Large data: {} -> {} bytes", large_data.len(), output.len());
}

Use bulk for data that fits comfortably in memory; stream for large or streaming data.

Hybrid Approach for Known-Size Large Files

use zstd::stream::Encoder;
use std::fs::File;
use std::io::{BufReader, BufWriter, Read, Write};
 
fn compress_file_optimized(input_path: &str, output_path: &str) -> std::io::Result<()> {
    let input = File::open(input_path)?;
    let metadata = input.metadata()?;
    let file_size = metadata.len();
    
    let output = File::create(output_path)?;
    
    // For small files, could use bulk
    if file_size < 1024 * 1024 {
        let data = std::fs::read(input_path)?;
        let compressed = zstd::bulk::compress(&data, 3)
            .map_err(|e| std::io::Error::other(e))?;
        std::fs::write(output_path, compressed)?;
        return Ok(());
    }
    
    // For large files, use streaming
    let mut reader = BufReader::new(input);
    let mut encoder = Encoder::new(BufWriter::new(output), 3)?;
    
    let mut buffer = vec
![0u8; 64 * 1024];
    loop {
        let bytes_read = reader.read(&mut buffer)?;
        if bytes_read == 0 {
            break;
        }
        encoder.write_all(&buffer[..bytes_read])?;
    }
    
    encoder.finish()?;
    Ok(())
}
 
fn main() {
    println!("Hybrid approach: bulk for small, stream for large");
}

A hybrid approach chooses the API based on data size.

Memory Budget Considerations

use zstd::bulk;
use zstd::stream::Encoder;
use std::io::Write;
 
struct MemoryBudget {
    available_bytes: usize,
}
 
impl MemoryBudget {
    fn can_use_bulk(&self, data_size: usize) -> bool {
        // Bulk needs roughly 2x data size (input + worst-case output)
        let bulk_requirement = data_size * 2;
        bulk_requirement < self.available_bytes
    }
    
    fn recommended_buffer_size(&self) -> usize {
        // For streaming, use small fraction of budget
        (self.available_bytes / 4).min(64 * 1024)
    }
}
 
fn main() {
    let budget = MemoryBudget { available_bytes: 1024 * 1024 };  // 1MB
    
    let small_data = vec
![0u8; 100 * 1024];   // 100KB
    let large_data = vec
![0u8; 10 * 1024 * 1024];  // 10MB
    
    println!("Can use bulk for 100KB: {}", budget.can_use_bulk(small_data.len()));
    println!("Can use bulk for 10MB: {}", budget.can_use_bulk(large_data.len()));
    println!("Recommended stream buffer: {} bytes", budget.recommended_buffer_size());
    
    if budget.can_use_bulk(small_data.len()) {
        let compressed = bulk::compress(&small_data, 3).unwrap();
        println!("Used bulk: {} -> {} bytes", small_data.len(), compressed.len());
    }
}

Consider memory constraints when choosing between bulk and stream.

Synthesis

API comparison:

Aspect zstd::bulk zstd::stream
Input Single slice &[u8] Read trait
Output Single Vec<u8> Write trait
Memory Proportional to data size Fixed buffer size
Simplicity Single function call State management
Use case Small/medium data Large data, streams
Size knowledge Required for decompress Not required

Memory usage patterns:

Data Size Bulk Memory Stream Memory
1 KB ~2 KB ~8 KB (buffer)
1 MB ~2 MB ~8 KB (buffer)
1 GB ~2 GB ~8 KB (buffer)

When to use each:

Scenario Recommended API
In-memory strings, small files Bulk
Files < 100 MB (typical) Bulk
Files > available RAM Stream
Network sockets Stream
Real-time log compression Stream
Single compress call Bulk
Data arriving in chunks Stream
Unknown original size (decompress) Stream

Key insight: The choice between zstd::bulk and zstd::stream reflects a fundamental trade-off between simplicity and scalability. The bulk API is optimal when all data fits comfortably in memory—it's a single function call, has no state to manage, and can leverage knowing the total size for optimization. The stream API, while requiring more setup with Encoder/Decoder structs and Read/Write trait implementations, provides the essential property of constant memory usage regardless of input size. For a 10 GB file, bulk compression would require at least 20 GB of RAM (input buffer plus worst-case output buffer), while stream compression might use only 64 KB of buffer plus internal state. The stream API also handles naturally streaming sources—network sockets, pipes, real-time log output—where the bulk API would require buffering everything first. The decompression asymmetry matters too: bulk decompression requires knowing the original size, which must be stored alongside the compressed data, while stream decompression discovers the output size as it processes. In practice, many applications use a hybrid approach: bulk for small data (under a threshold like 1-10 MB) and stream for larger files, getting the simplicity of bulk where memory isn't a concern and the scalability of stream where it is.