Loading pageâŚ
Rust walkthroughs
Loading pageâŚ
zstd::bulk and stream compression APIs for different data sizes?zstd::bulk provides all-at-once compression where the entire input must fit in memory and the output is produced as a single contiguous buffer, while zstd::stream offers incremental processing that compresses data in chunks, enabling constant memory usage regardless of input size. The bulk API is simpler and faster for small to medium dataâcompressing a string, a file that fits in memory, or a network responseâbecause it avoids the overhead of maintaining compression state across calls and can optimize based on knowing the total size upfront. The stream API is essential for large files, network streams, or any scenario where holding all data in memory is impractical, trading some overhead for the ability to process gigabytes of data with a fixed memory budget. The choice between them hinges on data size, memory constraints, and whether the data is already available or arriving incrementally.
use zstd::bulk;
fn main() {
let data = b"Hello, World! This is some data to compress.";
// Bulk compression: all data at once
let compressed = bulk::compress(data, 3).unwrap();
println!("Original size: {} bytes", data.len());
println!("Compressed size: {} bytes", compressed.len());
println!("Compression ratio: {:.2}x",
data.len() as f64 / compressed.len() as f64);
// Decompression also bulk
let decompressed = bulk::decompress(&compressed, data.len()).unwrap();
assert_eq!(data.to_vec(), decompressed);
}bulk::compress takes the entire input and returns the complete compressed output.
use zstd::stream::{compress, decompress};
use std::io::Cursor;
fn main() {
let data = b"Hello, World! This is some data to compress.";
// Stream compression through Read/Write traits
let mut input = Cursor::new(data);
let mut output = Vec::new();
compress(&mut input, &mut output, 3).unwrap();
println!("Original size: {} bytes", data.len());
println!("Compressed size: {} bytes", output.len());
// Stream decompression
let mut decompressed = Vec::new();
decompress(&mut Cursor::new(&output), &mut decompressed).unwrap();
assert_eq!(data.to_vec(), decompressed);
}stream::compress uses Read and Write traits for incremental processing.
use zstd::bulk;
use zstd::stream::{Encoder, Decoder};
use std::io::{Cursor, Write, Read};
fn main() {
// Create test data
let data = vec
![0u8; 1024 * 1024]; // 1MB
// Bulk: allocates output buffer sized for worst case
// Worst case is slightly larger than input for incompressible data
let compressed_bulk = bulk::compress(&data, 3).unwrap();
println!("Bulk compressed: {} bytes", compressed_bulk.len());
// Memory: input buffer + output buffer (~2MB total at peak)
// Stream: fixed buffer size regardless of input
let mut encoder = Encoder::new(Vec::new(), 3).unwrap();
// Process in chunks
for chunk in data.chunks(8192) {
encoder.write_all(chunk).unwrap();
}
let compressed_stream = encoder.finish().unwrap();
println!("Stream compressed: {} bytes", compressed_stream.len());
// Memory: small fixed buffers (~8KB chunk + internal state)
}Bulk requires memory proportional to input size; stream uses fixed buffers.
use zstd::bulk;
fn compress_data(data: &[u8]) -> Vec<u8> {
// Single function call
bulk::compress(data, 3).unwrap()
}
fn decompress_data(compressed: &[u8], original_size: usize) -> Vec<u8> {
// Single function call, but need to know original size
bulk::decompress(compressed, original_size).unwrap()
}
fn main() {
let original = b"Simple text to compress";
let compressed = compress_data(original);
let decompressed = decompress_data(&compressed, original.len());
assert_eq!(original.to_vec(), decompressed);
println!("Bulk API: straightforward for small data");
}The bulk API is a single function call with no state management.
use zstd::stream::Encoder;
use std::io::Write;
fn main() {
let mut encoder = Encoder::new(Vec::new(), 3).unwrap();
// Can write data incrementally from multiple sources
encoder.write_all(b"First chunk ").unwrap();
encoder.write_all(b"second chunk ").unwrap();
encoder.write_all(b"third chunk").unwrap();
// Finalize the stream
let compressed = encoder.finish().unwrap();
println!("Compressed {} bytes from 3 chunks", compressed.len());
// Decompress to verify
let decompressed = zstd::bulk::decompress(&compressed, 28).unwrap();
assert_eq!(b"First chunk second chunk third chunk", &decompressed[..]);
}Stream encoder accepts data in multiple writes, useful for progressive data generation.
use zstd::bulk;
use std::fs;
fn main() {
// Bulk compression for files that fit in memory
let content = fs::read("large_file.txt")
.expect("Failed to read file");
// Compress entire file at once
let compressed = bulk::compress(&content, 3)
.expect("Compression failed");
println!("Original: {} bytes", content.len());
println!("Compressed: {} bytes", compressed.len());
// Write compressed data
fs::write("large_file.txt.zst", &compressed)
.expect("Failed to write compressed file");
// Simple but requires entire file in memory
}Bulk compression is simple for files that comfortably fit in RAM.
use zstd::stream::Encoder;
use std::fs::File;
use std::io::{BufReader, BufWriter, Read, Write};
fn compress_file_streaming(input_path: &str, output_path: &str) -> std::io::Result<()> {
let input = File::open(input_path)?;
let output = File::create(output_path)?;
let mut reader = BufReader::new(input);
let mut encoder = Encoder::new(BufWriter::new(output), 3)?;
// Process in fixed-size chunks
let mut buffer = [0u8; 8192];
loop {
let bytes_read = reader.read(&mut buffer)?;
if bytes_read == 0 {
break;
}
encoder.write_all(&buffer[..bytes_read])?;
}
encoder.finish()?;
Ok(())
}
fn main() {
compress_file_streaming("large_file.txt", "large_file.txt.zst")
.expect("Compression failed");
println!("Streamed compression: constant memory usage");
}Stream compression processes files of any size with fixed memory.
use zstd::bulk;
fn main() {
let data = b"Data with known size";
// Bulk compression can use size for optimization
// zstd can optimize when it knows total input size
let compressed = bulk::compress(data, 3).unwrap();
// Decompression requires knowing original size
// This is a limitation: you must track sizes separately
let decompressed = bulk::decompress(&compressed, data.len()).unwrap();
// For unknown sizes, stream decompression is needed
println!("Bulk decompress requires knowing original size: {} bytes", data.len());
}Bulk decompression requires knowing the original size; streams don't.
use zstd::stream::decompress;
use std::io::Cursor;
fn main() {
let original = b"Data with unknown size after compression";
let compressed = zstd::bulk::compress(original, 3).unwrap();
// Stream decompression doesn't need original size
let mut decompressed = Vec::new();
decompress(&mut Cursor::new(&compressed), &mut decompressed).unwrap();
assert_eq!(original.to_vec(), decompressed);
println!("Stream decompress: no size needed, got {} bytes", decompressed.len());
}Stream decompression discovers the output size automatically.
use zstd::bulk;
use zstd::stream::{compress, Encoder};
use std::io::Cursor;
use std::time::Instant;
fn main() {
let sizes = [1024, 10 * 1024, 100 * 1024, 1024 * 1024];
for size in sizes {
let data: Vec<u8> = (0..size).map(|i| (i % 256) as u8).collect();
// Bulk compression
let start = Instant::now();
let bulk_compressed = bulk::compress(&data, 3).unwrap();
let bulk_time = start.elapsed();
// Stream compression
let start = Instant::now();
let mut output = Vec::new();
compress(&mut Cursor::new(&data), &mut output, 3).unwrap();
let stream_time = start.elapsed();
println!("Size: {} bytes", size);
println!(" Bulk: {:?}, {} bytes output", bulk_time, bulk_compressed.len());
println!(" Stream: {:?}, {} bytes output", stream_time, output.len());
println!();
}
}Bulk is typically faster for small data; overhead differs based on data size.
use zstd::stream::Encoder;
use std::io::Write;
struct NetworkStream {
// Simulated network output
buffer: Vec<u8>,
}
impl Write for NetworkStream {
fn write(&mut self, buf: &[u8]) -> std::io::Result<usize> {
// In reality, this would send over network
self.buffer.extend_from_slice(buf);
Ok(buf.len())
}
fn flush(&mut self) -> std::io::Result<()> {
Ok(())
}
}
fn main() {
let network = NetworkStream { buffer: Vec::new() };
let mut encoder = Encoder::new(network, 3).unwrap();
// Simulate streaming data to network
let chunks = [
b"Log entry 1: Application started\n",
b"Log entry 2: User logged in\n",
b"Log entry 3: Request processed\n",
];
for chunk in chunks {
encoder.write_all(chunk).unwrap();
// Data could be sent over network here
}
let network = encoder.finish().unwrap();
println!("Compressed {} bytes to network stream", network.buffer.len());
// Bulk API couldn't handle this without buffering all log entries
}Stream compression enables real-time network compression without buffering.
use zstd::bulk;
use std::time::Instant;
fn main() {
let data: Vec<u8> = (0..1024 * 1024).map(|i| (i % 256) as u8).collect();
let levels = [1, 3, 9, 19];
for level in levels {
let start = Instant::now();
let compressed = bulk::compress(&data, level).unwrap();
let duration = start.elapsed();
println!("Level {}: {:?}, {} bytes ({:.1}% of original)",
level,
duration,
compressed.len(),
100.0 * compressed.len() as f64 / data.len() as f64
);
}
// Higher levels give better compression but take longer
// Bulk API benefits from knowing size at all levels
}Compression level affects both APIs similarly; bulk may benefit more from size knowledge.
use zstd::bulk;
use zstd::dict::{from_samples, EncoderDictionary, DecoderDictionary};
fn main() {
// Training data
let samples: Vec<&[u8]> = vec
![
b"user_id:123,name:Alice,age:30",
b"user_id:456,name:Bob,age:25",
b"user_id:789,name:Charlie,age:35",
];
// Train dictionary (bulk API)
let dictionary = from_samples(&samples, 1024).unwrap();
// Create encoder/decoder dictionaries for bulk compression
let encoder_dict = EncoderDictionary::copy(&dictionary);
let decoder_dict = DecoderDictionary::copy(&dictionary);
// New data to compress
let new_data = b"user_id:999,name:David,age:40";
// Bulk compress with dictionary
let compressed = bulk::compress_with_dict(new_data, 3, &encoder_dict).unwrap();
// Decompress with dictionary
let decompressed = bulk::decompress_with_dict(&compressed, new_data.len(), &decoder_dict).unwrap();
println!("Original: {} bytes", new_data.len());
println!("Compressed with dict: {} bytes", compressed.len());
println!("Decompressed: {}", String::from_utf8_lossy(&decompressed));
}Dictionary compression improves ratio for similar data; both bulk and stream support it.
use zstd::stream::{Encoder, Decoder};
use zstd::dict::{from_samples, EncoderDictionary, DecoderDictionary};
use std::io::{Cursor, Read, Write};
fn main() {
let samples: Vec<&[u8]> = vec
![
b"sample data 1",
b"sample data 2",
b"sample data 3",
];
let dictionary = from_samples(&samples, 256).unwrap();
let encoder_dict = EncoderDictionary::copy(&dictionary);
let decoder_dict = DecoderDictionary::copy(&dictionary);
// Stream compression with dictionary
let mut encoder = Encoder::with_dictionary(Vec::new(), 3, &encoder_dict).unwrap();
encoder.write_all(b"new data to compress").unwrap();
let compressed = encoder.finish().unwrap();
// Stream decompression with dictionary
let mut decoder = Decoder::with_dictionary(Cursor::new(&compressed), &decoder_dict).unwrap();
let mut decompressed = Vec::new();
decoder.read_to_end(&mut decompressed).unwrap();
println!("Stream + dictionary: {} bytes", compressed.len());
}Dictionary compression works with streams for similar benefit on streaming data.
use zstd::bulk;
use zstd::stream::Encoder;
use std::io::Write;
fn compress_small_data(data: &[u8]) -> Vec<u8> {
// Bulk is simpler and faster for small data
bulk::compress(data, 3).unwrap()
}
fn compress_large_data<R: std::io::Read, W: std::io::Write>(
reader: &mut R,
writer: &mut W,
) -> std::io::Result<()> {
// Stream for large data or streaming sources
let mut encoder = Encoder::new(writer, 3)?;
std::io::copy(reader, &mut encoder)?;
encoder.finish()?;
Ok(())
}
fn main() {
// Small data: bulk
let small = b"Small string";
let compressed = compress_small_data(small);
println!("Small data: {} -> {} bytes", small.len(), compressed.len());
// Large or streaming data: stream
let large_data = vec
![0u8; 10 * 1024 * 1024]; // 10MB
let mut reader = large_data.as_slice();
let mut output = Vec::new();
compress_large_data(&mut reader, &mut output).unwrap();
println!("Large data: {} -> {} bytes", large_data.len(), output.len());
}Use bulk for data that fits comfortably in memory; stream for large or streaming data.
use zstd::stream::Encoder;
use std::fs::File;
use std::io::{BufReader, BufWriter, Read, Write};
fn compress_file_optimized(input_path: &str, output_path: &str) -> std::io::Result<()> {
let input = File::open(input_path)?;
let metadata = input.metadata()?;
let file_size = metadata.len();
let output = File::create(output_path)?;
// For small files, could use bulk
if file_size < 1024 * 1024 {
let data = std::fs::read(input_path)?;
let compressed = zstd::bulk::compress(&data, 3)
.map_err(|e| std::io::Error::other(e))?;
std::fs::write(output_path, compressed)?;
return Ok(());
}
// For large files, use streaming
let mut reader = BufReader::new(input);
let mut encoder = Encoder::new(BufWriter::new(output), 3)?;
let mut buffer = vec
![0u8; 64 * 1024];
loop {
let bytes_read = reader.read(&mut buffer)?;
if bytes_read == 0 {
break;
}
encoder.write_all(&buffer[..bytes_read])?;
}
encoder.finish()?;
Ok(())
}
fn main() {
println!("Hybrid approach: bulk for small, stream for large");
}A hybrid approach chooses the API based on data size.
use zstd::bulk;
use zstd::stream::Encoder;
use std::io::Write;
struct MemoryBudget {
available_bytes: usize,
}
impl MemoryBudget {
fn can_use_bulk(&self, data_size: usize) -> bool {
// Bulk needs roughly 2x data size (input + worst-case output)
let bulk_requirement = data_size * 2;
bulk_requirement < self.available_bytes
}
fn recommended_buffer_size(&self) -> usize {
// For streaming, use small fraction of budget
(self.available_bytes / 4).min(64 * 1024)
}
}
fn main() {
let budget = MemoryBudget { available_bytes: 1024 * 1024 }; // 1MB
let small_data = vec
![0u8; 100 * 1024]; // 100KB
let large_data = vec
![0u8; 10 * 1024 * 1024]; // 10MB
println!("Can use bulk for 100KB: {}", budget.can_use_bulk(small_data.len()));
println!("Can use bulk for 10MB: {}", budget.can_use_bulk(large_data.len()));
println!("Recommended stream buffer: {} bytes", budget.recommended_buffer_size());
if budget.can_use_bulk(small_data.len()) {
let compressed = bulk::compress(&small_data, 3).unwrap();
println!("Used bulk: {} -> {} bytes", small_data.len(), compressed.len());
}
}Consider memory constraints when choosing between bulk and stream.
API comparison:
| Aspect | zstd::bulk | zstd::stream |
|--------|--------------|----------------|
| Input | Single slice &[u8] | Read trait |
| Output | Single Vec<u8> | Write trait |
| Memory | Proportional to data size | Fixed buffer size |
| Simplicity | Single function call | State management |
| Use case | Small/medium data | Large data, streams |
| Size knowledge | Required for decompress | Not required |
Memory usage patterns:
| Data Size | Bulk Memory | Stream Memory | |-----------|-------------|---------------| | 1 KB | ~2 KB | ~8 KB (buffer) | | 1 MB | ~2 MB | ~8 KB (buffer) | | 1 GB | ~2 GB | ~8 KB (buffer) |
When to use each:
| Scenario | Recommended API | |----------|----------------| | In-memory strings, small files | Bulk | | Files < 100 MB (typical) | Bulk | | Files > available RAM | Stream | | Network sockets | Stream | | Real-time log compression | Stream | | Single compress call | Bulk | | Data arriving in chunks | Stream | | Unknown original size (decompress) | Stream |
Key insight: The choice between zstd::bulk and zstd::stream reflects a fundamental trade-off between simplicity and scalability. The bulk API is optimal when all data fits comfortably in memoryâit's a single function call, has no state to manage, and can leverage knowing the total size for optimization. The stream API, while requiring more setup with Encoder/Decoder structs and Read/Write trait implementations, provides the essential property of constant memory usage regardless of input size. For a 10 GB file, bulk compression would require at least 20 GB of RAM (input buffer plus worst-case output buffer), while stream compression might use only 64 KB of buffer plus internal state. The stream API also handles naturally streaming sourcesânetwork sockets, pipes, real-time log outputâwhere the bulk API would require buffering everything first. The decompression asymmetry matters too: bulk decompression requires knowing the original size, which must be stored alongside the compressed data, while stream decompression discovers the output size as it processes. In practice, many applications use a hybrid approach: bulk for small data (under a threshold like 1-10 MB) and stream for larger files, getting the simplicity of bulk where memory isn't a concern and the scalability of stream where it is.