What is the purpose of `tower::load_shed::LoadShedLayer` and how does it protect services from overload?

tower::load_shed::LoadShedLayer is middleware that monitors a service's load and automatically rejects requests when the system becomes overloaded, returning errors immediately rather than queueing requests or allowing resource exhaustion. It implements the "load shedding" pattern—intentionally dropping requests under high load to preserve system stability and responsiveness for requests that are accepted. The layer tracks concurrency and/or queue depth, and when thresholds are exceeded, new requests receive an immediate error response rather than being processed. This prevents cascading failures, timeout amplification, and resource exhaustion that occur when systems accept more work than they can handle.

Basic LoadShed Usage

use tower::load_shed::{LoadShedLayer, LoadShed};
use tower::{Service, ServiceBuilder, ServiceExt};
use std::time::Duration;
 
// Example service that processes requests
struct MyService;
 
impl Service<String> for MyService {
    type Response = String;
    type Error = String;
    type Future = std::future::Ready<Result<String, String>>;
 
    fn poll_ready(&mut self, _cx: &mut std::task::Context<'_>) -> std::task::Poll<Result<(), Self::Error>> {
        // Service is always ready (simplified)
        std::task::Poll::Ready(Ok(()))
    }
 
    fn call(&mut self, req: String) -> Self::Future {
        std::future::ready(Ok(format!("Processed: {}", req)))
    }
}
 
fn main() {
    // Apply LoadShed layer to a service
    let service = ServiceBuilder::new()
        .layer(LoadShedLayer::new())  // Default: load shed based on readiness
        .service(MyService);
    
    // When underlying service is not ready, LoadShed returns error immediately
    println!("LoadShed middleware configured");
}

LoadShedLayer wraps a service and intercepts requests when overloaded.

How Load Shedding Protects Services

use tower::load_shed::LoadShedLayer;
use tower::{Service, ServiceBuilder, ServiceExt};
use std::sync::atomic::{AtomicUsize, Ordering};
use std::sync::Arc;
 
// Service with limited capacity
struct LimitedService {
    active_requests: Arc<AtomicUsize>,
    max_concurrent: usize,
}
 
impl Service<String> for LimitedService {
    type Response = String;
    type Error = String;
    type Future = std::future::Ready<Result<String, String>>;
 
    fn poll_ready(&mut self, _cx: &mut std::task::Context<'_>) -> std::task::Poll<Result<(), Self::Error>> {
        let current = self.active_requests.load(Ordering::Relaxed);
        if current < self.max_concurrent {
            // Accept more requests
            std::task::Poll::Ready(Ok(()))
        } else {
            // Service is overloaded - not ready
            std::task::Poll::Pending
        }
    }
 
    fn call(&mut self, req: String) -> Self::Future {
        self.active_requests.fetch_add(1, Ordering::Relaxed);
        // Process request...
        std::future::ready(Ok(format!("Processed: {}", req)))
    }
}
 
fn main() {
    let active_requests = Arc::new(AtomicUsize::new(0));
    
    // Without load shedding:
    // - poll_ready returns Pending when overloaded
    // - Requests queue up waiting for service
    // - Memory grows, timeouts cascade
    
    // With load shedding:
    // - LoadShed checks poll_ready
    // - If Pending, returns error immediately
    // - Request fails fast, client can retry
    // - System stays responsive
    
    let service = ServiceBuilder::new()
        .layer(LoadShedLayer::new())
        .service(LimitedService {
            active_requests,
            max_concurrent: 10,
        });
}

Load shedding converts Pending readiness into immediate errors.

The Overload Problem

// Without load shedding:
// 1. High traffic arrives
// 2. Service queues requests
// 3. Memory grows with queued requests
// 4. Response times increase
// 5. Clients timeout and retry
// 6. More requests queue
// 7. System becomes unresponsive
// 8. Cascading failure
 
// With load shedding:
// 1. High traffic arrives
// 2. LoadShed checks capacity
// 3. If overloaded, returns error immediately
// 4. Client receives fast failure
// 5. Service stays responsive for accepted requests
// 6. System remains stable
 
fn main() {
    // Problem: Accepting unlimited requests
    // - Queue grows unbounded
    // - Latency increases
    // - Eventually OOM or timeout storm
    
    // Solution: Shed load when overloaded
    // - Reject excess requests quickly
    // - Preserve capacity for accepted requests
    // - Stable latency for successful requests
}

Load shedding prevents the "death spiral" of overloaded systems.

Integration with Tower Middleware Stack

use tower::load_shed::LoadShedLayer;
use tower::{ServiceBuilder, ServiceExt};
use tower::limit::{ConcurrencyLimitLayer, RateLimitLayer};
use tower::timeout::TimeoutLayer;
use std::time::Duration;
 
fn main() {
    // Typical middleware stack with load shedding
    let service = ServiceBuilder::new()
        // Rate limiting: max requests per second
        .layer(RateLimitLayer::new(100, Duration::from_secs(1)))
        
        // Concurrency limit: max concurrent requests
        .layer(ConcurrencyLimitLayer::new(50))
        
        // Load shedding: reject when service not ready
        .layer(LoadShedLayer::new())
        
        // Timeout: fail slow requests
        .layer(TimeoutLayer::new(Duration::from_secs(30)))
        
        .service(MyService);
    
    // Order matters:
    // 1. Rate limit first (global throttle)
    // 2. Concurrency limit (active request cap)
    // 3. Load shed (catch-all for overload)
    // 4. Timeout (bound request duration)
}
 
struct MyService;
impl tower::Service<String> for MyService {
    type Response = String;
    type Error = String;
    type Future = std::future::Ready<Result<String, String>>;
 
    fn poll_ready(&mut self, _cx: &mut std::task::Context<'_>) -> std::task::Poll<Result<(), Self::Error>> {
        std::task::Poll::Ready(Ok(()))
    }
 
    fn call(&mut self, req: String) -> Self::Future {
        std::future::ready(Ok(format!("Processed: {}", req)))
    }
}

LoadShedLayer typically sits after concurrency limits in the middleware stack.

Concurrency Limit vs Load Shed

use tower::load_shed::LoadShedLayer;
use tower::limit::ConcurrencyLimitLayer;
use tower::{ServiceBuilder, ServiceExt};
 
fn main() {
    // ConcurrencyLimit: Queues requests up to a limit
    // - Enforces max concurrent requests
    // - Queues excess requests (waits)
    // - Good for predictable load
    
    // LoadShed: Rejects requests immediately
    // - Doesn't queue, returns error
    // - Responds to service readiness
    // - Good for unpredictable overload
    
    // Often used together:
    let service = ServiceBuilder::new()
        // ConcurrencyLimit: Queue up to 100 requests
        .layer(ConcurrencyLimitLayer::new(100))
        
        // LoadShed: If service can't accept, reject
        .layer(LoadShedLayer::new())
        
        .service(MyService);
    
    // Flow:
    // 1. Request arrives
    // 2. ConcurrencyLimit: if < 100 active, proceed
    // 3. LoadShed: check if service ready
    // 4. If ready: process
    // 5. If not ready: return error immediately
}

Concurrency limits queue; load shedding rejects.

Error Handling with LoadShed

use tower::load_shed::LoadShedLayer;
use tower::{Service, ServiceBuilder, ServiceExt};
use std::future::Future;
use std::pin::Pin;
 
#[derive(Debug)]
enum MyError {
    Overloaded,
    ProcessingError(String),
}
 
fn main() {
    // LoadShed returns error when overloaded
    // The error type must implement From<tower::load_shed::error::Overloaded>
    
    async fn handle_request() {
        let mut service = ServiceBuilder::new()
            .layer(LoadShedLayer::new())
            .service(MyService);
        
        // Service might return Overloaded error
        match service.ready().await {
            Ok(svc) => {
                match svc.call("request".to_string()).await {
                    Ok(response) => println!("Success: {}", response),
                    Err(e) => println!("Error: {:?}", e),
                }
            }
            Err(e) => {
                // Service not ready, load was shed
                println!("Service overloaded, request shed");
            }
        }
    }
}
 
struct MyService;
 
impl Service<String> for MyService {
    type Response = String;
    type Error = MyError;
    type Future = Pin<Box<dyn Future<Output = Result<String, MyError>>>>;
 
    fn poll_ready(&mut self, _cx: &mut std::task::Context<'_>) -> std::task::Poll<Result<(), MyError>> {
        std::task::Poll::Ready(Ok(()))
    }
 
    fn call(&mut self, req: String) -> Self::Future {
        Box::pin(std::future::ready(Ok(format!("Processed: {}", req))))
    }
}

Handle load shed errors gracefully in application code.

Configuration Options

use tower::load_shed::LoadShedLayer;
 
fn main() {
    // Basic load shed
    let _basic = LoadShedLayer::new();
    
    // LoadShedLayer::new() creates a layer that:
    // - Wraps the inner service
    // - Calls poll_ready on inner service
    // - If poll_ready returns Pending:
    //   - Returns error immediately (sheds load)
    // - If poll_ready returns Ready:
    //   - Forwards request to inner service
    
    // The layer itself doesn't have configuration
    // Configuration comes from the inner service's readiness
    
    // Common pattern: combine with other limiters
    // to control when poll_ready returns Pending
}

LoadShed configuration is implicit through service readiness.

Real-World Scenario: Web Service Protection

use tower::load_shed::LoadShedLayer;
use tower::limit::ConcurrencyLimitLayer;
use tower::timeout::TimeoutLayer;
use tower::{ServiceBuilder, ServiceExt};
use std::time::Duration;
 
// Simulating a web service handler
struct WebService;
 
impl tower::Service<HttpRequest> for WebService {
    type Response = HttpResponse;
    type Error = Error;
    type Future = std::future::Ready<Result<HttpResponse, Error>>;
 
    fn poll_ready(&mut self, _cx: &mut std::task::Context<'_>) -> std::task::Poll<Result<(), Error>> {
        // In reality, might check database connections, thread pool, etc.
        std::task::Poll::Ready(Ok(()))
    }
 
    fn call(&mut self, req: HttpRequest) -> Self::Future {
        std::future::ready(Ok(HttpResponse { status: 200, body: "OK".to_string() }))
    }
}
 
struct HttpRequest {
    path: String,
}
 
struct HttpResponse {
    status: u16,
    body: String,
}
 
#[derive(Debug)]
enum Error {
    Overloaded,
    Timeout,
}
 
fn main() {
    // Production-like middleware stack
    let service = ServiceBuilder::new()
        // Maximum concurrent requests
        .layer(ConcurrencyLimitLayer::new(100))
        
        // Shed load when inner service not ready
        .layer(LoadShedLayer::new())
        
        // Timeout for request processing
        .layer(TimeoutLayer::new(Duration::from_secs(30)))
        
        .service(WebService);
    
    // This protects against:
    // - Traffic spikes: rate limit absorbs
    // - Resource exhaustion: concurrency limit bounds
    // - Cascading failures: load shed fails fast
    // - Slow requests: timeout bounds duration
}

Load shedding is part of a comprehensive protection strategy.

When to Use Load Shedding

// USE LOAD SHEDDING WHEN:
// 1. Service has variable processing times
// 2. Downstream dependencies can become unavailable
// 3. You need to maintain responsiveness under load
// 4. Clients can handle rejection (retry logic)
// 5. Queue growth could cause OOM
 
// DON'T USE LOAD SHEDDING WHEN:
// 1. All requests must be processed (use queue instead)
// 2. No retry/client handling for errors
// 3. Load is predictable and bounded
// 4. Better to queue and process eventually
 
fn main() {
    // Example: API Gateway
    // - Variable traffic patterns
    // - Backend services may slow down
    // - Clients can retry with backoff
    // -> Use load shedding
    
    // Example: Background Job Processor
    // - Jobs must eventually complete
    // - Can tolerate queue growth
    // - No client waiting for response
    // -> Maybe use queue instead of shedding
}

Load shedding is appropriate when fail-fast is better than queue-and-wait.

Comparison with Other Patterns

// Load Shedding: Reject requests when overloaded
// Pros: Immediate feedback, preserves stability
// Cons: Lost requests, requires client retry
 
// Rate Limiting: Reject requests above threshold
// Pros: Predictable throughput, prevents spikes
// Cons: Doesn't adapt to service capacity
 
// Circuit Breaker: Reject when failure rate high
// Pros: Handles downstream failures
// Cons: Reactive, not proactive
 
// Bulkhead: Isolate resources per tenant
// Pros: Prevents noisy neighbor
// Cons: More complex resource management
 
// Queue: Buffer requests for later
// Pros: Smooths traffic, no lost requests
// Cons: Latency grows, memory risk
 
use tower::load_shed::LoadShedLayer;
use tower::limit::RateLimitLayer;
use tower::ServiceBuilder;
 
fn main() {
    // Often used together:
    let service = ServiceBuilder::new()
        // Rate limit: prevent spikes
        .layer(RateLimitLayer::new(1000, Duration::from_secs(1)))
        
        // Load shed: handle capacity overflow
        .layer(LoadShedLayer::new())
        
        .service(MyService);
    
    // Rate limit catches known overload
    // Load shed catches unexpected capacity issues
}

Load shedding complements other resilience patterns.

Graceful Degradation Pattern

use tower::load_shed::LoadShedLayer;
use tower::{Service, ServiceBuilder, ServiceExt};
use std::future::Future;
use std::pin::Pin;
 
#[derive(Debug)]
enum ServiceError {
    Overloaded,
    Internal(String),
}
 
struct GracefulService;
 
impl Service<String> for GracefulService {
    type Response = String;
    type Error = ServiceError;
    type Future = Pin<Box<dyn Future<Output = Result<String, ServiceError>>>>;
 
    fn poll_ready(&mut self, _cx: &mut std::task::Context<'_>) -> std::task::Poll<Result<(), ServiceError>> {
        std::task::Poll::Ready(Ok(()))
    }
 
    fn call(&mut self, req: String) -> Self::Future {
        Box::pin(std::future::ready(Ok(format!("Processed: {}", req))))
    }
}
 
async fn handle_with_fallback(req: String) -> String {
    let mut service = ServiceBuilder::new()
        .layer(LoadShedLayer::new())
        .service(GracefulService);
    
    match service.ready().await {
        Ok(mut svc) => {
            match svc.call(req).await {
                Ok(response) => response,
                Err(ServiceError::Overloaded) => {
                    // Return degraded response instead of error
                    "Service temporarily degraded, please retry".to_string()
                }
                Err(ServiceError::Internal(e)) => {
                    format!("Internal error: {}", e)
                }
            }
        }
        Err(_) => {
            // Service not ready, return fallback
            "Service busy, please retry later".to_string()
        }
    }
}
 
fn main() {
    // Clients can handle overload gracefully:
    // 1. Return cached/stale data
    // 2. Return simplified response
    // 3. Return "try again later" message
    // 4. Client implements exponential backoff
}

Graceful degradation provides meaningful responses during overload.

Synthesis

Core purpose: LoadShedLayer protects services from overload by rejecting requests when the service cannot handle them, returning immediate errors instead of queuing.

How it works:

Wraps inner service
Calls poll_ready before processing
If Pending: returns Overloaded error immediately
If Ready: forwards request to inner service

Protection mechanisms:

Prevents unbounded queue growth
Maintains responsiveness for accepted requests
Fails fast, allowing clients to retry
Avoids resource exhaustion
Prevents cascading failures

When to use:

Variable or unpredictable traffic
Services with downstream dependencies
When clients can handle rejection gracefully
As part of comprehensive resilience strategy

Common pattern:

ServiceBuilder::new()
    .layer(ConcurrencyLimitLayer::new(max))
    .layer(LoadShedLayer::new())
    .service(inner)

Key insight: Load shedding is about making explicit trade-offs—rejecting some requests to preserve system stability and responsiveness for others. It's better to fail fast and let clients retry than to accept requests that cannot be processed in a reasonable time. LoadShedLayer integrates this pattern into Tower's middleware ecosystem, working alongside rate limits, concurrency limits, and timeouts to create robust, self-protecting services.

What is the purpose of tower::load_shed::LoadShedLayer and how does it protect services from overload?