How does async_trait::async_trait handle boxed futures and what are the allocation overhead implications?
async_trait::async_trait transforms async methods into regular methods that return a Pin<Box<dyn Future>>, requiring a heap allocation for each method call to store the generated future's state machine, which has performance implications for high-frequency calls but enables async methods in trait objects. The macro works around Rust's limitation that async methods cannot be used in traits by boxing the returned futures, making them concrete and sized at compile time.
The Async Trait Problem
// This doesn't work in stable Rust:
trait AsyncService {
async fn fetch(&self, id: u32) -> String;
}
// Error: async fn in traits are not supported
// The compiler cannot determine the size of the future
// returned by the async method at compile timeRust's type system requires knowing the size of all types at compile time. Async functions return anonymous impl Future types with sizes that depend on their internal state. In traits, this creates a problem: the compiler cannot guarantee the size of the return type across different implementations.
The async_trait Macro Solution
use async_trait::async_trait;
#[async_trait]
trait AsyncService {
async fn fetch(&self, id: u32) -> String;
}
struct MyService;
#[async_trait]
impl AsyncService for MyService {
async fn fetch(&self, id: u32) -> String {
// Simulated async work
format!("Fetched item {}", id)
}
}
// The macro transforms this into:
// fn fetch(&self, id: u32) -> Pin<Box<dyn Future<Output = String> + Send + '_>>The macro converts async methods into synchronous methods returning boxed futures.
What the Macro Generates
use async_trait::async_trait;
use std::future::Future;
use std::pin::Pin;
// Before macro expansion:
#[async_trait]
trait Service {
async fn process(&self, data: &str) -> Result<String, Error>;
}
// After macro expansion (simplified):
trait Service {
fn process<'life0, 'life1, 'async_trait>(
&'life0 self,
data: &'life1 str,
) -> Pin<Box<dyn Future<Output = Result<String, Error>> + Send + 'async_trait>>
where
'life0: 'async_trait,
'life1: 'async_trait,
Self: 'async_trait;
}
// The implementation becomes:
impl Service for MyService {
fn process<'life0, 'life1, 'async_trait>(
&'life0 self,
data: &'life1 str,
) -> Pin<Box<dyn Future<Output = Result<String, Error>> + Send + 'async_trait>>
where
'life0: 'async_trait,
'life1: 'async_trait,
Self: 'async_trait,
{
Box::pin(async move {
// Original async body here
Ok(format!("Processed: {}", data))
})
}
}The macro creates a synchronous function that boxes the async body.
The Box::pin Allocation
use std::future::Future;
use std::pin::Pin;
// Every call to an async_trait method does this:
fn async_trait_method(&self) -> Pin<Box<dyn Future<Output = String> + Send + '_>> {
Box::pin(async {
// Your async code
String::new()
})
}
// Box::pin:
// 1. Allocates heap memory
// 2. Moves the future's state machine into that memory
// 3. Returns a Pin<Box<...>> that owns the heap allocation
// The allocation holds:
// - All local variables captured across await points
// - The state of the async state machine
// - Any values suspended in .awaitEach method call allocates heap memory for the future's state.
Allocation Overhead Analysis
use async_trait::async_trait;
use std::time::Instant;
#[async_trait]
trait Worker {
async fn do_work(&self, input: u32) -> u32;
}
struct FastWorker;
#[async_trait]
impl Worker for FastWorker {
async fn do_work(&self, input: u32) -> u32 {
// Minimal async work
input * 2
}
}
fn measure_overhead() {
let worker = FastWorker;
// Each call allocates a Box
let start = Instant::now();
for i in 0..10000 {
let future = worker.do_work(i);
// The future is already boxed
// The allocation happened in do_work()
}
let elapsed = start.elapsed();
// Compare to non-async version:
trait SyncWorker {
fn do_work(&self, input: u32) -> u32;
}
struct SyncFastWorker;
impl SyncWorker for SyncFastWorker {
fn do_work(&self, input: u32) -> u32 {
input * 2
}
}
let sync_worker = SyncFastWorker;
let start = Instant::now();
for i in 0..10000 {
let _ = sync_worker.do_work(i);
// No allocation, just computation
}
let sync_elapsed = start.elapsed();
println!("Async trait overhead: {:?}", elapsed);
println!("Sync baseline: {:?}", sync_elapsed);
}The overhead includes heap allocation and dynamic dispatch.
Size of Boxed Futures
use std::future::Future;
use std::pin::Pin;
// Boxed future size depends on captured state
async fn minimal() -> u32 {
42
}
async fn captures_data(data: Vec<u8>) -> usize {
// Future captures: the Vec<u8>
// Plus state machine overhead
data.len()
}
async fn captures_multiple(a: String, b: Vec<u8>, c: HashMap<u32, String>) -> usize {
// Future captures: a, b, c
// Plus state machine overhead
a.len() + b.len() + c.len()
}
// The Box allocation size:
// minimal(): ~0 bytes of captured state (just returns constant)
// captures_data(): size of Vec<u8> + state machine overhead
// captures_multiple(): size of all three + state machine overhead
// Heap allocation size = sizeof(FutureStateMachine)
// where FutureStateMachine contains all captured variablesLarger captured state means larger heap allocations.
Dynamic Dispatch Overhead
use async_trait::async_trait;
// Box<dyn Future> adds dynamic dispatch overhead
#[async_trait]
trait Processor {
async fn process(&self, data: &[u8]) -> Vec<u8>;
}
// The boxed future uses:
// - dyn Future<Output = ...>
// - Virtual method table (vtable) for Future::poll
// Each .await on a boxed future:
// 1. Loads vtable pointer
// 2. Calls Future::poll through vtable
// 3. Returns Poll::Ready or Poll::Pending
// This is slower than:
// - Direct call to typed Future::poll
// - Inlined poll method
// But necessary for:
// - Trait objects
// - Dynamic dispatch
// - Heterogeneous collectionsBoxed futures require virtual dispatch for every poll.
Send and Sync Considerations
use async_trait::async_trait;
// async_trait adds Send bounds by default
#[async_trait] // Implies Future + Send
trait SendService {
async fn fetch(&self) -> String;
}
// Equivalent to:
// Pin<Box<dyn Future<Output = String> + Send + '_>>
// For non-Send futures:
#[async_trait(?Send)]
trait LocalService {
async fn local_op(&self) -> String;
}
// Equivalent to:
// Pin<Box<dyn Future<Output = String> + '_>>
// (no Send bound)
struct RcHolder {
data: std::rc::Rc<String>, // Rc is not Send
}
#[async_trait(?Send)]
impl LocalService for RcHolder {
async fn local_op(&self) -> String {
// Can use Rc here because no Send bound
self.data.clone()
}
}The ?Send variant allows non-thread-safe types in async methods.
Comparing to Native Async Traits
// Rust 1.75+ supports native async fn in traits:
// Native (no allocation):
trait NativeAsync {
async fn process(&self, data: &[u8]) -> Vec<u8>;
}
impl NativeAsync for MyType {
async fn process(&self, data: &[u8]) -> Vec<u8> {
// Returns impl Future (no boxing)
data.to_vec()
}
}
// But native async traits can't be trait objects:
// let service: Box<dyn NativeAsync> = ...; // Error!
// async_trait allows trait objects:
#[async_trait]
trait AsyncTraitObject {
async fn process(&self, data: &[u8]) -> Vec<u8>;
}
// This works:
// let service: Box<dyn AsyncTraitObject> = ...; // OK!
fn use_trait_object(service: Box<dyn AsyncTraitObject>) {
// Possible because futures are boxed
// Box<dyn Future<...>> is object-safe
}Native async traits return impl Future; async_trait returns Box<dyn Future>.
When Overhead Matters
use async_trait::async_trait;
// High-frequency calls: overhead matters
#[async_trait]
trait HighFrequency {
async fn tick(&self) -> u64;
}
// Called millions of times per second
// Each call: one allocation
// Allocations add up
// Low-frequency calls: overhead negligible
#[async_trait]
trait LowFrequency {
async fn process_batch(&self, items: Vec<Item>) -> Result<(), Error>;
}
// Called occasionally
// Allocation cost is tiny compared to batch processing time
// async_trait is fine here
// Mitigation strategies for high-frequency:
// 1. Don't use async_trait for hot paths
trait SyncHotPath {
fn compute(&self, input: u32) -> u32; // Synchronous
}
// 2. Use native async traits (Rust 1.75+)
trait NativeHotPath {
async fn compute(&self, input: u32) -> u32; // No boxing
}
// 3. Accept the overhead if trait objects needed
// Profile and measure - may be acceptableConsider call frequency when choosing async trait implementations.
Allocation in Executor Context
use async_trait::async_trait;
// In async context, allocations happen at call time
#[async_trait]
trait Database {
async fn query(&self, sql: &str) -> Vec<Row>;
}
struct Postgres;
#[async_trait]
impl Database for Postgres {
async fn query(&self, sql: &str) -> Vec<Row> {
// This method:
// 1. Called -> Box::pin allocates
// 2. Executor polls future
// 3. Future completes
// 4. Box deallocated
vec![]
}
}
async fn handle_request(db: &dyn Database) {
// Allocation happens here
let rows = db.query("SELECT * FROM users").await;
// If called in a loop:
for _ in 0..100 {
// 100 allocations
let _ = db.query("SELECT 1").await;
}
}Each async trait method call in a loop creates allocations.
Memory Fragmentation Concerns
use async_trait::async_trait;
// Many small allocations can fragment memory
#[async_trait]
trait FragmentationRisk {
async fn small_op(&self) -> u32;
}
// High-frequency calls create many short-lived Box allocations
// This can fragment the heap over time
// Mitigation:
// 1. Pool or reuse futures when possible
// 2. Batch operations to reduce call frequency
// 3. Use arena allocators (but futures need 'static usually)
// 4. Accept fragmentation for simplicity in non-extreme cases
// Modern allocators (jemalloc, mimalloc) handle small allocations well
// Fragmentation is rarely a practical issueSmall frequent allocations can fragment memory; modern allocators mitigate this.
Alternatives to async_trait
use async_trait::async_trait;
use std::future::Future;
use std::pin::Pin;
// Option 1: async_trait macro (boxed futures)
#[async_trait]
trait BoxedAsync {
async fn run(&self) -> u32;
}
// Option 2: Manual boxing (same result, no macro)
trait ManualBoxed {
fn run<'a>(&'a self) -> Pin<Box<dyn Future<Output = u32> + Send + 'a>>;
}
impl ManualBoxed for MyType {
fn run<'a>(&'a self) -> Pin<Box<dyn Future<Output = u32> + Send + 'a>> {
Box::pin(async move { 42 })
}
}
// Option 3: Native async traits (Rust 1.75+, no boxing, no trait objects)
trait NativeAsync {
async fn run(&self) -> u32;
}
// Option 4: Return impl Future (no trait objects)
trait ImplFuture {
fn run(&self) -> impl Future<Output = u32>;
}
// Option 5: async fn in impl block (works with native traits)
trait Service {
async fn run(&self) -> u32;
}
impl Service for MyService {
async fn run(&self) -> u32 {
42
}
}
// For trait objects, only async_trait works:
fn use_dyn(service: &dyn BoxedAsync) {
// Works because futures are boxed
}
// This doesn't work with native traits:
// fn use_dyn_native(service: &dyn NativeAsync) { }
// Error: `dyn NativeAsync` is not object-safeChoose based on whether you need trait objects.
Performance Comparison
use async_trait::async_trait;
use std::time::Instant;
// Native async trait (no allocation)
trait NativeService {
async fn compute(&self, n: u64) -> u64;
}
struct NativeImpl;
impl NativeService for NativeImpl {
async fn compute(&self, n: u64) -> u64 {
n * 2
}
}
// async_trait (boxed futures)
#[async_trait]
trait BoxedService {
async fn compute(&self, n: u64) -> u64;
}
struct BoxedImpl;
#[async_trait]
impl BoxedService for BoxedImpl {
async fn compute(&self, n: u64) -> u64 {
n * 2
}
}
async fn benchmark() {
let native = NativeImpl;
let boxed = BoxedImpl;
// Native: no allocation per call
let start = Instant::now();
for i in 0..10000 {
let _ = native.compute(i).await;
}
println!("Native: {:?}", start.elapsed());
// Boxed: allocation per call
let start = Instant::now();
for i in 0..10000 {
let _ = boxed.compute(i).await;
}
println!("Boxed: {:?}", start.elapsed());
// Boxed is slower due to:
// 1. Heap allocation
// 2. Dynamic dispatch
// 3. Cache misses (indirect calls)
}Native async traits are faster when trait objects aren't needed.
Practical Guidance
use async_trait::async_trait;
// Use async_trait when:
// 1. You need trait objects (dyn Trait)
// 2. You need compile-time compatibility with older Rust
// 3. The method is not called in tight loops
// 4. The async work dominates the allocation cost
// Example: Good use of async_trait
#[async_trait]
trait Repository {
// Called occasionally, I/O-bound
async fn find_user(&self, id: u64) -> Option<User>;
async fn save_user(&self, user: &User) -> Result<(), Error>;
}
// Use native async traits when:
// 1. Rust 1.75+ is available
// 2. No trait objects needed
// 3. Performance is critical
// Example: Native async trait preferred
trait Cache {
// Could be high-frequency
async fn get(&self, key: &str) -> Option<Bytes>;
}
// Avoid async_trait for:
// 1. Hot paths in tight loops
// 2. Methods called millions of times
// 3. Methods that barely do any work
// Example: Avoid async_trait here
#[async_trait]
trait Counter {
// Called millions of times - allocation overhead adds up
async fn increment(&self) -> u64; // Bad
}
// Better:
trait Counter {
fn increment(&self) -> u64; // Synchronous, no allocation
}Choose based on call frequency and trait object requirements.
Synthesis
How async_trait works:
// Input:
#[async_trait]
trait Service {
async fn run(&self) -> String;
}
// Output (simplified):
trait Service {
fn run(&self) -> Pin<Box<dyn Future<Output = String> + Send + '_>>;
}
// Each call:
// 1. Box::pin allocates heap memory
// 2. Future state machine stored in Box
// 3. Returns Pin<Box<dyn Future<...>>>
// 4. Executor polls through vtable
// 5. Box deallocated when future completesAllocation overhead:
| Operation | Cost |
|---|---|
Box::pin |
1 heap allocation |
dyn Future::poll |
Virtual dispatch (vtable lookup) |
| Captured state | Size of future stored on heap |
When overhead matters:
| Scenario | Overhead Impact |
|---|---|
| I/O-bound (database, network) | Negligible |
| Occasional calls | Negligible |
| Tight loops, hot paths | Measurable |
| Methods returning immediately | Significant |
Trade-offs:
| Aspect | async_trait |
Native async trait |
|---|---|---|
| Trait objects | Supported | Not supported |
| Allocation | Yes (Box) | No |
| Dynamic dispatch | Yes | No |
| Rust version | Any | 1.75+ |
| Performance | Lower | Higher |
Key insight: async_trait::async_trait enables async methods in traits by boxing futures, requiring one heap allocation per method call and adding dynamic dispatch overhead. For trait objects and code that needs to work on older Rust versions, this overhead is acceptableâthe macro solves a real problem that Rust's type system cannot express without it. However, for performance-critical code or high-frequency calls, native async traits (Rust 1.75+) avoid both the allocation and dynamic dispatch, making them preferable when trait objects aren't needed. The allocation size depends on the captured state of the async functionâfutures that capture large values allocate more heap memory.
