The Rust Collections Guide
Strings and Text as Collection-Like Data
Last Updated: 2026-04-05
Why strings matter in a collections guide
Strings are not just a special-case primitive in Rust. They are collection-like types built on contiguous UTF-8 bytes, and many of the same questions that apply to other collections apply here too. Who owns the data? Is the view borrowed or owned? Can it grow? Can it be sliced safely? What counts as one element?
Rust separates owned text from borrowed text. String is the owned growable text type. str, usually seen through &str, is a borrowed string slice. This relationship is closely parallel to Vec<T> and &[T].
Understanding this split is essential because text handling in Rust is intentionally explicit. Strings are powerful and flexible, but they are also one of the first places where UTF-8 reality becomes visible.
fn main() {
let owned: String = String::from("hello");
let borrowed: &str = &owned;
println!("owned = {}", owned);
println!("borrowed = {}", borrowed);
}That small example captures the central idea. A String owns text. A &str borrows a view into UTF-8 text that already exists.
Owned text: `String`
String is the standard owned text type in Rust. It stores UTF-8 encoded bytes in a growable heap buffer. In practice, that means it behaves a lot like a Vec<u8> with extra guarantees that the contents remain valid UTF-8.
Because it is owned and growable, String is useful when your program needs to build, modify, accumulate, or return text.
fn main() {
let mut message = String::from("hello");
message.push(' ');
message.push_str("world");
println!("{}", message);
}This is one of the most common uses of String: dynamic text construction. It owns its bytes and can grow as new text is appended.
Borrowed text: `str` and `&str`
str is the string slice type. Like [T], it is unsized, so in practice you use it through references such as &str. A &str is a borrowed view into valid UTF-8 text.
String literals are &'static str, which means they are borrowed string slices embedded in the program binary.
fn greet(name: &str) {
println!("hello, {name}");
}
fn main() {
let literal = "Rust";
let owned = String::from("world");
greet(literal);
greet(&owned);
}This is why &str is such a common function parameter. It lets callers pass string literals, borrowed portions of other strings, or borrowed views of String values, all without forcing extra allocation.
The `String` and `&str` relationship
A useful mental model is that String is to &str what Vec<T> is to &[T]. The owned type manages storage. The borrowed type provides a view into existing data.
This distinction affects API design. A function that only reads text usually wants &str. A function that produces new owned text often returns String.
fn emphasize(input: &str) -> String {
format!("***{}***", input)
}
fn main() {
let a = "note";
let b = String::from("warning");
println!("{}", emphasize(a));
println!("{}", emphasize(&b));
}This is a strong text API because it accepts flexible borrowed input and returns clear owned output.
Strings are UTF-8, not arrays of characters
One of the biggest conceptual shifts for many learners is that Rust strings are UTF-8 byte sequences, not fixed-width arrays of characters. This means the number of bytes and the number of visible user-perceived characters are not the same thing.
fn main() {
let s = "é";
println!("text = {}", s);
println!("byte length = {}", s.len());
}The string "é" looks like one character, but in UTF-8 it uses more than one byte. That is why len() on a string returns the number of bytes, not the number of Unicode scalar values or grapheme clusters.
This is not a bug or inconvenience added by Rust. It is a consequence of representing text as UTF-8.
Why direct string indexing is not allowed
Rust does not allow indexing a string with s[i] the way some languages do. The reason is that a byte offset does not necessarily correspond to a valid character boundary, and a single visible character may occupy multiple bytes.
This restriction prevents accidental misuse.
fn main() {
let s = String::from("hello");
let first = s.as_bytes()[0];
println!("first byte = {}", first);
}You can access raw bytes explicitly if that is what you really want, but Rust makes you say so. This keeps byte-level operations separate from text-level intentions.
The absence of direct indexing encourages clearer thinking: do you want bytes, Unicode scalar values, or something closer to user-visible characters?
Creating strings
Rust provides multiple ways to create owned strings. String literals are borrowed &str values, but you can turn them into owned String values with String::from, .to_string(), or format!.
fn main() {
let a = String::from("alpha");
let b = "beta".to_string();
let c = format!("{}-{}", a, b);
println!("b = {}", b);
println!("c = {}", c);
}format! is especially useful when constructing new text from existing values. It creates a new String.
Growing and mutating strings
Because String is growable, it supports several mutation methods. push appends a single char, and push_str appends a string slice.
fn main() {
let mut text = String::from("data");
text.push(':');
text.push(' ');
text.push_str("ready");
println!("{}", text);
}This distinction is useful. push is for one Unicode scalar value. push_str is for a borrowed string slice. Both operate by extending the underlying UTF-8 byte buffer while maintaining validity.
Concatenation patterns
Rust supports several ways to concatenate strings. The + operator works, but it moves the left-hand String and appends a borrowed string slice to it. format! is often clearer when combining multiple pieces.
fn main() {
let left = String::from("hello");
let right = String::from("world");
let combined = left + " " + &right;
println!("combined = {}", combined);
let clearer = format!("{} {}!", "hello", right);
println!("clearer = {}", clearer);
}The + operator is valid, but many learners find format! easier to reason about because it avoids ownership surprises.
A useful rule is that + is fine for short cases once you understand its ownership behavior, but format! is often more readable for multi-part construction.
Borrowing and slicing strings
Like vectors, strings can be borrowed as slices. You can also take sub-slices of a string, but only at valid UTF-8 boundaries.
fn first_word(input: &str) -> &str {
match input.find(' ') {
Some(i) => &input[..i],
None => input,
}
}
fn main() {
let text = String::from("blue sky");
println!("first word = {}", first_word(&text));
}This is an important pattern. The function takes &str and returns a borrowed &str pointing into the same original text. No new allocation is required.
But the boundaries matter. String slicing is by byte range, and Rust checks that the chosen range starts and ends at valid UTF-8 boundaries.
Iterating over bytes versus chars
Rust makes a sharp distinction between iterating over bytes and iterating over char values. Bytes represent raw UTF-8 storage. char values represent Unicode scalar values. These are not the same thing.
fn main() {
let text = "hé";
println!("bytes:");
for b in text.bytes() {
println!(" {b}");
}
println!("chars:");
for ch in text.chars() {
println!(" {ch}");
}
}This distinction matters a great deal. Use bytes for protocol parsing, low-level text processing, and exact UTF-8 storage handling. Use chars() when you want Unicode scalar values.
But even chars() is not always the same as what a human sees as one displayed character.
Bytes, chars, and grapheme clusters
There are at least three useful levels for thinking about text: bytes, Unicode scalar values, and grapheme clusters. Rust's standard library gives direct support for bytes and char values, but not full grapheme-cluster segmentation.
A single user-perceived character can sometimes be made of multiple Unicode scalar values. That means counting .chars() is often closer to human expectations than counting bytes, but it is still not a perfect measure of displayed characters.
For many programs, bytes or char values are enough. For text editors, advanced user-interface work, or correct user-visible slicing in all languages, grapheme-cluster-aware handling may require ecosystem crates.
The key lesson is to choose the level of text interpretation that actually matches the problem.
Common text operations
Rust strings support many everyday operations: checking prefixes and suffixes, splitting, replacing, trimming, and searching.
fn main() {
let text = " apple,banana,pear ";
let trimmed = text.trim();
println!("trimmed = {:?}", trimmed);
println!("starts with apple = {}", trimmed.starts_with("apple"));
println!("contains banana = {}", trimmed.contains("banana"));
for part in trimmed.split(',') {
println!("part = {}", part);
}
let replaced = trimmed.replace("pear", "plum");
println!("replaced = {}", replaced);
}Many real text workflows in Rust consist of operations like these on borrowed &str values, followed by allocating a new String only when necessary.
Text mutation and in-place cleanup
Owned strings can also be modified in place. Methods like clear, truncate, and retain are useful in different situations.
fn main() {
let mut text = String::from("abc123xyz");
text.retain(|ch| ch.is_ascii_alphabetic());
println!("letters only = {}", text);
text.truncate(3);
println!("truncated = {}", text);
}This kind of in-place editing is one reason String belongs naturally in a collections discussion. It is owned, growable, mutable sequence data with text-specific guarantees.
Common pitfalls with string length
One of the most common mistakes is to assume that len() means the number of visible characters. It does not. It means the number of bytes.
fn main() {
let a = "abc";
let b = "é";
println!("abc bytes = {}", a.len());
println!("é bytes = {}", b.len());
println!("é chars = {}", b.chars().count());
}When the distinction matters, choose the right measure explicitly. Use .len() for bytes. Use .chars().count() for Unicode scalar values. Be cautious about treating either as a perfect measure of displayed character count.
Common pitfalls with slicing
Another common mistake is to assume any byte range can be used to slice a string. Rust rejects invalid UTF-8 boundaries.
fn main() {
let text = "naïve";
println!("full = {}", text);
println!("prefix = {}", &text[..2]);
}This works only if the chosen boundary is valid. In real code, slicing text by byte index is often the wrong level unless you already know the boundaries are safe.
String methods such as split, find, strip_prefix, and iterator-based traversal are often better than manual byte-range slicing.
When to use `String` and when to use `&str`
Use String when you need owned, growable text. This includes building messages, storing dynamic user input, returning newly created text, or keeping text beyond the scope of the source it came from.
Use &str when you only need a borrowed view into text. This is ideal for function parameters, lightweight parsing, searching, and read-only text processing.
A good API pattern is to accept &str and return String only when the function actually creates new owned text.
fn title_case_label(label: &str) -> String {
let mut out = String::new();
let mut chars = label.chars();
if let Some(first) = chars.next() {
out.extend(first.to_uppercase());
out.push_str(chars.as_str());
}
out
}
fn main() {
println!("{}", title_case_label("warning"));
}This pattern keeps ownership costs visible and prevents unnecessary allocation at the call site.
API design for text-processing functions
Most text-processing functions should take &str, not &String. Taking &String needlessly restricts callers. A borrowed string slice is more general because it works with both string literals and owned strings.
fn word_count(input: &str) -> usize {
input.split_whitespace().count()
}
fn main() {
let literal = "one two three";
let owned = String::from("four five six");
println!("{}", word_count(literal));
println!("{}", word_count(&owned));
}This is one of the most useful habits to build early. Prefer &str for input unless the function specifically needs ownership or mutation of the caller's string buffer.
A small decision guide
Use String when text must be owned, built up, or mutated.
Use &str when text only needs to be borrowed and read.
Think in bytes when working with storage, encodings, or low-level protocols.
Think in char values when working with Unicode scalar values.
Avoid assuming that byte offsets, .len(), and user-visible characters all mean the same thing. For text, they often do not.
A small sandbox project
A tiny Cargo project is enough to experiment with owned strings, borrowed string slices, UTF-8 length, and common text operations.
[package]
name = "strings-guide"
version = "0.1.0"
edition = "2024"Create and run it like this.
cargo new strings-guide
cd strings-guide
cargo runA minimal src/main.rs could look like this.
fn summarize(input: &str) -> String {
let trimmed = input.trim();
let count = trimmed.split_whitespace().count();
format!("{} words: {}", count, trimmed)
}
fn main() {
let mut text = String::from(" héllo world ");
println!("bytes = {}", text.len());
println!("chars = {}", text.chars().count());
text.push('!');
println!("text = {}", text);
println!("summary = {}", summarize(&text));
println!("bytes:");
for b in text.bytes() {
println!(" {b}");
}
println!("chars:");
for ch in text.chars() {
println!(" {ch}");
}
}This one small program demonstrates the core themes of Rust text handling: the distinction between owned and borrowed text, UTF-8 byte length, mutation of String, slice-based APIs, and the difference between iterating over bytes and iterating over char values.
