In our three-part series on web backend performance, we measured idle memory ranging from 3 MB (Rust) to 500 MB (Spring Boot). We noted that a garbage collector explained much of Java’s overhead, that Go’s concurrent GC kept pause times under a millisecond, and that Rust had zero GC at all. But we never stopped to explain why those differences exist — or where they came from.
Memory management is the deepest architectural decision a language makes. It shapes performance ceilings, determines entire categories of bugs that are possible or impossible, and often defines how a language feels to write. The choice happens once, at design time, and every program ever written in that language lives with the consequences forever.
Here’s the full spectrum — from the original manual approach to the compile-time ownership model that has the industry reconsidering everything.
Where It All Started: Static Allocation and the FORTRAN Model
Before there was dynamic memory, there was no memory management problem to solve — because there was no dynamic memory.
Early FORTRAN (pre-FORTRAN 90) had no concept of heap allocation at all. Every variable, every array, every data structure had to be declared at compile time with a fixed size. The compiler calculated exactly how much memory the program needed, laid it out in a static region, and that was that.
This approach has an elegant simplicity: the program’s memory footprint is completely known before it ever runs. There are no allocation failures, no fragmentation, no garbage collection pauses — because there is no allocation. Memory is reserved once, used for the program’s lifetime, and released when the process exits.
The consequences, however, were significant. You couldn’t write a function that called itself — recursion requires a stack to store each call’s local variables and return address, and with static allocation, there’s nowhere to put them. FORTRAN’s static model is one reason early FORTRAN programs couldn’t use recursion at all. It wasn’t until FORTRAN 90 introduced ALLOCATE and DEALLOCATE that dynamic memory entered the language, and with it, both the power and the hazards that dynamic allocation brings.
Assembly language operated similarly — the programmer controlled memory placement directly, specifying addresses manually. There was no abstraction, no safety net, and no overhead. Just the machine.
The Manual Era: C, C++, and the Power-Responsibility Tradeoff
C gave programmers a portable, structured interface to dynamic memory: malloc to request a block from the heap, free to return it when done. C++ extended this with new and delete, which additionally called constructors and destructors. The contract was simple: you allocate it, you free it. No one is watching.
This model offers complete control and zero overhead. There is no runtime tracking allocation lifetimes, no background thread scanning the heap, no pause while the collector runs. The performance ceiling is as high as the hardware allows.
The cost is borne entirely by the programmer, and it is steep.
Use-after-free: a pointer is dereferenced after the memory it points to has been freed and potentially reallocated for something else. The program reads or writes whatever is now at that address. The behavior is undefined.
Double-free: free() is called twice on the same pointer. The allocator’s internal bookkeeping is corrupted. The next allocation may return a pointer that aliases live data.
Buffer overflows: writing past the end of an allocated buffer overwrites adjacent memory — potentially function return addresses, security-sensitive flags, or other program state.
These are not edge cases. They are endemic to large codebases written in manual memory management languages, and they have severe security implications.
A 2019 analysis of Microsoft’s security advisories found that approximately 70% of CVEs in Microsoft products over the preceding decade were memory safety issues — use-after-free, buffer overflows, and related vulnerabilities. A parallel analysis at Google found that roughly 70% of severe Chrome security bugs were memory safety bugs. These are not old codebases running unmaintained code. These are programs under continuous security review by expert engineers, written in C and C++. The problem is structural, not a matter of effort.
The C community has tools to help — valgrind for runtime memory error detection, AddressSanitizer for fast instrumented builds, static analyzers — but none of them eliminate the root cause. Manual memory management requires the programmer to be correct, in every function, every time, with no enforced verification.
Garbage Collection: Trading Overhead for Safety
The insight behind garbage collection is simple: instead of asking programmers to track object lifetimes, let the runtime track them automatically.
A garbage collector periodically identifies which objects in the heap are still reachable from live references (the “live set”) and frees everything that isn’t. The programmer allocates freely; the collector handles reclamation.
Mark-and-Sweep: The Foundation
The classic GC algorithm is mark-and-sweep. Starting from a set of root references (stack variables, global variables, CPU registers), the collector traverses the entire live object graph, marking every reachable object. Anything not marked is garbage — unreachable, and therefore safe to free. The sweep phase reclaims those unmarked regions.
Early mark-and-sweep collectors were “stop-the-world” — the entire application had to pause while the collector ran, because a running program might modify references while the collector was traversing them, invalidating its work. For interactive or latency-sensitive applications, this was painful. A large heap could mean seconds of pause at inopportune moments.
How Java Evolved Five Garbage Collectors
Java has been refining its GC story since 1996, and the evolution is instructive precisely because of how long it has taken.
- Serial GC: single-threaded, stop-the-world. Still useful for single-core environments or tiny heaps.
- Parallel GC: multiple threads run the collection in parallel, shortening pause duration. Still stop-the-world, but faster.
- G1 (Garbage First): introduced in Java 7, production-default in Java 9. Divides the heap into regions, prioritizes collecting regions with the most garbage first. Concurrent phases run alongside the application; stop-the-world pauses are shorter and more predictable. Typical pauses: 5-50 ms.
- ZGC: introduced as experimental in Java 11, production-ready in JDK 15, significantly improved in JDK 21. Targets sub-millisecond pause times regardless of heap size. Most collection work happens concurrently with the application. As of JDK 21, ZGC pause times are typically under 1 millisecond even on multi-hundred-gigabyte heaps.
- Shenandoah: Red Hat’s concurrent GC, also targeting sub-millisecond pauses via a different technique (concurrent compaction). Available in OpenJDK builds.
The fact that Java has five GC algorithms — and that the community continues debating which to use — reflects the genuine difficulty of the problem. There is no universal best; the optimal choice depends on heap size, allocation rate, latency requirements, and throughput goals. Each new algorithm represents years of engineering and real-world tuning. Part 2 of our web stack series shows the downstream effect: Java’s G1GC tail latency grows to 20-50x the median under high concurrency, while ZGC keeps P99 latency near P50.
Go’s GC: Design for Low Latency from the Start
Go took a different approach than Java, building its GC from scratch with the explicit goal of short pause times rather than adding concurrent collection after the fact.
Go’s GC is a concurrent, tricolor mark-and-sweep collector. Collection phases run concurrently with the application on separate goroutines, with only brief stop-the-world synchronization barriers. In most production workloads, Go targets and achieves GC pause times under 1 millisecond.
Go also uses escape analysis aggressively at compile time. When the compiler can prove that a variable’s lifetime doesn’t outlive the function that creates it, it allocates that variable on the stack rather than the heap. Stack allocation is essentially free — the stack grows and shrinks as function calls happen, with no GC involvement. By keeping short-lived objects off the heap entirely, Go significantly reduces the volume of work the collector has to do. This is one reason Go’s runtime footprint is so much smaller than the JVM’s: less heap allocation means less GC pressure.
C# and the CLR
The .NET CLR uses a generational GC that operates on a similar principle: most objects die young (the “generational hypothesis”), so collecting the youngest generation frequently and cheaply catches most garbage without touching long-lived objects. .NET 8’s background GC is well-tuned and generally keeps pauses under 10 ms in typical workloads, though it can spike under heavy allocation pressure.
Reference Counting: Determinism at a Price
Garbage collectors are non-deterministic: you don’t know exactly when an object will be freed, only that it will be freed eventually. For most programs this is fine. For programs that manage external resources (file handles, database connections, network sockets) via RAII patterns, or for programs where memory latency predictability matters, it can be limiting.
Reference counting offers an alternative: every object carries an integer counter tracking how many live references point to it. When a reference is created, the counter increments. When a reference is dropped, the counter decrements. When the counter reaches zero, the object is freed immediately and deterministically — not on some future collection cycle, but right now, at the exact moment the last reference disappears.
CPython’s Approach
Python’s reference implementation, CPython, uses reference counting as its primary memory management strategy. Every Python object has a ob_refcnt field. Every assignment, argument pass, and return value adjusts reference counts. When a count hits zero, the object’s destructor runs and memory is reclaimed.
The determinism is real and useful: with statements and context managers rely on the fact that __exit__ (and the underlying __del__) runs at a predictable point. File handles close when expected. Database connections return to pools on schedule.
The problem is reference cycles. If object A holds a reference to B, and B holds a reference back to A, both counts will always be at least 1 — neither will ever reach zero even if no outside code can reach either object. CPython solves this with a separate cycle collector that periodically scans for isolated reference cycles and breaks them. It’s a hybrid: fast reference counting for the common case, cycle detection for the pathological case.
Reference counting also has a performance cost: every pointer assignment requires an atomic reference count update, which creates memory bus traffic and makes reference counted code harder to parallelize efficiently (hence Python’s Global Interpreter Lock, the GIL, which prevents multiple threads from modifying Python objects concurrently).
Swift and ARC
Swift uses Automatic Reference Counting (ARC), but with a crucial distinction from CPython: the retain/release calls are inserted by the compiler at compile time, not driven by a runtime interpreter.
The Swift compiler inserts retain and release calls at the appropriate points in the source code during compilation. By the time the program runs, the reference counting machinery is just ordinary function calls woven into the object code — no separate runtime thread, no garbage collector, no cycle-detection background process. The actual incrementing and decrementing of reference counts still happens at runtime as those calls execute, but memory is freed at deterministic, statically determined points.
This is what “automatic” means in ARC: the programmer doesn’t write the retain/release calls, but the compiler does, based on a static analysis of reference lifetimes. The result is reference-counted memory management with the performance profile of manual code and the safety profile of an automated system — except for reference cycles, which Swift exposes to the programmer via weak and unowned references and requires explicit handling.
Rust’s Rc<T> and Arc<T>
Rust also provides reference-counted smart pointers as an opt-in library type. Rc<T> provides single-threaded reference counting; Arc<T> provides atomic reference counting safe across threads. Neither is the default — Rust’s primary memory model (discussed below) handles most cases without any runtime tracking at all. Rc<T> and Arc<T> are available specifically for the cases where shared ownership is genuinely needed and the ownership model would otherwise become unwieldy.
Rust’s Ownership Model: Safety Without a Runtime
Rust represents the most radical rethinking of memory management since garbage collection was invented. Its central insight: the properties that make manual memory safe — each piece of memory has exactly one owner, memory is freed when its owner goes out of scope, references to memory don’t outlive the memory they point to — can be enforced at compile time via static analysis, with no runtime overhead whatsoever.
The Borrow Checker
The borrow checker is a compile-time static analysis pass built into the Rust compiler. It enforces three core rules:
- Every value has exactly one owner.
- When the owner goes out of scope, the value is dropped (freed).
- You may have either one mutable reference or any number of immutable references to a value, but not both at the same time.
If code violates these rules, the program doesn’t compile. There is no runtime check, no segmentation fault at execution time, no sanitizer needed to catch the bug. The bug is caught before the program can ever run.
Memory is freed deterministically via RAII (Resource Acquisition Is Initialization): when a variable holding a value goes out of scope, Rust automatically calls the value’s drop implementation, which releases memory and any other owned resources. This is not a convention — it’s enforced by the type system.
The consequence: Rust programs have no garbage collector, no reference counting by default, no runtime overhead from memory management. The idle memory footprint we measured in our web backend at-rest analysis — 3-15 MB for a Rust Actix or Axum server — reflects this directly. There is no GC heap to pre-allocate, no collector thread consuming cycles, no retained object graph waiting to be swept.
When shared ownership is genuinely needed, Rust programmers reach for Rc<T> (single-threaded) or Arc<T> (thread-safe) as an explicit opt-in. The performance cost of reference counting is incurred only where the programmer chooses it, not across the entire program.
The Tradeoff
The borrow checker’s guarantees come at a cost: learning curve. Rust has the steepest onboarding of any mainstream language, specifically because the ownership model is a genuinely new way of thinking about code structure. References can’t outlive their referents. You can’t have a mutable reference while an immutable one exists. Self-referential data structures require care.
Experienced Rust programmers find these constraints become intuitive over time and guide them toward designs that are easier to reason about. But the initial friction is real, and it’s why Rust adoption moves at the pace it does despite the extraordinary performance and safety characteristics.
Erlang/BEAM: Isolation as the GC Strategy
Erlang (and Elixir, which runs on the same BEAM virtual machine) takes a different approach that sidesteps one of the hardest GC problems: how to collect a large, shared heap without pausing the entire application.
The BEAM’s answer: don’t share the heap.
Each Erlang process has its own private heap. Objects within a process exist in that process’s heap and are not shared with other processes. When processes communicate, they pass messages by copying data between heaps (with some optimizations for large binaries). There is no shared mutable state in the heap.
This architecture means GC is per-process, not global. When a process’s heap needs collection, that process pauses — but only that process. The thousands or millions of other processes running concurrently are completely unaffected. A GC pause in one Erlang process does not stop the world.
The consequence for latency is dramatic. Even a large, long-running BEAM application has consistent, predictable latency because no single GC event can affect the entire system simultaneously. This is part of why WhatsApp was able to serve 900 million users on BEAM-based infrastructure with a small engineering team — the runtime’s isolation model made reliability a structural property rather than an engineering achievement.
The cost is memory: message passing copies data, which adds allocation pressure. But the BEAM’s process model is cheap enough (each process starts with a heap of a few hundred bytes) that the tradeoff is often worthwhile for the latency profile it provides.
Haskell: When Laziness Creates Memory Surprises
Haskell uses a generational GC broadly similar to the ones in Java and .NET. But Haskell adds an unusual complication: lazy evaluation.
In a strict language, expressions are evaluated when they are computed. In Haskell, expressions are evaluated only when their result is actually needed. An unevaluated expression is stored as a thunk — a suspended computation sitting on the heap.
Thunks are powerful: they enable infinite data structures, short-circuit evaluation, and demand-driven computation. They are also a classic source of the “space leak” — a program that builds up enormous piles of unevaluated thunks, consuming heap memory not because it’s holding onto results, but because it hasn’t computed them yet.
Space leaks in Haskell can be subtle and hard to predict. A function that folds over a large list might look like it uses constant memory but actually accumulates a heap-sized chain of deferred additions. The GC can’t free these thunks because they are technically reachable; they just haven’t been evaluated yet. This interaction between lazy evaluation and generational GC is one of the more notoriously tricky aspects of Haskell performance engineering.
The Full Spectrum at a Glance
| Model | Languages | Runtime Overhead | Deterministic Free? | Developer Burden | Key Risk |
|---|---|---|---|---|---|
| Static allocation | Early FORTRAN, Assembly | None | Yes (never freed) | High | Inflexibility; no recursion |
Manual (malloc/free) | C, C++ | None | Yes | High | Use-after-free, buffer overflows, leaks |
| Tracing GC (stop-the-world) | Early Java, Ruby | Moderate-High | No | Low | GC pauses; unpredictable latency |
| Tracing GC (concurrent) | Java (G1/ZGC), Go, C# | Moderate | No | Low | Residual pauses; memory overhead |
| Per-process GC | Erlang/Elixir (BEAM) | Moderate | No | Low | Message-copy overhead |
| Generational GC + lazy eval | Haskell | Moderate | No | Medium | Space leaks from thunks |
| Reference counting + cycle GC | CPython | Moderate | Mostly | Low | Cycles; GIL contention |
| ARC (compile-time RC) | Swift | Low | Yes | Low | Cycles require explicit handling |
| Ownership + borrow checker | Rust | None | Yes | High (learning curve) | Compile-time learning curve |
Opt-in RC (Rc<T>/Arc<T>) | Rust (when needed) | Low (explicit) | Yes | Medium | Cycles (same as all RC) |
Why This Explains the Numbers We Measured
In Part 1 of our web stack series, we noted that Spring Boot claims 250-500 MB of RAM at idle while Rust Axum claims 3-15 MB. Now the underlying reasons are clear.
Spring Boot’s JVM pre-allocates a GC heap — typically 256 MB by default — before the application processes a single request. That memory is held in reserve for object allocation and the GC’s internal bookkeeping. Even if your application code barely allocates anything, the heap is there.
Rust has no heap to pre-allocate. There is no GC, no GC heap, no collector thread. Memory is allocated by the program as it runs and freed immediately when each value’s owning scope ends. The idle footprint is just the code itself plus the small amount of data initialized at startup.
Go sits in the middle. Its concurrent GC is lean and its escape analysis keeps many allocations off the heap entirely, which is why Go’s idle footprint (8-15 MB) is a fraction of the JVM’s — but it still maintains a GC heap and a collector, which accounts for the difference from Rust.
The latency behavior we measured in Part 2 also traces directly to GC model. Go’s sub-millisecond GC pauses mean its P99/P50 latency ratio stays near 3-5x under load. Java’s G1GC pauses drive its P99/P50 ratio to 10-20x, growing worse under heap pressure. Rust has zero GC pauses, which is why its tail latency barely changes between 100 and 10,000 concurrent connections. Elixir’s per-process GC means no single pause affects more than one process, contributing to the remarkably stable latency profile we observed — Phoenix’s Sharkbench stability score was the highest of any framework tested.
The 70% of CVEs in Microsoft products that trace to memory safety issues? Those are C and C++ programs operating in the manual memory management model. Rust’s borrow checker makes entire classes of those vulnerabilities structurally impossible to express in valid code. The Microsoft Security Response Center has noted this explicitly as part of their ongoing Safe Systems Programming Languages initiative.
Closing Thoughts
Memory management is where language design philosophy becomes most concrete. The choice encodes assumptions about who the programmer is, what they’re optimizing for, and what kinds of mistakes the language will tolerate.
FORTRAN’s static model assumed programs with known, fixed data structures — appropriate for the numerical computing it was designed for. C’s manual model assumed expert programmers who valued control over safety — appropriate for systems programming. Java’s GC assumed application programmers who shouldn’t have to think about memory at all — appropriate for business software where reliability and developer productivity mattered more than raw performance. Erlang’s per-process isolation assumed programs built around concurrency and reliability — appropriate for telecom infrastructure that could never stop. Rust’s borrow checker assumes programmers who want both safety and performance and are willing to learn a new discipline to get both.
None of these is wrong. Each reflects the context and priorities of the people who designed the language and the problems they were solving.
What’s shifted in recent years is that “safe by default” is increasingly seen not as a constraint but as a requirement — particularly for systems code that runs in security-sensitive environments. The Microsoft and Google statistics are a forcing function. When 70% of your security vulnerabilities trace to a single root cause, eliminating that root cause is worth the learning curve.
That’s the bet Rust is making. The industry is watching the outcome closely.
Memory management connects to almost everything else discussed in our web stack series. The GC model explains the idle memory behavior measured in Part 1: What Your Backend Costs at Rest, the tail latency patterns measured in Part 2: What Your Backend Costs Under Load, and the architectural tradeoffs discussed in Part 3: Choosing the Right Backend for the Job. If you want to go deeper on any of the languages mentioned here, CodeArchaeology has dedicated pages for Rust, Java, Go, Python, C, and many others — with Hello World examples, Docker images, and runnable code.
Comments
Loading comments...
Leave a Comment