The Weight of Your Web Stack, Part 3: Choosing the Right Backend for the Job

In Part 1 we measured what web backends cost before serving a single request — idle memory ranging from 3 MB (Rust) to 500 MB (Spring Boot). In Part 2 we measured what happens when traffic arrives — throughput, tail latency, and the surprising effect of adding a real database.

Now comes the question developers actually need answered: given all this data, how do you choose?

The honest answer is that backend weight is not a flaw to minimize — it’s a trade-off to understand. Heavier runtimes invest their overhead in infrastructure that lighter alternatives either don’t have or require you to assemble yourself. The goal isn’t the lightest stack. It’s the right one.

The Full Weight Spectrum

Here’s every major framework ranked by weight class, combining the idle cost data from Part 1 with throughput and concurrency data from Part 2:

Weight ClassFrameworkIdle RAMStartupReq/s (JSON)Docker ImageConcurrency Model
UltralightRust (Actix/Axum)3–15 MB<5 ms~165,0005–10 MBAsync tasks (tokio)
UltralightGo (net/http)8–15 MB<10 ms~132,0005–12 MBGoroutine per request
LightC# (Native AOT)17–23 MB14–17 msHigh18–90 MBAsync I/O, native binary
LightElixir (Phoenix)24–70 MB1–3 s~4,375*25–80 MBBEAM processes (~2 KB each)
LightPHP (plain FPM)5–15 MB/worker<5 ms/req~7,00051–80 MBProcess per request
MediumFlask (Gunicorn)30–50 MB0.3–1 s~3,00050–70 MBPre-forked workers
MediumFastAPI (Uvicorn)40–60 MB0.5–1.5 s~4,800**50–80 MBAsync event loop
MediumNode.js (Express)50–55 MB200–500 ms~13,000100–180 MBSingle-thread event loop
MediumC# (ASP.NET Core)40–80 MB70–80 ms~118,000120–216 MBAsync I/O, JIT compiled
MediumJava (Spring Native)50–80 MB30–90 msModerate~136 MBNative binary, no JVM
HeavyDjango (Gunicorn)70–130 MB2–5 s~95080–120 MBPre-forked workers
HeavyRuby (Rails/Puma)80–150 MB3–8 s~2,340180–300 MBForked workers + threads
HeavyPHP (Laravel/FPM)30–60 MB/worker50–200 ms/req~299100–150 MBProcess per request
HeavyweightJava (Spring Boot/JVM)250–500 MB2.5–5 s~18,500250–430 MBJVM thread pool (200 default)

* Phoenix’s HTTP throughput is moderate, but it handles 2M+ concurrent WebSocket connections — best for connection-dense workloads. ** FastAPI with asyncpg + ujson + 8 workers. Default single worker is ~1,185 req/s.

What Heavy Actually Buys You

Before writing off the heavyweights, it’s worth understanding what that overhead purchases.

Spring Boot’s 300 MB baseline isn’t waste — it’s investment in runtime infrastructure:

JIT optimizations unavailable to compiled languages. The JVM’s C2 compiler can inline virtual method calls based on observed targets, perform speculative optimizations on actual runtime data, and eliminate heap allocations through escape analysis. After 15–60 seconds of warm-up, a Java REST API can achieve 50,000–100,000 requests per second — more than Go for long-running server processes with stable traffic patterns.

Ecosystem depth you don’t have to build. Spring Boot includes out-of-the-box: OAuth2/SAML/LDAP security, JPA/JDBC/MongoDB/Redis data access, Kafka/RabbitMQ messaging, distributed tracing, Actuator observability, batch processing, and transaction management across multiple data sources. In a lightweight ecosystem, you assemble equivalent functionality from scattered packages and maintain compatibility yourself.

Production diagnostic tooling. Java Flight Recorder, async-profiler, heap dump analysis, GC logging — the JVM has the most mature production diagnostics of any runtime. Go has pprof and Rust has perf, but neither matches JVM depth for production troubleshooting.

The Concurrency Model Is the Biggest Variable

When choosing a backend, the concurrency model matters more than raw throughput numbers. It determines how your application behaves when load exceeds your expectations.

FrameworkBehavior Under Overload
Go, Rust, ElixirGraceful degradation. New goroutines/tasks/processes are cheap. Latency rises gradually.
Node.jsEvent loop slows. CPU-bound work blocks everything. No cliff, but a hard single-thread ceiling.
Java (Virtual Threads, JDK 21+)Similar to Go — graceful scaling. This is the modern answer to Java’s threading problem.
Java (Traditional Tomcat)Cliff at thread pool exhaustion. When all 200 threads are busy, requests queue. Latency spikes.
PHP-FPMCliff at worker pool exhaustion. Fixed worker count means a hard concurrency limit.
Rails/PumaModerate degradation. GVL limits parallelism within each worker. Queue builds when all threads are busy.
Python (Gunicorn sync)Worker-bounded. Each worker handles one request at a time.

The frameworks with “cliffs” — traditional Java Tomcat and PHP-FPM — are not fatally flawed, but they require explicit capacity planning. You need to know your concurrency ceiling before you hit it in production.

Framework Overhead vs. Language Overhead

One of the most common mistakes is conflating language weight with framework weight. They are separate variables:

FrameworkLanguage BaselineFramework AddsTotal
Spring BootJVM ~50–180 MB+100–200 MB (autoconfiguration, Tomcat, DI)250–400 MB
RailsRuby ~20–30 MB+300–400 MB (ActiveRecord, gems, metaprogramming)400–600 MB
DjangoPython ~20–30 MB+100–120 MB (ORM, admin, middleware)120–140 MB
LaravelPHP ~5–15 MB+50–80 MB (Eloquent, queues, auth)60–100 MB
PhoenixBEAM VM ~30–50 MB+minimal30–50 MB
ExpressV8 ~20–30 MB+10–20 MB (routing + middleware)30–50 MB
GinGo runtime ~5–7 MB+5–10 MB (HTTP router)10–15 MB
Actix WebNone~5 MB total~5 MB

Phoenix is the outlier: a batteries-included framework on an ultralight runtime. The BEAM VM’s per-process isolation keeps it lean despite providing real-time channels, PubSub, and an ORM.

How Many Instances Fit on 8 GB?

For teams running multiple services or planning Kubernetes deployments, memory density directly maps to infrastructure cost. With ~7 GB available after OS overhead:

FrameworkMemory/InstanceInstances on 7 GB
Rust (Actix)~10–30 MB230–700
Go (Gin)~25–70 MB100–280
Elixir (Phoenix)~50–100 MB70–140
Node.js (Express)~40–80 MB87–175
C# (ASP.NET Core)~100–300 MB7–23
Python (Flask, 4 workers)~200–300 MB23–35
Java (Spring Boot, tuned)~128–256 MB5–10
Java (Spring Boot, default)~256–512 MB3–5
Ruby (Rails, 4 workers)~400–600 MB3–4

The cloud cost implication is real: 50 microservices at 512 MB (Java default) = 25 GB of memory. The same 50 services at 50 MB (Go) = 2.5 GB. That’s a 10x difference in infrastructure costs.

But before optimizing for memory density, check whether you actually need 50 microservices — or whether a single well-tuned JVM monolith would serve those 50 concerns more efficiently with less operational overhead.

The Database Equalizer

The most important nuance in this entire series: most web applications are I/O-bound, not CPU-bound.

When benchmarks include real database queries, the performance gaps that look enormous in pure HTTP tests collapse dramatically:

FrameworkJSON (no DB)With DB QueriesWhat Happened
Go (Gin)~132,000 req/s~7,517 req/s18x gap…
Spring Boot (Java)~18,500 req/s~7,886 req/s…becomes ~1x
FastAPI (Python)~4,800 req/s~4,831 req/sAlready DB-bound at baseline
Express (Node.js)~13,000 req/s~4,145 req/sConverges with Java

When your service spends 95% of its time waiting on a database query, the difference between Go and Java in CPU efficiency is noise. The bottleneck is your database, not your language. Language weight only dominates the discussion when you’ve already optimized your data layer.

Choosing the Right Weight

With all of the above in mind, here’s a practical decision framework:

ScenarioRecommendedWhy
Serverless / edge computingUltralight (Go, Rust)Cold start is everything; JVM warm-up never pays off
Microservices at scale (many instances)Light–Medium (Go, Node.js, Elixir)Memory density and scaling speed matter
Enterprise applicationsHeavy (Java/Spring, C#/.NET)Ecosystem depth, tooling, and long-term maintainability
Rapid prototyping / startupsHeavy framework, light runtime (Laravel, Django)Developer velocity over server cost
Real-time / WebSocket-heavyLight (Elixir/Phoenix)2M+ connections, per-process garbage collection
AI/ML service backendsMedium (Python/FastAPI)Python ML ecosystem is unmatched
High-throughput APIs (compute-bound)Ultralight (Rust, Go)When CPU efficiency genuinely matters
Long-running, compute-heavy workloadsHeavyweight JVM (Java)JIT optimizations compound over hours of runtime

When Weight Genuinely Doesn’t Matter

There are three situations where this entire analysis becomes irrelevant:

Developer productivity vs. server cost. For a team of 10 engineers at $150K+, saving $500/month on cloud infrastructure by choosing Go over Java is a rounding error if Spring Boot saves each developer two hours per week. The human cost almost always dominates the infrastructure cost at team scale.

Monolithic deployments. The JVM overhead is paid once and shared across all endpoints. A single well-tuned JVM monolith can be more memory-efficient than a fleet of 20 lightweight microservices, each with its own container overhead, sidecar proxy, and health check process.

I/O-bound applications. When the database is the bottleneck, language weight is noise. Optimize your query patterns, add indexes, and tune your connection pool before considering a rewrite in a lighter language.

The Mental Model

Think of backend weight as layers, each adding overhead — and each buying something:

Layer 5: Framework           Spring: +200MB    Rails: +400MB    Express: +20MB    Gin: +8MB
Layer 4: Standard Library    Java: large       Ruby: large      Node: moderate    Go: moderate
Layer 3: Concurrency Model   Threads: 1MB ea   GVL-limited      Event loop        Goroutines: 4KB
Layer 2: Memory Management   JVM GC: % heap    Ruby GC          V8 GC             Go GC: low-pause
Layer 1: Execution Engine    JVM: ~100MB       Ruby: ~25MB      V8: ~25MB         Go: ~5MB
Layer 0: Operating System    [Shared by all]

Java is heavyweight because overhead accumulates at every layer. Go is lightweight because layers 1–3 are minimal. Rust is ultralight because layers 1–4 are essentially zero. Elixir is the outlier: moderate at the VM layer, but with the most efficient concurrency model available for connection-dense workloads.

Every layer of overhead exists to provide a capability. The JIT compiler buys peak throughput exceeding AOT-compiled code. The garbage collector buys freedom from manual memory management. The thick framework buys developer productivity. OS threads buy a simple synchronous programming model.

The question isn’t which stack is lightest. It’s which capabilities you actually need — and whether you’re willing to pay the cost to get them.


Data sources: TechEmpower Framework Benchmarks (Round 22/23), Sharkbench, Phoenix Road to 2M WebSocket Connections, and 80+ framework-specific benchmarks and documentation sources compiled in February 2026.

This is Part 3 of the Web Stack Weight series. Read Part 1: What Your Backend Costs at Rest and Part 2: What Your Backend Costs Under Load.

Last updated:

Comments

Loading comments...

Leave a Comment

2000 characters remaining