Load Testing Results — 15× Faster, 5× More Capacity

Nineteenth in a series about migrating from legacy architectures to a modern Nuxt 4 stack.

Architecture Decisions Have Consequences — Measure Them

Architecture decisions accumulate, and their combined effect only becomes visible under real load.

Before production, a large enterprise application was load-tested with production-equivalent patterns, not synthetic traffic. k6 replayed a model derived from real production logs: 20 pages, weighted by actual traffic share.

The Headline Numbers

Metric	Legacy System	New System	Change
Median response time	2,618 ms	165 ms	15.9× faster
Error rate (1× prod load)	3.91%	0.09%	97% lower
Max tested capacity	~99 RPM	494+ RPM	5× more
Infrastructure	3× fixed VMs (24 vCPU, 96 GB)	Auto-scaled containers	Elastic
Lighthouse Performance (mobile)	~50	97+	Near-perfect

A 2.6-second median means the better half of requests still took 2.6 seconds. A 165 ms median means the page renders before a user can blink.

Test Methodology

Traffic Pattern

The load test replayed production-equivalent traffic using k6’s HTTP module:

pie showData
  title Traffic Distribution (top 10 pages)
  "Homepage (28%)" : 28
  "Product Overview (19%)" : 19
  "Product Details (14%)" : 14
  "Checkout Step 1 (9%)" : 9
  "FAQ (7%)" : 7
  "Contact (6%)" : 6
  "About (5%)" : 5
  "Legal / Imprint (4%)" : 4
  "Blog Overview (3%)" : 3
  "Other (11 pages) (5%)" : 5

Test Types

Two test types were run:

Replay Test — constant load at 1× production traffic (99 RPM) for 30 minutes
Ramp Test — linear ramp from 1× to 5× production traffic over 30 minutes

Replay Test: 1× Production Load

The replay test answers: “Can the new system handle current production traffic?”

flowchart TB
  title["Replay Test Results (1× production load = 99 RPM)"]

  subgraph Legacy_System["Legacy System"]
    L_Median["Median RT: 2,618 ms"]
    L_P95["P95 RT: 8,500+ ms"]
    L_Error["Error Rate: 3.91%"]
    L_RPM["Requests/min: 99"]
    L_Status["Status: Degraded"]
  end

  subgraph New_System["New System"]
    N_Median["Median RT: 168 ms"]
    N_P95["P95 RT: 450 ms"]
    N_Error["Error Rate: 0.09%"]
    N_RPM["Requests/min: 99"]
    N_Status["Status: Healthy"]
  end

  L_Median --- N_Median
  L_P95 --- N_P95
  L_Error --- N_Error
  L_RPM --- N_RPM
  L_Status --- N_Status

The new system handles production traffic with 96% lower response times and 97% fewer errors. The P95 at 450 ms means even the slowest 5% of requests are faster than the legacy system’s median.

Ramp Test: Finding the Ceiling

The ramp test answers: “How far can we push it before it breaks?”

xychart-beta
  title "Ramp Test Results (1× → 5× production load)"
  x-axis "Load (× production)" [1, 2, 3, 4, 5]
  y-axis "Response Time (ms)"
  line [2618, 4000, 5000, 6000, 8800]
  line [165, 165, 165, 165, 165]

The median stayed flat at 165 ms even at 5× load. There was no linear degradation: additional load did not increase per-request latency.

The P95 degraded to 8.8 seconds at 5×, driven by scale-out lag. New replicas needed time to start; once they were online, they matched existing replica performance.

The Right-Sizing Experiment

Finding the minimum viable resource allocation is a critical part of load testing. Four configurations were tested:

Config	vCPU	RAM	PM2 Workers	V8 Heap	Result
#1	4	8 GiB	3	2048 MB	✅ Stable, over-provisioned
#2	2	4 GiB	2	1536 MB	✅ Stable, efficient
#3	1	2 GiB	2	1024 MB	❌ Cascading failures
#4	2	4 GiB	2	1536 MB	✅ Validated (6× load)

flowchart LR
  A["Config #1: 4 vCPU / 8 GiB / 3 workers / 2048 MB heap"] -->|Over-provisioned| B["Config #2: 2 vCPU / 4 GiB / 2 workers / 1536 MB heap"]
  B -->|Right-size further| C["Config #3: 1 vCPU / 2 GiB / 2 workers / 1024 MB heap"]
  C -->|Cascading failures| D["Config #4: 2 vCPU / 4 GiB / 2 workers / 1536 MB heap (Validated at 6× load)"]

The Failed Right-Sizing (Config #3)

Reducing to 1 vCPU / 2 GiB caused a cascade:

sequenceDiagram
  participant L as Load Generator
  participant R1 as Replica 1
  participant R2 as Replica 2
  participant R3 as Replica 3
  participant HP as Health Probe

  Note over R1,R3: Failure Cascade at 1 vCPU / 2 GiB

  L->>R1: t=0s: Traffic (99 RPM)
  Note over R1: Memory: 1,791 / 2,048 MB (87.5%)

  R1-->>R1: t=10s: V8 GC stalls<br/>Event loop blocked
  HP->>R1: t=15s: Health probe
  HP-->>HP: Timeout
  HP->>R1: Mark unhealthy → restart

  Note over R2: t=20s: Absorbs 2× traffic
  L->>R2: Increased traffic

  R2-->>R2: t=25s: Memory spike → restart
  Note over R3: t=30s: Overloaded → restart

  Note over R1,R3: t=35s: All replicas restarting
  Note over L: t=45s: Zero capacity for ~10 seconds<br/>→ 5% error rate

V8 needs breathing room. At 87.5% heap utilization, GC pauses block the event loop long enough for health probes to time out. The minimum viable compute here was 2 vCPU / 4 GiB, though the exact threshold depends on application complexity, page weight, and caching. The principle is general; the numbers are specific.

The Validated Production Configuration

The configuration that passed k6’s exit-code-0 threshold at 6× production load:

flowchart TB
  subgraph SPA["SPA Containers"]
    SPA_CPU["CPU: 2 vCPU"]
    SPA_MEM["Memory: 4 GiB"]
    SPA_PM2["PM2 Workers: 2 per container"]
    SPA_HEAP["V8 Heap: 1536 MB (--max-old-space-size=1536)"]
    SPA_MIN["Min Replicas: 5"]
    SPA_MAX["Max Replicas: 20"]
  end

  subgraph API["API Containers"]
    API_CPU["CPU: 0.5 vCPU"]
    API_MEM["Memory: 1 GiB"]
    API_MIN["Min Replicas: 3"]
    API_MAX["Max Replicas: 20"]
  end

  subgraph Results["Result at 6× load"]
    RES_MED["Median RT: 165 ms"]
    RES_ERR["Error rate: 0.82%"]
    RES_CPU["CPU peak: 12% of allocation"]
    RES_MEM["Memory peak: 60% of allocation"]
  end

  SPA --> Results
  API --> Results

Cost Analysis

50% less CPU and 50% less memory per replica compared to the initial over-provisioned config:

flowchart TB
  subgraph Legacy["Legacy (fixed)"]
    L1["3× VM instances"]
    L2["24 vCPU, 96 GB RAM — always on"]
    L3["Cost: constant regardless of traffic"]
  end

  subgraph New["New (elastic)"]
    N1["5–20 SPA replicas (2 vCPU, 4 GiB each)"]
    N2["3–20 API replicas (0.5 vCPU, 1 GiB each)"]
    N3["Per-second billing — pay for actual usage"]
    N4["At idle: 5 SPA + 3 API"]
    N5["At peak: 15 SPA + 8 API"]
    N6["Average: ~60% of peak capacity billed"]
  end

  Legacy -->|"Migrated to"| New

Elastic billing lowers cost during low-traffic periods — nights, weekends, and holidays — while still scaling for spikes without permanent over-provisioning.

What the Numbers Mean for Architecture

Each architecture decision from earlier articles contributed to these numbers:

Decision	Contribution
SSR (Article 1)	Eliminates client-side rendering delay
GraphQL Gateway (Article 2)	Single query per page instead of 3–5 REST calls
Multi-Tier Cache (Article 6)	Sub-ms content retrieval for cached pages
Deferred Hydration (Article 6)	Eliminates render-blocking JavaScript
Same-Origin Image Proxy (Article 6)	Improves LCP by reducing cross-origin overhead
PM2 Cluster Mode (Article 10)	Zero-downtime worker restarts
Container Apps Auto-Scaling (Article 11)	Elastic capacity, no over-provisioning

flowchart LR
  SSR["SSR"] --> PERF["Lower TTFB & faster first paint"]
  GQL["GraphQL Gateway"] --> PERF
  CACHE["Multi-Tier Cache"] --> PERF
  HYDR["Deferred Hydration"] --> PERF
  IMG["Same-Origin Image Proxy"] --> PERF
  PM2["PM2 Cluster Mode"] --> REL["Resilience & zero-downtime deploys"]
  AS["Container Apps Auto-Scaling"] --> CAP["Elastic capacity"]

  PERF --> OUT["15.9× faster median\nLighthouse 97+"]
  REL --> OUT
  CAP --> OUT

No single decision produces 15.9×. It is the combination — each one removing a different bottleneck — that delivers the aggregate result.

Lessons Learned

Load test with production traffic patterns, not synthetic ones

A synthetic test hitting the homepage 100 times per second says nothing about real-world performance. Real traffic has a distribution — heavy pages, light pages, API calls, form submissions. The test must match it.

flowchart LR
  A["Synthetic test: 100 req/s to homepage"] -->|Misleading| C["Unrealistic bottlenecks"]
  B["Production-equivalent mix:\nheavy pages, light pages, APIs, forms"] -->|Accurate| D["Realistic capacity & latency insights"]

Right-sizing failures are the most valuable test results

The cascading failure at 1 vCPU / 2 GiB taught more about system behavior than all successful tests combined. It exposed the GC pressure threshold, health probe timing sensitivity, and cold-start vulnerability. These insights shaped the production configuration.

flowchart TB
  F["Right-sizing attempt"] --> F1["Too small (1 vCPU / 2 GiB)"]
  F1 --> F2["GC pressure & probe timeouts"]
  F2 --> F3["Cascading restarts"]
  F3 --> F4["Error budget impact"]
  F4 --> F5["Refined production config\n(2 vCPU / 4 GiB, validated at 6×)"]

Median response time is the metric that matters most

P95 and P99 matter for tail latency, but the median determines the experience for most users. A flat median under increasing load (165 ms at 1× and 5×) proves horizontal scaling without per-request degradation.

xychart-beta
  title "Median vs P95 under load"
  x-axis "Load (× production)" [1, 2, 3, 4, 5]
  y-axis "Response Time (ms)"
  line [165, 165, 165, 165, 165]
  line [450, 1200, 3000, 6000, 8800]

15× is not an optimization — it is a different architecture

A 15.9× improvement does not come from optimizing an existing system. It comes from removing fundamental bottlenecks: dual rendering, multi-source data joining, absence of caching, fixed infrastructure. The improvement is architectural, not incremental.

What’s Next

Article 16: The Full Picture — What the New Concept Delivers — Synthesis for decision-makers and architects.
Article 17: The @delegate Directive Deep Dive — Cross-Subgraph Field Resolution — A technical deep dive into the most powerful schema stitching feature.
Article 18: Building a Headless Design System in Vue 3 — The Compose Pattern — Separating style logic from templates.

Munir Husseini is a software architect specializing in full-stack TypeScript, .NET, and cloud-native architectures.