Blog

  • SSR Deep Dive — Hydration, State Replay, and the Cookbook

    Twelfth in a series about migrating from legacy architectures to a modern Nuxt 4 stack.


    The Hydration Contract

    In a server-rendered Vue application, SSR establishes a strict contract: the HTML generated on the server must match exactly what the client-side Vue runtime would render. During hydration, Vue attaches to the existing DOM instead of re-rendering it from scratch. If the server HTML and the client render differ, Vue reports a hydration mismatch.

    In Vue 3 strict mode (and Nuxt 4), hydration mismatches are more than harmless warnings. They can lead to:

    • Silent rendering bugs (server HTML stays, but event listeners bind to the wrong elements)
    • Missing interactivity (Vue skips hydrating mismatched subtrees)
    • Inconsistent state (server-rendered content shows one value, client state holds another)

    These issues are tricky because they only appear under SSR — the same component may work perfectly in client-only mode.


    A Taxonomy of Hydration Mismatches

    Across large enterprise applications, most hydration issues fall into a handful of categories. Once you recognize the category, the fix usually becomes obvious.

    Category 1: Non-Deterministic Values

    Any value that differs between server and client at render time will cause a mismatch:

    Server renders:  <div id="input-a7f3b2">...</div>
    Client renders:  <div id="input-c9e1d4">...</div>
                                  ↑ different random value

    Common culprits: Math.random(), Date.now(), crypto.randomUUID() used in templates or setup().

    Fix: use useId() — a Nuxt composable that generates deterministic IDs, consistent between server and client.

    Category 2: Timing-Dependent State

    If a child component mutates parent state during setup(), the execution order can differ between server and client:

    sequenceDiagram
        box Server
          participant SParent as Parent (server)
          participant SChild as Child (server)
        end
        box Client
          participant CParent as Parent (client)
          participant CChild as Child (client)
        end
    
        Note over SParent: 1. Parent setup()
        SParent->>SParent: setup()
        Note over SParent: 2. Parent renders
        SParent->>SParent: render()
        Note over SChild: 3. Child setup()<br/>→ emits to parent (too late for render)
        SChild->>SParent: emit() changes parent state
    
        Note over CParent: 1. Parent setup()
        CParent->>CParent: setup()
        Note over CChild: 2. Child setup()<br/>→ emits to parent
        CChild->>CParent: emit() changes parent state
        Note over CParent: 3. Parent renders<br/>with new state
        CParent->>CParent: render()
    
        Note over SParent,CParent: Different HTML on server vs client

    Fix: move shared state into useState() so it is initialized once, independent of component execution order.

    Category 3: Teleports

    is rendered inline in the component tree on the server, but moved to on the client. The DOM structure no longer matches.

    Fix: wrap teleported content in so it is rendered exclusively on the client.

    flowchart LR
        subgraph SSRTree["SSR Tree"]
          A["Component A<br/>(includes Teleport target)"]
          B["Teleported content<br/>(rendered inline on server)"]
          A --> B
        end
    
        subgraph HydratedDOM["Hydrated DOM"]
          A2["Component A<br/>(no inline teleported content)"]
          B2["Teleported content<br/>moved under body element"]
        end
    
        SSRTree -->|server HTML| HydratedDOM
        classDef mismatch fill:#ffe0e0,stroke:#ff5555,stroke-width:1px;
        class B,B2 mismatch;

    Category 4: Client-Side State Initialization

    If a reactive value is false during SSR but becomes true during hydration (for example, a dialog’s isOpen toggled in mounted), CSS classes and markup diverge:

    Server: <div class="panel panel-closed">   ← isOpen = false
    Client: <div class="panel panel-open">     ← isOpen = true (mounted set it)

    Fix: ensure the initial value matches the SSR state. Use watch or nextTick to change state after hydration completes, not during.

    Category 5: Async Composable Race Conditions

    When multiple composables use useAsyncData and depend on each other, the resolution order can differ between server and client. Computed values built on these async results may pass through different intermediate states and yield divergent HTML.

    Fix: enforce top-down data flow from useState. Avoid computed values that depend on partially resolved async state.

    flowchart TB
        subgraph Server
          S1["useAsyncData A<br/>resolves first"]
          S2["useAsyncData B<br/>resolves second"]
          SC["Computed C<br/>based on A+B<br/>→ Server HTML"]
          S1 --> SC
          S2 --> SC
        end
    
        subgraph Client
          C1["useAsyncData B<br/>resolves first"]
          C2["useAsyncData A<br/>resolves second"]
          CC1["Computed C (intermediate)<br/>based only on B"]
          CC2["Computed C (final)<br/>based on A+B<br/>→ Client DOM"]
          C1 --> CC1
          C2 --> CC2
        end
    
        classDef warn fill:#fff4e5,stroke:#ff9900,stroke-width:1px;
        class SC,CC1,CC2 warn;

    The Hydration Cookbook Pattern

    Capturing hydration issues in a structured way — symptom, root cause, fix — builds a shared knowledge base that dramatically reduces debugging time in any sizeable Nuxt application. A practical approach is to keep a “Hydration Issues Cookbook” with entries like:

    flowchart TB
        Issue["HYDRATION ISSUE:<br/>Random IDs in Templates"]
    
        Symptom["Symptom:<br/>#quot;Hydration node mismatch#quot;<br/>&lt;input id=#quot;...#quot;&gt; differs"]
        Cause["Root Cause:<br/>Math.random() / crypto.randomUUID()<br/>in setup() or template"]
        Fix["Fix:<br/>Use useId() for deterministic IDs"]
        Prevention["Prevention:<br/>ESLint rule — no Math.random()<br/>in setup/template"]
    
        Issue --> Symptom
        Issue --> Cause
        Issue --> Fix
        Issue --> Prevention
    
        classDef header fill:#e0f2ff,stroke:#1e88e5,stroke-width:1px;
        classDef box fill:#ffffff,stroke:#90a4ae,stroke-width:1px;
        class Issue header;
        class Symptom,Cause,Fix,Prevention box;

    Each entry describes a pattern, not a one-off incident. Over time, teams learn to recognize categories instead of chasing isolated bugs.


    SSR Event Replay

    In large modular applications, events emitted during SSR still need to reach client-side listeners. The usual SSR lifecycle creates a timing gap:

    sequenceDiagram
        box Server
          participant S as Cart module (server)
        end
        box Client
          participant C as Funnel module (client)
        end
    
        Note over S: Cart module loads
        S->>S: emit cart:loaded
    
        Note over C: Hydration begins
        C->>C: subscribe to cart:loaded
    
        Note over S,C: Event is lost —<br/>no client listener existed<br/>when server emitted it

    The server fires events while rendering, but no client listeners exist yet. By the time they subscribe, those events are gone.

    The Solution: useState as an Event Buffer

    During SSR, events are serialized into useState, which is automatically transferred from server to client via the Nuxt payload. After hydration, the event bus reads the stored events and replays them through standard RxJS subjects.

    sequenceDiagram
        box Server
          participant S as Cart module (server)
          participant ST as useState (SSR store)
        end
        box Client
          participant CT as useState (hydrated payload)
          participant B as Event bus (RxJS)
          participant L as Listeners
        end
    
        Note over S: Cart module loads
        S->>S: emit cart:loaded
        S->>ST: push cart:loaded into useState buffer
    
        ST-->>CT: state transfer with events
    
        Note over B,L: After hydration
        L->>B: subscribe to cart:loaded
        B->>CT: read buffered events
        CT-->>B: cart:loaded events
        B-->>L: replay cart:loaded<br/>→ listeners fire ✓

    Replay is automatic. Module authors do not need to care whether an event fired during SSR or on the client — subscribers receive it either way.


    Debugging Hydration Issues

    Hydration warnings identify where the DOM diverged, but rarely why. Vue points to a specific DOM node, while the real cause might be several layers up in the tree or hidden in composables.

    Strategy 1: Binary Elimination

    Wrap parts of the page in to localize the mismatch. If wrapping section A in makes the warning disappear, the bug is in that section. Then progressively narrow down.

    flowchart TB
        Page[Page Component]
    
        A["Section A<br/>(suspect)"]
        B[Section B]
        C[Section C]
    
        Page --> A
        Page --> B
        Page --> C
    
        A2["Section A wrapped<br/>in &lt;ClientOnly&gt;"]
        Page -. test step .-> A2
    
        classDef suspect fill:#fff4e5,stroke:#ff9800;
        classDef normal fill:#ffffff,stroke:#90a4ae;
        class A suspect;
        class B,C normal;

    Strategy 2: SSR-Only Rendering

    Disable client-side hydration entirely (ssr: true with no client JavaScript) and compare:

    • The raw server HTML
    • The HTML the client would render

    This isolates state differences and logic that only runs on the client.

    flowchart LR
        SSR["SSR-only HTML<br/>(no client JS)"]
        ClientRender["Client-only render<br/>(same route, mocked data)"]
    
        SSR --> Diff[Diff DOM + state]
        ClientRender --> Diff
    
        Diff --> Cause["Identify diverging values<br/>and client-only logic"]

    Strategy 3: AI-Assisted Debugging

    An AI assistant connected to both the application’s MCP server (for server-side state) and the browser’s DevTools (for client-side state) can automatically diff the two:

    Developer: "The checkout form shows different content
                after hydration. Help me debug this."
    
    AI Assistant:
      1. Queries Pinia store via MCP → gets server-side cart state
      2. Inspects browser DOM via DevTools → gets client-side rendering
      3. Compares the two → identifies the diverging value
      4. Traces the value to a composable with client-only initialization
      5. Suggests fix: move initialization to useState
    flowchart TB
        Dev[Developer]
        AI[AI Assistant]
        MCP["MCP Server<br/>(server-side state)"]
        DevTools["Browser DevTools<br/>(client-side state)"]
        Diff[State + DOM diff]
        Fix["Suggested fix<br/>(e.g., move init to useState)"]
    
        Dev -->|debug request| AI
        AI --> MCP
        AI --> DevTools
        MCP --> AI
        DevTools --> AI
        AI --> Diff
        Diff --> Fix
        Fix --> Dev

    This pattern is already used in practice in sophisticated internal tooling — for example, a debug-chatbot-style module that provides exactly this capability (covered in detail in Article 14 of this series).


    The hydrate-never Directive

    Not all server-rendered content needs hydration. Static sections — text blocks, decorative images, layout wrappers — never change on the client. Hydrating them wastes CPU and inflates Total Blocking Time (TBT).

    A custom directive marks elements that should be skipped during hydration:

    With hydrate-never:
      Server renders: <div v-hydrate-never class="static-banner">
                        <h2>Welcome to Our Store</h2>
                        <p>Thousands of satisfied customers...</p>
                      </div>
    
      Client: Vue skips this subtree during hydration
              → No patch() calls
              → No reactive tracking
              → Zero TBT contribution
    flowchart LR
        subgraph Render
          S["Server render<br/>&lt;div v-hydrate-never&gt;..."]
          C["Client hydration<br/>Vue sees v-hydrate-never"]
        end
    
        S -->|HTML payload| C
    
        C -->|skip subtree| NoPatch["No patch() calls"]
        C -->|skip reactivity| NoReactive[No reactive tracking]
        C -->|perf| TBT["Zero TBT contribution<br/>for this subtree"]
    
        classDef static fill:#e0f7fa,stroke:#00acc1;
        class S,C,NoPatch,NoReactive,TBT static;

    On pages with large static sections (landing pages, editorial content, catalog content), this can cut Total Blocking Time by 30–50%.


    Lessons Learned

    Hydration is a contract, not a feature

    Treating hydration as “it works or it doesn’t” leads to brittle apps. Treating it as a contract — server and client must agree on every rendered value — leads to defensive patterns that prevent mismatches by design.

    The five categories cover 95% of real-world mismatches

    Random values, timing-dependent state, teleports, client-side initialization, and async race conditions. If you know these five, you can diagnose almost any hydration issue you encounter.

    Event replay is essential for SSR module architectures

    In modular SSR systems where modules communicate via events, event replay is non-negotiable. Without it, SSR-only events vanish on the client, creating subtle, production-only bugs.

    A cookbook is more valuable than documentation

    High-level advice (“avoid non-deterministic values”) is less actionable than concrete patterns (“this code causes this bug; here is the fix”). A living cookbook that evolves with new patterns is one of the most effective knowledge tools for hydration issues.


    What’s Next

    • Article 10: Memory, Stability, and PM2 — Running a Long-Lived Node.js Server — What happens when V8 runs for days and how to keep it stable.
    • Article 11: Multi-Environment Infrastructure — Azure Container Apps and the Configuration System — Managing three environments with generated configuration.
    • Article 12: Security in a Nuxt SSR App — CSRF, Azure AD, CSP, and More — The security layers that protect a server-rendered application.

    Munir Husseini is a software architect specializing in full-stack TypeScript, .NET, and cloud-native architectures.

  • The Full Picture — What the New Concept Delivers

    Twentieth and final article in a series about migrating from legacy architectures to a modern Nuxt 4 stack.


    From Parts to Whole

    The previous fifteen articles describe individual pieces — the GraphQL gateway, code generators, performance, infrastructure, security. This article brings them together and answers the question that matters to decision-makers: what does the complete system deliver?


    The Architecture at a Glance

    flowchart TB
        A["Azure Front Door<br/>(CDN + WAF)"] --> B[Azure Container Apps Environment]
        B --> C["Nginx Proxy<br/>TLS, Image cache, OTel spans"]
        C --> D["Nuxt 4<br/>SSR + GQL Gateway<br/>SSR, GraphQL stitching<br/>Page cache, Redis"]
        D --> E[".NET API<br/>Pricing, Orders, Users, Validation"]
        D --> F["Redis<br/>(per-env)"]
        B --> G["External Services<br/>Headless CMS · Application Insights<br/>Azure Key Vault · Azure AD"]

    Four containers per environment. One frontend language (TypeScript). One data schema (GraphQL). One module system (Nuxt modules). One configuration generator (YAML → JSON + Bicep).


    The Five Pillars

    Five architectural pillars, each delivering measurable value in a large enterprise application.

    Pillar 1: Unified Data Layer (GraphQL Schema Stitching)

    What it delivers: One API endpoint for all data sources. The frontend never needs to know which backend produced which field.

    BeforeAfter
    3–5 REST calls per page1 GraphQL query per page
    Manual data joining in frontend codeAutomatic via @delegate directive
    Per-endpoint types, manually maintainedGenerated from unified schema
    Custom error handling per APIOne Apollo error link

    Pillar 2: Total Automation (Code Generation)

    What it delivers: Developers write declarations (GraphQL queries, YAML translations, GraphQL input types). The system generates everything else.

    flowchart LR
        A[What developers write] --> B[What is generated]
        A1[.graphql files] --> B1["Typed composables<br/>(auto-imported)"]
        A2[YAML translation files] --> B2["Typed t.* proxy chain<br/>(auto-imported)"]
        A3[GraphQL input types] --> B3[Form field metadata + validation]
        A4[CMS content model] --> B4[Vue component stubs + types]
        A5[Module scaffold command] --> B5[Complete module structure]

    Roughly 40–60% of the TypeScript code is generated — not boilerplate, but correct implementations derived from authoritative schemas.

    Pillar 3: SSR Performance Stack

    What it delivers: Near-instant page loads with near-perfect Lighthouse scores.

    flowchart TB
        S[Performance Stack]
        S --> L1["SSR<br/>Content visible immediately"]
        S --> L2["Multi-Tier Cache<br/>Sub-ms data retrieval"]
        S --> L3["Deferred Hydration<br/>No render-blocking JS"]
        S --> L4["Same-Origin Proxy<br/>−594ms LCP"]
        S --> L5["Font Strategy<br/>Zero CLS"]
        S --> L6["Bot Detection<br/>Clean audit scores"]
        S --> L7["Manual Chunks<br/>Only needed JS per page"]
        S --> R["Combined Result<br/>Lighthouse 97+ (mobile)"]

    Pillar 4: Production-Grade Operations

    What it delivers: Elastic infrastructure, zero-downtime deployments, full observability, and instant rollback.

    CapabilityImplementationArticle
    Elastic scalingContainer Apps auto-scale (5–20 replicas)11
    Zero-downtime deployBlue-green with traffic switching11
    Per-branch environmentsAutomated feature deployments11
    Full request tracingW3C Trace Context across all services13
    SecurityCSRF + Azure AD + runtime CSP12
    Instant rollbackTraffic switch to previous revision11

    Pillar 5: Developer Experience

    What it delivers: Fast feedback loops, strong type safety, AI-assisted debugging, and clear modular boundaries.

    AspectExperience
    New data sourceWrite .graphql file → composable auto-imported
    New translationAdd YAML key → t.section.key typed and available
    New moduleRun scaffold → complete structure created
    DebuggingAI assistant with 30+ live inspection tools
    Architecture understandingAGENTS.md + module READMEs + consistent patterns

    For Decision-Makers: The Numbers

    Performance

    These example metrics illustrate typical gains when moving a legacy enterprise frontend to a modern SSR + GraphQL stack:

    MetricLegacyNewImprovement
    Median response time2,618 ms165 ms15.9× faster
    Error rate3.91%0.09%97% lower
    Lighthouse Performance (mobile)~5097++47 points
    LCP> 5 s< 2.5 sGoogle “good”

    Capacity and Cost

    MetricLegacyNewImprovement
    Max tested capacity~99 RPM494+ RPM5× more
    Infrastructure modelFixed (always on)Elastic (pay-per-use)~40% cost reduction at average load
    ScalingManual (operations team)Automatic (config-driven)Zero manual intervention

    Development Velocity

    MetricLegacyNewImprovement
    New API integrationWrite REST client + types + mapperWrite .graphql file~90% less glue code
    New formBackend + frontend sync + manual testingSchema-driven, auto-validatedNo manual field wiring
    New CMS content typeManual component + data fetching + typesGenerated component stub + typed queryNo boilerplate
    New translation keyAdd key, hope it matches at runtimeYAML key → typed t.section.keyCompile-time checked
    New moduleCreate folders, wire routing, exports, typesyarn plop → complete scaffoldConsistent structure in seconds
    Type safety coveragePartial (hand-written)Complete (generated)Zero type drift

    For Architects: The Design Principles

    Five design principles unify the decisions across all 15 articles for large-scale web applications.

    1. Generate, Don’t Write

    If code can be derived from an authoritative schema (GraphQL, CMS model, YAML translations), generate it. Hand-written code drifts; generated code stays correct.

    2. One Source of Truth Per Concern

    • Data shape → GraphQL schema
    • Validation rules → Backend model annotations
    • URL structure → CMS entries
    • Translations → YAML files
    • Infrastructure configuration → YAML values files

    No concern has two sources. Everything is defined once and consumed many times.

    3. Eliminate Work, Don’t Optimize It

    The biggest performance gains came from removing work: modulepreload hints, cross-origin connections, redundant CMS queries. The biggest productivity gains came from removing boilerplate. Subtraction outperforms optimization.

    4. Modules as Boundaries

    Folders suggest organization. Modules enforce it. With 35+ modules, each owning its own API surface, the codebase has real boundaries.

    5. Measure Everything, Assume Nothing

    Architecture claims without data are opinions. Load test results, Lighthouse scores, cache hit rates, and response time distributions turn them into evidence. The production configuration is shaped by measurement.


    Lessons Learned

    The whole is greater than the sum of the parts

    No single technique produces a 15.9× improvement. Stitched GraphQL removes multi-source joins, caching eliminates redundant fetches, deferred hydration removes render-blocking JS, and elastic infrastructure eliminates over-provisioning. Each targets a different bottleneck; together they transform the system.

    Architecture must allow change

    The architecture is designed upfront — but designed for adaptability. Technology and requirements evolve, and the system must evolve with them. An architecture that requires wholesale rewrites accumulates debt until it becomes unmaintainable.

    Four mechanisms make change safe:

    • Loose coupling via modules. Dozens of independent modules, each owning a vertical slice. Replacing authentication, tracking, or forms never cascades into unrelated code. A module can be deleted and rebuilt without touching the rest.
    • Code generation from schemas. Generated code follows the schema it came from. When a data source changes or a service is replaced, regeneration produces correct integration code. No hand-written adapter layer drifting from reality.
    • Schema-driven contracts (GraphQL). The frontend depends on a unified schema, not individual backends. A service can be rewritten, split, or replaced — as long as it serves the same fields, no frontend changes are required. New clients (mobile, internal tools) consume the same gateway without backend modifications.
    • Infrastructure as configuration. The same YAML generates manifests for Container Apps, AKS, and App Service. Switching platforms or adding environments is a configuration change, not a re-architecture.

    In practice: adding a data source is one subgraph and a stitch directive. Changing rendering strategy is a route rule. Replacing a backend service is transparent to the frontend. Swapping a module is contained to its directory.

    Things that change often (services, modules, data sources, infrastructure) are easy to change. The few core decisions that are hard to change (framework, data protocol) are chosen carefully and affect only their own layer.

    Developer experience is a force multiplier

    Fast feedback loops (auto-generated types, branch environments, AI debugging) do not just improve morale — they improve the system. When adding a data source takes minutes instead of hours, teams integrate more data. When debugging takes minutes instead of days, bugs get fixed faster.

    Developer hardware is not optional

    This architecture demands fast machines. The dev stack runs a Nuxt server watching thousands of files, a GraphQL server, code generators, and TypeScript language services analyzing the module graph — simultaneously. When hardware falls short, HMR becomes sluggish, type checking lags, and code generation feels blocking.

    Windows is particularly affected. Node.js file watching and module resolution are measurably slower on NTFS than on macOS or Linux. Teams on Windows need WSL2, faster disks, or higher-spec hardware.

    A trade-off worth noting: the architecture optimizes for velocity given adequate hardware. The minimum spec is higher than simpler stacks. Budget accordingly.


    What’s Next

    The remaining articles dive deeper into specific technical patterns:

    • Article 17: The @delegate Directive Deep Dive — Cross-subgraph field resolution in detail
    • Article 18: Building a Headless Design System — The Compose Pattern — Separating style logic from templates
    • Article 19: A/B Testing at SSR Level — Cookie-based variant selection during server rendering
    • Articles 20–27: Content preview, logging, module development, conditional rendering, image proxying, observability, reactive filters, and deferred hydration

    Munir Husseini is a software architect specializing in full-stack TypeScript, .NET, and cloud-native architectures.

  • Load Testing Results — 15× Faster, 5× More Capacity

    Nineteenth in a series about migrating from legacy architectures to a modern Nuxt 4 stack.


    Architecture Decisions Have Consequences — Measure Them

    Architecture decisions accumulate, and their combined effect only becomes visible under real load.

    Before production, a large enterprise application was load-tested with production-equivalent patterns, not synthetic traffic. k6 replayed a model derived from real production logs: 20 pages, weighted by actual traffic share.


    The Headline Numbers

    MetricLegacy SystemNew SystemChange
    Median response time2,618 ms165 ms15.9× faster
    Error rate (1× prod load)3.91%0.09%97% lower
    Max tested capacity~99 RPM494+ RPM5× more
    Infrastructure3× fixed VMs (24 vCPU, 96 GB)Auto-scaled containersElastic
    Lighthouse Performance (mobile)~5097+Near-perfect

    A 2.6-second median means the better half of requests still took 2.6 seconds. A 165 ms median means the page renders before a user can blink.


    Test Methodology

    Traffic Pattern

    The load test replayed production-equivalent traffic using k6’s HTTP module:

    pie showData
      title Traffic Distribution (top 10 pages)
      "Homepage (28%)" : 28
      "Product Overview (19%)" : 19
      "Product Details (14%)" : 14
      "Checkout Step 1 (9%)" : 9
      "FAQ (7%)" : 7
      "Contact (6%)" : 6
      "About (5%)" : 5
      "Legal / Imprint (4%)" : 4
      "Blog Overview (3%)" : 3
      "Other (11 pages) (5%)" : 5

    Test Types

    Two test types were run:

    1. Replay Test — constant load at 1× production traffic (99 RPM) for 30 minutes
    2. Ramp Test — linear ramp from 1× to 5× production traffic over 30 minutes

    Replay Test: 1× Production Load

    The replay test answers: “Can the new system handle current production traffic?”

    flowchart TB
      title["Replay Test Results (1× production load = 99 RPM)"]
    
      subgraph Legacy_System["Legacy System"]
        L_Median["Median RT: 2,618 ms"]
        L_P95["P95 RT: 8,500+ ms"]
        L_Error["Error Rate: 3.91%"]
        L_RPM["Requests/min: 99"]
        L_Status["Status: Degraded"]
      end
    
      subgraph New_System["New System"]
        N_Median["Median RT: 168 ms"]
        N_P95["P95 RT: 450 ms"]
        N_Error["Error Rate: 0.09%"]
        N_RPM["Requests/min: 99"]
        N_Status["Status: Healthy"]
      end
    
      L_Median --- N_Median
      L_P95 --- N_P95
      L_Error --- N_Error
      L_RPM --- N_RPM
      L_Status --- N_Status

    The new system handles production traffic with 96% lower response times and 97% fewer errors. The P95 at 450 ms means even the slowest 5% of requests are faster than the legacy system’s median.


    Ramp Test: Finding the Ceiling

    The ramp test answers: “How far can we push it before it breaks?”

    xychart-beta
      title "Ramp Test Results (1× → 5× production load)"
      x-axis "Load (× production)" [1, 2, 3, 4, 5]
      y-axis "Response Time (ms)"
      line [2618, 4000, 5000, 6000, 8800]
      line [165, 165, 165, 165, 165]

    The median stayed flat at 165 ms even at 5× load. There was no linear degradation: additional load did not increase per-request latency.

    The P95 degraded to 8.8 seconds at 5×, driven by scale-out lag. New replicas needed time to start; once they were online, they matched existing replica performance.


    The Right-Sizing Experiment

    Finding the minimum viable resource allocation is a critical part of load testing. Four configurations were tested:

    ConfigvCPURAMPM2 WorkersV8 HeapResult
    #148 GiB32048 MB✅ Stable, over-provisioned
    #224 GiB21536 MB✅ Stable, efficient
    #312 GiB21024 MB❌ Cascading failures
    #424 GiB21536 MB✅ Validated (6× load)
    flowchart LR
      A["Config #1: 4 vCPU / 8 GiB / 3 workers / 2048 MB heap"] -->|Over-provisioned| B["Config #2: 2 vCPU / 4 GiB / 2 workers / 1536 MB heap"]
      B -->|Right-size further| C["Config #3: 1 vCPU / 2 GiB / 2 workers / 1024 MB heap"]
      C -->|Cascading failures| D["Config #4: 2 vCPU / 4 GiB / 2 workers / 1536 MB heap (Validated at 6× load)"]

    The Failed Right-Sizing (Config #3)

    Reducing to 1 vCPU / 2 GiB caused a cascade:

    sequenceDiagram
      participant L as Load Generator
      participant R1 as Replica 1
      participant R2 as Replica 2
      participant R3 as Replica 3
      participant HP as Health Probe
    
      Note over R1,R3: Failure Cascade at 1 vCPU / 2 GiB
    
      L->>R1: t=0s: Traffic (99 RPM)
      Note over R1: Memory: 1,791 / 2,048 MB (87.5%)
    
      R1-->>R1: t=10s: V8 GC stalls<br/>Event loop blocked
      HP->>R1: t=15s: Health probe
      HP-->>HP: Timeout
      HP->>R1: Mark unhealthy → restart
    
      Note over R2: t=20s: Absorbs 2× traffic
      L->>R2: Increased traffic
    
      R2-->>R2: t=25s: Memory spike → restart
      Note over R3: t=30s: Overloaded → restart
    
      Note over R1,R3: t=35s: All replicas restarting
      Note over L: t=45s: Zero capacity for ~10 seconds<br/>→ 5% error rate

    V8 needs breathing room. At 87.5% heap utilization, GC pauses block the event loop long enough for health probes to time out. The minimum viable compute here was 2 vCPU / 4 GiB, though the exact threshold depends on application complexity, page weight, and caching. The principle is general; the numbers are specific.


    The Validated Production Configuration

    The configuration that passed k6’s exit-code-0 threshold at 6× production load:

    flowchart TB
      subgraph SPA["SPA Containers"]
        SPA_CPU["CPU: 2 vCPU"]
        SPA_MEM["Memory: 4 GiB"]
        SPA_PM2["PM2 Workers: 2 per container"]
        SPA_HEAP["V8 Heap: 1536 MB (--max-old-space-size=1536)"]
        SPA_MIN["Min Replicas: 5"]
        SPA_MAX["Max Replicas: 20"]
      end
    
      subgraph API["API Containers"]
        API_CPU["CPU: 0.5 vCPU"]
        API_MEM["Memory: 1 GiB"]
        API_MIN["Min Replicas: 3"]
        API_MAX["Max Replicas: 20"]
      end
    
      subgraph Results["Result at 6× load"]
        RES_MED["Median RT: 165 ms"]
        RES_ERR["Error rate: 0.82%"]
        RES_CPU["CPU peak: 12% of allocation"]
        RES_MEM["Memory peak: 60% of allocation"]
      end
    
      SPA --> Results
      API --> Results

    Cost Analysis

    50% less CPU and 50% less memory per replica compared to the initial over-provisioned config:

    flowchart TB
      subgraph Legacy["Legacy (fixed)"]
        L1["3× VM instances"]
        L2["24 vCPU, 96 GB RAM — always on"]
        L3["Cost: constant regardless of traffic"]
      end
    
      subgraph New["New (elastic)"]
        N1["5–20 SPA replicas (2 vCPU, 4 GiB each)"]
        N2["3–20 API replicas (0.5 vCPU, 1 GiB each)"]
        N3["Per-second billing — pay for actual usage"]
        N4["At idle: 5 SPA + 3 API"]
        N5["At peak: 15 SPA + 8 API"]
        N6["Average: ~60% of peak capacity billed"]
      end
    
      Legacy -->|"Migrated to"| New

    Elastic billing lowers cost during low-traffic periods — nights, weekends, and holidays — while still scaling for spikes without permanent over-provisioning.


    What the Numbers Mean for Architecture

    Each architecture decision from earlier articles contributed to these numbers:

    DecisionContribution
    SSR (Article 1)Eliminates client-side rendering delay
    GraphQL Gateway (Article 2)Single query per page instead of 3–5 REST calls
    Multi-Tier Cache (Article 6)Sub-ms content retrieval for cached pages
    Deferred Hydration (Article 6)Eliminates render-blocking JavaScript
    Same-Origin Image Proxy (Article 6)Improves LCP by reducing cross-origin overhead
    PM2 Cluster Mode (Article 10)Zero-downtime worker restarts
    Container Apps Auto-Scaling (Article 11)Elastic capacity, no over-provisioning
    flowchart LR
      SSR["SSR"] --> PERF["Lower TTFB & faster first paint"]
      GQL["GraphQL Gateway"] --> PERF
      CACHE["Multi-Tier Cache"] --> PERF
      HYDR["Deferred Hydration"] --> PERF
      IMG["Same-Origin Image Proxy"] --> PERF
      PM2["PM2 Cluster Mode"] --> REL["Resilience & zero-downtime deploys"]
      AS["Container Apps Auto-Scaling"] --> CAP["Elastic capacity"]
    
      PERF --> OUT["15.9× faster median\nLighthouse 97+"]
      REL --> OUT
      CAP --> OUT

    No single decision produces 15.9×. It is the combination — each one removing a different bottleneck — that delivers the aggregate result.


    Lessons Learned

    Load test with production traffic patterns, not synthetic ones

    A synthetic test hitting the homepage 100 times per second says nothing about real-world performance. Real traffic has a distribution — heavy pages, light pages, API calls, form submissions. The test must match it.

    flowchart LR
      A["Synthetic test: 100 req/s to homepage"] -->|Misleading| C["Unrealistic bottlenecks"]
      B["Production-equivalent mix:\nheavy pages, light pages, APIs, forms"] -->|Accurate| D["Realistic capacity & latency insights"]

    Right-sizing failures are the most valuable test results

    The cascading failure at 1 vCPU / 2 GiB taught more about system behavior than all successful tests combined. It exposed the GC pressure threshold, health probe timing sensitivity, and cold-start vulnerability. These insights shaped the production configuration.

    flowchart TB
      F["Right-sizing attempt"] --> F1["Too small (1 vCPU / 2 GiB)"]
      F1 --> F2["GC pressure & probe timeouts"]
      F2 --> F3["Cascading restarts"]
      F3 --> F4["Error budget impact"]
      F4 --> F5["Refined production config\n(2 vCPU / 4 GiB, validated at 6×)"]

    Median response time is the metric that matters most

    P95 and P99 matter for tail latency, but the median determines the experience for most users. A flat median under increasing load (165 ms at 1× and 5×) proves horizontal scaling without per-request degradation.

    xychart-beta
      title "Median vs P95 under load"
      x-axis "Load (× production)" [1, 2, 3, 4, 5]
      y-axis "Response Time (ms)"
      line [165, 165, 165, 165, 165]
      line [450, 1200, 3000, 6000, 8800]

    15× is not an optimization — it is a different architecture

    A 15.9× improvement does not come from optimizing an existing system. It comes from removing fundamental bottlenecks: dual rendering, multi-source data joining, absence of caching, fixed infrastructure. The improvement is architectural, not incremental.


    What’s Next

    • Article 16: The Full Picture — What the New Concept Delivers — Synthesis for decision-makers and architects.
    • Article 17: The @delegate Directive Deep Dive — Cross-Subgraph Field Resolution — A technical deep dive into the most powerful schema stitching feature.
    • Article 18: Building a Headless Design System in Vue 3 — The Compose Pattern — Separating style logic from templates.

    Munir Husseini is a software architect specializing in full-stack TypeScript, .NET, and cloud-native architectures.

  • The Nuxt Observability Stack: Tracing, Logging, and PM2 Metrics

    Migrating from a legacy application to a modern Nuxt 4 stack is not just about new frameworks and better performance numbers. The real shift is moving from reactive firefighting to proactive observability — knowing what is slow, why it is slow, and how the platform behaves under real load.

    This observability stack has three pillars:

    • End-to-end distributed tracing across Nginx, Nuxt, backend services, and Redis
    • Structured logging with per-module, runtime-tunable log levels
    • Node.js process diagnostics for GC, heap, and CPU under PM2

    Together, they turn a deployment into something that can be reasoned about, not just hoped over.


    Flying Blind vs. Full Visibility

    Without observability, slowdowns are only visible when users complain, and failures are only visible when error rates spike. The underlying cause remains unknown: which component was slow, which call failed, which cache missed.

    In a system with multiple containers — for example, a frontend app, an API, a proxy, and Redis — a single request crosses several services. Without tracing, correlating what happened means manually matching timestamps across separate log streams. Most teams stop long before they get a clear picture.

    The target state is one trace ID created at the edge and propagated from the browser through every service, so a single click in the observability backend reveals the full request waterfall.


    Three-Layer Telemetry: Traces, Proxy Spans, and Container Metrics

    The observability stack has three layers, each capturing a different dimension of the system:

    flowchart TB
      subgraph L1["Layer 1: SDK Instrumentation"]
        L1a["Node.js applicationinsights<br/>+ .NET AI SDK"]
        L1b["→ Request traces, dependency calls, exceptions"]
        L1c["→ Custom events (GraphQL operations, cache metrics)"]
      end
    
      subgraph L2["Layer 2: Nginx OpenTelemetry Module"]
        L2a["→ Span per proxied request"]
        L2b["→ W3C Trace Context headers<br/>(traceparent, tracestate)"]
        L2c["→ Complete proxy → SPA → API waterfall"]
      end
    
      subgraph L3["Layer 3: Container Apps Managed OTel Agent"]
        L3a["→ Container-level metrics<br/>(CPU, memory, restarts)"]
        L3b["→ All containers, including Redis"]
        L3c["→ Zero code changes"]
      end
    
      L1 --- L2 --- L3

    Layer 1: SDK Instrumentation

    Both the frontend app and the API send request traces, dependency calls, exceptions, and custom events to the observability backend. The Node.js SDK automatically instruments incoming HTTP requests, outgoing HTTP calls, and Redis operations.

    A GraphQL server module can add custom dependency telemetry for every subgraph call and every Redis cache operation:

    flowchart TB
      subgraph GQL["Custom Dependency Event: GraphQL"]
        direction TB
        g1["Name: GraphQL: cms/pageByPath"]
        g2["Type: GraphQL"]
        g3["Duration: 45ms"]
        g4["Success: true"]
        g5["operationName: pageByPath"]
        g6["subgraph: cms"]
        g7["cacheHit: false"]
        g8["transactionId: abc-123-def"]
      end
    
      subgraph RED["Custom Dependency Event: Redis"]
        direction TB
        r1["Name: Redis: cache-check"]
        r2["Type: Redis"]
        r3["Duration: 2ms"]
        r4["Success: true"]
        r5["operation: GET"]
        r6["cacheHit: true"]
        r7["key: page-data:/products/premium"]
      end

    These custom events land in the same trace as the HTTP request, so it becomes clear which operations ran, which caches hit or missed, and how long each step took.

    Layer 2: Nginx OpenTelemetry

    The reverse proxy includes the nginxinc/nginx-otel module. Every proxied request becomes a span and carries W3C Trace Context headers:

    sequenceDiagram
      participant B as Browser
      participant N as Nginx Proxy
      participant S as Nuxt SPA (Node.js)
      participant A as Backend API
      participant R as Redis
    
      B->>N: HTTP request<br/>(no trace context yet)
      Note right of N: Creates span<br/>Generates traceparent header<br/>traceparent: 00-abcdef1234567890-span1-01
      N->>S: Forward request<br/>+ traceparent
    
      Note right of S: Reads traceparent<br/>Creates child span<br/>Propagates to outgoing calls
    
      S->>R: Redis cache GET<br/>(child span)
      S->>A: GraphQL → CMS API<br/>(child span)
      S->>A: GraphQL → Backend API<br/>(child span)
    
      A->>A: Database calls,<br/>business logic (child spans)

    A single trace ID stitches together every hop. The end-to-end transaction view in the observability backend renders the full waterfall:

    gantt
      dateFormat  x
      axisFormat  %Lms
    
      section Nginx Proxy
      Nginx Proxy         :active, nginx, 0, 150
    
      section SPA Request
      SPA Request         :spa, 10, 140
      Redis GET           :redis, 20, 10
      GraphQL CMS         :cms, 30, 40
      GraphQL Backend     :backend, 40, 80
    
      section Backend API
      API Request         :api, 60, 70
      SQL Query           :sql, 80, 30

    Layer 3: Container-Level Metrics

    The container environment runs a managed OpenTelemetry collector that gathers container metrics — CPU, memory, restart counts — for all containers, including Redis. No application changes are required.

    This layer answers operational questions:

    • Is Redis consuming too much memory?
    • Are frontend replicas flapping?
    • What is the steady-state CPU profile for API containers?

    Transaction ID Propagation

    Distributed traces are useful for visualizing a single request, but day-to-day debugging often starts from logs. To bridge both worlds, the proxy generates an x-transaction-id header for every incoming request:

    flowchart TB
      N["Nginx<br/>x-transaction-id: txn-abc-123"]
      FE["Frontend app"]
      API["API"]
      GQL["GraphQL custom events"]
    
      N -->|"Reads header<br/>adds to outgoing calls<br/>logs include txn-abc-123"| FE
      N --> API
      FE -->|"Includes txn-abc-123<br/>in request & logs"| API
      FE -->|"Tag events with<br/>txn-abc-123"| GQL
      API -->|"Logs include<br/>txn-abc-123"| GQL

    The transaction ID is mapped to the W3C traceparent trace ID. Developers can start from either side — a transaction ID from logs or a trace ID from the observability backend — and still recover the complete request history.


    What Metrics Tell You

    The combined telemetry stack tracks several metric categories, each answering a distinct question:

    Metric CategoryExamplesQuestion It Answers
    Response timesPer-endpoint, per-container latency“Which pages are slow?”
    Error ratesHTTP 5xx, GraphQL errors, exceptions“What is failing?”
    Cache metricsHit/miss rates per cache tier“Is caching effective?”
    Resource usageCPU, memory per container/worker“Are we right-sized?”
    Dependency durationsGraphQL subgraph calls, Redis ops“Which external call is slow?”
    User journeysPage-to-page navigation funnels“Where do users drop off?”

    Alerting Strategy: Symptoms First, Causes Later

    Metrics matter only when they drive action. The guiding principle is:

    > Alert on symptoms, investigate with traces.

    • Symptom alert:

    “Frontend P95 response time exceeded 2 seconds for 5 minutes.”

    • Investigation:

    Open the traces for those slow requests → locate the slow dependency → fix the underlying issue.

    Alerting directly on causes like Redis CPU > 80% creates noise and false positives, because Redis CPU can legitimately spike during cache invalidation without harming users. Symptom-based alerts keep noise low and align alerts with real user impact.


    Structured Logging in Nuxt: From console.log to Observability

    Traces tell you where the problem is. Logs tell you what happened. To make that effective, logging has to be more than printing strings.

    The console.log Problem

    Using console.log in a production SSR application causes real issues:

    1. No severity levels — errors are indistinguishable from informational noise
    2. No structure — freeform strings cannot be reliably queried, filtered, or aggregated
    3. No context — you cannot tell which request, user, or component produced the log
    4. No control — you cannot selectively enable verbose logging for one module without overwhelming the output
    5. SSR noise — server-side logs are mixed with framework output, health checks, and PM2 logs

    There is a big difference between “we have logging” and “we have useful logging.” The first gives you strings to grep. The second gives you a structured, queryable observability layer.


    The Logging Architecture

    The logging system has three main building blocks:

    flowchart TB
      subgraph APP["Application Code"]
        A1["const log = useLogger('shopping-cart')"]
        A2["log.info('Item added', { productId, quantity })"]
      end
    
      subgraph UL["useLogger Composable"]
        UL1["Tagged with module name"]
        UL2["Checks if this module's level is enabled"]
        UL3["Formats structured message"]
      end
    
      subgraph MS["Multi-Sink Router"]
        S1["Sink 1: Console (development)<br/>Formatted, colored, human-readable"]
        S2["Sink 2: Observability Backend<br/>Structured JSON, custom properties"]
        S3["Sink 3: DevTools Log Viewer<br/>Real-time, filterable, in-browser"]
      end
    
      APP --> UL --> MS
      MS --> S1
      MS --> S2
      MS --> S3

    The useLogger Composable

    Each module gets its own logger instance:

    const log = useLogger('shopping-cart')
    
    log.debug('Cart state loaded', { items: cart.items.length })
    log.info('Item added', { productId: 'abc', quantity: 2 })
    log.warn('Price mismatch detected', { expected: 29.99, actual: 31.99 })
    log.error('Checkout failed', { error: err.message, orderId })

    Every logger is tagged with its module name. This enables per-module log level control — you can set shopping-cart to debug while keeping navigation at warn.

    Severity Levels

    LevelWhen to UseExample
    debugDevelopment-only details“Cart state loaded, 3 items”
    infoSignificant business events“Item added to cart”
    warnUnexpected but recoverable“Price mismatch, using server price”
    errorFailures requiring attention“Checkout failed, payment rejected”

    Multi-Sink Routing

    Each log message is fanned out to multiple sinks at once.

    Sink 1: Console (Development)

    In development, logs are written to both the browser console and Node.js stdout with:

    • Color coding by severity
    • A module name prefix
    • Collapsible structured payloads (objects expand on click)

    Sink 2: Observability Backend (Production)

    In production, logs are sent as structured events:

    Observability Event:
    {
      name: "shopping-cart:info",
      properties: {
        module: "shopping-cart",
        severity: "info",
        message: "Item added",
        productId: "abc-123",
        quantity: 2,
        requestId: "req-xyz",
        timestamp: "2025-06-02T12:34:56Z"
      }
    }

    These events can be queried with KQL (Kusto Query Language):

    customEvents
    | where name startswith "shopping-cart"
    | where customDimensions.severity == "error"
    | project timestamp, customDimensions.message, customDimensions.productId
    | order by timestamp desc

    Sink 3: DevTools Log Viewer

    A custom DevTools tab shows logs in real time:

    flowchart TB
      subgraph DT["DevTools — Logs Tab"]
        F["Filter controls:<br/>[All Modules ▼] [Info ▼] [Search...]"]
        L1["12:34:56 INFO  shopping-cart<br/>Item added {productId: 'abc', quantity: 2}"]
        L2["12:34:57 DEBUG catalog-query<br/>Cache hit for key 10115"]
        L3["12:34:58 WARN  shopping-cart<br/>Price mismatch {expected: 29.99, actual: 31}"]
        L4["12:35:01 ERROR checkout<br/>Payment failed {orderId: 'ord-789'}"]
      end
    
      F --> L1 --> L2 --> L3 --> L4

    Capabilities:

    • Filter by severity, such as only errors or debug and above
    • Filter by module, such as only shopping-cart logs
    • Full-text search across messages
    • Expandable structured data payloads

    Runtime Log Level Control

    Log levels are adjustable at runtime without restarting the app.

    flowchart TB
      subgraph CFG["Default levels (from config)"]
        C1["shopping-cart: info"]
        C2["catalog-query: warn"]
        C3["navigation: warn"]
      end
    
      subgraph RT["Runtime override (via API or DevTools)"]
        R1["shopping-cart: debug  ← changed"]
        R2["catalog-query: warn   ← unchanged"]
        R3["navigation: info      ← changed"]
      end
    
      CFG --> RT
    
      subgraph EFFECT["Effect"]
        E1["shopping-cart now outputs debug logs"]
        E2["No server restart"]
        E3["No redeploy"]
        E4["No impact on other modules"]
      end
    
      RT --> EFFECT

    A typical production debugging workflow:

    1. A user reports an issue
    2. Enable debug logging for the relevant module via an API or DevTools
    3. Reproduce the problem
    4. Inspect the debug logs in the observability backend
    5. Turn debug logging off again and restore the default level

    No deployment, no restart, and no log flood from unrelated modules.


    SSR-Aware Logging

    In an SSR app, logging must handle both server and client execution contexts:

    flowchart LR
      subgraph SRV["SSR Execution (Server: Node.js)"]
        S1["log.info('Page rendered')"]
        S2["Output:<br/>stdout (PM2 logs)<br/>Observability backend"]
        S3["Context:<br/>Request URL<br/>Request ID<br/>User-Agent"]
        S1 --> S2 --> S3
      end
    
      subgraph CLI["Client Execution (Browser)"]
        C1["log.info('Button clicked')"]
        C2["Output:<br/>Browser console<br/>DevTools Log Viewer<br/>Observability backend telemetry"]
        C3["Context:<br/>Current route<br/>Session ID"]
        C1 --> C2 --> C3
      end

    useLogger detects where it is running and routes logs to the right sinks. Server-side logs include request context such as URL, request ID, and user agent. Client-side logs include session context such as current route and user interactions.


    Replacing console.log Safely

    The migration away from console.log is incremental.

    • An ESLint rule flags console.log usage and suggests replacing it with useLogger. It does not auto-fix, so the developer explicitly chooses the severity and module tag.
    • For legacy code, a global console interceptor captures console.* calls and forwards them into the structured logging pipeline under a legacy module tag. This ensures nothing is lost during the transition.

    Over time, the codebase shifts from unstructured strings to queryable, structured events.


    Node.js Observability Under PM2: Diagnostics, GC, and CPU

    Application-level traces and logs tell you what is slow. To understand why the Node.js process itself degrades — heap growth, GC pauses, event loop lag — you need process-level visibility.

    Three Nuxt modules provide this:

    • diagnostics — per-request aggregation and pattern learning
    • diagnostics-heap — GC and heap monitoring with leak detection
    • diagnostics-profiler — automatic CPU profiling for slow requests

    These sit alongside the Nuxt app, PM2, and Nginx, and feed directly into the same observability backend.


    Layer 1: Per-Request Aggregation (diagnostics module)

    The diagnostics module captures seven metrics for every HTTP request:

    MetricWhat It Measures
    DurationTotal request handling time (ms)
    Input sizeRequest body size (bytes)
    Output sizeResponse body size (bytes)
    CPU usageProcess CPU delta during request
    Memory deltaHeap memory change during request
    Event loop lagMain thread blocking time (ms)
    Status codeHTTP response status

    O(1) Memory Aggregation

    Traditional APM tools store one record per request — 8.6 million records per day at 100 req/s. This module takes a different approach: no per-request storage. Only aggregations such as min, max, sum, and count are retained.

    flowchart TB
      subgraph TRAD["Per-Request Storage (traditional APM)"]
        T1["Request 1: { duration: 150, cpu: 12, memory: 35MB, ... }"]
        T2["Request 2: { duration: 200, cpu: 15, memory: 42MB, ... }"]
        T3["Request 3: { duration: 180, cpu: 11, memory: 38MB, ... }"]
        Tn["Request N: { duration: ???, cpu: ??, memory: ???, ... }"]
        TM["Memory usage: O(N) — grows with request count"]
        T1 --> T2 --> T3 --> Tn --> TM
      end
    
      subgraph AGG["Aggregation-Only (diagnostics module)"]
        A1["Aggregate:"]
        A2["duration: { min, max, sum, count }"]
        A3["cpu: { min, max, sum, count }"]
        A4["memory: { min, max, sum, count }"]
        AM["Memory usage: O(1) — constant<br/>regardless of request count"]
        A1 --> A2 --> A3 --> A4 --> AM
      end

    Monitoring overhead is constant, regardless of traffic volume.

    Slow-Request Pattern Detection

    Every 30 seconds, after at least 50 requests, the module detects patterns in slow requests by grouping on several features:

    flowchart TB
      subgraph FB["Feature Buckets"]
        F1["URL pattern: /products/*, /checkout/*, /"]
        F2["HTTP method: GET, POST"]
        F3["Payload size: small (<1KB), medium, large"]
        F4["Path depth: 1, 2, 3, 4+"]
      end
    
      FB --> P["For each bucket:<br/>Compute probability(request is slow)<br/>If probability ≥ 50% and count ≥ 3 → emit pattern"]

    For each feature bucket, the algorithm calculates the probability that a request in this bucket is slow (exceeds the configured threshold, such as 500ms). If a bucket has at least 50% slow probability with at least 3 samples, a pattern is emitted:

    flowchart TB
      P1["Observed: /checkout/*<br/>73% of requests slow (>500ms)<br/>12 observations in last 30s"]
      P2["Emit custom event:<br/>name = 'SlowRequestPatterns'<br/>pattern = '/checkout/*'<br/>probability > 0.5"]
      P1 --> P2

    Pattern detection surfaces systemic slowness that individual alerts miss. A single slow request might be a fluke. A persistent pattern for a specific URL points to a real problem with that page’s data fetching or rendering.

    SSR GraphQL Disambiguation

    During SSR, the Nuxt server makes GraphQL calls to itself — real HTTP requests that pass through the diagnostics middleware. Without disambiguation, each page request would be counted twice.

    The module identifies SSR-internal requests via the CSRF bypass token from the security layer and excludes them. You get accurate per-page measurements with no double-counting.


    Layer 2: Heap Memory and GC (diagnostics-heap module)

    The diagnostics-heap module uses V8’s PerformanceObserver API to monitor garbage collection events in real time.

    GC Event Categories

    GC TypeWhat It CollectsTypical Duration
    scavengeYoung generation (new objects)1–5 ms
    mark-sweepFull heap (major GC)10–50 ms
    incrementalIncremental marking1–10 ms
    weakcbWeak reference callbacks<1 ms

    Each event records duration, heap before/after, and bytes freed. Events are aggregated into time-series data and sent periodically to the observability backend.

    Automatic Memory Leak Detection

    The module tracks consecutive heap growth over time. When heapUsed increases for \(N\) consecutive intervals without a significant GC reduction, it emits a leak detection event:

    flowchart TB
      subgraph WIN["Observation Window: 10 intervals (5 min each)"]
        I1["Interval 1: heapUsed = 800 MB"]
        I2["Interval 2: heapUsed = 820 MB  ↑ +20 MB"]
        I3["Interval 3: heapUsed = 845 MB  ↑ +25 MB"]
        I4["Interval 4: heapUsed = 860 MB  ↑ +15 MB"]
        I5["Interval 5: heapUsed = 890 MB  ↑ +30 MB"]
      end
    
      WIN --> DET["5 consecutive growth intervals detected<br/>Growth rate ≈ 18 MB/interval = 216 MB/hour"]
    
      DET --> EVT["Emit event:<br/>{ event: 'PotentialMemoryLeak',<br/>confidence: 'medium',<br/>growthRateMBPerHour: 216,<br/>consecutiveGrowths: 5 }"]
    
      EVT --> HIGH["If growth continues to 8+ intervals:<br/>confidence → 'high'"]

    The confidence level reduces false positives. Short-term growth is normal during traffic spikes. Only sustained growth triggers a leak alert.

    Automatic Heap Dumps

    When heapUsed exceeds a configurable threshold (default 1024 MB), a .heapsnapshot file is written automatically. It can be loaded into Chrome DevTools for detailed memory analysis.

    V8 Heap Space Breakdown

    Periodic sampling of v8.getHeapSpaceStatistics() provides per-space memory usage:

    flowchart TB
      subgraph HS["V8 Heap Spaces"]
        N["new_space: 16 MB total, 8 MB used<br/>Purpose: New objects (GC: scavenge)"]
        O["old_space: 900 MB total, 780 MB used<br/>Purpose: Survived objects"]
        C["code_space: 12 MB total, 10 MB used<br/>Purpose: Compiled code"]
        L["large_object: 45 MB total, 40 MB used<br/>Purpose: Objects > 512 KB"]
      end
    
      N --> O --> C --> L

    This is essential for distinguishing object leaks (old_space growing) from code cache growth (code_space growing) — different causes, different fixes.


    Layer 3: CPU Profiling (diagnostics-profiler module)

    The diagnostics-profiler module automatically captures V8 CPU profiles for requests that exceed the slow-request threshold.

    flowchart TB
      RS["Request starts<br/>Timer begins"]
      TH["Duration exceeds threshold"]
      PR["Profiler activates<br/>Capture V8 CPU profile"]
      RC["Request completes<br/>Profile saved as .cpuprofile"]
      DEV["Load in Chrome DevTools<br/>Flame chart analysis"]
    
      RS --> TH --> PR --> RC --> DEV
    
      subgraph FL["Example Flame Chart Breakdown"]
        F1["SSR renderer: 45% CPU time"]
        F2["GraphQL response parsing: 30%"]
        F3["HTML serialization: 15%"]
        F4["Other: 10%"]
      end
    
      DEV --> FL

    Profiles are in the standard V8 format, which Chrome DevTools renders as a flame chart, showing exactly which functions consumed CPU time.


    The Unified Picture: From Symptom to Root Cause

    When a slow request occurs, all layers fire in concert — traces, logs, and Node diagnostics:

    flowchart TB
      subgraph L1["Layer 1 (diagnostics)"]
        L1a["Records duration: 2,300 ms"]
        L1b["Emits SlowRequest event"]
        L1c["Updates pattern detection"]
      end
    
      subgraph L2["Layer 2 (diagnostics-heap)"]
        L2a["Records memory delta: +45 MB"]
        L2b["Checks for leak pattern"]
        L2c["If heap > threshold → auto heap dump"]
      end
    
      subgraph L3["Layer 3 (diagnostics-profiler)"]
        L3a["Captures .cpuprofile"]
        L3b["Shows 65% time in CMS API response parsing"]
      end
    
      subgraph OBS["Observability backend"]
        O1["End-to-end trace correlates:"]
        O2["Nginx span"]
        O3["Nuxt request + GraphQL dependencies"]
        O4["API calls + SQL query"]
        O5["Custom logs tagged with transaction ID"]
        O6["SlowRequestPatterns + GC + leak signals"]
      end
    
      L1 --> OBS
      L2 --> OBS
      L3 --> OBS

    You can move from:

    • An alert: “P95 for /checkout is 2.3s”
    • To the trace: “Most time is in the CMS subgraph”
    • To logs: “Price mismatch warnings and retries”
    • To process-level data: “Major GC pauses plus heap growth”
    • To artifacts: .heapsnapshot and .cpuprofile for offline analysis

    All within a single, correlated observability fabric.


    Capacity Planning Endpoint

    The diagnostics module exposes a /api/__profiler/memory-capacity endpoint that calculates the theoretical memory requirement:

    flowchart TB
      IN["Inputs:<br/>Baseline = 200 MB<br/>Requests/sec = 10<br/>Avg RT = 150 ms (0.15 s)<br/>Memory/req = 35 MB"]
      CONC["Concurrent requests = 10 × 0.15 = 1.5"]
      PEAK["Peak memory = 200 + (1.5 × 35) = 252.5 MB"]
      SAFETY["With 3× safety factor = 757.5 MB"]
      CFG["Set --max-old-space-size ≥ 768 MB"]
    
      IN --> CONC --> PEAK --> SAFETY --> CFG

    This directly informs the V8 heap cap and container memory allocation, bridging runtime diagnostics with deployment configuration.


    Lessons Learned Across the Stack

    Distributed tracing is not optional in a multi-container architecture

    Without trace correlation, debugging a slow request across four or more containers means combing through isolated log streams and aligning timestamps by hand. With W3C Trace Context, one trace ID tells the whole story. Setup cost: a few hours. Debugging savings: ongoing.

    Custom dependency events are worth the effort

    Out-of-the-box instrumentation knows about HTTP calls and Redis commands but has no idea that a specific call is “a GraphQL query to the CMS subgraph for page-by-path.” Custom events supply that semantic meaning — you can ask for “all slow CMS page queries” instead of “all slow HTTP calls to this URL.”

    Separate the telemetry environment from the application environment

    Using separate observability instances for test and production stops test noise from polluting production dashboards. Feature branches can report into the test instance.

    Layer 3 catches what SDK instrumentation misses

    SDK instrumentation covers what happens inside application processes. Container and Node-level metrics capture everything around them — Redis memory growth, restarts, OOM kills, GC pauses. Without this layer, Redis running out of memory or Node leaks are invisible until things start failing.

    Per-module log levels are essential at scale

    With 35+ modules, a single global log level is useless because enabling debug generates thousands of messages per second. Per-module levels let teams zoom in on the area they care about without drowning in noise.

    Runtime control changes how production issues are debugged

    When enabling debug logging requires a deployment, teams either leave it on permanently or never enable it. Runtime controls turn it into a normal tool: enable, investigate, disable.

    Structured data beats formatted strings

    log.info('Item added', { productId: 'abc', quantity: 2 }) is queryable: “show all items with quantity > 5.”

    console.log('Item abc added, quantity: 2') needs regex parsing and still breaks when the format changes. The extra effort to log structured data pays off every time it needs to be analyzed.

    Pattern detection beats single-event alerts

    Single slow-request alerts create noise and fatigue. A pattern like “73% of /checkout requests are slow” is actionable. It tells you exactly where to investigate.

    Automatic heap dumps are worth the disk space

    When a leak is detected in production, reproducing it locally is often the hardest part. Automatic heap dumps capture the heap state at the moment of detection — no reproduction required. A single snapshot can save days of debugging.


    Munir Husseini is a software architect specializing in full-stack TypeScript, .NET, and cloud-native architectures.

  • Memory, Stability, and PM2 — Running a Long-Lived Node.js Server

    Seventeenth in a series about migrating from legacy architectures to a modern Nuxt 4 stack.


    The Inconvenient Truth About Node.js Servers

    Node.js is optimized for event-driven I/O, not for long-lived servers that render thousands of pages per hour. Over time, the V8 heap grows and objects such as GraphQL responses, Vue server renderer allocations, cached strings, and Apollo Client instances accumulate. Without intervention, a production process will eventually consume all available memory and get killed by the container orchestrator.

    That is not a bug to eliminate so much as a reality to manage. The real question is not whether memory will approach its limit, but how gracefully the system will handle it.


    PM2 Cluster Mode: Zero-Downtime Worker Management

    In a large enterprise application, instead of a single Node.js process, PM2 typically runs N worker processes — often 2–3 per container. Each worker handles requests independently, which provides two critical benefits:

    1. Fault isolation — if one worker crashes or becomes unresponsive, the others keep serving requests
    2. Rolling restarts — when a worker approaches its memory limit, PM2 restarts it while the other workers continue handling traffic
    flowchart TB
        subgraph C["Container (2 vCPU, 4 GiB RAM)"]
            direction TB
            M[PM2 Master Process]
    
            subgraph W1[Worker 1]
                direction TB
                H1[V8 Heap\n~1.5 GiB\nmax-old-space-size=1536]
                R1[Handles requests\nindependently]
            end
    
            subgraph W2[Worker 2]
                direction TB
                H2[V8 Heap\n~1.5 GiB\nmax-old-space-size=1536]
                R2[Handles requests\nindependently]
            end
        end
    
        M --- W1
        M --- W2

    When Worker 1 approaches 1,536 MB of heap usage, PM2 restarts it. Worker 2 handles traffic during the restart, which typically takes 2–3 seconds for V8 to compile the Nuxt application. For that worker, downtime lasts a few seconds. For the overall application, it is effectively zero.


    V8 Heap Cap: Trading Throughput for Predictability

    By default, V8 uses a dynamic heap limit that grows based on available system memory. In containerized environments, that behavior is risky — V8 can grow beyond the container’s memory allocation and trigger an OOM kill.

    Setting an explicit heap limit forces more aggressive garbage collection:

    NODE_OPTIONS=--max-old-space-size=1536
    
    Effect:
      Without cap:  GC runs infrequently → heap grows to 3+ GiB → OOM kill
      With cap:     GC runs at ~1.2 GiB → heap stays under 1.5 GiB → stable
    flowchart LR
        A[Start] --> B[No explicit V8 heap cap]
        B --> C["Heap grows with available memory\n&gt; 3 GiB in container"]
        C --> D[Container OOM kill]
    
        A --> E[Set --max-old-space-size=1536]
        E --> F[GC runs around 1.2 GiB]
        F --> G["Heap stays &lt;= 1.5 GiB"]
        G --> H[Process stable\nSlightly lower peak throughput]

    The trade-off is straightforward: more frequent GC pauses of 2–5 ms each reduce peak throughput by about 5%. But the process never gets OOM-killed, which is a far better outcome in production.


    Memory Is the Scaling Bottleneck

    Load testing for a typical Nuxt SSR frontend in a large SaaS or e-commerce platform reveals something counterintuitive: the Nuxt SSR application is often I/O-bound, not CPU-bound.

    flowchart TB
        subgraph RU[Resource Usage Under Load]
            CPU[CPU peak ~12%]
            MEM[Memory peak ~60%]
            BOT[Bottleneck: Memory, not CPU]
        end
    
        CPU --> BOT
        MEM --> BOT

    SSR mostly waits for backend responses (for example, GraphQL or REST APIs) and renders HTML — I/O work that barely touches the CPU. But each in-flight request still holds response objects, VNode trees, and serialization buffers in memory. Under load, dozens of concurrent requests holding a few hundred kilobytes each add up quickly.

    This means:

    • Over-provisioning CPU wastes money — you pay for compute that sits idle
    • Under-provisioning memory crashes the server — V8 heap exhaustion triggers cascading failures
    • A good starting ratio is roughly 1 vCPU : 2 GiB RAM for SSR workloads

    The Right-Sizing Experiment

    In a representative production-like environment, you can right-size Node.js SSR containers by running load tests with different resource configurations:

    ConfigurationResult
    4 vCPU / 8 GiBStable but over-provisioned
    2 vCPU / 4 GiBStable and efficient ✓
    1 vCPU / 2 GiBCascading failures

    At 1 vCPU / 2 GiB, workers ran at 1,791 MB out of 2,048 MB — V8 was at its ceiling. Health probes timed out because the event loop was blocked by GC. The orchestrator restarted replicas, but cold-starting Nuxt takes several seconds because V8 must compile the application. During that window, the remaining replicas were overloaded, which caused them to fail health checks. The cascade continued until manual intervention.

    sequenceDiagram
        participant R1 as Replica 1
        participant R2 as Replica 2
        participant R3 as Replica 3
        participant O as Orchestrator
    
        Note over R1: Memory 1791/2048 MB<br/>GC stalls<br/>Health probe timeout
        O->>R1: Mark unhealthy
        O->>R1: Restart replica
        Note over R1: Cold start (~3s)<br/>No traffic handling
    
        Note over R2: Now handling 2× traffic<br/>Memory spike
        O->>R2: Health probe timeout
        O->>R2: Restart replica
    
        Note over R3: Now handling 3× traffic<br/>Immediate failure
        O->>R3: Restart replica
    
        Note over R1,R3: All replicas restarting<br/>Zero capacity for ~10 seconds

    In practice, the minimum viable per-replica compute for V8 startup plus Nuxt SSR in such an environment is about 2 vCPU / 4 GiB. Going below that introduces a cascading failure risk that replica count alone cannot absorb.


    Minimum Replicas: Preventing Cold-Start Cascades

    Even with correctly sized replicas, starting from too few creates problems under load. The orchestrator can launch new replicas, but each one needs time to start, compile, and begin accepting requests.

    For example, with 2 replicas scaling to 15, the first traffic burst hits only 2 instances. They overload while new replicas spin up. By the time those are ready, the original 2 may already have failed.

    The fix is to set minReplicas high enough to handle average production traffic without scaling out. In a typical large-scale web application, values might look like this:

    ServiceminReplicasmaxReplicasReasoning
    SSR SPA520Handles page rendering (heaviest)
    API320Handles business logic (lighter)
    flowchart LR
        TRAF[Average production traffic] -->|First burst| R5[5 pre-warmed SSR SPA replicas]
        R5 --> CAP["Within capacity<br/>No scale-out needed"]
    
        TRAF -->|Genuine spike| SO[Autoscaler triggers scale-out]
        SO --> N[New replicas starting\nNuxt compile + V8 startup]
        R5 --> BUF[Existing 5 replicas buffer traffic]
        N --> READY[New replicas ready\nTraffic distributed]

    At 5 pre-warmed SPA replicas, normal production traffic stays within capacity and does not trigger scaling. Scale-out only activates for genuine spikes, and the existing 5 replicas buffer traffic while new ones start.


    Health Monitoring

    The application exposes a health endpoint that returns per-worker metrics, enabling the orchestrator and internal tools to see exactly what PM2 workers are doing:

    GET /api/health/pm2
    Response:
    {
      "workers": [
        {
          "id": 0,
          "cpu": 8.2,
          "memory": 1234567890,
          "restarts": 3,
          "uptime": 86400000,
          "status": "online"
        },
        {
          "id": 1,
          "cpu": 5.1,
          "memory": 987654321,
          "restarts": 1,
          "uptime": 72000000,
          "status": "online"
        }
      ]
    }

    The endpoint is protected by an internal API guard — it returns 404 for any caller that is not a health probe or internal service with the correct authorization header. External callers cannot even discover that it exists.


    The Validated Configuration

    After extensive load testing in a realistic production scenario, a configuration like the following has proven to pass all thresholds:

    Per Container:
      CPU:     2 vCPU
      Memory:  4 GiB
      PM2:     2 workers per container
      V8:      --max-old-space-size=1536 per worker
    
    Scaling:
      SPA: min 5, max 20 replicas
      API: min 3, max 20 replicas
    
    Result at 6× production load:
      Median response time: 165 ms
      Error rate: 0.82%
      CPU peak: 12% of allocation
      Memory peak: 60% of allocation
    flowchart TB
        subgraph PC[Per Container]
            CPU[CPU: 2 vCPU]
            MEM[Memory: 4 GiB]
            PM2W[PM2: 2 workers per container]
            V8[V8: --max-old-space-size=1536 per worker]
        end
    
        subgraph SC[Scaling]
            SPA[SPA: min 5, max 20 replicas]
            API[API: min 3, max 20 replicas]
        end
    
        subgraph RES[Result at 6× production load]
            RT[Median response time: 165 ms]
            ER[Error rate: 0.82%]
            CPUU[CPU peak: 12% of allocation]
            MEMU[Memory peak: 60% of allocation]
        end
    
        PC --> SC --> RES

    Lessons Learned

    Node.js is not a “fire and forget” runtime

    Unlike compiled languages with deterministic memory management, Node.js requires active memory management for long-lived processes. V8 heap caps, PM2 restarts, and minimum replica sizing are not optimizations — they are necessities.

    Size for memory, not CPU

    SSR workloads are I/O-bound. The CPU spends most of its time waiting for backend responses. Provision memory generously and CPU conservatively. A 1:2 vCPU:GiB ratio is a solid starting point.

    Cold starts are the hidden enemy of auto-scaling

    Auto-scaling sounds effortless until you realize new replicas take several seconds to become productive. During that window, existing replicas have to absorb the load. If they cannot, cascading failures follow. Adequate minReplicas removes that risk.

    Load test the validated configuration, not the ideal one

    It is tempting to load test with generous resources and right-size later. But right-sizing can expose failure modes that do not exist at larger sizes. Always load test the production configuration, not a more generous version.


    What’s Next

    • Article 11: Multi-Environment Infrastructure — Azure Container Apps and the Configuration System — Managing three environments with generated configuration.
    • Article 12: Security in a Nuxt SSR App — CSRF, Azure AD, CSP, and More — The security layers that protect a server-rendered application.
    • Article 13: Observability and Distributed Tracing — Application Insights End-to-End — How every request is traced from the reverse proxy through the application to the backend.

    Munir Husseini is a software architect specializing in full-stack TypeScript, .NET, and cloud-native architectures.

  • Multi-Environment Infrastructure — Azure Container Apps and the Configuration System

    Sixteenth in a series about migrating from legacy architectures to a modern Nuxt 4 stack.


    The Environment Problem

    Any non-trivial application needs multiple environments: development, test, production. In a large enterprise application, feature branches ideally each get an isolated environment so developers can share a live preview without blocking one another.

    Configuration is where the real complexity hides. Three environments × four services × dozens of environment variables × secrets × scaling rules = hundreds of values that must be correct for every combination. Managing this by hand inevitably leads to deployment failures from miscopied connection strings or wrong environment variables.


    The Architecture: Three Environments on Azure Container Apps

    A typical setup runs three distinct environment types, all on Azure Container Apps (ACA):

    flowchart LR
        subgraph ACA[Azure Container Apps Environment]
            direction LR
    
            subgraph FE[Feature Branches]
                direction TB
                FE_Title[Per-branch:]
                FE_SPA[SPA]
                FE_API[API]
                FE_Proxy[Proxy]
                FE_Redis[Redis]
                FE_Iso[Isolated per branch]
            end
    
            subgraph TEST[Test]
                direction TB
                T_Title[Shared:]
                T_SPA[SPA]
                T_API[API]
                T_Proxy[Proxy]
                T_Redis[Redis]
                T_Notes[Stable integration]
            end
    
            subgraph PROD[Production]
                direction TB
                P_Title[Shared:]
                P_SPA[SPA]
                P_API[API]
                P_Proxy[Proxy]
                P_Redis[Redis]
                P_Notes[Live traffic]
            end
        end

    Feature Environments

    In many teams, every feature branch gets its own fully isolated deployment: SPA, API, proxy, and Redis containers. The CI/CD pipeline provisions on git push and tears everything down when the branch is deleted.

    • Developers share a live URL within minutes of pushing
    • No shared test environment lock — multiple features can be tested in parallel
    • Full isolation — one branch’s bugs never impact another
    flowchart LR
        subgraph Dev[Developer Workflow]
            direction LR
            A[git push to feature branch]
            B["CI/CD: provision\nFeature Environment\n(SPA, API, Proxy, Redis)"]
            C["Share live URL\nfor review & QA"]
            D[Branch merged\nand deleted]
            E[CI/CD: teardown\nFeature Environment]
    
            A --> B --> C --> D --> E
        end

    Per-Branch Redis

    Each feature branch gets its own Redis container. The pipeline rewrites Redis connection strings in the manifests at deploy time, preventing any cross-branch cache pollution.

    flowchart LR
        subgraph BranchA[Feature Branch A]
            A_API[API A]
            A_R[Redis A]
            A_API --> A_R
        end
    
        subgraph BranchB[Feature Branch B]
            B_API[API B]
            B_R[Redis B]
            B_API --> B_R
        end
    
        style BranchA fill:#e8f5e9,stroke:#2e7d32
        style BranchB fill:#e3f2fd,stroke:#1565c0

    The Configuration Generator

    Manually managing hundreds of configuration values does not scale. A YAML-based configuration system can generate all deployment artifacts from a single source of truth.

    Configuration Merge Order

    Configuration values are defined in layers, where later layers override earlier ones:

    flowchart TB
        L1["Layer 1:\nvalues.yaml\n(base + test defaults)"]
        L2["Layer 2:\nenvironments/production.yaml\n(production overrides)"]
        L3["Layer 3:\nplatforms/container-apps.yaml\n(platform defaults)"]
        L4["Layer 4:\nplatforms/container-apps.production.yaml\n(platform × env)"]
        M[⭣\nMerged configuration object]
        O1[Container Apps\nJSON manifests]
        O2[Azure Bicep\nparameter files]
        O3[Pipeline\nvariable files]
    
        L1 --> L2 --> L3 --> L4 --> M
        M --> O1
        M --> O2
        M --> O3

    To add a new environment variable, define it once in values.yaml with a default. If production needs a different value, override it in environments/production.yaml. The generator merges all layers and emits the final artifacts.

    Generated Artifacts

    The configuration generator produces three kinds of outputs:

    ArtifactPurposeExample
    Container Apps manifestsComplete container spec (env vars, secrets, scaling)container-apps/spa.test.json
    Bicep parameter filesInfrastructure parameters (environment name, region)container-apps/my-app.test.bicepparam
    Pipeline variable filesCI/CD variables (image tags, resource names)variables/common.yml
    flowchart LR
        SRC["Single source of truth\n(YAML config)"]
        GEN[Configuration generator]
    
        MAN[Container Apps\nJSON manifests]
        BICEP[Bicep parameter files]
        VARS[Pipeline variable files]
    
        SRC --> GEN
        GEN --> MAN
        GEN --> BICEP
        GEN --> VARS

    Separation of Infrastructure and Application Configuration

    A critical design choice in large systems: infrastructure and application configuration are treated as separate concerns.

    flowchart LR
        subgraph Infra["Infrastructure (Bicep)"]
            I1[Container Apps Environment]
            I2[Application Insights]
            I3[Other platform resources]
            I_Mgr["Managed by:\nInfrastructure pipeline\n(runs rarely)"]
        end
    
        subgraph AppCfg["Application (JSON Manifests)"]
            A1[Container image + tag]
            A2[Environment variables]
            A3["Secrets (Key Vault refs)"]
            A4[Scaling rules]
            A5["Resource limits (CPU/RAM)"]
            A6[Ingress configuration]
            A_Mgr["Managed by:\nBuild/deploy pipeline\n(runs every deployment)"]
        end
    
        Infra -->|"Provides infrastructure\nendpoints & resources"| AppCfg

    Infrastructure — the Container Apps Environment, monitoring, and other platform resources — changes rarely and is defined with Bicep. Application configuration — environment variables, secrets, scaling rules, resource limits — changes with each deployment and lives in generated JSON manifests.

    Risky, infrequent infrastructure changes are decoupled from routine application releases that run multiple times per day.


    Secret Management

    Sensitive values — connection strings, API keys, encryption keys — never live in Git. Secrets are stored in Azure Key Vault and referenced by name in the manifests:

    Manifest (in Git):
      env:
        - name: NUXT_REDIS_CONNECTION_STRING
          secretRef: redis-connection-string    ← reference, not value
    
    Key Vault:
      redis-connection-string = "redis://host:6379,password=..."
                                                ← actual value
    flowchart LR
        subgraph Git[Git Repo]
            M[Manifest\nsecretRef: redis-connection-string]
        end
    
        subgraph KV[Azure Key Vault]
            S[Secret:\nredis-connection-string\n= actual value]
        end
    
        subgraph ACA[Azure Container Apps Runtime]
            R[Container\nat startup]
        end
    
        M -. reference name .-> R
        S -. value resolution .-> R

    The Container Apps runtime resolves these references at startup. Secret values never show up in CI/CD logs, Git history, or committed manifests.


    Runtime Placeholders

    Some values are only known at deploy time — for example, the image tag (from the build) or the Application Insights connection string (from infrastructure). Placeholders handle these late-bound values:

    Manifest template:
      image: myregistry.azurecr.io/spa:__IMAGE_TAG__
      env:
        - name: APPLICATIONINSIGHTS_CONNECTION_STRING
          value: __APPINSIGHTS_CONNECTION_STRING__
    
    Deploy pipeline substitution:
      jq '.properties.template.containers[0].image |=
          gsub("__IMAGE_TAG__"; "20260602.3")' manifest.json
    sequenceDiagram
        participant B as Build
        participant P as Deploy Pipeline
        participant M as Manifest Template
        participant ACA as Azure Container Apps
    
        B->>P: Produce image tag\n(e.g. 20260602.3)
        P->>M: Load manifest template\nwith __IMAGE_TAG__ / __APPINSIGHTS_CONNECTION_STRING__
        P->>P: Use jq to substitute\nplaceholders with real values
        P->>ACA: Apply concrete manifest
        ACA->>ACA: Run container with\nresolved image & settings

    The pipeline replaces placeholders at deploy time using jq. Manifests remain deterministic — the same manifest plus different placeholder values yields different environments.


    Blue-Green Deployments

    Test and production environments commonly use blue-green deployments: the new version is deployed alongside the old one, validated, and then traffic is switched.

    flowchart TB
        subgraph Before[Before]
            BO["Old Revision\n(v20260601)\n100% traffic"]
        end
    
        subgraph During[During Deploy]
            DO["Old Revision\n(v20260601)\n100% traffic"]
            DN["New Revision\n(v20260602)\n0% traffic\n(warming up)"]
        end
    
        subgraph After[After Validation]
            AO["Old Revision\n(v20260601)\n0% traffic\n(standby)"]
            AN["New Revision\n(v20260602)\n100% traffic"]
        end
    
        subgraph Rollback[Rollback]
            RO["Switch traffic back\nto old revision\n(instant)"]
        end

    The old revision remains deployed at 0% traffic. Rolling back is a single traffic flip — no new deployment, effectively sub-second rollback.

    Fully Isolated Chains

    Both versions run side by side during the transition. To avoid mixed-version states (for example, a new SPA calling an old API), environment variables are rewritten at deploy time to point to revision-specific hostnames:

    Old Chain: Old Proxy → Old SPA → Old API (all on main hostnames)
    New Chain: New Proxy → New SPA → New API (all on revision hostnames)
    flowchart LR
        subgraph Old["Old Chain\n(main hostnames)"]
            OP[Old Proxy]
            OS[Old SPA]
            OA[Old API]
            OP --> OS --> OA
        end
    
        subgraph New["New Chain\n(revision hostnames)"]
            NP[New Proxy]
            NS[New SPA]
            NA[New API]
            NP --> NS --> NA
        end
    
        style Old fill:#fff3e0,stroke:#fb8c00
        style New fill:#e3f2fd,stroke:#1565c0

    Traffic is switched at the proxy level — a single switch moves all requests to the new chain in one shot.


    Production Migration: Front Door Traffic Switching

    For an initial cutover from a legacy system to a new stack, Azure Front Door enables zero-downtime traffic switching:

    Before Go-Live:
      Front Door → Old App Service (100% traffic)
    
    During Migration:
      Front Door → Old App Service (100%)
      New Container Apps (0%, ready and warmed)
    
    Go-Live:
      Front Door → New Container Apps (100%)
      Old App Service (0%, still running)
    
    If issues:
      Front Door → Old App Service (100%)  ← instant rollback
    flowchart TB
        subgraph Before[Before Go-Live]
            FD1[Azure Front Door]
            OA1[Old App Service\n100% traffic]
            FD1 --> OA1
        end
    
        subgraph Migration[During Migration]
            FD2[Azure Front Door]
            OA2[Old App Service\n100% traffic]
            NC2["New Container Apps\n0% traffic\n(ready & warmed)"]
            FD2 --> OA2
            FD2 -. monitoring .- NC2
        end
    
        subgraph GoLive[Go-Live]
            FD3[Azure Front Door]
            NC3[New Container Apps\n100% traffic]
            OA3["Old App Service\n0% traffic\n(still running)"]
            FD3 --> NC3
        end
    
        subgraph Issue[If issues]
            FD4[Azure Front Door]
            OA4["Old App Service\n100% traffic\n(instant rollback)"]
            FD4 --> OA4
        end

    Both systems run in parallel. The switch is a Front Door configuration change — no DNS propagation delays, no cold starts. Rollback is likewise instant.


    Lessons Learned

    Generate configuration, don’t manage it

    Manual configuration management across environments does not scale. Treat it as a code generation problem: define values once, override per environment, and let a generator produce the final artifacts. This removes entire classes of deployment bugs.

    Separate infrastructure from application deployment

    Infrastructure changes are rare, high-risk, and require planning. Application deployments are frequent and should be low-friction. Coupling the two means either infrastructure changes slow everything down or every deploy becomes risky.

    Feature branch environments reshape the workflow

    When every branch has its own live URL, code review turns into live review. Stakeholders can exercise features before they merge. QA can work in parallel with development. The infrastructure cost is trivial compared to the productivity gain.

    Blue-green is worth the complexity

    Being able to deploy a new version, validate it under real traffic conditions, and then flip traffic with instant rollback changes the risk profile of releases. Deployments become uneventful.


    What’s Next

    • Article 12: Security in a Nuxt SSR App — CSRF, Azure AD, CSP, and More — The security layers that protect a server-rendered application.
    • Article 13: Observability and Distributed Tracing — Application Insights End-to-End — How every request is traced across all layers.
    • Article 14: AI-Assisted Development — MCP, Debug Chatbot, and the Shared Language of the Codebase — Making AI assistants genuinely useful for live debugging.

    Munir Husseini is a software architect specializing in full-stack TypeScript, .NET, and cloud-native architectures.

  • Security in a Nuxt SSR App — CSRF, OAuth, CSP, and More

    Fifteenth in a series about migrating from legacy architectures to a modern Nuxt 4 stack.


    Security in SSR Is Different

    An SSR application has a very different attack surface from a client-side SPA. The server is responsible for rendering HTML with embedded state, generating tokens, setting cookies, and proxying API calls — all before the browser executes any JavaScript.

    Security must be enforced at the server rendering layer. A CSRF token created during SSR has to survive hydration. Authentication must block the HTTP response before it ever reaches the browser. CSP must be sent as an HTTP header during rendering, not injected later as a meta tag.

    Reusing SPA security patterns directly in SSR apps creates gaps — not because the patterns are wrong, but because they operate at the wrong layer.

    flowchart LR
      subgraph Client["Browser"]
        HTML["SSR HTML + Embedded State"]
        JS["Hydrated JS App"]
      end
    
      subgraph Server["Nuxt SSR Stack"]
        Render["SSR Render Layer"]
        Tokens["Token Generation<br/>(CSRF, Auth)"]
        Cookies["Set Cookies<br/>(HTTP-only, SameSite)"]
        Proxy["API Proxy / BFF"]
      end
    
      Client <-- "HTTP Response" --> Server
      Render --> HTML
      Render --> Tokens
      Tokens --> Cookies
      Render --> Proxy
      Proxy -->|"Internal API Calls"| Backend["Upstream APIs / Services"]

    CSRF Protection: Dual-Token System with User-Agent Binding

    CSRF protection in an SSR app needs more than the classic double-submit cookie pattern.

    The Standard Pattern (And Why It’s Not Enough)

    The traditional double-submit approach: generate a token, store it in a cookie, and require it in a request header. The server verifies that the cookie and header values match. This works because a cross-site attacker cannot read the cookie in order to set the matching header.

    The weakness: if an attacker somehow gets both the cookie and the token (for example, via a subdomain cookie issue or XSS on a related domain), they can replay the request from any browser.

    The Enhanced Pattern: User-Agent Binding

    The fix is to bind the token to the specific browser that requested it:

    flowchart TB
      subgraph SSR["Token Generation During SSR"]
        UA["User-Agent Header"]
        Time["Current Timestamp"]
        Salt["Random UUID Salt"]
        Type["Token Type = 'client'"]
        HashUA["SHA-256(User-Agent)<br/>→ first 16 chars"]
        Payload["Token Payload<br/>d: timestamp<br/>p: type<br/>s: salt<br/>ua: UA hash"]
        Enc["Encrypt with AES-256-GCM"]
    
        UA --> HashUA
        Time --> Payload
        Type --> Payload
        Salt --> Payload
        HashUA --> Payload
        Payload --> Enc
      end
    
      Enc --> Cookie["Set HTTP-only cookie: csrf"]
      Enc --> Embed["Embed token in SSR HTML<br/>(for X-XSRF-TOKEN header)"]
    flowchart TB
      subgraph Validation["Token Validation on Every API Request"]
        HeaderTok["X-XSRF-TOKEN header"]
        CookieTok["csrf cookie"]
        Decrypt["Decrypt header token"]
        Exp["Check expiration (24h TTL)"]
        ReqUA["Current User-Agent"]
        HashReqUA["SHA-256(Req UA)<br/>→ first 16 chars"]
        MatchUA["Compare ua in token<br/>with current UA hash"]
        MatchCookie["Compare cookie value<br/>with header value"]
        Ok["All checks pass"]
        Reject["Reject 403 Forbidden"]
    
        HeaderTok --> Decrypt
        Decrypt --> Exp
        Decrypt --> MatchUA
        ReqUA --> HashReqUA --> MatchUA
        Decrypt --> MatchCookie
        CookieTok --> MatchCookie
    
        Exp -->|valid| MatchUA
        Exp -->|expired| Reject
        MatchUA -->|mismatch| Reject
        MatchUA -->|match| MatchCookie
        MatchCookie -->|mismatch| Reject
        MatchCookie -->|match| Ok
        Ok -->|"Process API request"| App["Application Handler"]
      end

    A stolen token is useless from a different browser — the User-Agent hash will not match. Combined with AES-256-GCM encryption, random salts, and a 24-hour TTL, this creates layered defenses against replay attacks.

    SSR Bypass Tokens

    During SSR, the Nuxt server calls its own GraphQL or REST APIs — there is no browser, and therefore no CSRF cookie. A server-only bypass token allows these internal SSR requests to pass CSRF checks.

    This token is generated per request, stored only in the Nitro event context (never exposed to the client), and validated using User-Agent binding but without the cookie–header comparison.

    sequenceDiagram
      participant Browser
      participant NuxtSSR as Nuxt SSR Renderer
      participant NitroCtx as Nitro Event Context
      participant InternalAPI as Internal API
    
      Browser->>NuxtSSR: HTTP GET /page
      NuxtSSR->>NitroCtx: Create SSR bypass token<br/>(bound to UA, no cookie)
      Note right of NitroCtx: Token stored only in<br/>server context, not sent<br/>to the client
      NuxtSSR->>InternalAPI: Request with SSR bypass token<br/>(e.g. header X-SSR-CSRF)
      InternalAPI-->>InternalAPI: Validate token + UA<br/>(no cookie-header check)
      InternalAPI-->>NuxtSSR: Data response
      NuxtSSR-->>Browser: Rendered HTML

    Encryption Key from Key Vault

    The AES key is loaded from a cloud key vault (for example, Azure Key Vault or AWS KMS) at startup and stored on globalThis. If the key cannot be loaded, the server refuses to start — a fail-fast approach with no degraded mode. CSRF protection is never quietly turned off.

    flowchart LR
      subgraph Startup["Nuxt Server Startup"]
        KV["Cloud Key Vault<br/>(AWS KMS / Azure Key Vault)"]
        Fetch["Fetch AES-256 Key"]
        Store["Store key on globalThis"]
        Ready["Server Ready"]
        Fail["Abort startup<br/>(process exit)"]
    
        KV --> Fetch
        Fetch -->|success| Store --> Ready
        Fetch -->|failure| Fail
      end

    OAuth 2.0 Authentication (Authorization Code Flow)

    Test and staging environments often require authentication, even for otherwise public-facing applications. A common pattern is server-side OAuth 2.0 Authorization Code Flow — the client secret is never exposed to the browser.

    sequenceDiagram
      participant Browser
      participant Nuxt as Nuxt Server
      participant IdP as Identity Provider
    
      Browser->>Nuxt: GET /protected-page
      Nuxt-->>Nuxt: Check auth cookie
      alt No valid token
        Nuxt-->>Browser: 302 Redirect to /api/auth/login
        Browser->>Nuxt: GET /api/auth/login
        Nuxt-->>Browser: 302 Redirect to IdP auth URL
        Browser->>IdP: GET /authorize?client_id=...&redirect_uri=/api/auth/callback
        IdP-->>Browser: 302 Redirect to /api/auth/callback?code=...
        Browser->>Nuxt: GET /api/auth/callback?code=...
        Nuxt->>IdP: POST /token (exchange code for token)
        IdP-->>Nuxt: Access token
        Nuxt-->>Browser: Set HTTP-only auth cookie + 302 /protected-page
        Browser->>Nuxt: GET /protected-page
        Nuxt-->>Browser: 200 Full HTML
      else Valid token
        Nuxt-->>Browser: 200 Full HTML (no redirect)
      end

    Key security properties:

    • Server-side token exchange — the client secret is used only on the server, never sent to the browser.
    • HTTP-only cookies — tokens live in cookies that JavaScript cannot read (mitigating XSS).
    • Middleware blocking — unauthenticated requests are stopped in server middleware before any page content is rendered. Unauthorized users cannot even download the app’s JavaScript bundle.

    Environment-Based Security Tiers

    Not every environment needs the full security stack:

    EnvironmentCSRFAuthCSPRationale
    DevelopmentOffOffOffFast iteration, minimal friction
    DockerOffOnOffProtects shared dev environments
    TestOnOnOnProduction-like security
    ProductionOnOffOnPublic site, no login required

    Development turns security off for productivity. Test enables everything to catch issues before release. Production enables CSRF and CSP but omits authentication for a public site.

    flowchart LR
      Dev["Development"]:::off -->|Deploy| Docker["Docker / Shared Dev"]:::partial
      Docker -->|Promote| Test["Test / Staging"]:::full
      Test -->|Promote| Prod["Production"]:::prod
    
      classDef off fill:#eee,stroke:#999,color:#333;
      classDef partial fill:#ffe6b3,stroke:#cc9a00,color:#333;
      classDef full fill:#c6f6d5,stroke:#2f855a,color:#000;
      classDef prod fill:#bee3f8,stroke:#2b6cb0,color:#000;
    
      Dev --- DevSec["CSRF: Off<br/>Auth: Off<br/>CSP: Off"]
      Docker --- DockSec["CSRF: Off<br/>Auth: On<br/>CSP: Off"]
      Test --- TestSec["CSRF: On<br/>Auth: On<br/>CSP: On"]
      Prod --- ProdSec["CSRF: On<br/>Auth: Off<br/>CSP: On"]

    Runtime Content Security Policy from CMS

    Hardcoding CSP in application config is an operational choke point: every new script source (analytics, chat widgets, A/B testing) forces a code change and deployment.

    Treating CSP as content solves this. A server plugin fetches CSP from the CMS at runtime and applies it as an HTTP header:

    flowchart TB
      Editor["CMS Editor<br/>updates CSP entry"] --> CMS["Headless CMS"]
      CMS --> Cache["Server Plugin<br/>fetches CSP (cache 5 min)"]
      Cache --> Header["Set Content-Security-Policy<br/>header on HTML responses"]
      Header --> Browser["Browser enforces CSP"]
    
      CMS -.failure.-> Fallback["Use hardcoded fallback CSP<br/>(stricter, not looser)"]
      Fallback --> Header

    A hardcoded fallback covers CMS downtime. The runtime CSP is strictly more permissive than the fallback — if the CMS fetch fails, the app operates under a stricter, not looser, policy.


    Internal API Guard

    Health and diagnostics endpoints expose sensitive operational data — memory usage, restart counts, worker status. The Internal API Guard keeps these endpoints invisible to the public:

    sequenceDiagram
      participant PublicClient as External Client
      participant Probe as Kube Health Probe
      participant InternalSvc as Internal Service
      participant NuxtAPI as Nuxt API Layer
    
      PublicClient->>NuxtAPI: GET /api/health/pm2<br/>User-Agent: Mozilla/5.0
      NuxtAPI-->>PublicClient: 404 Not Found
    
      Probe->>NuxtAPI: GET /api/health<br/>User-Agent: kube-probe/1.28
      NuxtAPI-->>Probe: 200 OK { status: "healthy" }
    
      InternalSvc->>NuxtAPI: GET /api/health/pm2<br/>X-Internal-Secret: correct
      NuxtAPI-->>InternalSvc: 200 OK { workers: [...] }

    The 404 is deliberate — it neither confirms nor denies the endpoint’s existence. Scanners see exactly what they would for any non-existent path.


    The Security Stack

    All layers combine into a single request pipeline:

    flowchart TB
      Browser["Browser"] --> HTTPS["HTTPS"]
      HTTPS --> Edge["Edge Load Balancer"]
      Edge -->|"TLS termination"| Nginx["Nginx Proxy"]
    
      Nginx --> Guard["Internal API Guard<br/>(hide internal endpoints)"]
      Guard --> Nuxt["Nuxt Server"]
    
      subgraph NuxtPipeline["Nuxt Middleware & Plugins"]
        Auth["Auth Middleware<br/>Check auth cookie<br/>Redirect if invalid"]
        CSRF["CSRF Middleware<br/>Decrypt token<br/>Validate UA hash<br/>Check cookie-header match"]
        CSP["CSP Plugin<br/>Fetch CSP from CMS<br/>Set CSP header"]
        SSR["SSR Render<br/>Generate CSRF token<br/>Set HTTP-only cookie<br/>Embed token in HTML"]
      end
    
      Nuxt --> Auth --> CSRF --> CSP --> SSR

    Lessons Learned

    SSR security operates at the HTTP response level, not the DOM level

    In a SPA, security typically lives in JavaScript — interceptors, route guards, client-side middleware. In SSR, it must live in server middleware controlling the HTTP response. By the time browser JavaScript runs, the HTML (and any injected payloads) has already been sent.

    flowchart LR
      subgraph SPA["SPA Model"]
        JSClient["Client-side JS<br/>(route guards, interceptors)"]
        API["APIs"]
        JSClient --> API
      end
    
      subgraph SSR["SSR Model"]
        Middleware["Server Middleware<br/>(auth, CSRF, CSP)"]
        Render["SSR Render"]
        API2["APIs"]
        Middleware --> Render --> API2
      end

    User-Agent binding is cheap insurance against replay attacks

    Hashing the first 16 characters of the User-Agent costs almost nothing (one SHA-256 per request) but shuts down an entire class of replay attacks. The User-Agent is available on every request — binding to it is practically free.

    flowchart TB
      UA["User-Agent string"] --> Hash["SHA-256 + truncate 16 chars"]
      Hash --> Bind["Include hash in token payload"]
      Bind --> Verify["On request, recompute hash<br/>and compare before processing"]

    Runtime security configuration reduces operational bottlenecks

    Any security setting that requires a deployment to change becomes a bottleneck. CSP changes are often requested by marketing (new script providers) and security (removing outdated sources). Moving CSP to the CMS takes the development team out of this loop.

    flowchart LR
      Marketing["Marketing / Security"] --> ChangeReq["Request CSP change"]
      ChangeReq --> CMSConfig["Update CSP in CMS"]
      CMSConfig --> AutoApply["Server auto-applies CSP<br/>on next cache refresh"]
      AutoApply --> LiveSite["Live Site with updated policy"]

    Environment tiers prevent security theater

    Full security in development forces engineers to work around it. Zero security in production is reckless. A tiered approach — each environment enabling exactly the protections it needs — balances safety with productivity.

    flowchart TB
      Dev["Dev: Minimal security<br/>High productivity"] --> Docker["Shared Dev: Auth only"]
      Docker --> Test["Test: Full security<br/>Pre-prod hardening"]
      Test --> Prod["Prod: Public-friendly<br/>CSRF + CSP only"]

    What’s Next

    • Article 13: Observability and Distributed Tracing — Application Insights End-to-End — How every request is traced across all layers.
    • Article 14: AI-Assisted Development — MCP, Debug Chatbot, and the Shared Language of the Codebase — Making AI assistants genuinely useful for live debugging.
    • Article 15: Load Testing Results — 15× Faster, 5× More Capacity — The measured proof that architecture decisions produce real outcomes.

    Munir Husseini is a software architect specializing in full-stack TypeScript, .NET, and cloud-native architectures.

  • Deferred Hydration Done Right — The `requestIdleCallback` Trick and the `modulepreload` Pitfall

    Fourteenth in a series about migrating from legacy architectures to a modern Nuxt 4 stack.


    The Hydration Dilemma

    SSR gives you a fast first paint: complete HTML that the browser can render immediately. Then the framework’s JavaScript arrives, gets parsed, compiled, and executed to hydrate the page—wiring up event listeners and making everything interactive.

    During hydration, the main thread is busy. Buttons do not respond. Forms do not accept input. This gap between “looks ready” and “is ready” is Total Blocking Time (TBT).

    If SSR has already rendered the page, why rush to download and execute JavaScript before the user even finishes the first paragraph?

    flowchart LR
      A[SSR Server] --> B["HTML Response<br/>Fully rendered markup"]
      B --> C[Browser parses HTML]
      C --> D["First Paint / FCP<br/>Looks ready"]
      D --> E["Hydration JS downloads<br/>parse & execute"]
      E --> F["Event listeners attached<br/>Interactive"]
    
      classDef paint fill:#e0f7fa,stroke:#006064;
      classDef js fill:#fff3e0,stroke:#e65100;
      classDef interactive fill:#e8f5e9,stroke:#1b5e20;
    
      class B,C,D paint
      class E js
      class F interactive

    The Naive Approach That Does Not Work

    The obvious idea: delay JavaScript modules with the media attribute, just like you can with stylesheets:

    <!-- This works for stylesheets: -->
    <link rel="stylesheet" href="print.css" media="print">
    <!-- Browser downloads but doesn't apply until print -->
    
    <!-- This does NOT work for modulepreload: -->
    <link rel="modulepreload" href="/_nuxt/entry.js" media="none">
    <!-- Browser IGNORES the media attribute and preloads anyway -->

    The spec does not define media semantics for , so browsers ignore it. Module scripts are preloaded eagerly regardless.[7]

    This is poorly documented. You only notice when performance metrics refuse to improve, even after you add media="none" to 90 modulepreload links.

    flowchart LR
      subgraph Stylesheet preload
        S1["<link rel=#quot;stylesheet#quot;<br/>media=#quot;print#quot;>"] --> S2["Browser may delay applying<br/>until media matches"]
      end
    
      subgraph Modulepreload
        M1["<link rel=#quot;modulepreload#quot;<br/>media=#quot;none#quot;>"] --> M2["Browser still preloads<br/>module eagerly"]
      end

    The Working Approach: Complete Removal

    Because modulepreload hints cannot be meaningfully delayed, the solution is to remove them entirely.[7] A Nitro render:html plugin performs two transformations in a large SSR application.

    Transformation 1: Remove All Modulepreload Links

    Before (standard Nuxt SSR output):
      <head>
        <link rel="modulepreload" href="/_nuxt/entry.js">
        <link rel="modulepreload" href="/_nuxt/chunk-abc.js">
        <link rel="modulepreload" href="/_nuxt/chunk-def.js">
        <link rel="modulepreload" href="/_nuxt/chunk-ghi.js">
        ... (90+ more)
      </head>
    
    After (deferred):
      <head>
        <!-- All modulepreload links removed -->
      </head>

    Without these hints, the browser no longer speculatively downloads chunks. It waits until it encounters actual