Migrating from a legacy application to a modern Nuxt 4 stack is not just about new frameworks and better performance numbers. The real shift is moving from reactive firefighting to proactive observability — knowing what is slow, why it is slow, and how the platform behaves under real load.
This observability stack has three pillars:
- End-to-end distributed tracing across Nginx, Nuxt, backend services, and Redis
- Structured logging with per-module, runtime-tunable log levels
- Node.js process diagnostics for GC, heap, and CPU under PM2
Together, they turn a deployment into something that can be reasoned about, not just hoped over.
Flying Blind vs. Full Visibility
Without observability, slowdowns are only visible when users complain, and failures are only visible when error rates spike. The underlying cause remains unknown: which component was slow, which call failed, which cache missed.
In a system with multiple containers — for example, a frontend app, an API, a proxy, and Redis — a single request crosses several services. Without tracing, correlating what happened means manually matching timestamps across separate log streams. Most teams stop long before they get a clear picture.
The target state is one trace ID created at the edge and propagated from the browser through every service, so a single click in the observability backend reveals the full request waterfall.
Three-Layer Telemetry: Traces, Proxy Spans, and Container Metrics
The observability stack has three layers, each capturing a different dimension of the system:
flowchart TB
subgraph L1["Layer 1: SDK Instrumentation"]
L1a["Node.js applicationinsights<br/>+ .NET AI SDK"]
L1b["→ Request traces, dependency calls, exceptions"]
L1c["→ Custom events (GraphQL operations, cache metrics)"]
end
subgraph L2["Layer 2: Nginx OpenTelemetry Module"]
L2a["→ Span per proxied request"]
L2b["→ W3C Trace Context headers<br/>(traceparent, tracestate)"]
L2c["→ Complete proxy → SPA → API waterfall"]
end
subgraph L3["Layer 3: Container Apps Managed OTel Agent"]
L3a["→ Container-level metrics<br/>(CPU, memory, restarts)"]
L3b["→ All containers, including Redis"]
L3c["→ Zero code changes"]
end
L1 --- L2 --- L3
Layer 1: SDK Instrumentation
Both the frontend app and the API send request traces, dependency calls, exceptions, and custom events to the observability backend. The Node.js SDK automatically instruments incoming HTTP requests, outgoing HTTP calls, and Redis operations.
A GraphQL server module can add custom dependency telemetry for every subgraph call and every Redis cache operation:
flowchart TB
subgraph GQL["Custom Dependency Event: GraphQL"]
direction TB
g1["Name: GraphQL: cms/pageByPath"]
g2["Type: GraphQL"]
g3["Duration: 45ms"]
g4["Success: true"]
g5["operationName: pageByPath"]
g6["subgraph: cms"]
g7["cacheHit: false"]
g8["transactionId: abc-123-def"]
end
subgraph RED["Custom Dependency Event: Redis"]
direction TB
r1["Name: Redis: cache-check"]
r2["Type: Redis"]
r3["Duration: 2ms"]
r4["Success: true"]
r5["operation: GET"]
r6["cacheHit: true"]
r7["key: page-data:/products/premium"]
end
These custom events land in the same trace as the HTTP request, so it becomes clear which operations ran, which caches hit or missed, and how long each step took.
Layer 2: Nginx OpenTelemetry
The reverse proxy includes the nginxinc/nginx-otel module. Every proxied request becomes a span and carries W3C Trace Context headers:
sequenceDiagram participant B as Browser participant N as Nginx Proxy participant S as Nuxt SPA (Node.js) participant A as Backend API participant R as Redis B->>N: HTTP request<br/>(no trace context yet) Note right of N: Creates span<br/>Generates traceparent header<br/>traceparent: 00-abcdef1234567890-span1-01 N->>S: Forward request<br/>+ traceparent Note right of S: Reads traceparent<br/>Creates child span<br/>Propagates to outgoing calls S->>R: Redis cache GET<br/>(child span) S->>A: GraphQL → CMS API<br/>(child span) S->>A: GraphQL → Backend API<br/>(child span) A->>A: Database calls,<br/>business logic (child spans)
A single trace ID stitches together every hop. The end-to-end transaction view in the observability backend renders the full waterfall:
gantt dateFormat x axisFormat %Lms section Nginx Proxy Nginx Proxy :active, nginx, 0, 150 section SPA Request SPA Request :spa, 10, 140 Redis GET :redis, 20, 10 GraphQL CMS :cms, 30, 40 GraphQL Backend :backend, 40, 80 section Backend API API Request :api, 60, 70 SQL Query :sql, 80, 30
Layer 3: Container-Level Metrics
The container environment runs a managed OpenTelemetry collector that gathers container metrics — CPU, memory, restart counts — for all containers, including Redis. No application changes are required.
This layer answers operational questions:
- Is Redis consuming too much memory?
- Are frontend replicas flapping?
- What is the steady-state CPU profile for API containers?
Transaction ID Propagation
Distributed traces are useful for visualizing a single request, but day-to-day debugging often starts from logs. To bridge both worlds, the proxy generates an x-transaction-id header for every incoming request:
flowchart TB N["Nginx<br/>x-transaction-id: txn-abc-123"] FE["Frontend app"] API["API"] GQL["GraphQL custom events"] N -->|"Reads header<br/>adds to outgoing calls<br/>logs include txn-abc-123"| FE N --> API FE -->|"Includes txn-abc-123<br/>in request & logs"| API FE -->|"Tag events with<br/>txn-abc-123"| GQL API -->|"Logs include<br/>txn-abc-123"| GQL
The transaction ID is mapped to the W3C traceparent trace ID. Developers can start from either side — a transaction ID from logs or a trace ID from the observability backend — and still recover the complete request history.
What Metrics Tell You
The combined telemetry stack tracks several metric categories, each answering a distinct question:
| Metric Category | Examples | Question It Answers |
|---|---|---|
| Response times | Per-endpoint, per-container latency | “Which pages are slow?” |
| Error rates | HTTP 5xx, GraphQL errors, exceptions | “What is failing?” |
| Cache metrics | Hit/miss rates per cache tier | “Is caching effective?” |
| Resource usage | CPU, memory per container/worker | “Are we right-sized?” |
| Dependency durations | GraphQL subgraph calls, Redis ops | “Which external call is slow?” |
| User journeys | Page-to-page navigation funnels | “Where do users drop off?” |
Alerting Strategy: Symptoms First, Causes Later
Metrics matter only when they drive action. The guiding principle is:
> Alert on symptoms, investigate with traces.
- Symptom alert:
“Frontend P95 response time exceeded 2 seconds for 5 minutes.”
- Investigation:
Open the traces for those slow requests → locate the slow dependency → fix the underlying issue.
Alerting directly on causes like Redis CPU > 80% creates noise and false positives, because Redis CPU can legitimately spike during cache invalidation without harming users. Symptom-based alerts keep noise low and align alerts with real user impact.
Structured Logging in Nuxt: From console.log to Observability
Traces tell you where the problem is. Logs tell you what happened. To make that effective, logging has to be more than printing strings.
The console.log Problem
Using console.log in a production SSR application causes real issues:
- No severity levels — errors are indistinguishable from informational noise
- No structure — freeform strings cannot be reliably queried, filtered, or aggregated
- No context — you cannot tell which request, user, or component produced the log
- No control — you cannot selectively enable verbose logging for one module without overwhelming the output
- SSR noise — server-side logs are mixed with framework output, health checks, and PM2 logs
There is a big difference between “we have logging” and “we have useful logging.” The first gives you strings to grep. The second gives you a structured, queryable observability layer.
The Logging Architecture
The logging system has three main building blocks:
flowchart TB
subgraph APP["Application Code"]
A1["const log = useLogger('shopping-cart')"]
A2["log.info('Item added', { productId, quantity })"]
end
subgraph UL["useLogger Composable"]
UL1["Tagged with module name"]
UL2["Checks if this module's level is enabled"]
UL3["Formats structured message"]
end
subgraph MS["Multi-Sink Router"]
S1["Sink 1: Console (development)<br/>Formatted, colored, human-readable"]
S2["Sink 2: Observability Backend<br/>Structured JSON, custom properties"]
S3["Sink 3: DevTools Log Viewer<br/>Real-time, filterable, in-browser"]
end
APP --> UL --> MS
MS --> S1
MS --> S2
MS --> S3
The useLogger Composable
Each module gets its own logger instance:
const log = useLogger('shopping-cart')
log.debug('Cart state loaded', { items: cart.items.length })
log.info('Item added', { productId: 'abc', quantity: 2 })
log.warn('Price mismatch detected', { expected: 29.99, actual: 31.99 })
log.error('Checkout failed', { error: err.message, orderId })
Every logger is tagged with its module name. This enables per-module log level control — you can set shopping-cart to debug while keeping navigation at warn.
Severity Levels
| Level | When to Use | Example |
|---|---|---|
debug | Development-only details | “Cart state loaded, 3 items” |
info | Significant business events | “Item added to cart” |
warn | Unexpected but recoverable | “Price mismatch, using server price” |
error | Failures requiring attention | “Checkout failed, payment rejected” |
Multi-Sink Routing
Each log message is fanned out to multiple sinks at once.
Sink 1: Console (Development)
In development, logs are written to both the browser console and Node.js stdout with:
- Color coding by severity
- A module name prefix
- Collapsible structured payloads (objects expand on click)
Sink 2: Observability Backend (Production)
In production, logs are sent as structured events:
Observability Event:
{
name: "shopping-cart:info",
properties: {
module: "shopping-cart",
severity: "info",
message: "Item added",
productId: "abc-123",
quantity: 2,
requestId: "req-xyz",
timestamp: "2025-06-02T12:34:56Z"
}
}
These events can be queried with KQL (Kusto Query Language):
customEvents
| where name startswith "shopping-cart"
| where customDimensions.severity == "error"
| project timestamp, customDimensions.message, customDimensions.productId
| order by timestamp desc
Sink 3: DevTools Log Viewer
A custom DevTools tab shows logs in real time:
flowchart TB
subgraph DT["DevTools — Logs Tab"]
F["Filter controls:<br/>[All Modules ▼] [Info ▼] [Search...]"]
L1["12:34:56 INFO shopping-cart<br/>Item added {productId: 'abc', quantity: 2}"]
L2["12:34:57 DEBUG catalog-query<br/>Cache hit for key 10115"]
L3["12:34:58 WARN shopping-cart<br/>Price mismatch {expected: 29.99, actual: 31}"]
L4["12:35:01 ERROR checkout<br/>Payment failed {orderId: 'ord-789'}"]
end
F --> L1 --> L2 --> L3 --> L4
Capabilities:
- Filter by severity, such as only errors or debug and above
- Filter by module, such as only
shopping-cartlogs - Full-text search across messages
- Expandable structured data payloads
Runtime Log Level Control
Log levels are adjustable at runtime without restarting the app.
flowchart TB
subgraph CFG["Default levels (from config)"]
C1["shopping-cart: info"]
C2["catalog-query: warn"]
C3["navigation: warn"]
end
subgraph RT["Runtime override (via API or DevTools)"]
R1["shopping-cart: debug ← changed"]
R2["catalog-query: warn ← unchanged"]
R3["navigation: info ← changed"]
end
CFG --> RT
subgraph EFFECT["Effect"]
E1["shopping-cart now outputs debug logs"]
E2["No server restart"]
E3["No redeploy"]
E4["No impact on other modules"]
end
RT --> EFFECT
A typical production debugging workflow:
- A user reports an issue
- Enable
debuglogging for the relevant module via an API or DevTools - Reproduce the problem
- Inspect the debug logs in the observability backend
- Turn debug logging off again and restore the default level
No deployment, no restart, and no log flood from unrelated modules.
SSR-Aware Logging
In an SSR app, logging must handle both server and client execution contexts:
flowchart LR
subgraph SRV["SSR Execution (Server: Node.js)"]
S1["log.info('Page rendered')"]
S2["Output:<br/>stdout (PM2 logs)<br/>Observability backend"]
S3["Context:<br/>Request URL<br/>Request ID<br/>User-Agent"]
S1 --> S2 --> S3
end
subgraph CLI["Client Execution (Browser)"]
C1["log.info('Button clicked')"]
C2["Output:<br/>Browser console<br/>DevTools Log Viewer<br/>Observability backend telemetry"]
C3["Context:<br/>Current route<br/>Session ID"]
C1 --> C2 --> C3
end
useLogger detects where it is running and routes logs to the right sinks. Server-side logs include request context such as URL, request ID, and user agent. Client-side logs include session context such as current route and user interactions.
Replacing console.log Safely
The migration away from console.log is incremental.
- An ESLint rule flags
console.logusage and suggests replacing it withuseLogger. It does not auto-fix, so the developer explicitly chooses the severity and module tag. - For legacy code, a global console interceptor captures
console.*calls and forwards them into the structured logging pipeline under alegacymodule tag. This ensures nothing is lost during the transition.
Over time, the codebase shifts from unstructured strings to queryable, structured events.
Node.js Observability Under PM2: Diagnostics, GC, and CPU
Application-level traces and logs tell you what is slow. To understand why the Node.js process itself degrades — heap growth, GC pauses, event loop lag — you need process-level visibility.
Three Nuxt modules provide this:
diagnostics— per-request aggregation and pattern learningdiagnostics-heap— GC and heap monitoring with leak detectiondiagnostics-profiler— automatic CPU profiling for slow requests
These sit alongside the Nuxt app, PM2, and Nginx, and feed directly into the same observability backend.
Layer 1: Per-Request Aggregation (diagnostics module)
The diagnostics module captures seven metrics for every HTTP request:
| Metric | What It Measures |
|---|---|
| Duration | Total request handling time (ms) |
| Input size | Request body size (bytes) |
| Output size | Response body size (bytes) |
| CPU usage | Process CPU delta during request |
| Memory delta | Heap memory change during request |
| Event loop lag | Main thread blocking time (ms) |
| Status code | HTTP response status |
O(1) Memory Aggregation
Traditional APM tools store one record per request — 8.6 million records per day at 100 req/s. This module takes a different approach: no per-request storage. Only aggregations such as min, max, sum, and count are retained.
flowchart TB
subgraph TRAD["Per-Request Storage (traditional APM)"]
T1["Request 1: { duration: 150, cpu: 12, memory: 35MB, ... }"]
T2["Request 2: { duration: 200, cpu: 15, memory: 42MB, ... }"]
T3["Request 3: { duration: 180, cpu: 11, memory: 38MB, ... }"]
Tn["Request N: { duration: ???, cpu: ??, memory: ???, ... }"]
TM["Memory usage: O(N) — grows with request count"]
T1 --> T2 --> T3 --> Tn --> TM
end
subgraph AGG["Aggregation-Only (diagnostics module)"]
A1["Aggregate:"]
A2["duration: { min, max, sum, count }"]
A3["cpu: { min, max, sum, count }"]
A4["memory: { min, max, sum, count }"]
AM["Memory usage: O(1) — constant<br/>regardless of request count"]
A1 --> A2 --> A3 --> A4 --> AM
end
Monitoring overhead is constant, regardless of traffic volume.
Slow-Request Pattern Detection
Every 30 seconds, after at least 50 requests, the module detects patterns in slow requests by grouping on several features:
flowchart TB
subgraph FB["Feature Buckets"]
F1["URL pattern: /products/*, /checkout/*, /"]
F2["HTTP method: GET, POST"]
F3["Payload size: small (<1KB), medium, large"]
F4["Path depth: 1, 2, 3, 4+"]
end
FB --> P["For each bucket:<br/>Compute probability(request is slow)<br/>If probability ≥ 50% and count ≥ 3 → emit pattern"]
For each feature bucket, the algorithm calculates the probability that a request in this bucket is slow (exceeds the configured threshold, such as 500ms). If a bucket has at least 50% slow probability with at least 3 samples, a pattern is emitted:
flowchart TB P1["Observed: /checkout/*<br/>73% of requests slow (>500ms)<br/>12 observations in last 30s"] P2["Emit custom event:<br/>name = 'SlowRequestPatterns'<br/>pattern = '/checkout/*'<br/>probability > 0.5"] P1 --> P2
Pattern detection surfaces systemic slowness that individual alerts miss. A single slow request might be a fluke. A persistent pattern for a specific URL points to a real problem with that page’s data fetching or rendering.
SSR GraphQL Disambiguation
During SSR, the Nuxt server makes GraphQL calls to itself — real HTTP requests that pass through the diagnostics middleware. Without disambiguation, each page request would be counted twice.
The module identifies SSR-internal requests via the CSRF bypass token from the security layer and excludes them. You get accurate per-page measurements with no double-counting.
Layer 2: Heap Memory and GC (diagnostics-heap module)
The diagnostics-heap module uses V8’s PerformanceObserver API to monitor garbage collection events in real time.
GC Event Categories
| GC Type | What It Collects | Typical Duration |
|---|---|---|
scavenge | Young generation (new objects) | 1–5 ms |
mark-sweep | Full heap (major GC) | 10–50 ms |
incremental | Incremental marking | 1–10 ms |
weakcb | Weak reference callbacks | <1 ms |
Each event records duration, heap before/after, and bytes freed. Events are aggregated into time-series data and sent periodically to the observability backend.
Automatic Memory Leak Detection
The module tracks consecutive heap growth over time. When heapUsed increases for \(N\) consecutive intervals without a significant GC reduction, it emits a leak detection event:
flowchart TB
subgraph WIN["Observation Window: 10 intervals (5 min each)"]
I1["Interval 1: heapUsed = 800 MB"]
I2["Interval 2: heapUsed = 820 MB ↑ +20 MB"]
I3["Interval 3: heapUsed = 845 MB ↑ +25 MB"]
I4["Interval 4: heapUsed = 860 MB ↑ +15 MB"]
I5["Interval 5: heapUsed = 890 MB ↑ +30 MB"]
end
WIN --> DET["5 consecutive growth intervals detected<br/>Growth rate ≈ 18 MB/interval = 216 MB/hour"]
DET --> EVT["Emit event:<br/>{ event: 'PotentialMemoryLeak',<br/>confidence: 'medium',<br/>growthRateMBPerHour: 216,<br/>consecutiveGrowths: 5 }"]
EVT --> HIGH["If growth continues to 8+ intervals:<br/>confidence → 'high'"]
The confidence level reduces false positives. Short-term growth is normal during traffic spikes. Only sustained growth triggers a leak alert.
Automatic Heap Dumps
When heapUsed exceeds a configurable threshold (default 1024 MB), a .heapsnapshot file is written automatically. It can be loaded into Chrome DevTools for detailed memory analysis.
V8 Heap Space Breakdown
Periodic sampling of v8.getHeapSpaceStatistics() provides per-space memory usage:
flowchart TB
subgraph HS["V8 Heap Spaces"]
N["new_space: 16 MB total, 8 MB used<br/>Purpose: New objects (GC: scavenge)"]
O["old_space: 900 MB total, 780 MB used<br/>Purpose: Survived objects"]
C["code_space: 12 MB total, 10 MB used<br/>Purpose: Compiled code"]
L["large_object: 45 MB total, 40 MB used<br/>Purpose: Objects > 512 KB"]
end
N --> O --> C --> L
This is essential for distinguishing object leaks (old_space growing) from code cache growth (code_space growing) — different causes, different fixes.
Layer 3: CPU Profiling (diagnostics-profiler module)
The diagnostics-profiler module automatically captures V8 CPU profiles for requests that exceed the slow-request threshold.
flowchart TB
RS["Request starts<br/>Timer begins"]
TH["Duration exceeds threshold"]
PR["Profiler activates<br/>Capture V8 CPU profile"]
RC["Request completes<br/>Profile saved as .cpuprofile"]
DEV["Load in Chrome DevTools<br/>Flame chart analysis"]
RS --> TH --> PR --> RC --> DEV
subgraph FL["Example Flame Chart Breakdown"]
F1["SSR renderer: 45% CPU time"]
F2["GraphQL response parsing: 30%"]
F3["HTML serialization: 15%"]
F4["Other: 10%"]
end
DEV --> FL
Profiles are in the standard V8 format, which Chrome DevTools renders as a flame chart, showing exactly which functions consumed CPU time.
The Unified Picture: From Symptom to Root Cause
When a slow request occurs, all layers fire in concert — traces, logs, and Node diagnostics:
flowchart TB
subgraph L1["Layer 1 (diagnostics)"]
L1a["Records duration: 2,300 ms"]
L1b["Emits SlowRequest event"]
L1c["Updates pattern detection"]
end
subgraph L2["Layer 2 (diagnostics-heap)"]
L2a["Records memory delta: +45 MB"]
L2b["Checks for leak pattern"]
L2c["If heap > threshold → auto heap dump"]
end
subgraph L3["Layer 3 (diagnostics-profiler)"]
L3a["Captures .cpuprofile"]
L3b["Shows 65% time in CMS API response parsing"]
end
subgraph OBS["Observability backend"]
O1["End-to-end trace correlates:"]
O2["Nginx span"]
O3["Nuxt request + GraphQL dependencies"]
O4["API calls + SQL query"]
O5["Custom logs tagged with transaction ID"]
O6["SlowRequestPatterns + GC + leak signals"]
end
L1 --> OBS
L2 --> OBS
L3 --> OBS
You can move from:
- An alert: “P95 for /checkout is 2.3s”
- To the trace: “Most time is in the CMS subgraph”
- To logs: “Price mismatch warnings and retries”
- To process-level data: “Major GC pauses plus heap growth”
- To artifacts:
.heapsnapshotand.cpuprofilefor offline analysis
All within a single, correlated observability fabric.
Capacity Planning Endpoint
The diagnostics module exposes a /api/__profiler/memory-capacity endpoint that calculates the theoretical memory requirement:
flowchart TB IN["Inputs:<br/>Baseline = 200 MB<br/>Requests/sec = 10<br/>Avg RT = 150 ms (0.15 s)<br/>Memory/req = 35 MB"] CONC["Concurrent requests = 10 × 0.15 = 1.5"] PEAK["Peak memory = 200 + (1.5 × 35) = 252.5 MB"] SAFETY["With 3× safety factor = 757.5 MB"] CFG["Set --max-old-space-size ≥ 768 MB"] IN --> CONC --> PEAK --> SAFETY --> CFG
This directly informs the V8 heap cap and container memory allocation, bridging runtime diagnostics with deployment configuration.
Lessons Learned Across the Stack
Distributed tracing is not optional in a multi-container architecture
Without trace correlation, debugging a slow request across four or more containers means combing through isolated log streams and aligning timestamps by hand. With W3C Trace Context, one trace ID tells the whole story. Setup cost: a few hours. Debugging savings: ongoing.
Custom dependency events are worth the effort
Out-of-the-box instrumentation knows about HTTP calls and Redis commands but has no idea that a specific call is “a GraphQL query to the CMS subgraph for page-by-path.” Custom events supply that semantic meaning — you can ask for “all slow CMS page queries” instead of “all slow HTTP calls to this URL.”
Separate the telemetry environment from the application environment
Using separate observability instances for test and production stops test noise from polluting production dashboards. Feature branches can report into the test instance.
Layer 3 catches what SDK instrumentation misses
SDK instrumentation covers what happens inside application processes. Container and Node-level metrics capture everything around them — Redis memory growth, restarts, OOM kills, GC pauses. Without this layer, Redis running out of memory or Node leaks are invisible until things start failing.
Per-module log levels are essential at scale
With 35+ modules, a single global log level is useless because enabling debug generates thousands of messages per second. Per-module levels let teams zoom in on the area they care about without drowning in noise.
Runtime control changes how production issues are debugged
When enabling debug logging requires a deployment, teams either leave it on permanently or never enable it. Runtime controls turn it into a normal tool: enable, investigate, disable.
Structured data beats formatted strings
log.info('Item added', { productId: 'abc', quantity: 2 }) is queryable: “show all items with quantity > 5.”
console.log('Item abc added, quantity: 2') needs regex parsing and still breaks when the format changes. The extra effort to log structured data pays off every time it needs to be analyzed.
Pattern detection beats single-event alerts
Single slow-request alerts create noise and fatigue. A pattern like “73% of /checkout requests are slow” is actionable. It tells you exactly where to investigate.
Automatic heap dumps are worth the disk space
When a leak is detected in production, reproducing it locally is often the hardest part. Automatic heap dumps capture the heap state at the moment of detection — no reproduction required. A single snapshot can save days of debugging.
Munir Husseini is a software architect specializing in full-stack TypeScript, .NET, and cloud-native architectures.
Leave a Reply