Skip to content

feat(observability): OpenTelemetry tracing — manual spans + opt-in NodeSDK#197

Merged
erichare merged 1 commit intomainfrom
feat/otel-tracing
May 6, 2026
Merged

feat(observability): OpenTelemetry tracing — manual spans + opt-in NodeSDK#197
erichare merged 1 commit intomainfrom
feat/otel-tracing

Conversation

@erichare
Copy link
Copy Markdown
Collaborator

@erichare erichare commented May 6, 2026

Summary

Third in the post-architecture-review stack (#195 supply-chain, #196 CSRF). Wires `@opentelemetry/api` through the runtime so every request gets a SERVER span, with an opt-in NodeSDK + OTLP HTTP exporter for deployments that have a collector.

Two layers:

  1. Manual server spans (always on). New `requestTracing` middleware in `lib/tracing.ts` extracts the inbound W3C trace context, opens a SERVER span parented under it, and records:

    • `http.request.method`, `url.path`, `http.route`
    • `http.response.status_code`
    • `wb.request_id` (mirrored from the existing `requestId` middleware so logs and spans correlate)
    • exception event + ERROR status when Hono's `onError` fires
      Without an SDK registered the spans are no-ops, so the wiring is essentially free.
  2. NodeSDK + auto-instrumentation (opt-in). New `runtime.tracing.{enabled,serviceName,exporterUrl}` config block; when `enabled: true`, `initOtelFromConfig` lazy-imports the SDK, OTLP HTTP exporter, and standard auto-instrumentations. `OTEL_*` env vars (sampler, headers, endpoint) are honored verbatim.

For full HTTP / fetch / pino auto-instrumentation, operators preload the SDK at process launch via the new `lib/tracing-preload.ts` module:

```sh
node --import ./dist/lib/tracing-preload.js dist/root.js
```

`root.ts` calls `otel.shutdown()` during the existing graceful drain so in-flight spans flush before exit.

Test plan

  • `npm run check` green: lint + typecheck + 1006 runtime tests + 327 web tests + build
  • New `tests/lib/tracing.test.ts`: SERVER span attrs, 5xx error path, inbound traceparent linking
  • All existing tests still pass with the middleware mounted (no-op spans when no SDK is registered)
  • Manual smoke: enable `runtime.tracing.enabled: true` in a dev workbench.yaml + a local OTel collector; verify spans land
  • Manual smoke: `node --import ./dist/lib/tracing-preload.js` shows HTTP child spans for outbound fetches

Migration / compat

  • Default behavior unchanged (`runtime.tracing.enabled: false`).
  • Existing `traceparent` propagation in `requestId` is preserved and now feeds the OTel context too.
  • New deps: `@opentelemetry/api`, `@opentelemetry/sdk-node`, `@opentelemetry/auto-instrumentations-node`, `@opentelemetry/exporter-trace-otlp-http`, `@opentelemetry/resources`, `@opentelemetry/semantic-conventions`. The SDK bundle is lazy-imported, so disabled deployments don't pay its load cost.

…t-in NodeSDK

Wires `@opentelemetry/api` through the runtime so every request gets a
SERVER span, with an opt-in NodeSDK + OTLP HTTP exporter for
deployments that have an OTel collector.

Two layers:

  1. Manual server spans (always on). New `requestTracing` middleware
     in `lib/tracing.ts` extracts the inbound W3C trace context off
     the request headers, opens a SERVER span parented under it, and
     records:
       - http.request.method / url.path / http.route
       - http.response.status_code
       - wb.request_id (mirrored from the existing `requestId`
         middleware so logs and spans correlate)
       - exception event + ERROR status when Hono's onError fires
     Without an SDK registered the spans are no-ops, so the wiring is
     essentially free.

  2. NodeSDK + auto-instrumentation (opt-in). New
     `runtime.tracing.{enabled,serviceName,exporterUrl}` config block;
     when `enabled: true`, `initOtelFromConfig` lazy-imports the SDK,
     OTLP HTTP exporter, and the standard auto-instrumentations bundle.
     Standard `OTEL_*` env vars are honored verbatim (sampler,
     headers, endpoint).

For full HTTP / fetch / pino auto-instrumentation, operators preload
the SDK at process launch via the new `lib/tracing-preload.ts` module:

    node --import ./dist/lib/tracing-preload.js dist/root.js

`root.ts` calls `otel.shutdown()` during the existing graceful drain
so in-flight spans flush before exit.

Tests (`tests/lib/tracing.test.ts`):
  - SERVER span emitted with method + route + status attributes
  - 5xx error path records exception + sets ERROR status
  - Inbound traceparent links the new span to the caller's trace

Docs: `docs/production.md` gains a tracing section covering both
in-config activation and the `--import` preload path.
@erichare erichare merged commit b8ee2bb into main May 6, 2026
13 checks passed
@erichare erichare deleted the feat/otel-tracing branch May 6, 2026 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant