Skip to content

CritterWatch 0.8.0: projection detail "Dead Letter Events" section assumes a skip-and-dead-letter policy; misleading for stop-on-error apps #3

@erdtsieck

Description

@erdtsieck

Summary

The projection detail page has a Dead Letter Events section that reads:

Projection failures for event subscriptions are stored in Wolverine's dead letter queue. View dead letters filtered by this projection's event types. [View Related Dead Letters]

This assumes the monitored application uses a skip-and-dead-letter projection error policy (failing events are routed to Wolverine's dead-letter queue and the shard keeps going). For applications that use the stop-on-error policy instead, this section is misleading.

Context

In Marten/JasperFx the async daemon's default behavior (no Projections.Errors.SkipApplyErrors / dead-letter routing configured) is to stop the shard on an unhandled projection apply error, after retries — it does not write per-event entries to a projection dead-letter queue. Our application relies on exactly this: we stop all projections on error rather than skipping and dead-lettering individual events.

Under that policy:

  • A projection failure halts the shard; there are no projection-level dead letters.
  • "View Related Dead Letters" returns nothing relevant for these projections.
  • The section's wording implies failures are individually recoverable via a DLQ, when in reality the shard simply stops and requires a fix + restart/rebuild.

Expected behavior

The projection detail page should reflect the application's actual projection error-handling policy:

  • Only surface the Dead Letter Events section when skip-and-dead-letter (DLQ routing) is actually in use for that projection/store.
  • For stop-on-error projections, either hide the section or replace it with an indication that the shard halts on error (and surface the stopping exception / how to restart/rebuild) instead of pointing at an empty DLQ.

If the policy isn't available in the telemetry today, exposing it (per projection / per store) would also help, since it drives how operators interpret a stopped shard.

Environment

  • CritterWatch 0.8.0, WolverineFx 6.5.1, Marten 9.6.0, JasperFx 2.8.2, .NET 10
  • Single-node host, enableClusterPartitioning: false
  • Marten async daemon with default (stop-on-error) projection error handling

Related: #2 (multi-store high-water-mark issue) — different feature area but found in the same investigation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions