Skip to content

ref(kafka): Deduplicate kafka error logging#5811

Merged
jjbayer merged 11 commits intomasterfrom
ref/dedup-kafka-error
Apr 15, 2026
Merged

ref(kafka): Deduplicate kafka error logging#5811
jjbayer merged 11 commits intomasterfrom
ref/dedup-kafka-error

Conversation

@jjbayer
Copy link
Copy Markdown
Member

@jjbayer jjbayer commented Apr 10, 2026

Follow-up to #5607:

We are currently getting two sentry errors for producer failures:

  1. "failed to produce to Kafka" - This is specific to the store service and also catches configuration errors such as missing topics.
  2. "error sending kafka message" - This is generic in relay_kafka, but does not report config errors.

This PR removes the public send function on the KafkaClient so there is only one entry point send_kafka_message, and logs all errors there. For this, the outcome producer also needs to implement relay_kafka::Message.

return Err(ClientError::MissingTopic);
};

relay_log::configure_scope(|s| s.set_tag("topic", topic_name));
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error is now logged higher up in the call stack, so we set the topic tag here as soon as it is known.

Comment thread relay-kafka/src/producer/mod.rs Outdated
@jjbayer jjbayer marked this pull request as ready for review April 10, 2026 13:14
@jjbayer jjbayer requested a review from a team as a code owner April 10, 2026 13:14
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

  • ✅ Fixed: Test may fail in debug builds due to schema validation ordering
    • I changed the test message serialization to MsgPack so debug-only JSON schema validation is skipped and the assertion reliably reaches InvalidTopicName.
  • ✅ Fixed: Outcome errors now produce duplicate Sentry error reports
    • I removed the redundant outcome-service error logs at the Kafka call sites so failures are logged only once by the new send_message wrapper.

Create PR

Or push these changes by commenting:

@cursor push 161f1048d0
Preview (161f1048d0)
diff --git a/relay-server/src/services/outcome.rs b/relay-server/src/services/outcome.rs
--- a/relay-server/src/services/outcome.rs
+++ b/relay-server/src/services/outcome.rs
@@ -1224,9 +1224,7 @@
             Self::Kafka(kafka_producer) => {
                 send_outcome_metric(&message, "kafka");
                 let raw_message = TrackRawOutcome::from_outcome(message, config);
-                if let Err(error) = self.send_kafka_message(kafka_producer, raw_message) {
-                    relay_log::error!(error = &error as &dyn Error, "failed to produce outcome");
-                }
+                let _ = self.send_kafka_message(kafka_producer, raw_message);
             }
             Self::ClientReport(producer) => {
                 send_outcome_metric(&message, "client_report");
@@ -1245,9 +1243,7 @@
             #[cfg(feature = "processing")]
             Self::Kafka(kafka_producer) => {
                 send_outcome_metric(&message, "kafka");
-                if let Err(error) = self.send_kafka_message(kafka_producer, message) {
-                    relay_log::error!(error = &error as &dyn Error, "failed to produce outcome");
-                }
+                let _ = self.send_kafka_message(kafka_producer, message);
             }
             Self::Http(producer) => {
                 send_outcome_metric(&message, "http");

diff --git a/relay-server/src/services/store.rs b/relay-server/src/services/store.rs
--- a/relay-server/src/services/store.rs
+++ b/relay-server/src/services/store.rs
@@ -1889,7 +1889,7 @@
             }
 
             fn serialize(&self) -> Result<SerializationOutput<'_>, ClientError> {
-                Ok(SerializationOutput::Json(Cow::Borrowed(
+                Ok(SerializationOutput::MsgPack(Cow::Borrowed(
                     br#"{"timestamp": "foo", "outcome": 1}"#,
                 )))
             }

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Comment thread relay-server/src/services/store.rs
Comment thread relay-kafka/src/producer/mod.rs Outdated
Comment thread relay-server/src/services/outcome.rs Outdated
Ok(_) => Ok(()),
Err(kafka_error) => Err(OutcomeError::SendFailed(kafka_error)),
}
let _ = producer.client.send_message(topic, &message); // logs error internally
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't send_message then just not return an error?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle_store_envelope still uses the result to reject the envelope. But I agree that logging and returning an error is a bit of an anti-pattern. Maybe I should move the error reporting to the top of the call stack instead.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[...] reporting to the top of the call stack instead

That makes sense to me, this is the place that can determine severity and has most context.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

jjbayer and others added 2 commits April 14, 2026 14:11
Move error logging from `KafkaClient::send_message` to the call sites
where errors are actually handled. This lets callers determine severity
and add context. The topic tag via `configure_scope` stays at the
producer level where the topic name is resolved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread relay-server/src/services/store.rs Outdated
Comment thread relay-server/src/services/store.rs Outdated
Comment thread relay-kafka/src/config.rs Outdated
Comment thread relay-server/src/services/store.rs
Comment on lines +217 to +218
relay_log::configure_scope(|s| s.set_tag("topic", topic_name));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The configure_scope call for Kafka producers now pollutes the Sentry scope with a topic tag on every message, leading to misleading context in unrelated error logs.
Severity: LOW

Suggested Fix

Move the relay_log::configure_scope call so that it only executes on error paths, similar to how it was handled previously. This ensures the topic tag is only added to the scope when a Kafka-related error is actually being logged, preventing scope pollution for successful operations.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: relay-kafka/src/producer/mod.rs#L217-L218

Potential issue: The `configure_scope` function is now called unconditionally for every
Kafka message being produced, setting a `topic` tag on the thread-local Sentry scope.
This happens even for successful messages. If a subsequent, unrelated error occurs on
the same thread, it will be logged with this potentially misleading `topic` tag from the
last successful Kafka operation. This pollutes the scope and can make debugging more
difficult by providing incorrect context in Sentry error reports.

Comment thread relay-server/src/services/store.rs
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit fa94546. Configure here.

Comment thread relay-server/src/services/store.rs Outdated
Comment thread relay-server/src/services/store.rs Outdated
Comment on lines +1184 to +1185
relay_log::with_scope(
|_| {},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of the empty scope config here, does it reset the scope somehow or is that just a leftover?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep the topic name on the error, but it is not known at this point. This call creates a new scope without any configuration. configure_scope is then called in produce() further down the call stack to set the topic name.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, you need a new scope, to not "dirty" the surrounding one with the inner configure_scope.


fn handle_message(&self, message: Store) {
// relay_kafka configures the scope
relay_log::with_scope(|_| {}, || self.handle_message_inner(message))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dito

tags.project_key = %scoping.project_key,
"failed to store envelope"
);
Err(Managed::with_meta_from(&envelope, ()).reject_err(error))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a way to get a Rejected without actually rejecting anything?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can also make the function return nothing, but I like the signal that something was actually rejected.

@jjbayer jjbayer added this pull request to the merge queue Apr 15, 2026
Merged via the queue into master with commit d79c9a4 Apr 15, 2026
30 checks passed
@jjbayer jjbayer deleted the ref/dedup-kafka-error branch April 15, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants