Skip to content

Conversation

@prince286
Copy link

This Pull Request implements two key fault tolerance features within MirrorMaker's MirrorSourceTask to enhance data integrity in mission-critical deployments.

The changes address the risk of silent data loss (truncation) and service disruption (topic reset).

1. Log Truncation Detection (Fail-Fast)

Problem: Kafka's retention policies can delete log segments, causing MM2's consumer to lose its expected offset and silently skip data.
Solution (in MirrorSourceTask.java):
An in-memory map tracks the last processed offset for each partition. If the newly polled offset is greater than the last processed offset + 1, a data gap is confirmed.

  • Action: Throws a ConnectException with a CRITICAL log message, forcing a Fail-Fast stop and preventing further silent data loss.

2. Graceful Topic Reset Handling (Auto-Recovery)

Problem: Deleting and recreating the source topic resets its offset to 0. MM2's stored offset is now invalid.
Solution (in MirrorSourceTask.java):
The logic detects if the newly polled offset is less than the last processed offset (indicating a log reset/topic recreation).

  • Action: Logs an AUTO-RECOVERED event but does not throw an exception. The consumer automatically resets to the earliest offset (0) based on Connect's default behavior, allowing replication to resume immediately.

3. Environment Fix (MirrorUtils.java)

Problem: Single-node Connect deployments fail due to Kafka 4.0.0's strict default replication factor of 3 for internal topics.
Solution (in MirrorUtils.java):
The createCompactedTopic method is modified to explicitly set replicationFactor = 1, ensuring compatibility with single-broker testing environments.

FrankYang0529 and others added 30 commits January 15, 2025 16:47
apache#18549)

Reviewers: Ismael Juma <ismael@juma.me.uk>, Gaurav Narula <gaurav_narula2@apple.com>, TengYao Chi <kitingiao@gmail.com>
…pache#18543)

Reviewers: Bruno Cadonna <bruno@confluent.io>, Alieh Saeedi <asaeedi@confluent.io>
Reviewers: Bill Bejeck <bill@confluent.io>
…ly (apache#18490)

RocksDBTimeOrderedKeyValueBuffer is not initialize with serdes provides
via Joined, but always uses serdes from StreamsConfig.

Reviewers: Bill Bejeck <bill@confluent.io>
…n without records (apache#18448)

Fix the issue where producer.commitTransaction under transaction version 2 throws error if no partition or offset is added to transaction. The solution is to avoid sending the endTxnRequest unless producer.send or producer.sendOffsetsToTransaction is triggered.

Reviewers: Justine Olshan <jolshan@confluent.io>
Reviewers: Bill Bejeck <bill@confluent.io>
A prior commit introduced checking for the version of a node related to move to log4j2 but it was causing an error
AttributeError("'ClusterNode' object has no attribute 'version'") This PR uses the get_version method from version.py which checks if the Node has a version attribute preventing an error.

Reviewers: Matthias Sax <mjsax@apache.org>
Removed Optional for SharePartitionManager and ClientMetricsManager as zookeeper code is being removed. Also removed asScala and asJava conversion in KafkaApis.handleListClientMetricsResources, moved to java stream.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
…fter removing zookeeper (apache#18365)

This patch introduces a new page to document the configs and metrics that have been removed in the transition to 4.0. While these removed items lack a formal deprecation cycle as they are part of KIP-500, KIP-500 itself does not provide an exhaustive list of all impacted configs and metrics. Therefore, this new page aims to assist Kafka users in understanding the specific configs and metrics that have been removed in the 4.0 release.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
The PR removes dependency of server module on share-coordinator, rather it should be other way. Moved the ShareCoordinatorConfig class from server to share-coordinator.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
…18414)

In 4.0, there is no ZK mode and both of these configs are required in kraft mode.

Reviewers: Ismael Juma <ismael@juma.me.uk>
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Divij Vaidya <diviv@amazon.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
…EFAULT and NUM_RECOVERY_THREADS_PER_DATA_DIR_CONFIG (apache#18106)

Reviewers: Divij Vaidya <diviv@amazon.com>
…ns (apache#18568)


Reviewers: Mickael Maison <mickael.maison@gmail.com>
…rtitionsMetadata, ZkConfigRepository, DelayedDeleteTopics (apache#18574)


Reviewers: Mickael Maison <mickael.maison@gmail.com>
…rtitionsAssignedCallback (apache#18515)

Reviewers: Lianet Magrans <lmagrans@confluent.io>
Reviewers: Andrew Schofield <aschofield@confluent.io>
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>
…pache#18565)

Since the example.com DNS lookup changed the second time within one
year, we rewrote the unit tests for ClientUtils so that they do
not make a real DNS lookup to the outside but use mocks.

Reviewers: PoAn Yang <payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>, Lianet Magrans <lmagrans@confluent.io>
Remove KafkaController and related unused references:

* ControllerChannelContext
* ControllerChannelManager
* ControllerEventManager
* ControllerState
* PartitionStateMachine
* ReplicaStateMachine
* TopicDeletionManager
* ZkBrokerEpochManager

Reviewers: Ismael Juma <ismael@juma.me.uk>
…#18406)

Add some logs when offline/online happens.

Reviewers: David Jacot <djacot@confluent.io>
cmccabe and others added 10 commits March 7, 2025 13:58
In "Upgrading to 4.0.0 from any version 0.8.x through 3.9.x" section, we
directly give instructions about [Upgrading to KRaft-based
clusters](https://kafka.apache.org/documentation/#upgrade_390_kraft),
but there might still be some users under ZK cluster before upgrading to
v4.0.0. We need to make it clear that they need to upgrade to KRaft mode
first before upgrading to v4.0.0 in "Upgrading to 4.0.0 from any version
0.8.x through 3.9.x" section.

Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
)

Use "incompatible" instead of an empty cell in Kafka Streams broker
compatibility docs. Also update compatibility matrix for 4.0.0 release.

Reviewers: Matthias J. Sax <matthias@confluent.io>
Add new section for Kafka 4.0 compatibility metrics for user, so user
can check server and client in this section.

Reviewers: Divij Vaidya <diviv@amazon.com>, Matthias J. Sax <matthias@confluent.io>
This patch adds a section about upgrading clients to the upgrade notes.

Reviewers: Ismael Juma <ismael@juma.me.uk>, David Jacot <djacot@confluent.io>
Skip kraft.version when applying FeatureLevelRecord records. The kraft.version is stored as control records and not as metadata records. This solution has the benefits of removing from snapshots any FeatureLevelRecord for kraft.version that was incorrectly written to the log and allows ApiVersions to report the correct finalized kraft.version.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
…ion (apache#19164)

Fixes two issues:
 - only commit TX if no revoked tasks need to be committed
 - commit revoked tasks after punctuation triggered

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Anna Sophie Blee-Goldman <sophie@responsive.dev>, Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bill@confluent.io>
@github-actions github-actions bot added triage PRs from the community streams core Kafka Broker producer consumer tools connect performance kraft mirror-maker-2 dependencies Pull requests that update a dependency file storage Pull requests that target the storage module tiered-storage Related to the Tiered Storage feature KIP-932 Queues for Kafka build Gradle build or GitHub Actions docker Official Docker image generator RPC and Record code generator transactions Transactions and EOS clients group-coordinator labels Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Gradle build or GitHub Actions clients connect consumer core Kafka Broker dependencies Pull requests that update a dependency file docker Official Docker image generator RPC and Record code generator group-coordinator KIP-932 Queues for Kafka kraft mirror-maker-2 performance producer storage Pull requests that target the storage module streams tiered-storage Related to the Tiered Storage feature tools transactions Transactions and EOS triage PRs from the community

Projects

None yet

Development

Successfully merging this pull request may close these issues.