HDDS-10408. NPE causes OM crash in Snapshot Purge request.#6250
HDDS-10408. NPE causes OM crash in Snapshot Purge request.#6250swamirishi merged 1 commit intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Thanks for the quick fix @aswinshakil.
LGTM.
Please create a follow task, if needed, as discussed offline to use cache when iterating over snapshot info table in snapshot delete service.
swamirishi
left a comment
There was a problem hiding this comment.
@aswinshakil thanks for the patch. Overall the patch looks good to me. Wondering if you can add a testcase for the same, which recreates the duplicate request scenario.
| SnapshotInfo fromSnapshot = omMetadataManager.getSnapshotInfoTable() | ||
| .get(snapTableKey); | ||
|
|
||
| if (fromSnapshot == null) { |
There was a problem hiding this comment.
Is it possible to create a unit test case for this in TestSnapshotDeletingService?
When further thinking about it, we can still end up having duplicate entries even when we check the cache since there could be a case where the first request could be in pre execute stage when the second request is sent. The first request's validateAndUpdateCache method could be execute when the request is inflight. We need to make this operation idempotent, there is no other way going about it. |
|
@aswinshakil Please create a follow up task for the test case & iterating through the snapshot info table cache to reduce the occurance. @hemantk-12 thanks for the review |
(cherry picked from commit 6dfd7d4)
…apache#6250) (cherry picked from commit 6dfd7d4) Change-Id: I1d602ad904b48e342b7aeb6d5aa925232e03486f
What changes were proposed in this pull request?
When a previous snapshot purge request has already purged the snapshot. There will be a race condition between
SnapshotDeletingServiceandOMSnapshotPurgeRequestwhere we resend the same request which causes a NPE when getting thesnapshotInfofrom thesnapshotInfoTable. We can ignore this request as this snapshot is already purged.What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10408
How was this patch tested?
Existing tests.