HDDS-6795: EC: PipelineStateMap#addPipeline should not have precondition checks post db updates by umamaheswararao · Pull Request #3453 · apache/ozone

umamaheswararao · 2022-05-25T03:50:31Z

What changes were proposed in this pull request?

Moved the condition before getting pipeline. We should not have sanity checks after adding pipelines into DB. If there is an issue and failed with this sanity checks, SCM can crash and it will never come back as the same issue can trigger again when loading them from db.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-6795

How was this patch tested?

Checking on existing tests.

sodonnel · 2022-05-25T20:03:26Z

hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelineStateMap.java

Do you think we should still have a check here, and at least log if the pipeline does not have enough nodes and skip it, rather than loading it? I guess we should not get a "bad pipeline" here, as it never should be allowed to be added anyway.

I think it make sense to leave this preconditions as is. Anyway we are making sure to pass the correct pipeline with other checks.

sodonnel · 2022-05-25T20:05:26Z

...p-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/PipelineManagerImpl.java

As this change is required due what we suspect is a bug in the RackScatterPlacementPolicy, where it returns less nodes than expected without giving an exception, I think we should make a change to the RackScatter policy too, so that it throws an exception indicating it is about to return successfully with less than expected. You could add a reference to this Jira in a comment so we can remove it later if we get to the bottom of the suspected bug.

oh, yeah I remember we talked about this. Let me add that.

agree with stephen. maybe we would better to check and handle this in that certain placement policy. if the placement policy can not choose enough datanodes for the new pipeline , an exception should be thrown from it.

Moved the check into pipelineFactory#create

…ion checks post db updates

adoroszlai · 2022-06-02T17:42:56Z

.../apache/hadoop/hdds/scm/container/placement/algorithms/SCMContainerPlacementRackScatter.java

+    if (nodesRequiredToChoose != chosenNodes.size()) {
+      String reason = "Chosen nodes size: " + chosenNodes
+          .size() + ", but required nodes to choose: " + nodesRequiredToChoose
+          + " do not match.";
+      LOG.warn("Placement policy could not choose the enough nodes."
+              + " {} Available nodes count: {}, Excluded nodes count: {}",
+          reason, totalNodesCount, excludedNodesCount);
+      throw new SCMException(reason,
+          SCMException.ResultCodes.FAILED_TO_FIND_HEALTHY_NODES);
+    }


I wonder if validateContainerPlacement should verify the number of chosen nodes vs. required nodes (either in base class or this specific subclass).

I assumed that is more validating the placement with respective to racks. Some how we are still seeing the issue of getting less nodes than requested.

Makes sense, I see the message is referencing racks.

umamaheswararao · 2022-06-02T19:04:56Z

Thanks @adoroszlai for the review

* master: (87 commits) HDDS-6686. Do Leadship check before SASL token verification. (apache#3382) HDDS-4364: [FSO]List FileStatus : startKey can be a non-existed path (apache#3481) HDDS-6091. Add file checksum to OmKeyInfo (apache#3201) HDDS-6706. Exposing Volume Information Metrics to the DataNode UI (apache#3478) HDDS-6759: Add listblock API in MockDatanodeStorage (apache#3452) HDDS-5821 Container cache management for closing RockDB (apache#3426) HDDS-6683. Refactor OM server bucket layout configuration usage (apache#3477) HDDS-6824. Revert changes made in proto.lock by HDDS-6768. (apache#3480) HDDS-6811. Bucket create message with layout type (apache#3479) HDDS-6810. Add a optional flag to trigger listStatus as part of listKeys for FSO buckets. (apache#3461) HDDS-6828. Revert RockDB version pending leak fixes (apache#3475) HDDS-6764: EC: DN ability to create RECOVERING containers for EC reconstruction. (apache#3458) HDDS-6795: EC: PipelineStateMap#addPipeline should not have precondition checks post db updates (apache#3453) HDDS-6823. Intermittent failure in TestOzoneECClient#testExcludeOnDNMixed (apache#3476) HDDS-6820. Bucket Layout Post-Finalization Validators for ACL Requests. (apache#3472) HDDS-6819. Add LEGACY to AllowedBucketLayouts in CreateBucketHandler (apache#3473) HDDS-4859. [FSO]ListKeys: seek all the files/dirs from startKey to keyPrefix (apache#3466) HDDS-6705 Add metrics for volume statistics including disk capacity, usage, Reserved (apache#3430) HDDS-6474. Add test to cover the FSO bucket list status with beyond batch boundary and cache. (apache#3379). Contributed by aswinshakil HDDS-6280. Support Container Balancer HA (apache#3423) ...

sodonnel reviewed May 25, 2022

View reviewed changes

umamaheswararao added 3 commits June 2, 2022 07:09

HDDS-6795: EC: PipelineStateMap#addPipeline should not have precondit…

df1295a

…ion checks post db updates

Added validation check in pipelineFactory#create

ce49c04

Corrected a format

ca6cb66

umamaheswararao force-pushed the HDDS-6795 branch from c7f1238 to ca6cb66 Compare June 2, 2022 14:10

Fixed the comments.

362b1b5

adoroszlai self-requested a review June 2, 2022 16:17

adoroszlai reviewed Jun 2, 2022

View reviewed changes

adoroszlai approved these changes Jun 2, 2022

View reviewed changes

umamaheswararao merged commit 15257c7 into apache:master Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDDS-6795: EC: PipelineStateMap#addPipeline should not have precondition checks post db updates#3453

HDDS-6795: EC: PipelineStateMap#addPipeline should not have precondition checks post db updates#3453
umamaheswararao merged 4 commits intoapache:masterfrom
umamaheswararao:HDDS-6795

umamaheswararao commented May 25, 2022

Uh oh!

sodonnel May 25, 2022

Uh oh!

umamaheswararao May 25, 2022

Uh oh!

sodonnel May 25, 2022

Uh oh!

umamaheswararao May 25, 2022

Uh oh!

JacksonYao287 May 26, 2022

Uh oh!

umamaheswararao May 27, 2022

Uh oh!

adoroszlai Jun 2, 2022

Uh oh!

umamaheswararao Jun 2, 2022

Uh oh!

adoroszlai Jun 2, 2022

Uh oh!

umamaheswararao commented Jun 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

umamaheswararao commented May 25, 2022

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

umamaheswararao commented Jun 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants