HDDS-10887. Implement a basic Merkle Tree Manager.#6778
HDDS-10887. Implement a basic Merkle Tree Manager.#6778kerneltime merged 12 commits intoapache:HDDS-10239-container-reconciliationfrom
Conversation
* HDDS-10239-container-reconciliation: HDDS-10372. SCM and Datanode communication for reconciliation (apache#6506) HDDS-10239. Storage Container Reconciliation. (apache#6121)
...rvice/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumManager.java
Outdated
Show resolved
Hide resolved
...rvice/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumManager.java
Outdated
Show resolved
Hide resolved
...rvice/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumManager.java
Show resolved
Hide resolved
| return tree; | ||
| } | ||
|
|
||
| private ContainerProtos.ContainerChecksumInfo readFile() throws IOException { |
There was a problem hiding this comment.
Why not move this into the ChecksumTreeManager?
There was a problem hiding this comment.
The ContainerChecksumInfo file/proto is not meant to be interacted with directly. Updates and modifications to that file should go through the manager. This would not be clear if there was a public method in the manager to read the file directly.
Making the method public with @VisibleForTesting is also not ideal because the base directory used in each method is different so that would add an extra parameter.
There was a problem hiding this comment.
We are already exposing writeContainerDataTree(), Can we do something like that to test the read() functionality? I understand both writeContainerDataTree() and markBlocksAsDeleted() implicitly test read(). I have made this change as a part of #6864 to capture read latency. Let me know if it works in this case.
There was a problem hiding this comment.
Discussed offline:
- The read and write metrics are separate so the can still be tested even if the only public methods exposed are doing read-modify-write.
- GRPC API to move the tree between datanodes will operate at the file level and will not deserialize the protobuf.
- If a debug method is added to the client and we need to render the trees we can add such a method to hdds-common or similar package. That could also be invoked from this class if necessary.
- This manager is designed to run in the datanode over a pool of containers and will not be suitable for client use anyways.
| containerChecksumBuffer.putLong(blockTreeProto.getBlockChecksum()); | ||
| } | ||
| containerChecksumBuffer.flip(); | ||
| checksumImpl.update(containerChecksumBuffer); |
There was a problem hiding this comment.
It should be possible to take the longs of the blocks and just call update with the bits shifted to calculate the checksum.
There was a problem hiding this comment.
Yeah I can update this to use Java's CRC32 which has an update(int) method instead of our own ByteBuffer based implementation.
There was a problem hiding this comment.
After some offline discussion we are going to handle any optimizations in checksum computation in HDDS-11077
...er-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerMerkleTree.java
Show resolved
Hide resolved
|
Will give it one more round tonight. Overall the PR looks good, most of the changes can be done in a follow up PR. |
|
Going to push a few more commits before this is ready for full CI. Converting to draft to save CI time. |
| return tree; | ||
| } | ||
|
|
||
| private ContainerProtos.ContainerChecksumInfo readFile() throws IOException { |
There was a problem hiding this comment.
We are already exposing writeContainerDataTree(), Can we do something like that to test the read() functionality? I understand both writeContainerDataTree() and markBlocksAsDeleted() implicitly test read(). I have made this change as a part of #6864 to capture read latency. Let me know if it works in this case.
hadoop-hdds/interface-client/src/main/proto/DatanodeClientProtocol.proto
Outdated
Show resolved
Hide resolved
…-delete * HDDS-10239-container-reconciliation: HDDS-10887. Implement a basic Merkle Tree Manager. (apache#6778) HDDS-10923. Container Scanner should still scan unhealthy containers. (apache#6809) Conflicts: hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/checksum/ContainerChecksumTreeManager.java hadoop-hdds/container-service/src/test/java/org/apache/hadoop/ozone/container/checksum/TestContainerChecksumTreeManager.java
What changes were proposed in this pull request?
Implement a central manager to coordinate reads and writes to datanode container merkle tree/checksum information. This patch defines the bare minimum API and protobuf structures needed for reconciliation:
Anything more advanced can be added with its implementation when or if it is needed. This includes the ability to diff two merkle trees, which is stubbed out here and planned for HDDS-10928.
What is the link to the Apache JIRA
HDDS-10887
How was this patch tested?
Unit tests added.