restores highly available service by keith-turner · Pull Request #6153 · apache/accumulo

keith-turner · 2026-02-25T00:07:45Z

Restored highly available service in order to support multiple managers. This change should also avoid clients that connect to a non-primary manager from getting stuck. Made all thrift methods throw ThriftNotActiveServiceException, if the method does not declare this it will cause a server side exception. There was a single one way thrift method that could not throw this, so it would still cause server side exceptions if its called. Delaying advertising should prevent this in most cases.

keith-turner · 2026-02-25T00:09:52Z

Now that the compaction coordinator can throw ThriftNotActiveServiceException the compactors should retry if they see this exception. Looked at the compactors and it seems like they will retry as all of their calls use RetryableThriftCall.

keith-turner · 2026-02-25T00:11:35Z

server/base/src/main/java/org/apache/accumulo/server/rpc/HighlyAvailableServiceWrapper.java

+ * lock.
+ *
+ * <p>
+ * Its expected that all methods in the wrapped thrift service declare they throw


All these Highly* classes were copied as is from an earlier commit. The only change I made was adding this comment.

dlmarion · 2026-02-25T17:34:08Z

server/manager/src/main/java/org/apache/accumulo/manager/Manager.java

-        compactionCoordinator.getThriftService(), managerClientHandler, getContext());
+        wrappedCoordinator, managerClientHandler, getContext());
    try {
      updateThriftServer(() -> {


The Manager is the only process that passes false for the last argument so that it can delay starting the ThriftServer until after the Manager is fully up. I think with these changes that last parameter can be removed from the method.

I can look into removing that.

dlmarion · 2026-02-25T17:35:40Z

server/manager/src/main/java/org/apache/accumulo/manager/Manager.java

      }, false);
+      // Now that the Manager is up, start the ThriftServer
+      Objects.requireNonNull(getThriftServerAddress(), "Thrift Server Address should not be null");
+      getThriftServerAddress().startThriftServer("Manager Client Service Handler");


If the suggestion above is implemented, then this line can be removed.

dlmarion · 2026-02-25T17:47:22Z

I wonder if there is a way to automate a check that all Thrift RPC methods handled by the Manager throws the ThriftNotActiveServiceException

keith-turner · 2026-02-25T18:09:21Z

I wonder if there is a way to automate a check that all Thrift RPC methods handled by the Manager throws the ThriftNotActiveServiceException

Not sure this functionality has any tests. There is the server side aspect of throwing the exception, if we could automate test for that it would be nice. Then there is the client side code that retries when it sees this exception, that code is sprinkled all over the place. The server side code seems easier to test. For the client side code, maybe the best way to test is to run random walk tests while starting/stopping managers.

keith-turner · 2026-02-25T18:12:56Z

For the client side code, maybe if we refactored to use more common code on the client side then we could focus on testing that common code. I noticed the compactor code did not need to change because it uses common code for retry. That common code retries on TException though which may be too broad for the general case, like we probably should not retry on a thrift security exception.

dlmarion · 2026-02-25T19:03:15Z

I looked at trying to write a test for this, but could not figure it out quickly. Another option would be to fail at runtime on the server side, which should be caught during testing. To do this, we could include the following method in the ThriftProcessorTypes class, then call it for each interface passed to the getManagerTProcessor method.

  private void validateHAServerExceptions(Class<?> thriftInterface) {
    String className = thriftInterface.getClass().getName();
    Method[] methods = thriftInterface.getClass().getMethods();
    for (Method m : methods) {
      Class<?>[] exceptionClasses = m.getExceptionTypes();
      if (exceptionClasses.length == 0) {
        throw new IllegalStateException("Method " + m.getName() + " on " + className + ""
            + " does not declare ThriftNotActiveServiceException to be thrown");
      }
      boolean found = false;
      for (Class<?> ec : exceptionClasses) {
        if (ThriftNotActiveServiceException.class.getName().equals(ec.getClass().getName())) {
          found = true;
          break;
        }
      }
      if (!found) {
        throw new IllegalStateException("Method " + m.getName() + " on " + className + ""
            + " does not declare ThriftNotActiveServiceException to be thrown");        
      }
    }
  }

keith-turner · 2026-02-25T19:52:31Z

Re using reflection to look for absence of the exception, thrift one way methods can not throw an exception. Not sure if we can detect if a method is one way. If we could that would be nice in the HAService code, it could just ignore oneway thrift calls when not the primary instead of throwing an exception.

dlmarion · 2026-02-25T20:05:52Z

It looks like the information that's needed is in the Processor class, specifically a subclass that has the same name as the method, then you call isOneWay on it.

For example, for the ManagerClientService.Iface.waitForFlush method, there is a ManagerClientService.Processor.waitForFlush class, and its isOneWay method returns false.

keith-turner added this to the 4.0.0 milestone Feb 25, 2026

keith-turner requested a review from dlmarion February 25, 2026 00:07

keith-turner commented Feb 25, 2026

View reviewed changes

dlmarion reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restores highly available service#6153

restores highly available service#6153
keith-turner wants to merge 1 commit intoapache:mainfrom
keith-turner:highly-avail-service

keith-turner commented Feb 25, 2026

Uh oh!

keith-turner commented Feb 25, 2026

Uh oh!

keith-turner Feb 25, 2026

Uh oh!

dlmarion Feb 25, 2026

Uh oh!

keith-turner Feb 25, 2026

Uh oh!

dlmarion Feb 25, 2026

Uh oh!

dlmarion commented Feb 25, 2026

Uh oh!

keith-turner commented Feb 25, 2026

Uh oh!

keith-turner commented Feb 25, 2026 •

edited

Loading

Uh oh!

dlmarion commented Feb 25, 2026

Uh oh!

keith-turner commented Feb 25, 2026

Uh oh!

dlmarion commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

keith-turner commented Feb 25, 2026

Uh oh!

keith-turner commented Feb 25, 2026

Uh oh!

keith-turner Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

dlmarion Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

keith-turner Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

dlmarion Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

dlmarion commented Feb 25, 2026

Uh oh!

keith-turner commented Feb 25, 2026

Uh oh!

keith-turner commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dlmarion commented Feb 25, 2026

Uh oh!

keith-turner commented Feb 25, 2026

Uh oh!

dlmarion commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

keith-turner commented Feb 25, 2026 •

edited

Loading