PubSubPublisher : High CPU usage

We use PubSub in a fairly conventional way : 
```
                                                Database X
                                              /   
                                             / 
Pubsub (Topic A) --> Worker (Subscription A) 
                                             \
                                              \
                                                Database Y
```
Our infrastructure is on google cloud (and we use k8s). For a steady ~3000 messages / second, 6 pods with the following config are enough and work at around 80% CPU

```
  "resources": {
    "requests": {
      "cpu": "500m",
      "memory": "2Gi"
    },
    "limits": {
      "cpu": "1500m",
      "memory": "3Gi"
    }
  }
```

We would like to use the message in different ways and move towards something like : 

```
                                                 Database X
                                              /   
                                             / 
Pubsub (Topic A) --> Worker (Subscription A) -- Database Y
                                             \
                                              \
                                                Pubsub (Topic B)
```

The only modification we made to the code is creating a new PubSubPublisher and publishing messages to the topic. Every message from Topic A / Subscription A ends up in Topic B so we are also effectively publishing ~3000 messages / second.

After adding that operation, CPU usage for the pods went through the roof. What 6 pods were able to handle @ 80% CPU could not be handled by 20 pods @ 100% CPU with the config noted above (we rolled the code back as the workers weren't able to keep up).

The PubSubPublisher is created with very standard settings (I believe the retry settings are the default ones) : 

```java
private void startPublisher() {
RetrySettings retrySettings = RetrySettings.newBuilder()
    .setTotalTimeout(Duration.ofSeconds(10))
    .setInitialRetryDelay(Duration.ofMillis(5))
    .setRetryDelayMultiplier(2.0)
    .setMaxRetryDelay(Duration.ofMillis(Long.MAX_VALUE))
    .setInitialRpcTimeout(Duration.ofSeconds(10))
    .setRpcTimeoutMultiplier(2)
    .setMaxRpcTimeout(Duration.ofSeconds(10))
    .build();

publisher = Publisher.newBuilder(topic)
    .setRetrySettings(retrySettings)
    .setCredentialsProvider(FixedCredentialsProvider.create(
            ServiceAccountCredentials.fromStream(new FileInputStream(Configurations.getGoogleCloudCredentials())))
    )
    .build();
}
```

The actual publish method is : 

```java
private void publish(byte[] message) {
    try {
        PubsubMessage pubsubMessage = PubsubMessage.newBuilder().setData(ByteString.copyFrom(message)).build();
        ApiFuture<String> future = publisher.publish(pubsubMessage);

        ApiFutures.addCallback(future, new ApiFutureCallback<String>() {
            @Override
            public void onFailure(Throwable throwable) {
                if (throwable instanceof ApiException) {
                    ApiException apiException = ((ApiException) throwable);
                    logger.debug(new LogEntry().setMessage(String.format("PubSubException : ApiException. Code %s. IsRetryable ? %s",
                            apiException.getStatusCode().getCode(),
                            apiException.isRetryable())));
                }
                logger.warn(new LogEntry().setMessage("Error publishing message : " + Helper.getStackTrace(throwable)));
            }

            @Override
            public void onSuccess(String messageId) {

            }
        });
    }
    catch(Exception ex) {
        logger.error(new LogEntry().setMessage("An error occured publishing a message to pubsub : " + Helper.getStackTrace(ex)));
    }
}
```

Converting our objects to byte arrays is simple serialization : 

```java
org.apache.commons.lang3.SerializationUtils.serialize(message);
```

Is such a high CPU usage to be expected ? Should queueing into Pubsub need that much processing power ? What are we doing wrong for our publishers to consume so much ?

We've seen this with different versions. We got these results with `0.39.0-beta` on a box running centos 7. We use gRPC 1.10.0 in the project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PubSubPublisher : High CPU usage #3194

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PubSubPublisher : High CPU usage #3194

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions