Skip to content

Increase start_period for default healthcheck #541

@jaydrogers

Description

@jaydrogers

Background

During zero-downtime deployments, sometime the service fails to start:

[09-Jun-2025 17:16:27] NOTICE: fpm is running, pid 131
[09-Jun-2025 17:16:27] NOTICE: ready to handle connections curl: (7) Failed to connect to localhost port 8080 after 1 ms: Could not connect to server ❌ There seems to be a failure in checking the NGINX + PHP-FPM. curl: (7) Failed to connect to localhost port 8080 after 1 ms: Could not connect to server HTTP Status Code: 000 ::1 - -
[09/Jun/2025:17:16:30 +0000] "GET /up HTTP/2.0" 200 1936 "-" "curl/8.12.1" "-" ✅ NGINX + PHP-FPM is running correctly. ::1 - - 
[09/Jun/2025:17:16:33 +0000] "GET /up HTTP/2.0" 200 1936 "-" "curl/8.12.1" "-" ::1 - - 
[09/Jun/2025:17:16:38 +0000] "GET /up HTTP/2.0" 200 1936 "-" "curl/8.12.1" "-"

This can mainly be from the Laravel Auto-runs.

From a user on Discord:

did more digging, finally got a deployment to work, but now I've ran into an old issue that subsequent deployments will always fail and then get rolled back

the container typically stays alive for about 30 seconds, gets through most of the Laravel automations, and sometimes gives the FPM error I shown above before being reported as a failure and being rolled back

I am still at a loss as to what's causing this, at first I thought it could be fpm printing to stderr, but that wasn't the case

it even sometimes starts giving successful health checks before it gets rolled back

docker inspect implies the container is exiting with code 137 (SIGTERM?)

Problem

Proposed solution

  • Set the start_period to 30s

From the official docs:

start period provides initialization time for containers that need time to bootstrap. Probe failure during that period will not be counted towards the maximum number of retries. However, if a health check succeeds during the start period, the container is considered started and all consecutive failures will be counted towards the maximum number of retries.

Outcome

What did you expect?

  • If AUTORUN_ENABLED is true, we should give enough time for these services to start.

What happened instead?

  • The container shuts down

Affected Docker Images

All except CLI

Anything else?

Related Discord Message

https://discord.com/channels/910287105714954251/910299290230997003/1381686793522380850

Metadata

Metadata

Assignees

Labels

🧐 Bug: Needs ConfirmationSomething isn't working, but needs to be confirmed by a team member.

Type

No type
No fields configured for issues without a type.

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions