A production-ready loadable plugin for Zabbix Agent 2 that provides comprehensive monitoring of Docker Swarm services.
The standard Docker plugin in Zabbix is inadequate for Docker Swarm monitoring because containers get random suffixes when restarted, creating new Zabbix items and breaking historical data continuity. This plugin solves this by monitoring at the service level instead of container level, providing stable service discovery and tracking desired vs running replica counts.
- Service Discovery: Low-Level Discovery (LLD) for all Docker Swarm services with stack grouping
- Stack Discovery: Low-Level Discovery for Docker Compose stacks
- Replica Monitoring: Track desired vs running replica counts per service
- Restart Detection: Monitor and alert on service task restarts/crashes
- Stack Health Monitoring: Aggregate health status by Docker Compose stack
- Stable Monitoring: Service-based monitoring prevents historical data fragmentation
- Cross-Architecture: Supports both x86_64 and ARM64 Linux systems
- Zabbix Agent 2: Version 6.0 or later
- Go: Version 1.21+ (for building from source)
- Docker Swarm: Linux environment with Docker Swarm mode enabled
- Permissions: Zabbix user must have access to Docker socket
Option A: Build from source
git clone <repository-url>
cd zabbix-agent2-plugin-docker-swarm/src
# For x86_64 Linux (most common)
make build-x86_64
# For ARM64 Linux
make build-arm64
# Or build both
make buildOption B: Download pre-built binaries
Download the latest release from the Releases page.
Build a .deb that installs the binary to /var/lib/zabbix/plugins/docker-swarm.
Requirements:
dpkg-debandcurlavailable on the build host
Build and upload:
# Ensure API key is available for upload
echo 'RM_API_KEY=your-key-here' > ~/.repomanager
chmod 600 ~/.repomanager
# Build for current architecture and upload
scripts/build-deb.shBuild only (no upload):
scripts/build-deb.sh --no-uploadBuild for a specific architecture:
scripts/build-deb.sh --arch amd64
scripts/build-deb.sh --arch arm64Release flow (build + package + upload):
scripts/release.sh
scripts/release.sh --arch amd64
scripts/release.sh --arch arm64 --no-upload# Copy the binary to Zabbix plugins directory
sudo cp docker-swarm-linux-x86_64 /var/lib/zabbix/plugins/docker-swarm
sudo chmod +x /var/lib/zabbix/plugins/docker-swarm
sudo chown zabbix:zabbix /var/lib/zabbix/plugins/docker-swarmAdd to /etc/zabbix/zabbix_agent2.conf:
Plugins.DockerSwarm.System.Path=/var/lib/zabbix/plugins/docker-swarm
Plugins.DockerSwarm.System.Timeout=30# Add zabbix user to docker group
sudo usermod -aG docker zabbix
# Or set proper permissions on socket
sudo chmod 666 /var/run/docker.socksudo systemctl restart zabbix-agent2Test the plugin functionality:
zabbix_get -s localhost -k "swarm.services.discovery"After installation, test the monitoring features:
# Test service discovery with stack information
zabbix_get -s localhost -k "swarm.services.discovery"
# Test stack discovery
zabbix_get -s localhost -k "swarm.stacks.discovery"
# Test stack health (replace 'mystack' with actual stack name)
zabbix_get -s localhost -k "swarm.stack.health[mystack]"
# Test restart monitoring (use service name, ID, or service key)
zabbix_get -s localhost -k "swarm.service.restarts[web]"
zabbix_get -s localhost -k "swarm.service.restarts[mystack_web]"
# Debug: Check total task count to understand restart behavior
zabbix_get -s localhost -k "swarm.service.tasks[web]"
# Check last restart timestamp (for restart detection)
zabbix_get -s localhost -k "swarm.service.last_restart[web]"For detailed examples and Zabbix template configuration, see EXAMPLES.md.
| Key | Description | Returns |
|---|---|---|
swarm.services.discovery |
Service discovery for LLD | JSON array with {#SERVICE.ID}, {#SERVICE.NAME}, {#STACK.NAME}, and {#SERVICE.KEY} macros |
swarm.service.replicas_desired[<service_identifier>] |
Configured replica count | Integer (desired replicas) |
swarm.service.replicas_running[<service_identifier>] |
Running task count | Integer (running tasks) |
swarm.service.restarts[<service_identifier>] |
Number of task restarts (crashed tasks) | Integer (restart count) |
swarm.service.tasks[<service_identifier>] |
Total number of tasks for debugging | Integer (task count) |
swarm.service.last_restart[<service_identifier>] |
Timestamp of most recent running task | Unix timestamp |
swarm.stacks.discovery |
Stack discovery for LLD | JSON array with {#STACK.NAME} macro |
swarm.stack.health[<stack_name>] |
Stack health status | JSON with health metrics |
Service metrics accept multiple identifier types for maximum flexibility:
- Service ID: Full Docker service ID (e.g.,
abc123def456...) - Service Name: Simple service name (e.g.,
web,database) - Service Key: Stable identifier for stack services (e.g.,
mystack_web,mystack_database)
Service Key Format:
- Stack services:
{stack_name}_{service_name}(e.g.,mystack_web) - Standalone services:
{service_name}(e.g.,web)
Benefits:
- ✅ Stable monitoring: Service keys don't change during stack redeploys
- ✅ Flexible identification: Use any identifier type that's convenient
- ✅ Backward compatible: Existing service ID usage continues to work
The plugin provides multiple ways to detect service restarts:
-
Task Count Method (
swarm.service.restarts):- Counts non-running tasks in recent history
- Limited by Docker's ~5 task retention policy
- Good for detecting recent restarts
-
Timestamp Method (
swarm.service.last_restart):- Returns Unix timestamp of most recent running task
- Use in Zabbix with
change()function to detect restarts - More reliable for long-term monitoring
Recommended Zabbix Trigger:
Expression: change(/YourHost/swarm.service.last_restart[{#SERVICE.KEY}])>0
Description: Service {#SERVICE.NAME} has restarted
- Name: Docker Swarm Services
- Key:
swarm.services.discovery - Update Interval: 300s (5 minutes)
-
Desired Replicas
- Name: Service {#SERVICE.NAME} ({#STACK.NAME}) desired replicas
- Key:
swarm.service.replicas_desired[{#SERVICE.KEY}] - Type: Zabbix agent
-
Running Replicas
- Name: Service {#SERVICE.NAME} ({#STACK.NAME}) running replicas
- Key:
swarm.service.replicas_running[{#SERVICE.KEY}] - Type: Zabbix agent
-
Restart Count
- Name: Service {#SERVICE.NAME} ({#STACK.NAME}) restart count
- Key:
swarm.service.restarts[{#SERVICE.KEY}] - Type: Zabbix agent
- Store Value: Delta (speed per second)
- Note: Use Delta to track increase in restarts over time
-
Replica Mismatch
- Name: Service {#SERVICE.NAME} ({#STACK.NAME}) replica mismatch
- Expression:
last(/Template/swarm.service.replicas_running[{#SERVICE.KEY}])<>last(/Template/swarm.service.replicas_desired[{#SERVICE.KEY}]) - Severity: Warning
-
Service Restarted
- Name: Service {#SERVICE.NAME} ({#STACK.NAME}) has restarted
- Expression:
change(/Template/swarm.service.restarts[{#SERVICE.KEY}])>0 - Severity: Warning
- Description: A task for this service has crashed and been restarted
- Name: Docker Compose Stacks
- Key:
swarm.stacks.discovery - Update Interval: 600s (10 minutes)
-
Stack Health
- Name: Stack {#STACK.NAME} health status
- Key:
swarm.stack.health[{#STACK.NAME}] - Type: Zabbix agent
- Value Type: Text
-
Stack Health Percentage (Calculated Item)
- Name: Stack {#STACK.NAME} health percentage
- Formula:
jsonpath(last(/Template/swarm.stack.health[{#STACK.NAME}]),"$.health_percentage") - Units: %
-
Unhealthy Services Count (Calculated Item)
- Name: Stack {#STACK.NAME} unhealthy services
- Formula:
jsonpath(last(/Template/swarm.stack.health[{#STACK.NAME}]),"$.unhealthy_services")
-
Stack Health Critical
- Name: Stack {#STACK.NAME} has unhealthy services
- Expression:
jsonpath(last(/Template/swarm.stack.health[{#STACK.NAME}]),"$.unhealthy_services")>0 - Severity: High
-
Stack Health Warning
- Name: Stack {#STACK.NAME} health below 100%
- Expression:
jsonpath(last(/Template/swarm.stack.health[{#STACK.NAME}]),"$.health_percentage")<100 - Severity: Warning
The plugin discovers all Docker Swarm services and groups them by Docker
Compose stack using the com.docker.stack.namespace label. Services
without this label are marked as "standalone".
For each stack, the plugin:
- Identifies all services belonging to the stack
- Compares desired vs running replica counts for each service
- Calculates health percentage:
(healthy_services / total_services) * 100 - Returns comprehensive health metrics
The plugin tracks tasks that have failed or shutdown with non-zero exit codes, indicating container crashes that triggered Docker Swarm restarts.
- Permission Denied: Ensure Zabbix user has access to Docker socket
- No Services Found: Verify Docker Swarm is running and services exist
- Stack Not Detected: Check that services have
com.docker.stack.namespacelabels
# Test Docker API access
curl --unix-socket /var/run/docker.sock http://localhost/v1.41/services
# Check Zabbix Agent logs
sudo tail -f /var/log/zabbix/zabbix_agent2.log
# Test specific metrics
zabbix_get -s localhost -k "swarm.services.discovery"- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Zabbix team for the excellent Agent 2 plugin framework
- Docker team for the comprehensive Swarm API
- Community contributors and testers