Zabbix Agent 2 Docker Swarm Plugin

A production-ready loadable plugin for Zabbix Agent 2 that provides comprehensive monitoring of Docker Swarm services.

Overview

The standard Docker plugin in Zabbix is inadequate for Docker Swarm monitoring because containers get random suffixes when restarted, creating new Zabbix items and breaking historical data continuity. This plugin solves this by monitoring at the service level instead of container level, providing stable service discovery and tracking desired vs running replica counts.

Features

Service Discovery: Low-Level Discovery (LLD) for all Docker Swarm services with stack grouping
Stack Discovery: Low-Level Discovery for Docker Compose stacks
Replica Monitoring: Track desired vs running replica counts per service
Restart Detection: Monitor and alert on service task restarts/crashes
Stack Health Monitoring: Aggregate health status by Docker Compose stack
Stable Monitoring: Service-based monitoring prevents historical data fragmentation
Cross-Architecture: Supports both x86_64 and ARM64 Linux systems

Requirements

Zabbix Agent 2: Version 6.0 or later
Go: Version 1.21+ (for building from source)
Docker Swarm: Linux environment with Docker Swarm mode enabled
Permissions: Zabbix user must have access to Docker socket

Installation

1. Download or Build

Option A: Build from source

git clone <repository-url>
cd zabbix-agent2-plugin-docker-swarm/src

# For x86_64 Linux (most common)
make build-x86_64

# For ARM64 Linux 
make build-arm64

# Or build both
make build

Option B: Download pre-built binaries

Download the latest release from the Releases page.

Debian package (build + optional upload)

Build a .deb that installs the binary to /var/lib/zabbix/plugins/docker-swarm.

Requirements:

dpkg-deb and curl available on the build host

Build and upload:

# Ensure API key is available for upload
echo 'RM_API_KEY=your-key-here' > ~/.repomanager
chmod 600 ~/.repomanager

# Build for current architecture and upload
scripts/build-deb.sh

Build only (no upload):

scripts/build-deb.sh --no-upload

Build for a specific architecture:

scripts/build-deb.sh --arch amd64
scripts/build-deb.sh --arch arm64

Release flow (build + package + upload):

scripts/release.sh
scripts/release.sh --arch amd64
scripts/release.sh --arch arm64 --no-upload

2. Install Plugin

# Copy the binary to Zabbix plugins directory
sudo cp docker-swarm-linux-x86_64 /var/lib/zabbix/plugins/docker-swarm
sudo chmod +x /var/lib/zabbix/plugins/docker-swarm
sudo chown zabbix:zabbix /var/lib/zabbix/plugins/docker-swarm

3. Configure Zabbix Agent 2

Add to /etc/zabbix/zabbix_agent2.conf:

Plugins.DockerSwarm.System.Path=/var/lib/zabbix/plugins/docker-swarm
Plugins.DockerSwarm.System.Timeout=30

4. Configure Docker Socket Access

# Add zabbix user to docker group
sudo usermod -aG docker zabbix

# Or set proper permissions on socket
sudo chmod 666 /var/run/docker.sock

5. Restart Services

sudo systemctl restart zabbix-agent2

6. Verify Installation

Test the plugin functionality:

zabbix_get -s localhost -k "swarm.services.discovery"

Quick Start

After installation, test the monitoring features:

# Test service discovery with stack information
zabbix_get -s localhost -k "swarm.services.discovery"

# Test stack discovery
zabbix_get -s localhost -k "swarm.stacks.discovery"

# Test stack health (replace 'mystack' with actual stack name)
zabbix_get -s localhost -k "swarm.stack.health[mystack]"

# Test restart monitoring (use service name, ID, or service key)
zabbix_get -s localhost -k "swarm.service.restarts[web]"
zabbix_get -s localhost -k "swarm.service.restarts[mystack_web]"

# Debug: Check total task count to understand restart behavior
zabbix_get -s localhost -k "swarm.service.tasks[web]"

# Check last restart timestamp (for restart detection)
zabbix_get -s localhost -k "swarm.service.last_restart[web]"

For detailed examples and Zabbix template configuration, see EXAMPLES.md.

Supported Metrics

Key	Description	Returns
`swarm.services.discovery`	Service discovery for LLD	JSON array with `{#SERVICE.ID}`, `{#SERVICE.NAME}`, `{#STACK.NAME}`, and `{#SERVICE.KEY}` macros
`swarm.service.replicas_desired[<service_identifier>]`	Configured replica count	Integer (desired replicas)
`swarm.service.replicas_running[<service_identifier>]`	Running task count	Integer (running tasks)
`swarm.service.restarts[<service_identifier>]`	Number of task restarts (crashed tasks)	Integer (restart count)
`swarm.service.tasks[<service_identifier>]`	Total number of tasks for debugging	Integer (task count)
`swarm.service.last_restart[<service_identifier>]`	Timestamp of most recent running task	Unix timestamp
`swarm.stacks.discovery`	Stack discovery for LLD	JSON array with `{#STACK.NAME}` macro
`swarm.stack.health[<stack_name>]`	Stack health status	JSON with health metrics

Service Identifiers

Service metrics accept multiple identifier types for maximum flexibility:

Service ID: Full Docker service ID (e.g., abc123def456...)
Service Name: Simple service name (e.g., web, database)
Service Key: Stable identifier for stack services (e.g., mystack_web, mystack_database)

Service Key Format:

Stack services: {stack_name}_{service_name} (e.g., mystack_web)
Standalone services: {service_name} (e.g., web)

Benefits:

✅ Stable monitoring: Service keys don't change during stack redeploys
✅ Flexible identification: Use any identifier type that's convenient
✅ Backward compatible: Existing service ID usage continues to work

Restart Detection Methods

The plugin provides multiple ways to detect service restarts:

Task Count Method (swarm.service.restarts):
- Counts non-running tasks in recent history
- Limited by Docker's ~5 task retention policy
- Good for detecting recent restarts
Timestamp Method (swarm.service.last_restart):
- Returns Unix timestamp of most recent running task
- Use in Zabbix with change() function to detect restarts
- More reliable for long-term monitoring

Recommended Zabbix Trigger:

Expression: change(/YourHost/swarm.service.last_restart[{#SERVICE.KEY}])>0
Description: Service {#SERVICE.NAME} has restarted

Zabbix Template Configuration

Service-Level Monitoring

Discovery Rule

Name: Docker Swarm Services
Key: swarm.services.discovery
Update Interval: 300s (5 minutes)

Item Prototypes

Desired Replicas
- Name: Service {#SERVICE.NAME} ({#STACK.NAME}) desired replicas
- Key: swarm.service.replicas_desired[{#SERVICE.KEY}]
- Type: Zabbix agent
Running Replicas
- Name: Service {#SERVICE.NAME} ({#STACK.NAME}) running replicas
- Key: swarm.service.replicas_running[{#SERVICE.KEY}]
- Type: Zabbix agent
Restart Count
- Name: Service {#SERVICE.NAME} ({#STACK.NAME}) restart count
- Key: swarm.service.restarts[{#SERVICE.KEY}]
- Type: Zabbix agent
- Store Value: Delta (speed per second)
- Note: Use Delta to track increase in restarts over time

Trigger Prototypes

Replica Mismatch
- Name: Service {#SERVICE.NAME} ({#STACK.NAME}) replica mismatch
- Expression: last(/Template/swarm.service.replicas_running[{#SERVICE.KEY}])<>last(/Template/swarm.service.replicas_desired[{#SERVICE.KEY}])
- Severity: Warning
Service Restarted
- Name: Service {#SERVICE.NAME} ({#STACK.NAME}) has restarted
- Expression: change(/Template/swarm.service.restarts[{#SERVICE.KEY}])>0
- Severity: Warning
- Description: A task for this service has crashed and been restarted

Stack-Level Monitoring

Discovery Rule

Name: Docker Compose Stacks
Key: swarm.stacks.discovery
Update Interval: 600s (10 minutes)

Item Prototypes

Stack Health
- Name: Stack {#STACK.NAME} health status
- Key: swarm.stack.health[{#STACK.NAME}]
- Type: Zabbix agent
- Value Type: Text
Stack Health Percentage (Calculated Item)
- Name: Stack {#STACK.NAME} health percentage
- Formula: jsonpath(last(/Template/swarm.stack.health[{#STACK.NAME}]),"$.health_percentage")
- Units: %
Unhealthy Services Count (Calculated Item)
- Name: Stack {#STACK.NAME} unhealthy services
- Formula: jsonpath(last(/Template/swarm.stack.health[{#STACK.NAME}]),"$.unhealthy_services")

Trigger Prototypes

Stack Health Critical
- Name: Stack {#STACK.NAME} has unhealthy services
- Expression: jsonpath(last(/Template/swarm.stack.health[{#STACK.NAME}]),"$.unhealthy_services")>0
- Severity: High
Stack Health Warning
- Name: Stack {#STACK.NAME} health below 100%
- Expression: jsonpath(last(/Template/swarm.stack.health[{#STACK.NAME}]),"$.health_percentage")<100
- Severity: Warning

How It Works

Service Discovery

The plugin discovers all Docker Swarm services and groups them by Docker Compose stack using the com.docker.stack.namespace label. Services without this label are marked as "standalone".

Stack Health Calculation

For each stack, the plugin:

Identifies all services belonging to the stack
Compares desired vs running replica counts for each service
Calculates health percentage: (healthy_services / total_services) * 100
Returns comprehensive health metrics

Restart Detection

The plugin tracks tasks that have failed or shutdown with non-zero exit codes, indicating container crashes that triggered Docker Swarm restarts.

Troubleshooting

Common Issues

Permission Denied: Ensure Zabbix user has access to Docker socket
No Services Found: Verify Docker Swarm is running and services exist
Stack Not Detected: Check that services have com.docker.stack.namespace labels

Debug Commands

# Test Docker API access
curl --unix-socket /var/run/docker.sock http://localhost/v1.41/services

# Check Zabbix Agent logs
sudo tail -f /var/log/zabbix/zabbix_agent2.log

# Test specific metrics
zabbix_get -s localhost -k "swarm.services.discovery"

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Zabbix team for the excellent Agent 2 plugin framework
Docker team for the comprehensive Swarm API
Community contributors and testers

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
scripts		scripts
src		src
.gitignore		.gitignore
EXAMPLES.md		EXAMPLES.md
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod

Folders and files

Latest commit

History

Repository files navigation

Zabbix Agent 2 Docker Swarm Plugin

Overview

Features

Requirements

Installation

1. Download or Build

Debian package (build + optional upload)

2. Install Plugin

3. Configure Zabbix Agent 2

4. Configure Docker Socket Access

5. Restart Services

6. Verify Installation

Quick Start

Supported Metrics

Service Identifiers

Restart Detection Methods

Zabbix Template Configuration

Service-Level Monitoring

Discovery Rule

Item Prototypes

Trigger Prototypes

Stack-Level Monitoring

Discovery Rule

Item Prototypes

Trigger Prototypes

How It Works

Service Discovery

Stack Health Calculation

Restart Detection

Troubleshooting

Common Issues

Debug Commands

Contributing

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages