Prometheus can discover targets in a Docker Swarm cluster, as of v2.20.0. This guide demonstrates how to use that service discovery mechanism.
The Docker Swarm service discovery contains 3 different roles: nodes, services, and tasks.
The first role, nodes, represents the hosts that are part of the Swarm. It can be used to automatically monitor the Docker daemons or the Node Exporters who run on the Swarm hosts.
The second role, tasks, represents any individual container deployed in the swarm. Each task gets its associated service labels. One service can be backed by one or multiple tasks.
The third one, services, will discover the services deployed in the swarm. It will discover the ports exposed by the services. Usually you will want to use the tasks role instead of this one.
Prometheus will only discover tasks and service that expose ports.
For this guide, you need to setup Prometheus. We will assume that
Prometheus runs on a Docker Swarm manager node and has access to the Docker
socket at /var/run/docker.sock
.
Let's dive into the service discovery itself.
Docker itself, as a daemon, exposes metrics that can be ingested by a Prometheus server.
You can enable them by editing /etc/docker/daemon.json
and setting the
following properties:
{
"metrics-addr" : "0.0.0.0:9323",
"experimental" : true
}
Instead of 0.0.0.0
, you can set the IP of the Docker Swarm node.
A restart of the daemon is required to take the new configuration into account.
The Docker documentation contains more info about this.
Then, you can configure Prometheus to scrape the Docker daemon, by providing the
following prometheus.yml
file:
scrape_configs:
# Make Prometheus scrape itself for metrics.
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Create a job for Docker daemons.
- job_name: 'docker'
dockerswarm_sd_configs:
- host: unix:///var/run/docker.sock
role: nodes
relabel_configs:
# Fetch metrics on port 9323.
- source_labels: [__meta_dockerswarm_node_address]
target_label: __address__
replacement: $1:9323
# Set hostname as instance label
- source_labels: [__meta_dockerswarm_node_hostname]
target_label: instance
For the nodes role, you can also use the port
parameter of
dockerswarm_sd_configs
. However, using relabel_configs
is recommended as it
enables Prometheus to reuse the same API calls across identical Docker Swarm
configurations.
Let's now deploy a service in our Swarm. We will deploy cadvisor, which exposes container resources metrics:
docker service create --name cadvisor -l prometheus-job=cadvisor \
--mode=global --publish target=8080,mode=host \
--mount type=bind,src=/var/run/docker.sock,dst=/var/run/docker.sock,ro \
--mount type=bind,src=/,dst=/rootfs,ro \
--mount type=bind,src=/var/run,dst=/var/run \
--mount type=bind,src=/sys,dst=/sys,ro \
--mount type=bind,src=/var/lib/docker,dst=/var/lib/docker,ro \
google/cadvisor -docker_only
This is a minimal prometheus.yml
file to monitor it:
scrape_configs:
# Make Prometheus scrape itself for metrics.
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# Create a job for Docker Swarm containers.
- job_name: 'dockerswarm'
dockerswarm_sd_configs:
- host: unix:///var/run/docker.sock
role: tasks
relabel_configs:
# Only keep containers that should be running.
- source_labels: [__meta_dockerswarm_task_desired_state]
regex: running
action: keep
# Only keep containers that have a `prometheus-job` label.
- source_labels: [__meta_dockerswarm_service_label_prometheus_job]
regex: .+
action: keep
# Use the prometheus-job Swarm label as Prometheus job label.
- source_labels: [__meta_dockerswarm_service_label_prometheus_job]
target_label: job
Let's analyze each part of the relabel configuration.
- source_labels: [__meta_dockerswarm_task_desired_state]
regex: running
action: keep
Docker Swarm exposes the desired state of the tasks over the API. In out example, we only keep the targets that should be running. It prevents monitoring tasks that should be shut down.
- source_labels: [__meta_dockerswarm_service_label_prometheus_job]
regex: .+
action: keep
When we deployed our cadvisor, we have added a label prometheus-job=cadvisor
.
As Prometheus fetches the tasks labels, we can instruct it to only keep the
targets which have a prometheus-job
label.
- source_labels: [__meta_dockerswarm_service_label_prometheus_job]
target_label: job
That last part takes the label prometheus-job
of the task and turns it into
a target label, overwriting the default dockerswarm
job label that comes from
the scrape config.
The Prometheus Documentation contains the full list of labels, but here are other relabel configs that you might find useful.
- source_labels: [__meta_dockerswarm_network_name]
regex: ingress
action: keep
Global tasks run on every daemon.
- source_labels: [__meta_dockerswarm_service_mode]
regex: global
action: keep
- source_labels: [__meta_dockerswarm_task_port_publish_mode]
regex: host
action: keep
- source_labels: [__meta_dockerswarm_node_hostname]
target_label: docker_node
The above dockerswarm_sd_configs
entries have a field host:
host: unix:///var/run/docker.sock
That is using the Docker socket. Prometheus offers additional configuration options to connect to Swarm using HTTP and HTTPS, if you prefer that over the unix socket.
There are many discovery labels you can play with to better determine which targets to monitor and how, for the tasks, there is more than 25 labels available. Don't hesitate to look at the "Service Discovery" page of your Prometheus server (under the "Status" menu) to see all the discovered labels.
The service discovery makes no assumptions about your Swarm stack, in such a way that given proper configuration, this should be pluggable to any existing stack.
This documentation is open-source. Please help improve it by filing issues or pull requests.