For this fourth episode, we’ll explain how we implemented healthchecks for Podman containers. Don’t miss the first, second and third episodes where we learnt how to deploy, operate and upgrade Podman containers.
In this post, we’ll see the work that we have done to implement container healthchecks with Podman.
Note: Jill Rouleau wrote the code in TripleO to make that happen.
Context
Docker can perform health checks directly in the Docker engine without the need of an external monitoring tool or sidecar containers.
A script (usually per-image) would be run by the engine and the return code would define if whether or not a container is healthy.
Example of healthcheck script:
curl -g -k -q --fail --max-time 10 --user-agent curl-healthcheck \
--write-out "\n%{http_code} %{remote_ip}:%{remote_port} %{time_total} seconds\n" https://my-app:8774 || return 1
It was originally built so unhealthy containers can be rescheduled or removed by Docker engine. The health could be verified by docker ps or docker inspect commands:
$ docker ps
my-app "/entrypoint.sh" 30 seconds ago Up 29 seconds (healthy) 8774/tcp my-app
However with Podman we don’t have that kind of engine anymore but having that monitoring interface has been useful in our architecture, so our operators can use this interface to verify the state of the containers.
Several options were available to us:
- systemd timers (like cron) to schedule the health checks. Example documented on coreos manuals.
- use Podman Pods with Side Car container running health checks.
- add a scheduling function to conmon with a scheduling function.
- systemd service, like a podman-healthcheck service that would run on a fixed interval.
If you remember from the previous posts, we decided to get some help from systemd to control the containers, like automatically restart on failure and also automatic start at boot. With that said, we decided to go with the first option, which seems the easier to integrate and the less invasive.
Implementation
The systemd timer is a well-known mechanism. It’s basically a native feature in systemd that allows to run a specific service in a time controlled. The service would be a “OneShot” type, executing the healthcheck script present in the container image.
Here is how we did for our OpenStack containers (with a timer of 30 seconds for healthchecks, configurable like a cron):
# my_app_healthcheck.timer
[Unit]
Description=my_app container healthcheck
Requires=my_app_healthcheck.service
[Timer]
OnUnitActiveSec=90
OnCalendar=*-*-* *:*:00/30
[Install]
WantedBy=timers.target
# my_app_healthcheck.service
[Unit]
Description=my_app healthcheck
Requisite=my_app.service
[Service]
Type=oneshot
ExecStart=/usr/bin/podman exec my_app /bin/healthcheck
[Install]
WantedBy=multi-user.target
Activate the timer and service:
$ systemctl daemon-reload
$ systemctl enable --now my_app_healthcheck.service
$ systemctl enable --now my_app_healthcheck.timer
Check the service & timer status:
$ service my_app_healthcheck status
Redirecting to /bin/systemctl status my_app_healthcheck.service
● my_app_healthcheck.service - my_app healthcheck
Loaded: loaded (/etc/systemd/system/my_app_healthcheck.service; enabled; vendor preset: disabled)
Active: activating (start) since Fri 2018-12-14 20:11:00 UTC; 158ms ago
Main PID: 325504 (podman)
CGroup: /system.slice/my_app_healthcheck.service
└─325504 /usr/bin/podman exec my_app /bin/healthcheck
Dec 14 20:11:00 myhost.localdomain systemd[1]: Starting my_app healthcheck...
$ service my_app_healthcheck.timer status
Redirecting to /bin/systemctl status my_app_healthcheck.timer
● my_app_healthcheck.timer - my_app container healthcheck
Loaded: loaded (/etc/systemd/system/my_app_healthcheck.timer; enabled; vendor preset: disabled)
Active: active (waiting) since Fri 2018-12-14 18:42:22 UTC; 1h 30min ago
Dec 14 18:42:22 myhost.localdomain systemd[1]: Started my_app container healthcheck.
$ systemctl list-timers
NEXT LEFT LAST PASSED UNIT ACTIVATES
Fri 2018-12-14 20:14:00 UTC 361ms left Fri 2018-12-14 20:13:30 UTC 29s ago my_app_healthcheck.timer my_app_healthcheck.service
Now it’s implemented, let’s try it!
Demo
Stay in touch for the next post in the series of deploying TripleO and Podman!
Source of the demo.