TripleO

OpenStack Containerization with Podman – Part 4 (Healthchecks)

For this fourth episode, we’ll explain how we implemented healthchecks for Podman containers. Don’t miss the first, second and third episodes where we learnt how to deploy, operate and upgrade Podman containers.

In this post, we’ll see the work that we have done to implement container healthchecks with Podman.

Note: Jill Rouleau wrote the code in TripleO to make that happen.

Context

Docker can perform health checks directly in the Docker engine without the need of an external monitoring tool or sidecar containers.

A script (usually per-image) would be run by the engine and the return code would define if whether or not a container is healthy.

Example of healthcheck script:

curl -g -k -q --fail --max-time 10 --user-agent curl-healthcheck \
--write-out "\n%{http_code} %{remote_ip}:%{remote_port} %{time_total} seconds\n" https://my-app:8774 || return 1

It was originally built so unhealthy containers can be rescheduled or removed by Docker engine. The health could be verified by docker ps or docker inspect commands:

$ docker ps
my-app "/entrypoint.sh" 30 seconds ago Up 29 seconds (healthy) 8774/tcp my-app

However with Podman we don’t have that kind of engine anymore but having that monitoring interface has been useful in our architecture, so our operators can use this interface to verify the state of the containers.

Several options were available to us:

  • systemd timers (like cron) to schedule the health checks. Example documented on coreos manuals.
  • use Podman Pods with Side Car container running health checks.
  • add a scheduling function to conmon with a scheduling function.
  • systemd service, like a podman-healthcheck service that would run on a fixed interval.

If you remember from the previous posts, we decided to get some help from systemd to control the containers, like automatically restart on failure and also automatic start at boot. With that said, we decided to go with the first option, which seems the easier to integrate and the less invasive.

Implementation

The systemd timer is a well-known mechanism. It’s basically a native feature in systemd that allows to run a specific service in a time controlled. The service would be a “OneShot” type, executing the healthcheck script present in the container image.

Here is how we did for our OpenStack containers (with a timer of 30 seconds for healthchecks, configurable like a cron):

# my_app_healthcheck.timer

[Unit]
Description=my_app container healthcheck
Requires=my_app_healthcheck.service
[Timer]
OnUnitActiveSec=90
OnCalendar=*-*-* *:*:00/30
[Install]
WantedBy=timers.target
# my_app_healthcheck.service

[Unit]
Description=my_app healthcheck
Requisite=my_app.service
[Service]
Type=oneshot
ExecStart=/usr/bin/podman exec my_app /bin/healthcheck
[Install]
WantedBy=multi-user.target

Activate the timer and service:

$ systemctl daemon-reload
$ systemctl enable --now my_app_healthcheck.service
$ systemctl enable --now my_app_healthcheck.timer

Check the service & timer status:

$ service my_app_healthcheck status
Redirecting to /bin/systemctl status my_app_healthcheck.service
● my_app_healthcheck.service - my_app healthcheck
   Loaded: loaded (/etc/systemd/system/my_app_healthcheck.service; enabled; vendor preset: disabled)
   Active: activating (start) since Fri 2018-12-14 20:11:00 UTC; 158ms ago
 Main PID: 325504 (podman)
   CGroup: /system.slice/my_app_healthcheck.service
           └─325504 /usr/bin/podman exec my_app /bin/healthcheck
Dec 14 20:11:00 myhost.localdomain systemd[1]: Starting my_app healthcheck...

$ service my_app_healthcheck.timer status
Redirecting to /bin/systemctl status my_app_healthcheck.timer
● my_app_healthcheck.timer - my_app container healthcheck
   Loaded: loaded (/etc/systemd/system/my_app_healthcheck.timer; enabled; vendor preset: disabled)
   Active: active (waiting) since Fri 2018-12-14 18:42:22 UTC; 1h 30min ago
Dec 14 18:42:22 myhost.localdomain systemd[1]: Started my_app container healthcheck.

$ systemctl list-timers
NEXT                         LEFT          LAST                         PASSED       UNIT                                               ACTIVATES
Fri 2018-12-14 20:14:00 UTC  361ms left    Fri 2018-12-14 20:13:30 UTC  29s ago      my_app_healthcheck.timer               my_app_healthcheck.service

Now it’s implemented, let’s try it!

Demo

Stay in touch for the next post in the series of deploying TripleO and Podman!

Source of the demo.

Software Engineeer at Red Hat, Private Pilot, French guy hiding somewhere in Canada.