Yet another way to monitor OpenStack

The project

For one year, I’ve been working on OpenStack deployments and the first thing I do after each of them is to setup monitoring.

There is some interesting ways that you could read on Internet. One of them is from Mirantis and explains that you could monitor API with check_http method.
In my opinion, that’s not enough to use this check, since you don’t actually check if the API is responding what you are waiting for. I’ve seen many times nova-api completely down (because of MySQL access), while the API sent HTTP 200 to the client.
In other words, the monitoring system was not able to detect the failure.

I decided to write some useful scripts to check if every OpenStack service are running.

You can download them all, fork the repository and of course pull requests will be highly welcomed! Since they are quite standards, you can use it with many of current monitoring tools (Nagios, Sensu, etc).
One of the requirements is to have a dedicated tenant / user for monitoring.

 

How to monitor APIs

A simple way could be to use directly OpenStack clients (python-novaclient, etc) but it’s too easy. I consider that monitoring servers should have only their software installed, and that’s all. That’s why all the scripts only use “curl“.

Keystone example:

To monitor Keystone it’s enough to ask for a token at the API and see if we actually get it. For this, I use this curl command (without using python):

curl -s -X 'POST' ${OS_AUTH_URL}:5000/v2.0/tokens -d '{"auth":{"passwordCredentials":{"username": "'$OS_USERNAME'", "password":"'$OS_PASSWORD'"}, "tenantName":"'$OS_TENANT'"}}' -H 'Content-type: application/json' |sed -e 's/[{}]/''/g' | awk -v k="text" '{n=split($0,a,","); for (i=1; i<=n; i++) print a[i]}'|awk 'NR==3'|awk '{print $2}'|sed -n 's/.*"\([^"]*\)".*/\1/p'

Funny curl, yeah :-) maybe the longest of my carrier now.

To monitor Neutron API, I simply list the networks with :

curl -s -H "X-Auth-Token: $TOKEN" -H "Content-type: application/json" http://localhost:9696/v2.0/networks

Help of API scripts:

Usage: ./check_{project}-api.sh [OPTIONS]
 -h     Get help
 -H     URL for obtaining an auth token. Ex: http://localhost
 -T     Tenant to use to get an auth token
 -U     Username to use to get an auth token
 -P     Password to use ro get an auth token

Another point, it could happens that API are slow due to amount of requests. All the scripts check the latency and send a WARNING if the API takes more than 10 seconds to answer.

 

How to monitor other services

Most of the other services are connected to AMQP.
If we take the example of nova-scheduler, the script checks if the service is running by getting the PID and verifying that it’s connected to AMQP. Far enough for schedulers.

PID=$(ps -ef | grep nova-scheduler | grep python | awk {'print$2'} | head -n 1)
 
if ! KEY=$(netstat -epta 2>/dev/null | grep $PID 2>/dev/null | grep amqp) || test -z "$PID"
then
    echo "nova-scheduler is not connected to AMQP."
    exit $STATE_CRITICAL
fi
 
echo "nova-scheduler is working."
exit $STATE_OK

For this kind of services, you just need to run the script without any parameter.

 

Conclusion

The work is not finished yet and I’m waiting for contributors. Test them, improve them!
Happy monitoring!

  • Jesse Pretorius

    Excellent work. This is sorely needed. I’ll be testing and contributing within the next month or two as my next scheduled focus point is to improve monitoring.

    I can’t help thinking that perhaps there’ll be some duplication between this and the grenade anddevstack tests though.

  • Emilien Macchi

    Grenade is a tool for testing upgrades : https://wiki.openstack.org/wiki/Grenade

    There is no link between my project and Grenade :)

    I appreciate your contribution, let me know if I can help.

  • Pingback: ECAE — Shopex电子商务云的梦想空间 » [Denny] openstack ceilometer

  • aditya neelkanth

    Nice post…but how do you calculate other performance parameters like latency, data rate, availability etc ?