Bringing High Availability to Quantum Server

Introduction

Quantum is going to be a core project in OpenStack next release (Folsom).

If we use Quantum as the Network Manager, we can’t configure nova-network in multi_host mode, that’s why we loose High Availability for nova-network.

Quantum-Server can be a single-point-of-failure, that’s why I was thinking about how to fix that. As everyone in the list, I could see that Hastexo and Sebastien Han worked on Nova RA (Resource Agent) for Pacemaker.

I decided to work on Quantum Server RA and wrote something very close from other agents. You can directly have a look on the RA GitHub or follow my HowTo below.

Note : with this RA, we don’t bring HA in nova-network, but in Quantum Server service only. I think this feature will be available in a close future.

 

Requirements

I use two VMs with Ubuntu 12.04 LTS Server installed & configured with one NIC.

Here is the network configuration :

  • quantum-server-1: 192.168.2.129/24
  • quantum-server-2: 192.168.2.130/24

 

  • Install Quantum-Server, OVS Plugin (optional) and Git :
    apt-get update
    apt-get install -y quantum-server quantum-plugin-openvswitch git
  •  

  • Configure Quantum to use OVS Plugin in editing the /etc/quantum/plugins.ini file :
    [PLUGIN]
    provider = quantum.plugins.openvswitch.ovs_quantum_plugin.OVSQuantumPlugin
  • Note : if you’re using Folsom Testing Packages, you should modify /etc/quantum/quantum.conf only :

    core_plugin = quantum.plugins.openvswitch.ovs_quantum_plugin.OVSQuantumPlugin

     

  • Stop quantum-server process :
    service quantum-server stop

 

Pacemaker Basic Configuration

  • Installation on both servers :
    apt-get install -y pacemaker
  • Configure /etc/hosts file on both servers :
     192.168.2.129  quantum-server-1
     192.168.2.130  quantum-server-2
  • On first server :
    corosync-keygen
    scp /etc/corosync/authkey root@quantum-server-2:/root
  • On second server :
    sudo mv ~/authkey /etc/corosync/authkey
    sudo chown root:root /etc/corosync/authkey
    sudo chmod 400 /etc/corosync/authkey
  • On both servers, configure /etc/corosync/corosync.conf file :

    secauth: yes
    bindnetaddr: 192.168.2.0
  • On both servers, allow corosync to be started. To do that, modify /etc/default/corosync file :

     START=yes
  • And start the service :

    /etc/init.d/corosync start
  •  

    Ressource Agent Installation

  • On both servers :
    mkdir /usr/lib/ocf/resource.d/openstack
    cd /usr/lib/ocf/resource.d/openstack/
    wget https://github.com/madkiss/openstack-resource-agents/raw/master/ocf/quantum-server
    chmod +x quantum-server
    crm ra info ocf:openstack:quantum-server
  • On first server :
    crm configure property stonith-enabled=false
    crm configure property no-quorum-policy=ignore
    crm configure rsc_defaults resource-stickiness=100
    crm configure primitive p_vip ocf:heartbeat:IPaddr params ip="192.168.2.150" cidr_netmask="24" nic="eth0" op monitor interval="5s"
    crm configure primitive p_quantum_server ocf:openstack:quantum-server params config="/etc/quantum/quantum.conf" op monitor interval="5s" timeout="5s"
    crm configure group g_quantum_servers p_quantum_server p_vip
  • You can check the cluster :

    root@quantum-server-1:~# crm_mon -1
    ============
    Last updated: Tue Jul 17 12:19:52 2012
    Last change: Tue Jul 17 11:16:30 2012 via crm_attribute on quantum-server-1
    Stack: openais
    Current DC: quantum-server-1 - partition with quorum
    Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
    2 Nodes configured, 2 expected votes
    2 Resources configured.
    ============
     
    Online: [ quantum-server-1 quantum-server-2 ]
     
     p_vip    (ocf::heartbeat:IPaddr):    Started quantum-server-1
     p_quantum_server    (ocf::openstack:quantum-server):    Started quantum-server-1

     

    Configure nova-network

    Configure the Quantum Host flag like this :

    --quantum_connection_host=192.168.2.150

    We actually use this IP (p_vip) for access to the Quantum Server by 9696 TCP port (by default).

     

    Simulate a failure

    We are going to stop Quantum-Server-1 :

    crm node standby quantum-server-1

    And check if Quantum-Server is working on Server 2 :

    ps -ef | grep quantum-server

    You should see quantum-server process.

    To enable Server 1 after failure :

    crm node online quantum-server-1

    Note: Depending of your resource-stickiness value, the process can stay on Server-2.

     

    Conclusion

    Feel free to bring your own experience here, that’s actually not a definitive solution and let me know if something is wrong.
    Of course, we don’t have a full high availability since Nova-Network process does not support it [yet].

     

    Thank’s to Sebastien Han for his help ;-) !

     

    Source