Stay tuned on our recent achievements in the Kubernetes and OpenStack space when running Fast-Datapath applications.

Authors: Emilien Macchi and Maysa Macedo.

In the past months, the Kubernetes Network Plumbing Working-Group added new features to the SR-IOV Network Operator for the OpenStack platform.

If you’re not familiar with this operator, it helps Kubernetes cluster users deploy their workloads to be connected to Fast Datapath (FDP) networking resources. While the operator is named “SR-IOV”, we’ll see that it can also manage other types of connectivity.

In fact, the operator helps to provision and configure the SR-IOV Network Device Plugin for Kubernetes, which is in charge of discovering and advertising networking resources for FDP, mainly (but not exclusively) for SR-IOV Virtual Functions (VFs) and PCI Physical Functions (PFs) that are available on a Kubernetes host (usually a worker node).

The operator hides some complexity to achieve that and provides an easy user interface.

OpenStack metadata support

The operator originally required config-drives to be enabled for the machines connected to the FDP networking, so it could read the OpenStack metadata and Network data.

We removed that requirement by adding support for reading that information from the Nova metadata service if no config-drive was used.

If your Kubernetes hosts have access to the Nova metadata URL, then you have nothing to do! Otherwise, you’ll need to make sure to create the machines with config-drive enabled.

Enable VFIO with NOIOMMU

In virtual deployments of Kubernetes where the underlying virtualization platform (e.g. QEMU) has support for virtualized I/O memory management unit (IOMMU) however OpenStack Nova doesn’t know how to handle it yet. It’s a work in progress. Therefore, the VFIO PCI driver needs to be loaded with an option named enable_unsafe_noiommu_mode enabled. This option gives user-space I/O access to a device which is direct memory access capable without a IOMMU.

The operator is now loading the driver with the right arguments so the users don’t have to worry about it.

DPDK

The operator was initially designed to work on Baremetal and not necessarily on virtualized platforms. However, when a virtualized Kubernetes host is connected to some network hardware using DPDK, the device is exposed as a virtio interface (seen as a VF by the operator) but to take advantage of DPDK, the device has to use the VFIO-PCI driver. We added support for detecting vhost-user interfaces that are connected to the specified Neutron network used for DPDK. Vhost-user is a module part of DPDK and it helps to run networking in the user-space. You can find more information here.

Here is an example of a SriovNetworkNodePolicy that can be used for Intel devices (you’ll need to change a few things if your device is Mellanox):

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: dpdk1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci # change to netdevice if Mellanox
  nicSelector:
	netFilter: openstack/NetworkID:55a54d05-9ec1-4051-8adb-1b5a7be4f1b6
  nodeSelector:
	feature.node.kubernetes.io/network-sriov.capable: 'true'
  numVfs: 1
  priority: 99
  resourceName: dpdk1
  isRdma: false # set to true if Mellanox

You’ll need to configure the Network ID that matches your DPDK network in OpenStack.

OVS Hardware Offload

Open-vSwitch is CPU intensive, which affects system performance and prevents available bandwidth from being fully utilized.

Since OVS 2.8 a feature called OVS Hardware Offload is available. It improves performance significantly by offloading tasks to the hardware running the NIC. OpenStack has full compatibility with this feature and the SR-IOV operator can now take advantage of it.

Here is an example of a SriovNetworkNodePolicy that can be used:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: hwoffload1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  nicSelector:
	netFilter: openstack/NetworkID:55a54d05-9ec1-4051-8adb-1b5a7be4f1b6
  nodeSelector:
	feature.node.kubernetes.io/network-sriov.capable: 'true'
  numVfs: 1
  priority: 99
  resourceName: hwoffload1
  isRdma: true

For now, we only support certain types of devices from the Mellanox vendor.

Also, you’ll need to configure the Network ID that matches your offloaded network for OpenStack.

Wrap-up

The SR-IOV Network operator was extended to support essential use-cases for OpenStack, so the workloads can be using FDP features. All the features are available in the upstream operator. If you’re an OpenShift user, it’ll be available to you in the 4.11 release and backported to 4.10 in the next zstream, so stay tuned!