OVN (Open Virtual Network) Introduction

OVN (Open Virtual Network) is an open source software that provides virtual networking for features like L2, L3 and ACL, etc. OVN works with OVS (Open vSwitch) which has been adopted widely. OVN has been developed by the same community as that of OVS and it is treated almost like a sub-project of OVS. Just like OVS, the development of OVN is 100% open and discussion is being made on a public mailing list and IRC. While VMware and RedHat are the primary contributors today, it is open to everybody who wishes to contribute to OVN.

The target platform for OVN is Linux-derived hypervisors such as KVM and Xen and containers. DPDK is also supported. As of this writing, there is no ESXi support. Because OVS has been ported to Hyper-V (still in progress, though), OVN may support Hyper-V in the future.

It is important to note that OVN is CMS (Cloud Management System) agnostic and takes OpenStack, Docker and Mesos into the scope. Among these CMPs, OpenStack would have the highest significance for OVN. A better integration with OpenStack (compared to Neutron OVS plugin as of today) is probably one of the driving factors of OVN development.

OVN share the same goal as OVS, that is, supporting large scale deployment consisting of thousands of hypervisors with production quality.

Please see the diagram depicted below showing the high level architecture of OVN.

OVN Architecture

OVN Architecture

As you can see, OVN has two new components (processes). One is “ovn-northd” and another is “ovn-controller”. As its name implies, ovn-northd provides a northbound interface to CMS. As of this writing, only one ovn-northd exists in one OVN deployment. However, this part will be enhanced in the future to support some sort of clustering for redundancy and the scale-out.

One of the most unique architectures of OVN is probably the fact that the integration point with CMS is a database (DB). Although many people would expect that RESTful API is the integration point as a northbound interface, ovn-northd is designed to interact with CMS via DB.

ovn-northd uses two databases. One is called “Northbound DB”, which holds a “desired state” of virtual network. More specifically, it holds information about logical switch, logical port, logical router, logical router port, and ACL. It is worth noting that Northbound DB doesn’t hold any “physical” information. An entity sitting north side of ovn-northd (typically CMS) writes information to this Northbound DB to interact with ovn-northd. Another database that ovn-northd takes care of is “Southbound DB” which holds runtime state such as physical, logical and binding information. Specifically, Southbound DB includes information related to chassis, datapath binding, encapsulation port binding, and logical flows. One of the important roles of ovn-northd is to read Northbound DB, translates it to logical flows and write them to Southbound DB.

On the other hand, ovn-controller is a distributed controller, which will be installed on every single hypervisor. ovn-controller reads information from Southbound DB and configures ovsdb-server and ovs-vswitchd running on the same host accordingly. ovn-controller translates logical flow information populated in Southbound DB to physical flow information and then install it to OVS.

Today OVN uses OVSDB as a database system. Inherently database can be anything and doesn’t have to be OVSDB. However, as OVSDB is an intrinsic part of OVS, on which OVN always depends, and developers of OVN know the characteristics of OVSDB very well, they decided to use OVSDB as a database system for OVN for the time being.

Basically OVN provides three features; L2, L3 and ACL (Security Group).

As an L2 feature, OVN provides a logical switch. Specifically, OVN creates an L2-over-L3 overlay tunnel between hosts automatically. OVN uses Geneve as a default encapsulation. Considering the necessity of metadata support, multipath capability and hardware acceleration, Geneve would be the most desired encapsulation of choice. In case where hardware acceleration on the NIC for Geneve is not available, OVN allows to use STT as the second choice. In general, HW-VTEP doesn’t support Geneve/STT today, OVN uses VXLAN when talking to HW-VTEPs.

In terms of L3 feature, OVN provides so called a “distributed logical routing”. L3 features provided by OVN are not centralized meaning each host executes L3 function autonomously. L3 topology that OVN supports today is very simple. It routes traffic between logical switches directly connected to it and to the default gateway. As of this writing it is not possible to configure a static route other than the default route. It is simple but sufficient to support basic L3 function that OpenStack Neutron requires. NAT will be supported soon.

OVS plugin in OpenStack Neutron today implements Security Group by applying iptables to tap interface (vnet) on Linux Bridge. Having both OVS and Linux Bridge at the same time makes the architecture somewhat complex.


Conventional OpenStack Neutron OVS plugin Architecture (An excerpt from http://docs.ocselected.org/openstack-manuals/kilo/networking-guide/content/figures/6/a/a/common/figures/under-the-hood-scenario-1-ovs-compute.png)

Since OVS 2.4, OVS has been integrated with “conntrack” feature available on Linux, so it is possible to implement stateful ACL by OVS without relying on iptables. OVN takes advantages of this OVS & conntrack integration to implement the ACL.

Since conntrack integration is an OVS feature, one can use OVS+conntrack without OVN. However, OVN allows you to use stateful ACL without explicit awareness of conntrack because OVN compiles logical ACL rules to conntrack-based rules automatically, which would be appreciated by many people.

I will go into a bit more detail about L2, L3 and ACL features of OVN in the subsequent posts.

NetFlow on Open vSwitch

Open vSwitch (OVS) has been supporting NetFlow for a long time (since 2009). To enable NetFlow on OVS, you can use the following command for example:

When you want to disable NetFlow, you can do it in the following way:

NetFlow has several versions. V5 and V9 are the most commonly used versions today. OVS supports NetFlow V5 only. NetFlow V9 is not supported as of this writing (and very unlikely to be supported because OVS already supports IPFIX, a direct successor of NetFlow V9).

NetFlow V5 packet consists of a header and flow records (up to 30) following the header (see below).

NetFlow V5 header format NetFlow V5 header format

NetFlow V5 flow record format NetFlow V5 flow record format

NetFlow V5 cannot handle IPv6 flow records. If you need to monitor IPv6 traffic, you must use sFlow or IPFIX.

Comparing to NetFlow implementation on typical routers and switches, the one on OVS has a couple of unique points that you should keep in mind. I will describe such points below.

Most NetFlow-capable switches and routers support so called a “sampling”, where only subset of packets are processed for NetFlow (there are couple of ways how to sample the packet but it is beyond the scope of this blog post). NetFlow on OVS doesn’t support sampling. If you need to sample the traffic, you need to use sFlow or IPFIX instead.

Somewhat related to the fact that NetFlow on OVS doesn’t do sampling, it is worth noting that “byte count (dOctets)” and “packet count (dPkts)” fields in a NetFlow flow record (both are 32bit fields) may wrap around in case of elephant flows. In order to circumvent this issue, OVS sends multiple flow records when bytes or packets count exceeds 32bit maximum so that it can tell the collector with the accurate bytes and packets count.

Typically, NetFlow-capable switches and routers have a per-interface configuration to enable/disable NetFlow in addition to a global NetFlow configuration. OVS on the other hand doesn’t have a per-interface configuration. Instead it is a per-bridge configuration that allows you to enable/disable NetFlow on a per-bridge basis.

Most router/switch-based NetFlow exporters allow to configure the source IP address of NetFlow packet to be exported (and if it is the case, loopback address is a reasonable choice for this address). OVS, however, doesn’t have this capability. The source IP address of NetFlow packet is determined by the IP stack of host operating system and it is usually an IP address associated with the outgoing interface. Since NetFlow V5 doesn’t have a concept like “agent address” in sFlow, most collectors distinguish the exporters by the source IP address of NetFlow packets. Because OVS doesn’t allow us to configure the source IP address of NetFlow packets explicitly, we’d better be aware that there is a chance for the source IP address of NetFlow packets to change when the outgoing interface changes.

Although it is not clearly described in the document, OVS, in fact, supports multiple collectors as shown in the example below. This configuration provides redundancy of collectors.

When flow-based network management is adopted, In/Out interface number included in a flow record has a significant importance because it is often used to fitter the traffic of interest. Most commercial collector products have such a sophisticated filtering capability. Router/switch-based NetFlow exporter uses SNMP’s IfIndex to report In/Out interface number. NetFlow on OVS on the other hand uses OpenFlow port number instead to report In/Out interface number in the flow record. OpenFlow port number can be found by ovs-ofctl command as follows:

In this example, OpenFlow port number of “eth1” is 1. Some In/Out interface number have a special meaning in OVS. An interface local to the host (interfaces that is labeled as “LOCAL” in the example above) is represented as 65534. Output interface number 65535 is used for broadcast/multicast packets whereas most router/switch-based NetFlow exporters use “0” for both these two cases.

Those who are aware that IfIndex information was added to “Interface” table in OVS relatively recently may think that using IfIndex instead of OpenFlow port number is the right thing to do. May be true, but it is not that simple. For example, tunnel interface created by OVS doesn’t have an IfIndex, so it is not possible to export a flow record for traffic that traverses over tunnel interfaces if we simply chose IfIndex as NetFlow In/Out interface number.

NetFlow V5 header has a field called “Engine ID” and “Engine Type”. How these fields are set in OVS by default depends on the type of datapath. If OVS is run in a user space using netdev, Engine Type and Engine ID are derived from a hash value based on the datapath name and Engine ID and Engine Type are the most significant and least significant 8 bit of the hash value respectively. On the other hand, in the case of Linux kernel datapath using netlink, IfIndex of datapath is set to both Engine ID and Engine Type. You can find the IfIndex of OVS datapath by the following command:

If you don’t like the default value of these fields, you can configure them explicitly as shown below:

The example above shows the case where Engine Type and Engine ID are set at the same time when NetFlow configuration is created for the first time. You can also change NetFlow-related parameters after NetFlow configuration is created by doing like:

In general, typical use case of Engine Type and Engine ID is to distinguish logically separated but physically a single exporter. Good example of such a case is Cisco 6500, which has MSFC and PFC in a single chassis but having separate NetFlow export engines. In OVS case, it can be used to distinguish two or more bridges that is generating NetFlow flow records. As I mentioned earlier, the source IP address of NetFlow packet that OVS exports is determined by the standard IP stack (and it is usually not the IP address associated to the bridge interface in OVS that is NetFlow-enabled.) Therefore it is not possible to use the source IP address of NetFlow packets to tell which bridge is exporting the flow records. By setting a distinct Engine Type and Engine ID on each bridge, then you can tell it to the collector. To my knowledge, not many collectors can use Engine Type and/or Engine ID to distinguish multiple logical exporters however.

There is another use case for Engine ID. As I already explained that OVS uses OpenFlow port number as an In/Out interface number in NetFlow flow record. Because OpenFlow port number is a per bridge unique number, there is a chance for these numbers to collide across bridges. To get around this problem, you can set “add_to_interface” to true.

When this parameter is set to true, the 7 most significant bits of In/Out interface number is replaced with the 7 least significant bits of Engine ID. This will help interface number collision happen less likely.

Similar to typical router/switch-based NetFlow exporters, OVS also has a concept of active and inactive timeout. You can explicitly configure active timeout (in seconds) using the following command:

If it is not explicitly specified, it defaults to 600 seconds. If it is specified as -1, then active timeout will be disabled.

While OVS has an inactive timeout mechanism for NetFlow, it doesn’t have an explicit configuration knob for it. When flow information that OVS maintains is removed from the datapath, information about those flows will also be exported via NetFlow. This timeout is dynamic; it varies depending on many factors like the version of OVS, CPU and memory load etc., but typically it is 1 to 2 seconds in recent OVS. This duration is considerably shorter than that of typical router/switch-based NetFlow exporters which is 15 seconds in most cases.

As with most router/switch-based NetFlow exporters, OVS exports flow records for ICMP packet by filling the number calculated by “ICMP Type * 256 + Code” to source port number in a flow record. Destination port number for ICMP packets is always set to 0.

NextHop, source and destination of AS number, source and destination of Netmask are always set to 0. This is an expected behavior as OVS is inherently a “switch”.

While there are some caveats as I described above, NetFlow on OVS is a very useful tool if you want to monitor traffic handled by OVS. One advantage of NetFlow over sFlow or IPFIX is that there are so many open source or commercial collectors available today. Whatever flow collector you choose most likely supports NetFlow V5. Please give it a try. It will give you a great visibility about the traffic of your network.

BGP4 library implementation by Ruby

A while back, I posted a simple BGP4 library implemented by Ruby to GitHub. I wrote this code when I was working at Fivefront, which was selling GenieATM, a NetFlow/sFlow/IPFIX collector.


Using this library, you can easily write a code that speaks BGP4 as shown below:

I wrote this code primarily because we were so “poor” at that time (we were a startup with just 5 people) and couldn’t afford to buy a decent physical router such as Cisco or Juniper 🙂 Another reason was that we needed to inject large number of routes (e.g. 1,000,000 routes) to GenieATM, in which case Zebra did do a good job as it has to keep all routes in memory. These situations motivated me to write a stateless BGP code for route injection.

Over time, I added more features as needed and the code became more generic. Sometimes it gave me a great help, for example, when I needed to debug an issue introduced by a subtle difference of how BGP4 Capability Announcement is made by Juniper and Force10 (now DELL). Juniper announces capability by listing multiple Capability Advertisement that includes only one capability. On the other hand, Force10 announces single Capability Advertisement that includes multiple capabilities. This BGP4 library was useful to simulate these two different behaviors.

Because of the intended use case of this library code, the abstraction level of BGP was set low intentionally. It could have been abstracted more, but then it would in turn loose the flexibility of crafting a BGP message arbitrarily.

Please note that there are some limitations and constrains because this code was primarily developed to test GenieATM, a flow collector. First, while BGP session can be established from either peer in general, this code will never initiate BGP session from itself. Instead, it simply expects the peer to establish a session (a.k.a. passive mode). Second, no matter what messages are sent by the peer, it doesn’t do anything with it. In essence, this BGP4 library aims to experiment how the peer BGP implementation behaves when various kinds of BGP messages are sent.

Not to mention, this code is not a complete implementation of BGP4. Compared to other complete BGP4 implementations like Zebra/Quagga, this code is just a toy. That said, I decided to make this code available to public hoping that someone who has a similar use case as mine may find this code useful. I would like to say thanks to Tajima-san, who suggested me to publish this code to Github.

A book about NSX

It has been my desire that I would like to contribute something “visible” to the world and my dream came true (at least to some extent). I am pleased to announce that I wrote a book about VMware NSX with a couple of my colleagues, which can be found here.

VMware NSX Illustrated

VMware NSX Illustrated

This is not a translation of an existing book, instead it is written from scratch. We apology that this books is available only in Japanese for the time being, while we are getting several requests outside Japan for the translation of this book into other languages. We are not certain if such a work will take place, but we do hope it will. That said,  you can purchase it today  from www.amazon.co.jp (sorry it is not available on www.amazon.com) if you’re interested.

The table of contents looks like as follows:

  • Chapter 01 Technical Background and Definitions
  • Chapter 02 Standardization
  • Chapter 03 Some challenges in networking today
  • Chapter 04 Network Virtualization API
  • Chapter 05 Technical Details about NSX
  • Chapter 06 OpenStack and Neutron
  • Chapter 07 SDDC and NSX

among which I was responsible for the first half of Chapter 05 and some columns. While I have contributed many articles to magazines and books in the past, it was the first time for me that my name was explicitly listed as authors, so it was a profound experience.

What drove us to publish this book was the experiences in our day to day activities. Because network virtualization is a relatively new concept, we’ve often come across a situation where people don’t understand it very well or even misunderstand it sometimes. In a sense it was our fault because we haven’t been able to provide enough information about NSX publicly particularly in Japan market. To change this situation the best way we thought was to write a book about NSX and we made it!

It was not an easy journey though. It was almost an year ago when we first came up with an idea to write this book and we had many chances to give it up. I would like to say thank you for all co-authors (i.e. Mizumoto-san, Tanaka-san, Yokoi-san, Takata-san and Ogura-san) for their efforts that made this book available at vForum Tokyo 2014, which was tough I know. I am also very thankful to Mr. Maruyama (Hecula Inc.) and Mr. Hatanaka (Impress Inc.) for their night and day devotion for editing this book.

Publishing the book is not our goal. We hope that this book helps people understand network virtualization better and get more traction on NSX.

Geneve on Open vSwitch

A few weeks back, I posted a blog (sorry it was done only in Japanese) about a new encapsulation called “Geneve” which is being proposed to IETF as an Internet-Draft. Recently the first implementation of Geneve became available for Open vSwitch (OVS) contributed by Jesse Gross, a main author of Geneve Internet Draft, and the patch was upstream to a master branch on github where the latest OVS code resides. I pulled the latest source code of OVS from github and played with Geneve encapsulation. The following part of this post explains how I tested it. Since this effort is purely for testing Geneve and nothing else, I didn’t use KVM this time. Instead I used two Ubuntu 14.04 VM instances (host-1 and host-2) running on VMware Fusion with the latest OVS installed. In terms of VMware Fusion configuration, I assigned 1 Ethernet NIC on each VM which obtains an IP address from DHCP provided by Fusion by default. In the following example, let’s assume that host-1 and host-2 obtained an IP address and respectively. Next, two bridges are created (called br0 and br1), where br0 connecting to the network via eth0 while br1 on each VM talking with each other using Geneve encapsulation.

Geneve Test with Open vSwitch Geneve Test with Open vSwitch

OVS configuration for host-1 and host-2 are shown below:

Once this configuration has been done, now ping should work between br1 on each VM and those ping packets are encapsulated by Geneve.

Let’s take a closer look how Genve encapsulated packets look like using Wireshark. A Geneve dissector for Wireshark became available recently (this is also a contribution from Jesse, thanks again!) and merged into the latest master branch. Using this latest Wireshark, we can see how Geneve packet looks like as follows:

Geneve Frame by Wireshark

Geneve Frame by Wireshark

As you can see, Geneve uses 6081/udp as its port number. This is a port number officially assigned by IANA on Mar.27, 2014. Just to connect two bridges together by Geneve tunnel, there’s no need to specify a VNI (Virtual Network Identifier) specifically. If VNI is not specified, VNI=0 will be used as you can see in this Wireshark capture.

On the other hand if you need to multiplex more than 1 virtual networks over a single Geneve tunnel, VNI needs to be specified. In such a case, you can designate VNI using a parameter called “key” as an option to ovs-vsctl command as shown below:

The following is a Wireshark capture when VNI was specified as 5000 (0x1388):

Geneve Frame with VNI 5000 by Wireshark

Geneve Frame with VNI 5000 by Wireshark

Geneve is capable of encapsulating not only Ethernet frame but also arbitrary frame types. For this purpose Geneve header has a field called “Protocol Type”. In this example, Ethernet frames are encapsulated so this filed is specified as 0x6558 meaning “Transparent Ethernet Bdiging”.

As of this writing, Geneve Options are not supported (more specifically, there is no way which Geneve Options to be added to Geneve header). Please note that Geneve Options are yet to be defined in Geneve’s Internet Draft. Most likely a separate Internet Draft will be submitted to define Geneve Options sooner or later. As such a standardization process progresses, Geneve implementation in OVS will also evolve for sure.

Although Geneve-aware NIC which can perform TSO against Geneve encapsulated packets is not available on the market yet, OVS is at least “Geneve Ready” now. Geneve code is only included in the latest master branch of OVS at this point, but it will be included in the subsequent official release of OVS (hopefully 2.2). When that happens you can play with it more easily. Enjoy!