Features/VirtNetworkingEnhancements

Summary
Enhance networking support in libvirt

Owner

 * Name: Laine Stump
 * Email: laine at redhat dot com

Current status

 * Targeted release: Fedora 16
 * Last updated: 2011-07-26
 * Percentage of completion: 90%

Background
This document discusses three improvements to libvirt's networking support:

1. Logical Abstraction Between Guest Config and Host Config
Currently, the details of how a libvirt-managed virtual guest is connected to the network are all contained in the element(s) of the guest's domain configuration, which is an XML document. This is very flexible, allowing several different types of connection (virtual network, host bridge, direct macvtap connection to physical interface, qemu usermode, user-defined via an external script), but currently has the problem that unnecessary details of the virtualization host resources are embedded into the guest's config (the most obvious item being the name of the physical interface or bridge on the host that is being used for the connection); if the guest is migrated to a different host, and that host has a different hardware or network config (or possibly the same hardware, but that hardware is currently in use by a different guest), the migration will fail. This makes the requirements for migration very rigid.

Another problem with this system is that a large host may have multiple physical network interfaces (or SRIOV virtual functions) which need to be shared among the guests, and when a guest starts up, the physical interface that it previously used may already be in use by a different guest (some modes of direct/macvtap connection allow only a single guest at a time to use each physical device).

2. Transactional Host Network Configuration Changes
It is often necessary to change the network configuration of virtualization hosts, especially during initial deployment. libvirt offers an API (and a series of shell commands) to enable making these changes remotely via libvirt. However, an incorrect change to the networking may leave the host unreachable, and getting the network back to a usable state may be difficult or even impossible.

3. Quality of Service
Sometimes administrators need to limit domain's interface(s) and/or whole virtual network bandwidth. For example when one domain serves streaming data, but the others don't have so much compute capacity to consume data at so high speed. In another scenario, host has limited connection which is shared between domains. But one domain consume all of it. Currently, only bandwidth limitation is supported in libvirt.

1. Logical Network Abstraction
All configuration information dealing with details of the host physical hardware will be optionally moved to an expanded libvirt network definition (created/managed vir libvirt's virNetwork API (a.k.a. "virsh net-*"), which can also include a pool of physical interfaces for use by guests connecting via that network. Guest definitions will reference these networks rather than referencing the physical hardware directly. This way, a guest interface can be defined to connect via "network X", and as the guest moves from one host to another, it will simply look up the specifics in each host's "network X" definition, and setup the connection appropriately.

2. Transactional Host Network Configuration Changes
Three new API functions will be provided as a part of the existing virInterface* API in libvirt: virInterfaceChangeBegin, virInterfaceChangeCommit, and virInterfaceChangeRollback. These functions are really just a frontend for similar new functions in the netcf library (as are all of the virInterface* functions in libvirt. The parallel functions in netcf likewise call through to a new initscript (installed as part of netcf) that does the following: 1) for "change begin" the current state of all interface configuration related files in /etc/sysconfig/network-scripts is copied into a "snapshot" directory, 2) for "change commit", this snapshot directory is deleted, or 3) for "change rollback", the newly created config files are removed, and replaced with the ones from the snapshot directory (netcf adds the functionality of calling ifup for deleted interfaces that are added back as part of the rollback, ifdown for new interfaces that are removed, and ifdown/ifup for existing interfaces that are changed). The same initscript is run at boot time - if it sees that interface configuration changes have been made, and they haven't yet been committed, it assumes that these changes made the host unreachable, and performs an automatic rollback.

3. Quality of Service
The domain's interface XML and netowrk XML were extended to allow new bandwidth element. This can have at most one inbound and ot most one outbound child elements. Both of these accept three attributes defining traffic shaping. Average attribute sets a long-term limit around which should traffic of shaped device float. Then both elements have peak and burst attributes. Basically, they mean: device can send at most $(burst) amout of data at $(peak) size. If it wants to send more, after sending at (usually higher) peak rate, it will fallback to average rate. For example, if average is set to 100 units, peak to 200 and burst to 50, after sending those 50 units of data at rate 200, the rate will fall down to 100, so from a longer perspective device rate is 100 units. Speaking of units, attributes in above XMLs are strictly numerical, but they are threated in kilobytes for burst attribute, and kilobytes per second for average and peak attributes. Mandatory attribute is average only, the remaining two are optional.

Internally, tc command (iproute package) is being used to set QoS. But please don't rely on this as it's libvirt internal that might change over time. It is now important to note one thing, domain's outgoing traffic is actually host incoming and vice versa. Therefore egress qdisc is tied with inbound element, and ingress qdisc with outbound element. On egress qdisc a HTB class is hanged with appropiate parameters set, on ingrees qdisc a filter with appropiate limits is hanged.

Benefit to Fedora
Deployment and provisioning of large virt installations (i.e. with multiple hosts and migrating guests) will be made simpler.

Scope
As described above, changes are required in libvirt, as well as netcf (for item 2 only). Eventually the tools using libvirt should be updated to take advantage of the new features, but that is not immediately necessary, nor within the scope of this work.

TODO

 * for item 2 (Transactional Interface Configuration Changes), when a rollback is done, netcf doesn't yet bounce the interfaces that need it (ifup/ifdown). This doesn't require any API changes, only some additional behind-the-scences code in netcf.


 * for item 3, extend support for as many interface types as possible

Completed

 * All of item 1 (Logical Abstraction) is code complete, has gone through basic testing, and is waiting on upstream review to be pushed into upstream libvirt. Barring unexpected problems in review or testing, it will be in libvirt-0.9.4, which will be released at the end of July.


 * The only piece of item 2 (Transaction Interface Configuration Changes) that isn't completed is outlined in the TODO section above. The rest is completed, tested, in an upstream release of both libvirt (0.9.2) and netcf (0.1.8), and both of these are already included in Rawhide.


 * Item 3 works for type 'network' and 'direct', and for virtual network with virtual bridge.

1. Logical Network Abstraction
1) Start with an existing configuration that connects to the network via a host bridge:

...        ...     ...

1a) define a new network called 'br0-net':

test-bridge  

1b) modify the configuration to reference this network:

...        ...     ... 1c) net-start test-bridge, then stop and re-start the guest domain, and verify that it still has connectivity

---

2) Start with an existing configuration that connects to the network directly via a physical interface (i.e. "macvtap" mode):

...        ...     ...

2a) define a new network called 'eth0-net':

test-direct   ...

2b) modify the configuration to reference this network:

...        ...     ... 2c) net-start test-direct, then stop and re-start the guest domain, and verify that it still has connectivity

2. Transactional Host Network Configuration Changes
1. # virsh iface-begin 2. (remove an interface with virsh iface-undefine, modify one with iface-edit, add a new one with iface-define) 3. # virsh iface-rollback (or reboot the machine) 4. (verify that the original interface configuration is restored).

---

Perform another test similar to the previous, except make changes that won't result in a non-working machine, then run "virsh iface-commit" and verify that the new interface config remains in effect, even after a system reboot of the host.

3. Quality of Service
1. Set limit on domain's interface and/or virtual network 2. Startup (a few) domain(s) 3. Observe traffic being shaped


 * See: http://libvirt.org/formatdomain.html#elementQoS for documentation of the domain QoS XML.
 * See: http://libvirt.org/formatnetwork.html#elementsConnect for documentation of the network QoS XML.

User Experience
See the previous section.

Dependencies
None, outside of the implementation efforts detailed above (i.e. libvirt, netcf and iproute).

Contingency Plan
Administrators can continue configuring their host and guest networking as they did in the past.

Documentation
Logical Network Abstraction:


 * patch series (including design document) for Logical Network Abstraction
 * Upstream Bugzilla record

Transactional Interface Configuration Change:


 * patch and description of netcf initscript
 * patch and description of netcf API
 * libvirt patch series
 * Upstream Bugzilla record

Quality of Service:


 * patchset and description of QoS
 * Documentation for domain's interface
 * Documentation for network

Release Notes
This version of libvirt adds the ability to remove details of virt host physical hardware from guest configuration, making it simpler to migrate hosts from one guest to another, or to operate in a large scale environment where there are a large number of physical interfaces (or SRIOV virtual functions).

Part 2. of the functionality allows safely changing virt host interface configuration without fear of making an incorrect change that will leave the host unusable.

Part 3. of the functionality allows to set rate limits on domain's interfaces and/or virtual network bridges.

Comments and Discussion

 * See Talk:Features/VirtNetworkingEnhancements