From Fedora Project Wiki
(Created page with " = Virtual Machine Lock Manager <!-- The name of your feature --> = == Summary == The virtual machine lock manager is a daemon which will ensure that a virtual machine's disk im...")
 
No edit summary
Line 1: Line 1:
= Virtual Machine Lock Manager <!-- The name of your feature --> =
= Virtual Machine Lock Manager <!-- The name of your feature --> =


Line 35: Line 34:
There are no special hardware requirements for testing this feature, beyond those already required for running QEMU/KVM virtual machines.
There are no special hardware requirements for testing this feature, beyond those already required for running QEMU/KVM virtual machines.


Single host testing:
=== Single host testing ===


  - Install the standard libvirt + QEMU/KVM virtualization packages
  - Install the standard libvirt + QEMU/KVM virtualization packages
Line 42: Line 41:
  - Add the following XML to the configuration of both virtual machines
  - Add the following XML to the configuration of both virtual machines


   
      <disk type='file' device='disk'>
0. What special hardware / data / etc. is needed (if any)?
        <source file='/var/lib/libvirt/images/extra.img'/>
1. How do I prepare my system to test this feature? What packages
        <target dev='vdb' bus='virtio'/>
need to be installed, config files edited, etc.?
      </disk>
2. What specific actions do I perform to check that the feature is
- Start the first virtual machine
working like it's supposed to?
- Attempt to start the second virtual machine
3. What are the expected results of those actions?
 
-->
The last step should fail, with a message that the disk image is already in use.
 
  - Stop the first virtual machine
  - Attempt to start the second virtual machine
 
The second VM should now successfully run
 
 
=== Dual host testing ===
 
- Install the standard libvirt + QEMU/KVM virtualization packages on both hosts
- Mount an NFS volume at /var/lib/libvirt/lockd on both hosts
- Restart the virtlockd service
- Provision a virtual machine
- Copy the virtual machine configuration to the second host
 
        virsh dumpxml myguest > myguest.xml
        virsh -c qemu+ssh://otherhost/system define myguest.xml
 
- Start the virtual machine on the first host
- Attempt to start the virtual machine on the second host
 
The last step should fail, with a message that the disk image is already in use.
 
- Stop the virtual machine on the first host
- Attempt to start the virtual machine on the second host
 
The VM should now succesfully run on the second host
 
=== Migration testing ===
 
- As per "Dual host testing"
- Attempt to migrate the running VM from the first host to the second host
 
=== Libvirtd failure testing ===
 
- As per 'Single host testing"
- Start the first virtual machine
- Stop the libvirtd daemon, without stopping the VM
- Delete the files /var/run/libvirt/qemu/myguest.{pid,xml}  (this ophans the VM from libvirtd)
- Start the libvirtd daemon
- Attempt to start the first virtual machine again
 
The last step should fail, with a message that the disk image is already in use.
 
- Find the orphaned QEMU process and manually kill it
- Attempt to start the first virtual machine again
 
The VM should now once again run successfully


== User Experience ==
== User Experience ==
<!-- If this feature is noticeable by its target audience, how will their experiences change as a result?  Describe what they will see or notice. -->
 
End users should see no difference in behaviour of QEMU/KVM virtualization during normal
operation.
 
They will be prevented from making certain configuration/operational mistakes which would
otherwise result in the same disk image being run twice
 
In the event of a total virtualization host failure, the NFS server may still hold locks for the dead host which will not be released. This will prevent VMs being started on a new host. To recover from this scenario, ensure the dead host is truely dead (hardware cluster fencing agents are a good option). Then manually force the release of locks from the dead host on the NFS server.


== Dependencies ==
== Dependencies ==
<!-- What other packages (RPMs) depend on this package?  Are there changes outside the developers' control on which completion of this feature depends?  In other words, completion of another feature owned by someone else and might cause you to not be able to finish on time or that you would need to coordinate?  Other upstream projects like the kernel (if this is not a kernel feature)? -->
 
The feature is confined to the 'libvirt' package


== Contingency Plan ==
== Contingency Plan ==
<!-- If you cannot complete your feature by the final development freeze, what is the backup plan?  This might be as simple as "None necessary, revert to previous release behaviour."  Or it might not.  If you feature is not completed in time we want to assure others that other parts of Fedora will not be in jeopardy. -->
 
In the event of the virtlockd daemon not working as expected, the default libvirt driver configuration will be changed to use the 'nop' lock manager. This is a lock manager which does nothing, and so is equivalent to the functionality of previous Fedora releases.


== Documentation ==
== Documentation ==
<!-- Is there upstream documentation on this feature, or notes you have written yourself?  Link to that material here so other interested developers can get involved. -->
* http://libvirt.org/locking.html  NB: not yet updated to describe virtlockd
*


== Release Notes ==
== Release Notes ==
<!-- The Fedora Release Notes inform end-users about what is new in the release. Examples of past release notes are here: http://docs.fedoraproject.org/release-notes/ -->
 
<!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns.  If there are any such changes involved in this feature, indicate them here.  You can also link to upstream documentation if it satisfies this need.  This information forms the basis of the release notes edited by the documentation team and shipped with the release. -->
* The QEMU/KVM virtualization driver in libvirt now enforces exclusive access to the virtual machine disk images on a single host. This prevents multiple guests being started with the same disk image, unless the <shareable/> flag is set for the disk
*
* At administrator discretion, a shared filesystem (eg NFS) can be mounted at /var/lib/libvirt/lockd to extend the protection across multiple hosts in a network
* If configuring locking across multiple hosts it is important to ensure that all disk image paths are globally unique across all hosts sharing the same NFS mount, and that block devices use the stable unique names under /dev/disk/by-path/ and not the unstable /dev/sdNN names


== Comments and Discussion ==
== Comments and Discussion ==

Revision as of 11:01, 7 July 2011

Virtual Machine Lock Manager

Summary

The virtual machine lock manager is a daemon which will ensure that a virtual machine's disk image cannot be written to by two QEMU/KVM processes at the same time. It provides protection against starting the same virtual machine twice, or adding the same disk to two different virtual machines.

Owner

Current status

  • Targeted release: Fedora 16
  • Last updated: (DATE)
  • Percentage of completion: 80%

Detailed Description

Virtual machines running via the QEMU/KVM platform do not currently acquire any kind of lock when starting up. This means it is possible for the same virtual machine to be accidentally started more than once, or for the same disk image to be accidentally added to two different virtual machines. The result of such a mistake is likely to be catastrophic destruction of the virtual machines filesystem.

The virtual machine lock manager is a framework embedded in the libvirtd daemon that allows for pluggable locking mechanisms. Out of the box, libvirt will provide a daemon "virtlockd" that will maintain locks for all running virtual machines on a host. This will protect against adding the same disk to two different virtual machines, and protect against libvirtd bugs where it might "forget" about a previously running virtual machine. If the administrator mounts a suitable shared filesystem (eg, NFS) in /var/lib/libvirt/lockd then the lock manager protection will be extended to all hosts shared that filesystem.

There will also be a separate, 3rd party, lock manager implementation available called "sanlock". This is expected to be the subject of a separate Fedora feature, so will not be discussed here further.

Benefit to Fedora

Hosts running virtual machines for QEMU/KVM will have much stronger protection against administrator host/cluster configuration mistakes. This will reduce the risk that a virtual machines' disk image will become corrupted as a result.

Scope

The changes are confined to the libvirt package. It will include

- A new daemon 'virtlockd' with systemd service + socket files
- virtlockd will be enabled by default on all hosts currently running 'libvirtd'
- The /etc/libvirt/qemu.conf file will gain a configuration parameter to set the lock manager implementation

How To Test

There are no special hardware requirements for testing this feature, beyond those already required for running QEMU/KVM virtual machines.

Single host testing

- Install the standard libvirt + QEMU/KVM virtualization packages
- Provision two virtual machines
- Create a third disk image  (eg dd if=/dev/zero of=/var/lib/libvirt/images/extra.img bs=1M count=100)
- Add the following XML to the configuration of both virtual machines
     <disk type='file' device='disk'>
       <source file='/var/lib/libvirt/images/extra.img'/>
       <target dev='vdb' bus='virtio'/>
     </disk>
- Start the first virtual machine
- Attempt to start the second virtual machine

The last step should fail, with a message that the disk image is already in use.

 - Stop the first virtual machine
 - Attempt to start the second virtual machine

The second VM should now successfully run


Dual host testing

- Install the standard libvirt + QEMU/KVM virtualization packages on both hosts
- Mount an NFS volume at /var/lib/libvirt/lockd on both hosts
- Restart the virtlockd service
- Provision a virtual machine
- Copy the virtual machine configuration to the second host
        virsh dumpxml myguest > myguest.xml
        virsh -c qemu+ssh://otherhost/system define myguest.xml
- Start the virtual machine on the first host
- Attempt to start the virtual machine on the second host

The last step should fail, with a message that the disk image is already in use.

- Stop the virtual machine on the first host
- Attempt to start the virtual machine on the second host

The VM should now succesfully run on the second host

Migration testing

- As per "Dual host testing"
- Attempt to migrate the running VM from the first host to the second host

Libvirtd failure testing

- As per 'Single host testing"
- Start the first virtual machine
- Stop the libvirtd daemon, without stopping the VM
- Delete the files /var/run/libvirt/qemu/myguest.{pid,xml}  (this ophans the VM from libvirtd)
- Start the libvirtd daemon
- Attempt to start the first virtual machine again

The last step should fail, with a message that the disk image is already in use.

- Find the orphaned QEMU process and manually kill it
- Attempt to start the first virtual machine again

The VM should now once again run successfully

User Experience

End users should see no difference in behaviour of QEMU/KVM virtualization during normal operation.

They will be prevented from making certain configuration/operational mistakes which would otherwise result in the same disk image being run twice

In the event of a total virtualization host failure, the NFS server may still hold locks for the dead host which will not be released. This will prevent VMs being started on a new host. To recover from this scenario, ensure the dead host is truely dead (hardware cluster fencing agents are a good option). Then manually force the release of locks from the dead host on the NFS server.

Dependencies

The feature is confined to the 'libvirt' package

Contingency Plan

In the event of the virtlockd daemon not working as expected, the default libvirt driver configuration will be changed to use the 'nop' lock manager. This is a lock manager which does nothing, and so is equivalent to the functionality of previous Fedora releases.

Documentation

Release Notes

  • The QEMU/KVM virtualization driver in libvirt now enforces exclusive access to the virtual machine disk images on a single host. This prevents multiple guests being started with the same disk image, unless the <shareable/> flag is set for the disk
  • At administrator discretion, a shared filesystem (eg NFS) can be mounted at /var/lib/libvirt/lockd to extend the protection across multiple hosts in a network
  • If configuring locking across multiple hosts it is important to ensure that all disk image paths are globally unique across all hosts sharing the same NFS mount, and that block devices use the stable unique names under /dev/disk/by-path/ and not the unstable /dev/sdNN names

Comments and Discussion