From Fedora Project Wiki
Important.png
Comments and Explanations
The page source contains comments providing guidance to fill out each section. They are invisible when viewing this page. To read it, choose the "edit" link.
Copy the source to a new page before making changes! DO NOT EDIT THIS TEMPLATE FOR YOUR FEATURE.
Important.png
Set a Page Watch
Make sure you click watch on your new page so that you are notified of changes to it by others, including the Feature Wrangler
Note.png
All sections of this template are required for review by FESCo. If any sections are empty it will not be reviewed



pacemaker-cloud

Summary

The pacemaker-cloud project demonstrates the current community work in providing application service high availability in a cloud environment.


Owner

  • Email: <sdake@redhat.com>

Current status

  • Targeted release: Fedora 16
  • Last updated: (July 4th, 2011)
  • Percentage of completion: 75%


Detailed Description

The software provides a user interface shell called pcloudsh which provides:

  • Create deployables including:
    • Create a JEOS image of F14, F15, F16, RHEL6
    • Create an assembly of F14, F15, F16, RHEL6
    • Add assemblies to a deployable
    • Add managed resources to an assembly
  • Launch a deployable, including all of its assembly images
  • Provides user interface feedback when an application or assembly fails and describe which corrective actions are taken.

The software provides daemons and init scripts which provide high availability of the deployables configured in the system:

  • Kill/restart applications if a failure is detected.
  • Kill and restart assemblies if an assembly failure is detected.

Nomenclature:

  • JEOS - just enough operating system - the bare minimum operating system required to boot a virtual machine image
  • Assembly - Composition of a JEOS image and managed resources
  • Deployable - Collection of assemblies that represent all virtual machines required to provide a specific service
  • Resource - Daemon application, such as Apache's httpd service, which is managed for high availability
  • high availability - Applying the techniques of:
    • monitoring a component for failure
    • forcibly terminating a component when a failure has been detected
    • restarting the failed component
    • providing notification to system administration so they may repair the underlying fault
    • See Pacemaker Cloud Project Slides for more details.

Benefit to Fedora

This feature provides a preview of high availability for cloud environments using a building block that is reusable in other cloud management systems. This feature provides only single node deployable high availability, but for F17 we plan to integrate with other distributed cloud management tools such as Aeolus.

Scope

This is a standalone package but has several dependencies on other parts of Fedora 16. We are in good shape relatng to dependencies, however, systemd is not LSB compliant currently resulting in our software not being able to provide high availability for F15 or Rawhide guests.

We are nearing code completion for the single node case and have some basic packaging done.

How To Test

yum install pacemaker-cloud
chkconfig pacemaker-cloud on
service pacemaker-cloud start

We have a test suite that can be run which provides automated validation the software functions properly.

Manually the following operations can be done:

root# pcloudsh
pcloudsh# jeos_create F14 x86_64
pcloudsh# assembly_create assy1 F14 x86_64
pcloudsh# assembly_clone assy1 assy2
pcloudsh# assembly_clone assy1 assy3
pcloudsh# assembly_resource_add httpd httpd assy1
pcloudsh# assembly_resource_add httpd httpd assy2
pcloudsh# assembly_resource_add httpd httpd assy3
pcloudsh# deployable_create dep1
pcloudsh# deployable_assembly_add dep1 assy1
pcloudsh# deployable_assembly_add dep1 assy2
pcloudsh# deployable_assembly_add dep1 assy3
pcloudsh# deployable_start dep1

Keep pcloudsh running and in another shell:

  1. verify application restart works properly:
    1. login to one of the assemblies and killall -9 httpd
    2. verify that httpd is restarted via pacemaker-cloud
  2. verify deployable restart works properly:
    1. Open the virtual machine manager GUI
    2. Use the force off functionality on an assembly
    3. The virtual machine manager should display that the assembly is restarted
    4. Login to the restarted virtual machine and verify httpd was restarted properly
  3. verify pcloudsh displays feedback
    1. verify failed applications indicate they are failed and restarted
    2. verify failed assemblies indicate they are failed and restarted

User Experience

The audience will notice a shell with comands which can be used to create, launch, and monitor deployables single node.

Dependencies

Previously packaged in Fedora rawhide:

glib2
dbus-glib
libxml2
libqb
pacemaker-libs
qmf
libxslt
qpid-cpp-server
qpid-cpp-client
python-qpid-qmf
matahari-service
matahari-host

Needs packaging in Fedora rawhide: oz Review Ticket

Dependency with broken functionality: systemd - systemd guests don't work properly because systemd is not LSB compliant. F14 and RHEL6 guests will work properly, but without bug fixing in systemd, F15 and F16 are nonfunctional.

Contingency Plan

If this feature is not ready by July 22, it can moved to a later Fedora version. If systemd is not LSB compliant by July 22, appropriate release notes should indicate that systemd guests are non-functional.

Documentation

Project Page

Developer Resources

Pacemaker Cloud Project Slides

Release Notes

Pacemaker-Cloud provides high availability for application services inside virtual machines on a single node. This feature provides a shell for creating virtual machine images, associating resources with the virtual machines, and combining these images into a deployable. A deployable can then be launched and monitored for high availability. If virtual machines or applications fail, these components will be restarted reducing MTTR (mean time to repair) improving availability over manual operator restart.

Fedora guest virtual machines using systemd are non-functional with this feature and that fact may need release notes as to not confuse the audience. See systemd defect 702621 and systemd defect 629040 discussion.

Comments and Discussion