Features/Opensharedroot

= Open Sharedroot =

Summary
The open sharedroot project provides abilities to boot multiple linux systems with the same root filesystem providing a single system filesystem based cluster.

Owner

 * Name: Marc Grimme
 * email: grimme@atix.de

Current status

 * Targeted release: Fedora 12
 * Last updated: 2009-10-09
 * Percentage of completion: 100%

Detailed Description
The purpose of this feature is to provide an open and flexible filesystem based single system image (SSI) linux cluster.

Currently the following filesystems can be used with fedora 11:


 * NFSv3, NFSv4


 * GFS2


 * Ocfs2


 * Ext3 as local filesystem

Basically it consists of three different software components and some small changes to the initprocess.

1. The initrd (comoonics-bootimage) to boot such a system. As it is more complex to boot from a cluster filesystem or nfs in order to use in a sharedroot configuration we need a new concept of the initrd.

2. The clustertools provide access to clusterfunctionality like querying the cluster for amount of nodes and configuration. This is organized under the software component comoonics-clustersuite.

3. The management tools for building a cdsl structure (context dependent symbolic links) and managing cdsl files. This is organized under the software component comoonics-cdsls. The cdsl concept is based on bindmounts.

The changes to the initprocess are already filed in bugzillas:


 * https://bugzilla.redhat.com/show_bug.cgi?id=496843


 * https://bugzilla.redhat.com/show_bug.cgi?id=496854


 * https://bugzilla.redhat.com/show_bug.cgi?id=496857


 * https://bugzilla.redhat.com/show_bug.cgi?id=496859


 * https://bugzilla.redhat.com/show_bug.cgi?id=496861

Benefit to Fedora
Being able to boot multiple nodes from the same root filesystem. Enabling fedora to be a filesystem based single system image cluster.

Scope
Except from the small changes that have to be accepted for the initprocess. Everything else is already working for FC11, RHEL5 and RHEL4. So only the migration to FC12 has to be made.

Testenvironment
We propose a preinstalled FC12 KVM machine (called installnode) which is installed as need be.

We propose that this cluster is installed on a libvirt/KVM based Maschine as a two node cluster. Libvirt is installed as standard and the network that is NATed is called *default* and has the network 192.168.122.0/24 mapped (as it is default).

There is a NFS share /mnt/virtual/nfsosr/fc12 exported on the KVM Hostmaschine as follows:

/mnt/virtual/nfsosr/fc12 192.168.122.0/255.255.255.0(rw,fsid=0,no_subtree_check,sync,no_root_squash)

The libvirt configurationfiles for the two vms can be found at (node1, node2):


 * http://www.open-sharedroot.org/documentation/files/qemu-kvm-osr-nfs-fc11-node1
 * http://www.open-sharedroot.org/documentation/files/qemu-kvm-osr-nfs-fc11-node2

Prerequisites

 * None except a libvirt configuration when using a virtualized libvirt based cluster.


 * A running DHCP/TFTP/PXE infrastructure for autobooting the cluster.

Install OSR packages
First install dracut at least version 0.7 (see dracut).

Decide how to boot the cluster
Then decide how you want to boot the cluster.

There are basically two ways to boot a open sharedroot cluster with Fedora 12.

Either you want to specify all parameters at boot time as parameters given to the boot loader. Or you want to have those parameters set in the initrd via the cluster configuration build into the initrd.

The first way will be called "static initrd" as you might not have to change the initrd when changing the cluster. And the second will be called "full featured initrd" as

RPMs for the static initrd
Either execute

yum install osr-dracut-module comoonics-cdsl-py

Or

install the following rpms (always take the latest version available):


 * http://www.open-sharedroot.org/development/osr-dracut-module/osr-dracut-module-0.8-3.noarch.rpm


 * http://www.open-sharedroot.org/development/comoonics-cluster-py/comoonics-cluster-py-0.1-21.noarch.rpm


 * http://www.open-sharedroot.org/development/comoonics-cdsl-py/comoonics-cdsl-py-0.2-16.noarch.rpm


 * http://www.open-sharedroot.org/development/comoonics-base-py/comoonics-base-py-0.1-3.noarch.rpm

RPMs for the full featured initrd
Either execute

yum install osr-dracut-module comoonics-cdsl-py osr-dracut-module-cluster

Or

install the following rpms (always take the latest version available):


 * http://www.open-sharedroot.org/development/osr-dracut-module/osr-dracut-module-0.8-3.noarch.rpm


 * http://www.open-sharedroot.org/development/comoonics-cluster-py/comoonics-cluster-py-0.1-21.noarch.rpm


 * http://www.open-sharedroot.org/development/comoonics-cdsl-py/comoonics-cdsl-py-0.2-16.noarch.rpm


 * http://www.open-sharedroot.org/development/comoonics-base-py/comoonics-base-py-0.1-3.noarch.rpm


 * http://www.open-sharedroot.org/development/osr-dracut-module/osr-dracut-module-cluster-0.8-2.noarch.rpm

Create a cluster configuration file
Create a cluster configuration file /etc/cluster/cluster.conf with the com_info tags.

Note, that the following cluster configuration still needs a valid fencing configuration for a properly working cluster::

           

Create the shared root filesystem
The shared root filesystem must be a moutable nfs export as a shared nfs resource.:

On installnode mount the new filesystem to '/mnt/virtual/nfsosr/fc11':

mkdir /mnt/newroot mount -t nfs 192.168.122.1:/mnt/virtual/nfsosr/fc11 /mnt/newroot/

Copy all data from the local installed fedora root filesystem to the shared root filesystem:

cp -ax / /mnt/newroot/ cp /boot/*$(uname -r)* /mnt/newroot/boot

Create some directories if need be::

mkdir /mnt/newroot/proc mkdir /mnt/newroot/sys

Create a new cdsl infrastructure on the shared root filesystem::

com-mkcdslinfrastructure -r /mnt/newroot

Mount the local cdsl infrastructure::

mount --bind /mnt/newroot/cluster/cdsl/1/ /mnt/newroot/cdsl.local/

Mount other deps to be able to chroot::

mount -t proc proc /mnt/newroot/proc mount -t sysfs none /mnt/newroot/sys mount --bind /dev /mnt/newroot/dev chroot /mnt/newroot

Make '/var' hostdependent::

com-mkcdsl -a /var Make '/var/lib' shared again::

com-mkcdsl -s /var/lib

Make '/etc/sysconfig/network' hostdependent::

com-mkcdsl -a /etc/sysconfig/network

Edit the hostdependent network files and change the hostnames:

vi /cluster/cdsl/?/etc/sysconfig/network

Inserd this for Node 1 for exsample:

NETWORKING=yes HOSTNAME=node1

Create '/etc/mtab' link to '/proc/mounts'::

cd /etc/ rm -f mtab ln -s /proc/mounts mtab

Remove (only) cluster network configuration::

rm -f /etc/sysconfig/network-scripts/ifcfg-eth0

Modify '/etc/fstab'::

devpts                 /dev/pts                devpts  gid=5,mode=620  0 0 tmpfs                  /dev/shm                tmpfs   defaults        0 0 proc                   /proc                   proc    defaults        0 0 sysfs                  /sys                    sysfs   defaults        0 0

Disable selinux:

[root@install-node3 comoonics]# cat /etc/sysconfig/selinux # This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: #      enforcing - SELinux security policy is enforced. #      permissive - SELinux prints warnings instead of enforcing. #      disabled - No SELinux policy is loaded. SELINUX=disabled # SELINUXTYPE= can take one of these two values: #      targeted - Targeted processes are protected, #      mls - Multi Level Security protection. SELINUXTYPE=targeted

Disable the NetworkManager:

chkconfig NetworkManager off chkconfig kudzu off

Create boot configuration
Create boot configuration based on PXE or any other possibility.

Create Shared Root initrd

On installnode create the shared root initrd into the shared boot filesystem:

For the static initrd use:

dracut -f -a "osr" /boot/initrd_sr-$(uname -r).img $(uname -r)

or for the full featured use:

dracut -f -a "network nfs base osr osr-cluster" /boot/initrd_sr-$(uname -r).img $(uname -r)

Also think about the following optional dracut modules that will help:

There is (should be as it is not pushed upstream yet) a syslog module for dracut. This will provide syslog output during boot time so that debugging is much easier.
 * syslog

If having problems during booting also the debug module can help. Also see dracut debug.
 * debug

prepare TFTP-Server
Now push your new initrd and Kernel in your TFTP-Server (PXE boot).

Clean up
On installnode exist the chroot and:

exit umount /mnt/newroot/cdsl.local umount /mnt/newroot/dev umount /mnt/newroot/proc umount /mnt/newroot/sys umount /mnt/newroot

Important notes on boot parameters
If you decided to use the static initrd approach you must specify the boot parameters for the osr module to detect the nodeid and network configuration. If you choose the full featured approach you can use boot parameters to overwrite default settings.

The following boot parameters influence the way OSR is build.


 * ip (required for static): the network configuration syntax:

ip=::: :: :[dhcp|on|any|none|off]


 * root (required for static): the root filesystem

root=nfs[4]:[server:]path[:options]


 * nodeid (required for static): the nodeid of this node

nodeid=

TFTP config example
Here a config example for TFTP / pxe boot (for the full featured):

Example A:

timeout 100 prompt 1 default NFS-dracut-Cluster LABEL NFS-dracut-Cluster MENU LABEL NFSdracut NFS Configuration KERNEL /OR-nfsdracut/vmlinuz APPEND initrd=/nfsdracut/initrd_sr root=nfs:192.168.122.1:/mnt/virtual/nfsosr/fc11 rd.shell rd.debug
 * 1) Menus
 * 2) for OSR based on NFS

Example B:

timeout 100 prompt 1 default NFS-dracut-Cluster LABEL NFS-dracut-Cluster MENU LABEL NFSdracut NFS Configuration KERNEL /OR-nfsdracut/vmlinuz APPEND initrd=/nfsdracut/initrd_sr rd.shell rw
 * 1) Menus
 * 2) for OSR based on NFS

Boot the nodes
On the host node boot the nodes:

virsh create /node1.xml virsh create /node2.xml You can now use those two node as it would be one.

Have Fun !!

User Experience
This project has been devolped since 8 years. We know of some hundreds of RHEL4/5, FC11 clusters that are running productivly for years. This concept is also supported by Red Hat on RHEL.

Dependencies
See bugzillas above. Basically changes are needed in initscripts and SysVinit (some are already integrated in this package).

Contingency Plan
None necessary.

Documentation and Ressources

 * Project open-sharedroot.org
 * Buzilla Instance: bugzilla.open-sharedroot.org
 * Mailing List: open-sharedroot-users@lists.sourceforge.net, open-sharedroot-devel@lists.sourceforge.net
 * Git-Repository: github.com
 * Red Hat Magazine
 * Red Hat/ATIX/SAP Whitepaper (PDF)

Release Notes
Fedora now provides the ability to create filesystem based Single System Image Clusters. A server with a shareable root filesystem (only NFS3/4 up to now) is able to share the root filesystem with multiple other nodes. Hostdependent files and directories can also be managed (see Open Sharedroot).

Comments and Discussion

 * See Talk:Features/Opensharedroot