From Fedora Project Wiki
Line 95: Line 95:
<!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns.  If there are any such changes involved in this feature, indicate them here.  You can also link to upstream documentation if it satisfies this need.  This information forms the basis of the release notes edited by the documentation team and shipped with the release. -->
<!-- The release notes also help users know how to deal with platform changes such as ABIs/APIs, configuration or data file formats, or upgrade concerns.  If there are any such changes involved in this feature, indicate them here.  You can also link to upstream documentation if it satisfies this need.  This information forms the basis of the release notes edited by the documentation team and shipped with the release. -->


There are a few local filesystem operations which are not supported, or which are slightly different on GFS2. Here are the main things to watch out for:
There are a few local file system operations that are not supported, or that are slightly different on GFS2. Here are the main things to watch out for:
* The flock() system call is not interruptible [https://bugzilla.redhat.com/show_bug.cgi?id=421321 Bug #421321] - maybe fixed before release
 
* The fcntl() F_GETLK returns a pid which may, or may not be on the current node (there is no way to indicate the node on which the process exists with the current interface - beware if you have an application that uses this interface to get a pid to send signals to)
* The flock() system call is not interruptible [https://bugzilla.redhat.com/show_bug.cgi?id=421321 Bug #421321] - maybe fixed before release.
* leases are not supported with lock_dlm, but they are supported with lock_nolock
* The fcntl() F_GETLK returns a pid which may, or may not be on the current node (there is no way to indicate the node on which the process exists with the current interface - beware if you have an application that uses this interface to get a pid to send signals to).
* Locking is based upon a single lock per inode. Applications which either write to a single file from multiple nodes or which insert/remove lots of files from a single directory will be slow. This is the single most frequently asked question regarding GFS/GFS2 performance and often occurs in relation to email/imap spool directories. The answer is each case is to break up the single large spool into separate directories, and to try to keep each set of files "local" to one node, so far as possible. Likewise, don't try to mmap() a file and use it as distributed shared memory, it will work, but it will be so slow that it makes no sense to do so.
* Leases are not supported with lock_dlm, but they are supported with lock_nolock.
* If you've used previous releases of GFS/GFS2 you might be wondering where the "lock modules" have got to. The answer is that they have been merged into the main GFS2 module, so you no longer need to load them separately, the mount options have remained the same though. (N.B. The final part of this is still in the -nmw git tree, but it will be merged in the next merge window)
* Locking is based upon a single lock per inode. Applications which either write to a single file from multiple nodes or which insert/remove lots of files from a single directory will be slow. This is the single most frequently asked question regarding GFS/GFS2 performance and often occurs in relation to email/imap spool directories. The answer in each case is to break up the single large spool into separate directories, and to try to keep each set of files "local" to one node, as far as possible. Likewise, don't try to mmap() a file and use it as distributed shared memory: it will work, but it will be so slow that it makes no sense to do so.
* fallocate is not supported (but is on the TODO list [https://bugzilla.redhat.com/show_bug.cgi?id=455572 Bug #455572])
* If you've used previous releases of GFS/GFS2 you might be wondering where the "lock modules" have got to. The answer is that they have been merged into the main GFS2 module, so you no longer need to load them separately.  The mount options have remained the same though. (N.B. The final part of this is still in the -nmw git tree, but it will be merged in the next kernel.org merge window).
* XIP is not supported (but is also on the TODO list [https://bugzilla.redhat.com/show_bug.cgi?id=455570 Bug #455570])
* fallocate is not supported, but is on the TODO list [https://bugzilla.redhat.com/show_bug.cgi?id=455572 Bug #455572].
* FIEMAP is supported, but currently only for regular files and not for xattrs (again the xattr extension is on the TODO list)
* XIP is not supported, but is also on the TODO list [https://bugzilla.redhat.com/show_bug.cgi?id=455570 Bug #455570]).
* The internal glock state of GFS2 is accessible via debugfs
* FIEMAP is supported, but currently only for regular files and not for xattrs (again the xattr extension is on the TODO list).
* dnotify will work on a "same node" basis, but its use with GFS2 is not recommended
* The internal glock state of GFS2 is accessible via debugfs.
* inotify will work on a "same node" basis, but we don't currently recommend its use
* dnotify will work on a "same node" basis, but its use with GFS2 is not recommended.
* inotify will work on a "same node" basis, but we don't currently recommend its use.


== Comments and Discussion ==
== Comments and Discussion ==

Revision as of 16:49, 4 February 2009


Feature Name

A stable version of the GFS2 cluster filesystem

Summary

A cluster filesystem allowing simultaneous access to shared storage from multiple nodes, designed for SAN environments. It is also possible to use GFS2 as a single node (local) filesystem by selecting the "lock_nolock" locking protocol.

Owner

  • email: <swhiteho@redhat.com>
  • mailing list: <cluster-devel@redhat.com>

Current status

  • Targeted release: Fedora 40
  • Last updated: (04 Feb 2009)
  • Percentage of completion: 90%

Detailed Description

GFS2 is part of the upstream kernel, but is still listed as experimental. The plan is that this will become stable before the release of F-11. Also the gfs2-utils package is part of Fedora already, and again we hope to declare this stable before F-11.

Benefit to Fedora

The main benefit is a stable cluster filesystem which works seamlessly with the Red Hat cluster infrastructure.

Scope

Most of the remaining work now is testing and bug fixing.

How To Test

Read the docs, create a filesystem, run an application on it, check to see whether there are any problems/bugs and if so report them via the usual bugzilla process.

We will also be running the Red Hat QE tests, some performance tests and basically anything else that we can get our hands on in order to try and cover as many possible tests as possible. Any filesystem test suite would be a good thing to test with, whether for performance or correctness. We also want to see lots of testing with real applications, Apache, Samba, NFS (over GFS2), exim, sendmail, yourfavouriteapplicationhere, etc. Basically anything that uses the filesystem.

You don't need any special hardware to do single node tests - you can create a filesystem in a single file and mount it loopback. For multiple node tests you will need some shared storage (iSCSI, FC, or some other kind of SAN) plus a method of fencing failed nodes (this can be done manually if you don't have any fencing hardware, but power switches and/or remote access controllers are recommended).

If everything is working correctly, the results should be exactly the same as you'd expect running the application on a local filesystem. One point to watch though is that many applications are not written to run in a clustered environment, so if you are expecting multiple copies of an application to share the same set of data files, then please check that the application does support this mode of operation first. Usually it will require some method for inter-node communication at the application level.


User Experience

The GFS2 filesystem allows sharing of a filesystem across multiple nodes in an HA environment.

Dependencies

This feature depends on the cman package, the corosync package and the dlm kernel module, which are already part of Fedora.

Contingency Plan

If this is not ready in time, we can just push out that date at which we consider GFS2 stable. There are no other packages at the moment which depend on this feature. Bearing in mind that this is almost complete, it is fairly unlikely that we will have to do this.

Documentation

Release Notes

There are a few local file system operations that are not supported, or that are slightly different on GFS2. Here are the main things to watch out for:

  • The flock() system call is not interruptible Bug #421321 - maybe fixed before release.
  • The fcntl() F_GETLK returns a pid which may, or may not be on the current node (there is no way to indicate the node on which the process exists with the current interface - beware if you have an application that uses this interface to get a pid to send signals to).
  • Leases are not supported with lock_dlm, but they are supported with lock_nolock.
  • Locking is based upon a single lock per inode. Applications which either write to a single file from multiple nodes or which insert/remove lots of files from a single directory will be slow. This is the single most frequently asked question regarding GFS/GFS2 performance and often occurs in relation to email/imap spool directories. The answer in each case is to break up the single large spool into separate directories, and to try to keep each set of files "local" to one node, as far as possible. Likewise, don't try to mmap() a file and use it as distributed shared memory: it will work, but it will be so slow that it makes no sense to do so.
  • If you've used previous releases of GFS/GFS2 you might be wondering where the "lock modules" have got to. The answer is that they have been merged into the main GFS2 module, so you no longer need to load them separately. The mount options have remained the same though. (N.B. The final part of this is still in the -nmw git tree, but it will be merged in the next kernel.org merge window).
  • fallocate is not supported, but is on the TODO list Bug #455572.
  • XIP is not supported, but is also on the TODO list Bug #455570).
  • FIEMAP is supported, but currently only for regular files and not for xattrs (again the xattr extension is on the TODO list).
  • The internal glock state of GFS2 is accessible via debugfs.
  • dnotify will work on a "same node" basis, but its use with GFS2 is not recommended.
  • inotify will work on a "same node" basis, but we don't currently recommend its use.

Comments and Discussion