From Fedora Project Wiki
Line 29: Line 29:
These are the options that I consider essential:
These are the options that I consider essential:


* '''Drive selection.''' I'm not entirely sure how a user is supposed to create two different arrays that utilize different drives in this new UI because the drive selection is done before you get to the raid UI
* '''Drive selection.''' I'm not entirely sure how a user is supposed to create two different arrays that utilize different drives in this new UI because the drive selection is done before you get to the raid UI screen, and once you are at the raid UI screen, you have no control over the drives utilized.  That's basically a show stopper for any real raid array usage scenario.  Most people that want a raid array for, say,
screen, and once you are at the raid UI screen, you have no control over the drives utilized.  That's basically a show stopper for any real raid array usage scenario.  Most people that want a raid array for, say,
/home or /srv, also want one for /, but they are very often different types and use different combinations of drives.
/home or /srv, also want one for /, but they are very often different types and use different combinations of drives.
* '''Spare allocation.'''  This exists on Redundancy, which is actually the one valid place for it to exist that I recommend it not be used.  It's most useful for RAID10/4/5/6, but for RAID1 its use is actually counterproductive.  This is because for RAID1 if you make the array larger instead of adding a spare, so for instance make it a 3 drive raid1 array instead of a 2 drive raid1 array with 1 spare, the capacity does not change and instead of having to rebuild the spare if one of the drives fails, the spare is essentially an online, up to date spare all the time.  In fact, there is a minor amount of risk that when rebuilding a functional drive onto a spare drive that you might encounter a read error on the last functional drive and fail the rebuild operation.  With a 3 drive raid1 array, that never happens.  This works because all the drives in a raid1 array are identical, so no matter which one fails, the data that needs to go on the spare is always the same.  It just so happens too that with a 3 drive raid1 array, writes are not really slowed down at all, but reads are sped up because you have 3 drives you can read from instead of 2.  So performance is enhanced by a 3 drive raid1 array instead of a 2 drive raid1 array with a spare.  For raid10/4/5/6 this is not the case and you have to wait for a drive to fail in order to know which data to reconstruct on the spare.
* '''Spare allocation.'''  This exists on Redundancy, which is actually the one valid place for it to exist that I recommend it not be used.  It's most useful for RAID10/4/5/6, but for RAID1 its use is actually counterproductive.  This is because for RAID1 if you make the array larger instead of adding a spare, so for instance make it a 3 drive raid1 array instead of a 2 drive raid1 array with 1 spare, the capacity does not change and instead of having to rebuild the spare if one of the drives fails, the spare is essentially an online, up to date spare all the time.  In fact, there is a minor amount of risk that when rebuilding a functional drive onto a spare drive that you might encounter a read error on the last functional drive and fail the rebuild operation.  With a 3 drive raid1 array, that never happens.  This works because all the drives in a raid1 array are identical, so no matter which one fails, the data that needs to go on the spare is always the same.  It just so happens too that with a 3 drive raid1 array, writes are not really slowed down at all, but reads are sped up because you have 3 drives you can read from instead of 2.  So performance is enhanced by a 3 drive raid1 array instead of a 2 drive raid1 array with a spare.  For raid10/4/5/6 this is not the case and you have to wait for a drive to fail in order to know which data to reconstruct on the spare.

Revision as of 21:04, 7 March 2013

Miscellaneous Notes on the Problem Space

  • RAID can apply to standard partitions, LVM volume groups, and BTRFS volumes. It does not apply to individual LV's.
  • While it's technically possible to use LVM mirror and redundancy features on individual LVs on md RAID VG's on hw or fw RAID disks (!), we do not want to support LVM mirror & redundancy on top of VG's that are md RAID. You must pick either md RAID for the entire VG, or LVM mirror / redundancy for individual LVs in the VG.
  • We do allow users to specify array name already
  • Need entry for VG reserved space and checkbox for encrypting VG in VG editor screen. Not sure if VG-specific RAID UI belongs here as well - if it's in the VG edit box then we'd have to grey it out if any LV's had LVM-provided mirror/redundancy, and if none do we'd have to grey out the LV's from doing LVM mirror/redundancy if md RAID is turned on for VG.
  • Disable spare as an option for RAID 1 [dledford]
  • Same partition numbers on all partitions involved in an array [dledford] - seems like a hack, better served by making it more apparent post-install which disks are RAID and which aren't
  • RAID warnings propagated up in anaconda UI? [dledford]
  • What's the difference between redundancy and redundant? [dledford]
  • "here are two boxes, one labeled Optimized Performance and the other Error Detection. But if you *ever* enable Error Detection (parity), then Optimized Performance (stripe) is a given, you can not select one without the other. However, this may be hugely misleading to a user when they enable parity, get optimized performance, and then find that their raid array, under pathological conditions, may be up to 600 *TIMES* slower than a redundant array" [dledford]
  • "[current ui] Options that apply to multiple levels are limited to only one level, options that apply to only one level are at the top as though they apply to multiple, etc." [dledford]

Suggested Tree Structure for UI:

  • Fault Tolerant Array (These arrays trade performance for a reduced risk of data loss)
    • Redundancy Based Fault Tolerance (These arrays provide the highest performance of the fault tolerant array types, but do so at the cost of disk space as they have the lowest capacity of all the RAID types)
      • Drive Mirroring (RAID1.) Commonly used to make two identical copies of data so if one drive fails, the other drive keeps going. Your available disk capacity will be roughly the capacity of a single drive minus some space for overhead. This array type can suffer the loss of one of the drives and keep going. This is a very fast array type)
      • Drive Striping with Redundant Copies (RAID10.) This raid level tries to combine the benefits of the Non-Fault Tolerant raid0 array type with the fault tolerance of Drive Mirroring. In order to get the full benefits of this raid level, you need at least 4 identical drives to be part of the array. Your available disk capacity will be roughly half the size of all the disks you put into this array (eg. if you add 4 1TB drives, for a total of 4TB, the array capacity will be just under half of that, so a little under 2TB). This array type can suffer the loss of any one drive, and if you are lucky, can suffer the loss of a second drive. However, whether or not a second drive loss will take this array type down depends on which drive it is. If you're unlucky, a second drive loss will render this array inoperable. This is a very fast array type)
    • Parity Based Fault Tolerance (These arrays provide the most efficient use of space on your drives while still providing fault tolerance, but they do so at the cost of performance. It takes CPU resources to calculate the parity for the arrays, and in the event of a disk failure, more CPU resources are used to reconstruct your missing data from the parity.)
      • Single Disk Failure Tolerance (RAID4 or RAID5.) This raid level uses simple parity to enable fault tolerance. It requires a minimum of 3 drives, but can use more. The more drives it uses, the more efficient it is at storing your data, but the slower it also becomes for hard drive access and parity calculation reasons. A 5 or 6 disk raid5 array is a common compromise between space efficiency and performance. This array can suffer the loss of any one drive. In the event of a drive loss, performance will degrade until the drive is replaced and the data rebuilt on the new drive. Your drive capacity depends on how many drives you add to the array. The final capacity is determined by subtracting one from the number of drives you added to the array and multiplying be each drives size. Eg, if you added 4 1TB drives to the array, your final capacity would be 3 * 1TB = 3TB. This is a fast array type for hard disk reads, but can be much slower than the other array types for writes and modifications. This type of array is well suited for being a fileserver where the files are read very frequently, but only rarely written too.)
      • Two Disk Failure Tolerance (RAID6.) This raid level uses the same simple parity to deal with a single disk failure as RAID5 does, but in the event of a second disk failure will add in a second type of complex parity to allow it to reconstruct the data. It requires a minimum of 3 drives, but like RAID5 needs more to get the benefits of the efficient use of storage space. In the event of a two disk failure, this array type is *very* slow. However, given how important the data is, *very* slow may be better than restoring from backup. Your drive capacity depends on how many drives you add to the array, similar to RAID5, but instead of loosing one drive to parity, you loose two. As a result, the formula is number of drives in the array minus two times the capacity of the drives. Eg, if you add 6 1TB drives, then it would be (6 - 2) * 1TB = 4TB capacity for the array. This type of array is mainly of use when you must make the most of the drive space you have, but your system must stay online and running even in the event of two separate drive failures. This is a highly resilient RAID level, not a highly performant RAID level.)
  • Non-Fault Tolerant Array (RAID0, or striping.) This is the highest performance array type and is really only intended to be used where a single hard drive is not fast enough to keep up with the workload placed up on. It has the highest risk of data loss. The risk of data loss with this array type is actually higher than if you have no raid array at all, so please keep regular backups of your data.)

Structure for Options

I would create a tree structure with the items above, put a radio box next to each item and allow only one to be selected, then put these options below and simply gray out the inappropriate options based upon which radio box is selected.

These are the options that I consider essential:

  • Drive selection. I'm not entirely sure how a user is supposed to create two different arrays that utilize different drives in this new UI because the drive selection is done before you get to the raid UI screen, and once you are at the raid UI screen, you have no control over the drives utilized. That's basically a show stopper for any real raid array usage scenario. Most people that want a raid array for, say,

/home or /srv, also want one for /, but they are very often different types and use different combinations of drives.

  • Spare allocation. This exists on Redundancy, which is actually the one valid place for it to exist that I recommend it not be used. It's most useful for RAID10/4/5/6, but for RAID1 its use is actually counterproductive. This is because for RAID1 if you make the array larger instead of adding a spare, so for instance make it a 3 drive raid1 array instead of a 2 drive raid1 array with 1 spare, the capacity does not change and instead of having to rebuild the spare if one of the drives fails, the spare is essentially an online, up to date spare all the time. In fact, there is a minor amount of risk that when rebuilding a functional drive onto a spare drive that you might encounter a read error on the last functional drive and fail the rebuild operation. With a 3 drive raid1 array, that never happens. This works because all the drives in a raid1 array are identical, so no matter which one fails, the data that needs to go on the spare is always the same. It just so happens too that with a 3 drive raid1 array, writes are not really slowed down at all, but reads are sped up because you have 3 drives you can read from instead of 2. So performance is enhanced by a 3 drive raid1 array instead of a 2 drive raid1 array with a spare. For raid10/4/5/6 this is not the case and you have to wait for a drive to fail in order to know which data to reconstruct on the spare.
  • Array name. For all mdadm arrays, the preferred way of identifying the array is with a name stored in the name field of the superblock. We have no way of entering the name, and the label: field is not specific as to what the label applies to (although I can suss out that it means the filesystem label and not the array name). There should be an entry for the array name, and a note by the entry field that on assembly, the array can be found under /dev/md/<array_name>. For the time being I'm going to ignore the homehost setting. It applies to this too, but I don't want to cause the Anaconda folks heads to spin in circles on the first writing, so we'll ignore it for now.
  • These options are not showstoppers, but are still highly desirable (we already do a default thing here that is reasonably sane, but some users might want to do something else):
    • Bitmap. This should be a checkbox. It only applies to fault tolerant array types, and should default to enabled for any array that's more than, say, 10GB in size. If it's enabled, then optionally we should have a number entry box that allows the user to select the granularity of the bitmap. A tooltip can explain that a bitmap makes recovery of an array after an unexpected machine failure (such as from power loss) *much* faster, but does so at the expense of some performance on writes to the array (on the order of 3 to 10% degradation in write performance depending on the granularity of the bitmap, the lower the number, the worse the performance degradation on writes, but the faster the recovery after an unclean shutdown, a common number is 65536 and results in about a 4% write performance degradation yet is still granular enough that a multi TB array reconstruction after machine failure will complete in minutes instead of 1 day+).