From Fedora Project Wiki
m (Fix typo)
m
Line 3: Line 3:
 
== Summary ==
 
== Summary ==
  
Running out of free space on '/' or '/home' is [https://pagure.io/fedora-workstation/issue/152 is not fun, but is common on the desktop with the current default partitioning layout. btrfs avoids the problem by providing one big file system, and brings additional features that will benefit Fedora users. The proposal is: make btrfs the default file system on the desktop.
+
For laptop and workstation installs of Fedora, we want to provide file system features to users in a transparent fashion. We want to add new features, while reducing the amount of expertise needed to deal with situations like [https://pagure.io/fedora-workstation/issue/152 running out of disk space.] Btrfs is well adapted to this role by design philosophy, let's make it the default.
  
 
== Owners ==
 
== Owners ==
  
* Names: [[User:Chrismurphy|Chris Murphy]], [[User:Ngompa|Neal Gompa]], [[User:Josef|Josef Bacik]], [[User:Salimma|Michel Alexandre Salim]], [[User:Dcavalca|Davide Cavalca]], [[User:eeickmeyer|Erich Eickmeyer]], [[User:ignatenkobrain|Igor Raits]], [[User:Raveit65|Wolfgang Ulbrich]], [[User:Zsun|Zamir SUN]], [[User:rdieter|Rex Dieter]]
+
* Names: [[User:Chrismurphy|Chris Murphy]], [[User:Ngompa|Neal Gompa]], [[User:Josef|Josef Bacik]], [[User:Salimma|Michel Alexandre Salim]], [[User:Dcavalca|Davide Cavalca]], [[User:eeickmeyer|Erich Eickmeyer]], [[User:ignatenkobrain|Igor Raits]], [[User:Raveit65|Wolfgang Ulbrich]], [[User:Zsun|Zamir SUN]]
* Emails: chrismurphy@fedoraproject.org, ngompa13@gmail.com, josef@toxicpanda.com, michel@michel-slm.name, dcavalca@fb.com, erich@ericheickmeyer.com, ignatenkobrain@fedoraproject.org, fedora@raveit.de, zsun@fedoraproject.org, rdieter@gmail.com
+
* Emails: chrismurphy@fedoraproject.org, ngompa13@gmail.com, josef@toxicpanda.com, michel@michel-slm.name, dcavalca@fb.com, erich@ericheickmeyer.com, ignatenkobrain@fedoraproject.org, fedora@raveit.de, zsun@fedoraproject.org
 
<!--- UNCOMMENT only for Changes with assigned Shepherd (by FESCo)
 
<!--- UNCOMMENT only for Changes with assigned Shepherd (by FESCo)
 
* FESCo shepherd: [[User:FASAccountName| Shehperd name]] <email address>
 
* FESCo shepherd: [[User:FASAccountName| Shehperd name]] <email address>
Line 40: Line 40:
  
 
'''''Current partitioning'''''<br />
 
'''''Current partitioning'''''<br />
<code>vg/root</code> LV mounted at <code>/</code> and a <code>vg/home</code> LV mounted at <code>/home</code>. These are separate file system volumes, with separate free/used space.
+
<span style="color: limegreen">vg/root</span> LV mounted at <span style="color: limegreen">/</span> and a <span style="color: limegreen">vg/home</span> LV mounted at <span style="color: limegreen">/home</span>. These are separate file system volumes, with separate free/used space.
  
 
'''''Proposed partitioning'''''<br />
 
'''''Proposed partitioning'''''<br />
<code>root</code> subvolume mounted at <code>/</code> and <code>home</code> subvolume mounted at <code>/home</code>. Subvolumes don't have size, they act mostly like directories, space is shared.
+
<span style="color: limegreen">root</span> subvolume mounted at <span style="color: limegreen">/</span> and <span style="color: limegreen">home</span> subvolume mounted at <span style="color: limegreen">/home</span>. Subvolumes don't have size, they act mostly like directories, space is shared.
  
 
'''''Unchanged'''''<br />
 
'''''Unchanged'''''<br />
<code>/boot</code> will be a small ext4 volume. A separate boot is needed to boot dm-crypt sysroot installations; it's less complicated to keep the layout the same, regardless of whether sysroot is encrypted. There will be no automatic snapshots/rollbacks.
+
<span style="color: limegreen">/boot</span> will be a small ext4 volume. A separate boot is needed to boot dm-crypt sysroot installations; it's less complicated to keep the layout the same, regardless of whether sysroot is encrypted. There will be no automatic snapshots/rollbacks.
  
 
=== Optimizations (Optional) ===
 
=== Optimizations (Optional) ===
Line 60: Line 60:
 
==== Compression ====
 
==== Compression ====
  
* Enable transparent compression using zstd on select directories: <code>/usr</code>  <code>/var/lib/flatpak</code>  <code>~/.local/share/flatpak</code>
+
* Enable transparent compression using zstd on select directories: <span style="color: limegreen">/usr</span>  <span style="color: limegreen">/var/lib/flatpak</span>  <span style="color: limegreen">~/.local/share/flatpak</span>
 
* Advantage: Saves space and significantly increase the lifespan of flash-based media by reducing write amplification. It may improve performance in some instances.
 
* Advantage: Saves space and significantly increase the lifespan of flash-based media by reducing write amplification. It may improve performance in some instances.
 
* Scope: Contingent on installer team review and approval to enhance anaconda to perform the installation using <code>mount -o compress=zstd</code>, then set the proper XATTR for each directory. The XATTR can't be set until after the directories are created via: rsync, rpm, or unsquashfs based installation.
 
* Scope: Contingent on installer team review and approval to enhance anaconda to perform the installation using <code>mount -o compress=zstd</code>, then set the proper XATTR for each directory. The XATTR can't be set until after the directories are created via: rsync, rpm, or unsquashfs based installation.
Line 66: Line 66:
 
==== Additional subvolumes ====
 
==== Additional subvolumes ====
  
* <code>/var/log/</code>  <code>/var/lib/libvirt/images</code> and <code>~/.local/share/gnome-boxes/images/</code> will use separate subvolumes.
+
* <span style="color: limegreen">/var/log/</span>  <span style="color: limegreen">/var/lib/libvirt/images</span> and <span style="color: limegreen">~/.local/share/gnome-boxes/images/</span> will use separate subvolumes.
* Advantage: Makes it easier to excluded them from snapshots, rollbacks, and send/receive. (btrfs snapshotting is not recursive, it stops at a nested subvolume.)
+
* Advantage: Makes it easier to excluded them from snapshots, rollbacks, and send/receive. (Btrfs snapshotting is not recursive, it stops at a nested subvolume.)
* Scope: Anaconda knows how to do this already, just change the kickstart to add additional subvolumes (minus the subvolume in <code>~</code>). GNOME Boxes will need enhancement to detect that the user home is on btrfs and create <code>~/.local/share/gnome-boxes/images/</code> as a subvolume.
+
* Scope: Anaconda knows how to do this already, just change the kickstart to add additional subvolumes (minus the subvolume in <span style="color: limegreen">~/</span>. GNOME Boxes will need enhancement to detect that the user home is on btrfs and create <span style="color: limegreen">~/.local/share/gnome-boxes/images/</span> as a subvolume.
  
 
== Feedback ==
 
== Feedback ==
  
==== Red Hat doesn't support btrfs? Can Fedora do this? ====
+
==== Red Hat doesn't support Btrfs? Can Fedora do this? ====
  
 
Red Hat supports Fedora well, in many ways. But Fedora already works closely with, and depends on, upstreams. And this will be one of them. That's an important consideration for this proposal. The community has a stake in ensuring it is supported. Red Hat will never support btrfs if Fedora rejects it. Fedora necessarily needs to be first, and make the persuasive case that it solve more problems than alternatives. Feature owners believe it does, hands down.
 
Red Hat supports Fedora well, in many ways. But Fedora already works closely with, and depends on, upstreams. And this will be one of them. That's an important consideration for this proposal. The community has a stake in ensuring it is supported. Red Hat will never support btrfs if Fedora rejects it. Fedora necessarily needs to be first, and make the persuasive case that it solve more problems than alternatives. Feature owners believe it does, hands down.
  
The btrfs community has users that have been using it for most of the past decade at scale. It's been in use as the default on openSUSE (and SUSE Linux Enterprise) since 2014, and Facebook has been using for all their OS and data volumes in their data centers for almost as long. btrfs is a mature, well-understood, and battle-tested file system, used on both desktop/container and server/cloud use-cases. We do have developers of the btrfs filesystem maintaining and supporting the code in Fedora, one is a Change owner, so issues that are pinned to btrfs can be addressed quickly.
+
The btrfs community has users that have been using it for most of the past decade at scale. It's been in use as the default on openSUSE (and SUSE Linux Enterprise) since 2014, and Facebook has been using for all their OS and data volumes in their data centers for almost as long. Btrfs is a mature, well-understood, and battle-tested file system, used on both desktop/container and server/cloud use-cases. We do have developers of the btrfs filesystem maintaining and supporting the code in Fedora, one is a Change owner, so issues that are pinned to btrfs can be addressed quickly.
  
=== Why not LVM thin provisioning? ===
+
=== What about device-mapper alternatives? ===
  
Issue#152 still happens, because the installer won't over provision by default. It requires manual intervention by the user to identify the problem, and resolve it. Upon growing any file system on dm-thin, the pool is over committed, and file system sizes become a fantasy: they don't add up to the total physical storage available. The truth of used and free space is only known by the thin pool, and no CLI or GUI programs are prepared for this. It means desktop integration is required, rather than a nice-to-have.
+
dm-thin (thin provisioning): Issue#152 still happens, because the installer won't over provision by default. It still requires manual intervention by the user to identify and resolve the problem. Upon growing a file system on dm-thin, the pool is over committed, and file system sizes become a fantasy: they don't add up to the total physical storage available. The truth of used and free space is only known by the thin pool, and CLI and GUI programs are unprepared for this. Integration points like rpm free space checks or GNOME disk-space warnings would have to be adapted as well.
  
Btrfs solves the problems that need solving, with fewer side effects for the general use case. And includes more features: compression, integrity, and IO isolation. But if you know thin provisioning meets your use case better, of course you should use it instead.
+
dm-vdo: is not yet merged, and isn't as straightforward to selectively enable per directory and per file, as is the case on btrfs using <code>chattr +c</code> on <span style="color: limegreen">/var/lib/flatpaks/</span>.
 +
 
 +
Btrfs solves the problems that need solving, with few side effects or pitfalls for users. It has more features we can take advantage of immediately and transparently: compression, integrity, and IO isolation. Many btrfs features and optimizations can be opted into selectively per directory or file, such as compression and nodatacow, rather than as a layer that's either on or off.
  
  
Line 91: Line 93:
 
Anaconda already has sophisticated btrfs integration.
 
Anaconda already has sophisticated btrfs integration.
  
==== What btrfs features are recommended and supported? ====
+
==== What Btrfs features are recommended and supported? ====
 
 
This is the upstream [https://btrfs.wiki.kernel.org/index.php/Status btrfs feature status page]
 
 
 
Fedora is a community project. What is supported within Fedora depends on what the community decides to put forward in terms of resources.
 
  
When in doubt, use defaults. Be patient with yourself, and each other. There are few things you must learn about btrfs, but the toy box is full. It can be overwhelming. Features that sound familiar, like raid1, don't work the same as other implementations you're familiar with. There is lots of jargon. Take your time. No one needs to go from 0 kph to 100 kph overnight.
+
The primary goal of this feature is to be largely transparent to the user. It does not require or expect users to learn new commands, or to engage in peculiar maintenance rituals.
  
==== What is possible but not supported? ====
+
The full set of btrfs features that is considered stable and enabled by default upstream will be enabled in Fedora. Fedora is a community project. What is supported within Fedora depends on what the community decides to put forward in terms of resources.
  
No btrfs features will be disabled. The full box of toys is available. It is possible to get into trouble.
+
The upstream [https://btrfs.wiki.kernel.org/index.php/Status Btrfs feature status page].
  
 
== Benefit to Fedora ==
 
== Benefit to Fedora ==
Line 111: Line 109:
 
** transparent compression: significantly reduces write amplification, improves lifespan of storage hardware
 
** transparent compression: significantly reduces write amplification, improves lifespan of storage hardware
 
** reflinks and snapshots are more efficient for use cases like containers (Podman supports both)
 
** reflinks and snapshots are more efficient for use cases like containers (Podman supports both)
 +
* Storage devices can be flaky, resulting in data corruption
 +
** Everything is checksummed and verified on every read
 +
** Corrupt data results in EIO, instead of resulting in application confusion, and isn't replicated into backups and archives
 
* Poor desktop responsiveness when under pressure [https://pagure.io/fedora-workstation/issue/154 Workstation issue #154]
 
* Poor desktop responsiveness when under pressure [https://pagure.io/fedora-workstation/issue/154 Workstation issue #154]
 
** Currently only btrfs has proper IO isolation capability via cgroups2
 
** Currently only btrfs has proper IO isolation capability via cgroups2
 
** Completes the resource control picture: memory, cpu, IO isolation
 
** Completes the resource control picture: memory, cpu, IO isolation
* Storage devices betray users, resulting in data corruption
 
** Everything is checksummed and verified on every read
 
** Corrupt data results in EIO, instead of resulting in application confusion, and isn't replicated into backups and archives
 
 
* File system resize
 
* File system resize
 
** Online shrink and grow are fundamental to the design
 
** Online shrink and grow are fundamental to the design
Line 122: Line 120:
 
** Simple and comprehensive command interface. One master command
 
** Simple and comprehensive command interface. One master command
 
** Simpler to boot, all code is in the kernel, no initramfs complexities
 
** Simpler to boot, all code is in the kernel, no initramfs complexities
* Simple and efficient filesystem replication with <code>btrfs send</code> and <code>btrfs receive</code>
+
** Simple and efficient file system replication, including incremental backups, with <code>btrfs send</code> and <code>btrfs receive</code>
** Incremental backups are easy and cheap
+
 
** Snapshotting of the filesystem can be done only on the backup device if desired
 
  
 
== Scope ==
 
== Scope ==

Revision as of 21:02, 22 June 2020

Make btrfs the default file system for Workstation, KDE, MATE-Compiz, and LXQt

Summary

For laptop and workstation installs of Fedora, we want to provide file system features to users in a transparent fashion. We want to add new features, while reducing the amount of expertise needed to deal with situations like running out of disk space. Btrfs is well adapted to this role by design philosophy, let's make it the default.

Owners

  • Products: Workstation, KDE, Jam, Classroom Lab, Astronomy, Comp Neuro, Design Suite, Robotics Suite, MATE-Compiz, LXQt
  • Responsible WGs: Workstation Working Group, KDE Special Interest Group

Current status

  • Targeted release: Fedora 33
  • Last updated: 2020-06-22
  • FESCo issue: <will be assigned by the Wrangler>
  • Tracker bug: <will be assigned by the Wrangler>
  • Release notes tracker: <will be assigned by the Wrangler>

Detailed Description

Fedora Workstation, KDE, MATE-Compiz, and LXQt will switch to using btrfs as the filesystem by default for new installs. Labs derived from these variants inherit this change, and other labs and spins may opt into this change.

The change is based on the installer's custom partitioning btrfs preset. It's been well tested for 7 years.

Current partitioning
vg/root LV mounted at / and a vg/home LV mounted at /home. These are separate file system volumes, with separate free/used space.

Proposed partitioning
root subvolume mounted at / and home subvolume mounted at /home. Subvolumes don't have size, they act mostly like directories, space is shared.

Unchanged
/boot will be a small ext4 volume. A separate boot is needed to boot dm-crypt sysroot installations; it's less complicated to keep the layout the same, regardless of whether sysroot is encrypted. There will be no automatic snapshots/rollbacks.

Optimizations (Optional)

The detailed description above is the proposal. It's intended to be a minimalist and transparent switch. It's also the same as was proposed (and accepted) for Fedora 16. The following optimizations improve on the proposal, but are not critical. They are also transparent to most users. The general idea is agree to the base proposal first, and then consider these as enhancements.

Boot on btrfs

  • Instead of a 1G ext4 boot, create a 1G btrfs boot.
  • Advantage: Makes it possible to include in a snapshot and rollback regime. GRUB has stable support for btrfs for 10+ years.
  • Scope: Contingent on bootloader and installer team review and approval. blivet should use mkfs.btrfs --mixed.

Compression

  • Enable transparent compression using zstd on select directories: /usr /var/lib/flatpak ~/.local/share/flatpak
  • Advantage: Saves space and significantly increase the lifespan of flash-based media by reducing write amplification. It may improve performance in some instances.
  • Scope: Contingent on installer team review and approval to enhance anaconda to perform the installation using mount -o compress=zstd, then set the proper XATTR for each directory. The XATTR can't be set until after the directories are created via: rsync, rpm, or unsquashfs based installation.

Additional subvolumes

  • /var/log/ /var/lib/libvirt/images and ~/.local/share/gnome-boxes/images/ will use separate subvolumes.
  • Advantage: Makes it easier to excluded them from snapshots, rollbacks, and send/receive. (Btrfs snapshotting is not recursive, it stops at a nested subvolume.)
  • Scope: Anaconda knows how to do this already, just change the kickstart to add additional subvolumes (minus the subvolume in ~/. GNOME Boxes will need enhancement to detect that the user home is on btrfs and create ~/.local/share/gnome-boxes/images/ as a subvolume.

Feedback

Red Hat doesn't support Btrfs? Can Fedora do this?

Red Hat supports Fedora well, in many ways. But Fedora already works closely with, and depends on, upstreams. And this will be one of them. That's an important consideration for this proposal. The community has a stake in ensuring it is supported. Red Hat will never support btrfs if Fedora rejects it. Fedora necessarily needs to be first, and make the persuasive case that it solve more problems than alternatives. Feature owners believe it does, hands down.

The btrfs community has users that have been using it for most of the past decade at scale. It's been in use as the default on openSUSE (and SUSE Linux Enterprise) since 2014, and Facebook has been using for all their OS and data volumes in their data centers for almost as long. Btrfs is a mature, well-understood, and battle-tested file system, used on both desktop/container and server/cloud use-cases. We do have developers of the btrfs filesystem maintaining and supporting the code in Fedora, one is a Change owner, so issues that are pinned to btrfs can be addressed quickly.

What about device-mapper alternatives?

dm-thin (thin provisioning): Issue#152 still happens, because the installer won't over provision by default. It still requires manual intervention by the user to identify and resolve the problem. Upon growing a file system on dm-thin, the pool is over committed, and file system sizes become a fantasy: they don't add up to the total physical storage available. The truth of used and free space is only known by the thin pool, and CLI and GUI programs are unprepared for this. Integration points like rpm free space checks or GNOME disk-space warnings would have to be adapted as well.

dm-vdo: is not yet merged, and isn't as straightforward to selectively enable per directory and per file, as is the case on btrfs using chattr +c on /var/lib/flatpaks/.

Btrfs solves the problems that need solving, with few side effects or pitfalls for users. It has more features we can take advantage of immediately and transparently: compression, integrity, and IO isolation. Many btrfs features and optimizations can be opted into selectively per directory or file, such as compression and nodatacow, rather than as a layer that's either on or off.


What about UI/UX and integration in the desktop?

If btrfs isn't the default file system, there's no commitment, nor reason to work on any UI/UX integration. There are ideas to make certain features discoverable: selective compression; systemd-homed may take advantage of either btrfs online resize, or near-term planned native encryption, which could make it possible to live convert non-encrypted homes to encrypted; and system snapshot and rollbacks.

Anaconda already has sophisticated btrfs integration.

What Btrfs features are recommended and supported?

The primary goal of this feature is to be largely transparent to the user. It does not require or expect users to learn new commands, or to engage in peculiar maintenance rituals.

The full set of btrfs features that is considered stable and enabled by default upstream will be enabled in Fedora. Fedora is a community project. What is supported within Fedora depends on what the community decides to put forward in terms of resources.

The upstream Btrfs feature status page.

Benefit to Fedora

Problems btrfs helps solve:

  • Users running out of free space on either / or /home Workstation issue #152
    • "one big file system": no hard barriers like partitions or logical volumes
    • transparent compression: significantly reduces write amplification, improves lifespan of storage hardware
    • reflinks and snapshots are more efficient for use cases like containers (Podman supports both)
  • Storage devices can be flaky, resulting in data corruption
    • Everything is checksummed and verified on every read
    • Corrupt data results in EIO, instead of resulting in application confusion, and isn't replicated into backups and archives
  • Poor desktop responsiveness when under pressure Workstation issue #154
    • Currently only btrfs has proper IO isolation capability via cgroups2
    • Completes the resource control picture: memory, cpu, IO isolation
  • File system resize
    • Online shrink and grow are fundamental to the design
  • Complex storage setups are... complicated
    • Simple and comprehensive command interface. One master command
    • Simpler to boot, all code is in the kernel, no initramfs complexities
    • Simple and efficient file system replication, including incremental backups, with btrfs send and btrfs receive


Scope

  • Proposal owners:
    • Submit PR's for Anaconda to change default_scheme = BTRFS to the proper product files.
    • Multiple test days: build community support network
    • Aid with documentation
  • Other developers:
    • Anaconda, review PRs and merge
    • Bootloader team, review PRs and merge
    • Recommended optimization chattr +C set on the containing directory for virt-manager and GNOME Boxes.
  • Policies and guidelines: N/A
  • Trademark approval: N/A

Upgrade/compatibility impact

Change will not affect upgrades.

Documentation will be provided for existing btrfs users to "retrofit" their setups to that of a default btrfs installation (base plus any approved options).

How To Test

Today
Do a custom partitioning installation; change the scheme drop-down menu to btrfs; click the blue "automatically create partitions"; and install.
Fedora 31, 32, Rawhide, on x86_64 and ARM.

Once change lands
It should be simple enough to test, just do a normal install.

User Experience

Pros

  • Mostly transparent
  • Space savings from compression
  • Longer lifespan of hardware, also from compression.
  • Utilities for used and free space, CLI and GUI, are expected to behave the same. No special commands are required.
  • More detailed information can be revealed by btrfs specific commands.

Enhancement opportunities

updatedb does not index /home when /home is a bind mount Also can affected rpm-ostree installations, including Silverblue.

GNOME Usage: Incorrect numbers when using multiple btrfs subvolumes This isn't btrfs specific, happens with "one big ext4" volume as well.

GNOME Boxes, RFE: create qcow2 with 'nocow' option when on btrfs /home This is btrfs specific, and is a recommended optimization for both GNOME Boxes and virt-manager.

containers/libpod: automatically use btrfs driver if on btrfs

Dependencies

None.

Contingency Plan

  • Contingency mechanism: Owner will revert changes back to LVM+ext4
  • Contingency deadline: Beta freeze
  • Blocks release? Yes
  • Blocks product? Workstation and KDE

Documentation

Strictly speaking no documentation is required reading. But there will be some Fedora documentation to help get the ball rolling.

For those who want to know more:

btrfs wiki main page and full feature list.

man 5 btrfs contains: mount options, features, swapfile support, checksum algorithms, and more
man btrfs contains an overview of the btrfs subcommands
man btrfs <subcommand> will show the man page for that subcommand


NOTE: The btrfs command will accept partial subcommands, as long as it's not ambiguous. These are equivalent commands:
btrfs subvolume snapshot
btrfs sub snap
btrfs su sn

You'll discover your own convention. It might be preferable to write out the full command on forums and lists, but then maybe some folks don't learn about this useful shortcut?

Release Notes

The default file system is btrfs.