Chrismurphy (talk | contribs) No edit summary |
Chrismurphy (talk | contribs) No edit summary |
||
Line 3: | Line 3: | ||
== Summary == | == Summary == | ||
Swap is useful, except when slow.[1] ZRAM is a RAM disk that uses always-on compression [2]. It has a size assigned at create time, but the RAM usage is dynamically allocated and deallocated, on demand. This ZRAM block device behaves like any other, it can be formatted with a file system or mkswap, which is the intention with this change proposal. | Swap is useful, except when it's slow.[1] ZRAM is a RAM disk that uses always-on compression [2]. It has a size assigned at create time, but the RAM usage is dynamically allocated and deallocated, on demand. This ZRAM block device behaves like any other, it can be formatted with a file system or mkswap, which is the intention with this change proposal. | ||
Each of the following changes is opt-in (owner assumes editions/spins are to be excluded unless they ask to be included): | |||
# Install systemd rust-zram-generator[3] package. This does not enable swap-on-ZRAM, it only makes the generator available.</br > | |||
# Install a default zram-generator configuration. When present, swap-on-ZRAM is set-up during startup.</br > | |||
# Do not create swap partition/LV for default installations. | |||
The practical combinations of the above: | |||
(1) only = generator present, user can enable by creating a configuration file. No other changes. Ideally FESCo approves this Fedora wide so that the generator is available everywhere without exception. Makes it easier to converge on to reduce user confusion, | |||
(1) + (2) = swap-on-ZRAM is enabled, and with a higher priority than default for swap-on-drive. Both co-exist, but swap-on-ZRAM is favored first. Hibernation is still possible if the swap-on-drive partition is big enough and all other requirements are met. | |||
(1) + (2) + (3) = swap-on-ZRAM is enabled, no disk-based swap present. Fedora Workstation edition plans to do this (pending test day results and feedback). | |||
[1]</br > | [1]</br > | ||
Line 24: | Line 29: | ||
[3]</br > | [3]</br > | ||
https://github.com/systemd/zram-generator | https://github.com/systemd/zram-generator | ||
https://pagure.io/fedora-workstation/blob/master/f/hibernationstatus.md | |||
Line 58: | Line 65: | ||
ZRAM has about 0.1% overhead or ~1MiB/1GiB. If the workload never touches swap, living entirely inside RAM, the overhead is the sole cost, there is no preallocation of RAM. | ZRAM has about 0.1% overhead or ~1MiB/1GiB. If the workload never touches swap, living entirely inside RAM, the overhead is the sole cost, there is no preallocation of RAM. | ||
==== Default configuration: ==== | ==== Default ZRAM device configuration: ==== | ||
Create ZRAM device regardless of RAM size, using a ZRAM to RAM ratio of 1:2, and capped to 4GiB [4], with a higher than typical swap priority [5]. | |||
These values seem reasonable, and are based on prior work. Anaconda has two examples for setting swap size to 50% RAM: the no hibernation case, common outside x86; and its own current swap-on-ZRAM implementation. Fedora IoT's implementation also sets swap-on-ZRAM size to 50% RAM. | |||
[4]</br > | [4]</br > | ||
Line 71: | Line 79: | ||
https://github.com/systemd/zram-generator/issues/8 | https://github.com/systemd/zram-generator/issues/8 | ||
==== Default installer behavior ==== | |||
The installer is currently responsible for creating a swap-on-disk device. This will be dropped. The zram-generator + configuration file will trigger the setup and activation of swap-on-ZRAM. This means hibernation isn't possible, even on systems that could support it. | |||
Please see [https://pagure.io/fedora-workstation/blob/master/f/hibernationstatus.md Supporting hibernation in Workstation edition] for much more detailed information, including why it's increasingly likely hibernation isn't possible anyway, and a path to improving hibernation support. | |||
==== | ==== Custom/Advance partitioning installer behavior ==== | ||
The user can add swap using Custom partitioning at install time. In that case, the installer will include the <span style=color:red> resume=UUID </span> hint, so hibernation resume can happen. No change in behavior here. | |||
Since swap-on-ZRAM is still enabled by default, there will be two swaps: swap-on-ZRAM, and swap-on-disk. The swap-on-ZRAM will have higher priority, thus being favored over disk based swap. The kernel is smart enough to know it can't hibernate to a ZRAM device, and will instead use disk based swap. | |||
==== Test Day: ==== | ==== Test Day: ==== | ||
Recommendation is all editions/spins opt in to the feature, and participate in the soon to be scheduled test day. This will help tweak the default configuration, hopefully establishing a good one-size-fits all out of the box approach. But if that's not possible, the test day will be central to figuring out edition/spin specific defaults. | |||
Line 102: | Line 114: | ||
==== You're enabling it on upgrades? ==== | ==== You're enabling it on upgrades? ==== | ||
That's the current plan. There are some difficulties with | That's the current plan. There are some difficulties with upgrades right now in Fedora. We need to use weak dependency 'Supplements:' to cause new packages to be dragged in on upgrades. As a technical matter, feature owner is confident this feature will improve the experience of all users regardless of configuration. As a non-technical matter, it's recognized that (a) ''hey pal, you're messing with my customizations, not cool!'' and (b) ''swap always stinks, I don't care if it has a 'Z' in the name!'' | ||
The dilemma is, the Fedora user base becomes fragmented without applying it to upgrades. The overall experience people are having is less consistent, and makes feedback inconsistent. All of this has to be balanced out. | |||
Line 108: | Line 122: | ||
==== Why systemd zram-generator? ==== | ==== Why systemd zram-generator? ==== | ||
It's the most upstream implementation to date, and leverages existing systemd infrastructure setup the ZRAM block device, format it as swap, and swapon - all during early boot. | It's the most upstream implementation to date, is fast and lightweight. It leverages existing systemd infrastructure setup the ZRAM block device, format it as swap, and swapon - all during early boot. It's very similar in behavior to fstab-generator, gpt-auto-generator, and cryptsetup-generator. | ||
Converging on one implementation avoids user confusion. And while the alternatives are nice and work fine, a systemd generator is particularly well suited for this use case compared to a systemd service unit.</br > | |||
https://www.freedesktop.org/software/systemd/man/systemd.generator.html</br > | https://www.freedesktop.org/software/systemd/man/systemd.generator.html</br > | ||
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/TCY534JPIMZ3OXM5Q5E2ZH5PSAKQNGP7/ | https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/TCY534JPIMZ3OXM5Q5E2ZH5PSAKQNGP7/ | ||
Line 118: | Line 132: | ||
==== Why not a bigger ZRAM device? ==== | ==== Why not a bigger ZRAM device? ==== | ||
It's possible some workloads will have data | It's possible some workloads will have less compressible data. Hence, not going with a 1:1 ZRAM to RAM ratio. Even a 2:1 ratio is not unreasonable *if* the compression ratio is at least 2:1. However, it's possible a system can get "stuck" in a kind of swap thrashing similar to conventional swap-on-disk, except it's CPU and memory bound, rather than IO bound. Feature owner thinks it's better to just oom, instead of getting overly aggressive with the ZRAM device size. | ||
Line 125: | Line 141: | ||
* significantly improves system responsiveness, especially when swap is under pressure; | * significantly improves system responsiveness, especially when swap is under pressure; | ||
* complements on-going resource control work; | * more secure, user data leaks into swap are on volatile media; | ||
* complements on-going resource control work, including earlyoom; | |||
* further reduces the time to out-of-memory kill, when workloads exceed limits; | * further reduces the time to out-of-memory kill, when workloads exceed limits; | ||
* improves both | * improves performance for both "no swap" and "existing swap" setups; | ||
* without swap-on-disk, there's better utilization of a limited resource: benefit of swap without the disk space consumption; | |||
Line 139: | Line 157: | ||
* Other developers: | * Other developers: | ||
**Anaconda is agreeable to deprecating their built-in implementation in favor of swap-on-ZRAM | |||
**RFE's for zram-generator: users are not worse off if they don't happen</br > | **RFE's for zram-generator: users are not worse off if they don't happen</br > | ||
https://github.com/systemd/zram-generator/issues/10</br > | https://github.com/systemd/zram-generator/issues/10</br > | ||
https://github.com/systemd/zram-generator/issues/8 | https://github.com/systemd/zram-generator/issues/8 | ||
* Release engineering: [https://pagure.io/releng/issues #9495] | * Release engineering: [https://pagure.io/releng/issues #9495] | ||
Line 172: | Line 189: | ||
# Check that swap is on a ZRAM device: zramctl, swapon | # Check that swap is on a ZRAM device: zramctl, swapon | ||
# Detailed check: journalctl -b -o short-monotonic | grep 'swap\|zram' | # Detailed check: journalctl -b -o short-monotonic | grep 'swap\|zram' | ||
# Check that priority is higher than existing swap if two or more are listed. | # Check that priority is higher than existing swap if two or more are listed. ## (Enhancement is needed for this.) | ||
Feel free to run your usual workloads more aggressively or in parallel. Suspend-to-RAM and suspend-to-disk are expected to continue to work too (or at least hit all the same bugs as without ZRAM being used). | |||
Line 178: | Line 197: | ||
The user won't notice anything. If their usual workload causes them to dread swap thrashing, they'll be surprised this doesn't happen. The user might get curious if they don't find a swap entry in /etc/fstab. Or they might get curious if they 'swapon' and see swap pointing to /dev/zram0 instead of a disk partition or LV. | The user won't notice anything. If their usual workload causes them to dread swap thrashing, they'll be surprised this doesn't happen. The user might get curious if they don't find a swap entry in /etc/fstab. Or they might get curious if they 'swapon' and see swap pointing to /dev/zram0 instead of a disk partition or LV. | ||
== Dependencies == | == Dependencies == | ||
Line 186: | Line 207: | ||
== Contingency Plan == | == Contingency Plan == | ||
* Contingency mechanism: Don't ship the generator = big hammer but easy. | * Contingency mechanism: Don't ship the generator = big hammer, but easy. Preferable to ship the generator, but only selectively ship configuration files = scalpel, pretty easy. | ||
* Contingency deadline: Beta freeze | * Contingency deadline: Beta freeze | ||
* Blocks release? No. | * Blocks release? No. | ||
Line 195: | Line 216: | ||
== Documentation == | == Documentation == | ||
Consider adding a hint in an /etc/fstab comment? There is no man page for this, and the documentation is also minimal, besides what's in this feature proposal. It's an open question how the user should get more information on how to configure and tweak it. But then, they don't have that for swap today either. There's just institutional knowledge. | |||
Hence, a strong test day with a lot of people and press coverage of the feature might help spread the word for institutional knowledge changes coming. | Hence, a strong test day, with a lot of people and press coverage of the feature, might help spread the word for institutional knowledge changes coming. | ||
Ideas welcome. | Ideas welcome. |
Revision as of 19:12, 30 May 2020
swap on ZRAM
Summary
Swap is useful, except when it's slow.[1] ZRAM is a RAM disk that uses always-on compression [2]. It has a size assigned at create time, but the RAM usage is dynamically allocated and deallocated, on demand. This ZRAM block device behaves like any other, it can be formatted with a file system or mkswap, which is the intention with this change proposal.
Each of the following changes is opt-in (owner assumes editions/spins are to be excluded unless they ask to be included):
- Install systemd rust-zram-generator[3] package. This does not enable swap-on-ZRAM, it only makes the generator available.
- Install a default zram-generator configuration. When present, swap-on-ZRAM is set-up during startup.
- Do not create swap partition/LV for default installations.
The practical combinations of the above:
(1) only = generator present, user can enable by creating a configuration file. No other changes. Ideally FESCo approves this Fedora wide so that the generator is available everywhere without exception. Makes it easier to converge on to reduce user confusion,
(1) + (2) = swap-on-ZRAM is enabled, and with a higher priority than default for swap-on-drive. Both co-exist, but swap-on-ZRAM is favored first. Hibernation is still possible if the swap-on-drive partition is big enough and all other requirements are met.
(1) + (2) + (3) = swap-on-ZRAM is enabled, no disk-based swap present. Fedora Workstation edition plans to do this (pending test day results and feedback).
[1]
There is a tl;dr section at the top. Highly recommend reading the whole article.
In defence of swap: common misconceptions
https://chrisdown.name/2018/01/02/in-defence-of-swap.html
[2]
https://www.kernel.org/doc/Documentation/blockdev/zram.txt
[3]
https://github.com/systemd/zram-generator
https://pagure.io/fedora-workstation/blob/master/f/hibernationstatus.md
Owner
- Name: Chris Murphy
- Email: chrismurphy@fedoraproject.org
Current status
- Targeted release: Fedora 33
- Last updated: 2020-05-30
- FESCo issue: <will be assigned by the Wrangler>
- Tracker bug: <will be assigned by the Wrangler>
- Release notes tracker: <will be assigned by the Wrangler>
Detailed Description
Basic function:
The system will use RAM normally up until it's full, and then start paging out to the swap-on-ZRAM device, just as if it were a real swap. But, there is no free lunch. The ZRAM driver starts to allocate memory at roughly 1/2 the rate of page outs, due to compression. This means swap is not as effective at page eviction, the rate is ~50% instead of 100%. But it is orders of magnitude faster that disk based swap.
ZRAM has about 0.1% overhead or ~1MiB/1GiB. If the workload never touches swap, living entirely inside RAM, the overhead is the sole cost, there is no preallocation of RAM.
Default ZRAM device configuration:
Create ZRAM device regardless of RAM size, using a ZRAM to RAM ratio of 1:2, and capped to 4GiB [4], with a higher than typical swap priority [5].
These values seem reasonable, and are based on prior work. Anaconda has two examples for setting swap size to 50% RAM: the no hibernation case, common outside x86; and its own current swap-on-ZRAM implementation. Fedora IoT's implementation also sets swap-on-ZRAM size to 50% RAM.
[4]
RFE: should be able to set a cap on zram device size
https://github.com/systemd/zram-generator/issues/10
[5]
RFE: should set priority #8
https://github.com/systemd/zram-generator/issues/8
Default installer behavior
The installer is currently responsible for creating a swap-on-disk device. This will be dropped. The zram-generator + configuration file will trigger the setup and activation of swap-on-ZRAM. This means hibernation isn't possible, even on systems that could support it.
Please see Supporting hibernation in Workstation edition for much more detailed information, including why it's increasingly likely hibernation isn't possible anyway, and a path to improving hibernation support.
Custom/Advance partitioning installer behavior
The user can add swap using Custom partitioning at install time. In that case, the installer will include the resume=UUID hint, so hibernation resume can happen. No change in behavior here.
Since swap-on-ZRAM is still enabled by default, there will be two swaps: swap-on-ZRAM, and swap-on-disk. The swap-on-ZRAM will have higher priority, thus being favored over disk based swap. The kernel is smart enough to know it can't hibernate to a ZRAM device, and will instead use disk based swap.
Test Day:
Recommendation is all editions/spins opt in to the feature, and participate in the soon to be scheduled test day. This will help tweak the default configuration, hopefully establishing a good one-size-fits all out of the box approach. But if that's not possible, the test day will be central to figuring out edition/spin specific defaults.
Feedback
Why not zswap?
Zswap is a similar idea, similar "z" affection, but with a totally different implementation. It is swap specific, uses a RAM cache, and requires a conventional swap partition existing already. It might be true certain workloads are better suited for using zswap. But swap-on-ZRAM depends only on volatile storage. This is simpler and it's more secure. Whereas zswap "spilling over" into the real swap on disk can leak user data if that swap device isn't encrypted. This is certainly a valid future feature for a new generator, or possibly zram-generator could be extended to include support for zswap via the configuration file.
https://www.kernel.org/doc/Documentation/vm/zswap.txt
You're enabling it on upgrades?
That's the current plan. There are some difficulties with upgrades right now in Fedora. We need to use weak dependency 'Supplements:' to cause new packages to be dragged in on upgrades. As a technical matter, feature owner is confident this feature will improve the experience of all users regardless of configuration. As a non-technical matter, it's recognized that (a) hey pal, you're messing with my customizations, not cool! and (b) swap always stinks, I don't care if it has a 'Z' in the name!
The dilemma is, the Fedora user base becomes fragmented without applying it to upgrades. The overall experience people are having is less consistent, and makes feedback inconsistent. All of this has to be balanced out.
Why systemd zram-generator?
It's the most upstream implementation to date, is fast and lightweight. It leverages existing systemd infrastructure setup the ZRAM block device, format it as swap, and swapon - all during early boot. It's very similar in behavior to fstab-generator, gpt-auto-generator, and cryptsetup-generator.
Converging on one implementation avoids user confusion. And while the alternatives are nice and work fine, a systemd generator is particularly well suited for this use case compared to a systemd service unit.
https://www.freedesktop.org/software/systemd/man/systemd.generator.html
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/TCY534JPIMZ3OXM5Q5E2ZH5PSAKQNGP7/
Why not a bigger ZRAM device?
It's possible some workloads will have less compressible data. Hence, not going with a 1:1 ZRAM to RAM ratio. Even a 2:1 ratio is not unreasonable *if* the compression ratio is at least 2:1. However, it's possible a system can get "stuck" in a kind of swap thrashing similar to conventional swap-on-disk, except it's CPU and memory bound, rather than IO bound. Feature owner thinks it's better to just oom, instead of getting overly aggressive with the ZRAM device size.
Benefit to Fedora
- significantly improves system responsiveness, especially when swap is under pressure;
- more secure, user data leaks into swap are on volatile media;
- complements on-going resource control work, including earlyoom;
- further reduces the time to out-of-memory kill, when workloads exceed limits;
- improves performance for both "no swap" and "existing swap" setups;
- without swap-on-disk, there's better utilization of a limited resource: benefit of swap without the disk space consumption;
Scope
- Proposal owners:
- add zram-generator package to comps for the editions/spins opting in
- means of per edition/spin configurations, if needed
- coordinate a test day
- Other developers:
- Anaconda is agreeable to deprecating their built-in implementation in favor of swap-on-ZRAM
- RFE's for zram-generator: users are not worse off if they don't happen
https://github.com/systemd/zram-generator/issues/10
https://github.com/systemd/zram-generator/issues/8
- Release engineering: #9495
- Policies and guidelines: N/A
- Trademark approval: N/A
Upgrade/compatibility impact
If all editions/spins opt in, add Supplements:fedora-release to zram-generator to pull it in on upgrades.
Existing systems without swap will have swap-on-ZRAM enabled.
Existing systems with swap-on-disk, will also have swap-on-ZRAM enabled (two swap devices), with higher priority for the ZRAM device. Existing swap-on-disk will not be removed.
How To Test
Any hardware. Any version of Fedora.
- dnf install zram-generator
- cp /usr/share/doc/zram-generator/zram-generator.conf.example /etc/systemd/zram-generator.conf
- Edit the configuration
- Reboot
- Check that swap is on a ZRAM device: zramctl, swapon
- Detailed check: journalctl -b -o short-monotonic | grep 'swap\|zram'
- Check that priority is higher than existing swap if two or more are listed. ## (Enhancement is needed for this.)
Feel free to run your usual workloads more aggressively or in parallel. Suspend-to-RAM and suspend-to-disk are expected to continue to work too (or at least hit all the same bugs as without ZRAM being used).
User Experience
The user won't notice anything. If their usual workload causes them to dread swap thrashing, they'll be surprised this doesn't happen. The user might get curious if they don't find a swap entry in /etc/fstab. Or they might get curious if they 'swapon' and see swap pointing to /dev/zram0 instead of a disk partition or LV.
Dependencies
N/A
Contingency Plan
- Contingency mechanism: Don't ship the generator = big hammer, but easy. Preferable to ship the generator, but only selectively ship configuration files = scalpel, pretty easy.
- Contingency deadline: Beta freeze
- Blocks release? No.
- Blocks product? No.
Documentation
Consider adding a hint in an /etc/fstab comment? There is no man page for this, and the documentation is also minimal, besides what's in this feature proposal. It's an open question how the user should get more information on how to configure and tweak it. But then, they don't have that for swap today either. There's just institutional knowledge.
Hence, a strong test day, with a lot of people and press coverage of the feature, might help spread the word for institutional knowledge changes coming.
Ideas welcome.
Release Notes
Pending feedback and test day.