Revision as of 17:42, 4 January 2020

Enable EarlyOOM

Summary

Install earlyoom package, and enable it by default. If both RAM and swap go below 10% free, earlyoom issues SIGTERM to the process with the largest oom_score. If both RAM and swap go below 5% free, earlyoom issues SIGKILL to the process with the largest oom_score. The idea is to recover from out of memory situations sooner, rather than the typical complete system hang in which the user has no other choice but to force power off.

Owner

Name: Chris Murphy
Email: bugzilla@colorremedies.com

Current status

Targeted release: Fedora 32
Last updated: 2020-01-04
Tracker bug: <will be assigned by the Wrangler>
Release notes tracker: <will be assigned by the Wrangler>

Detailed Description

Workstation working group has discussed "better interactivity in low-memory situations" for some months:

https://pagure.io/fedora-workstation/issue/98
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/XUZLHJ5O32OX24LG44R7UZ2TMN6NY47N/

Certain workloads have heavy memory demands, quickly consume all of RAM, and start to heavily page out to swap. (Heavy paging, is often called "swap thrashing" for added descriptive effect, probably because it's noticeable and annoying). Incidental swap usage is a good thing, it frees up memory for active pages used by a process. Heavy swap usage quickly leads to a very negative UX, because it's slow, even on modern SSDs. Due to installer defaults, the swap partition is made the same size as available memory (at install time), which can be huge. This just extends swap thrashing time.

On the one hand, we want this resource hungry job to complete. On the other hand, we want our system to be responsive while that other work is going on. But once the GUI stutters or even comes to an apparent stand still (hang), we're really wishing the kernel oom-killer would kick in and free up memory, so we can start over (maybe using memory or thread limiting options - which arguably should be more intelligently figured out, and that too is a work in progress but beyond the scope of this feature).

However, once in a heavy swap scenario, it's relatively common the system gets stuck in it, where GUI interactivity is terrible to non-existent, and also the kernel oom-killer doesn't trigger. From a certain point of view, this is working as intended. The kernel oom-killer is concerned about keeping the kernel running. It's not at all concerned about user space responsiveness.

Instead of the system becoming completely unresponsive for tens of minutes, hours or days, this feature expects that an offending process (determined by oom_score, same as the kernel oom-killer) will be killed off within seconds or a few minutes.

This is an incremental improvement in user experience, but admittedly still suboptimal. There is additional work on-going to improve the user experience further.

Workstation working group discussion specific to enabling earlyoom by default https://pagure.io/fedora-workstation/issue/119

Other in-progress solutions:
https://gitlab.freedesktop.org/hadess/low-memory-monitor

Background information on this complicated problem:
https://www.kernel.org/doc/gorman/html/understand/understand014.html
https://www.kernel.org/doc/gorman/html/understand/understand016.html
https://lwn.net/Articles/317814/

Feature concerns:
Suboptimal behavior if the system has no swap (this is not a default setup, but still needs improvement): https://pagure.io/fedora-workstation/issue/119#comment-618480

Benefit to Fedora

There are two major benefits to Fedora:

improved user experience by more quickly regaining control over one's system, rather than having to force power off in low-memory situations where there's aggressive swapping. Once a system becomes unresponsive, it's completely reasonable for the user to assume the system is lost, but that includes high potential for data loss.

reducing forced poweroff as the main work around will increase data collection, improving understanding of low memory situations and how to handle them better

Scope

Proposal owners:

a. Modify https://pagure.io/fedora-comps/blob/master/f/comps-f32.xml.in to include earlyoom package for Workstation.
b. Modify https://src.fedoraproject.org/rpms/fedora-release/blob/master/f/80-workstation.preset to include:

# enable earlyoom by default on workstation
enable earlyoom.service

Other developers:

Restricted to Workstation edition, unless other editions/spins want to opt-in.

Release engineering: #9141 (a check of an impact with Release Engineering is needed)

Policies and guidelines: N/A
Trademark approval: N/A

Upgrade/compatibility impact

earlyoom.service will be enabled on upgrade. An upgraded system should exhibit the same behaviors as a clean installed system.

How To Test

Fedora 30/31 users can test today, any edition or spin:

sudo dnf install earlyoom
sudo systemctl enable --now earlyoom

And then attempt to cause an out of memory situation. Examples:
tail /dev/zero
https://lkml.org/lkml/2019/8/4/15

Fedora Workstation 32 (and Rawhide) users will see this service is already enabled. It can be toggled with sudo systemctl start/stop earlyoom where start means earlyoom is running, and stop means earlyoom is not running.

User Experience

The most egregious instances this change is trying to mitigate:

a. RAM is completely used
b. Swap is completely used
c. System becomes unresponsive to the user as swap thrashing has ensued
--> earlyoom disabled, the user often gives up and forces power off (in cmurf's limited testing, this condition lasts >30 minutes with no kernel triggered oom killer and no recovery)
--> earlyoom enabled, the system likely still becomes unresponsive but oom killer is triggered in much less time (seconds or a few minutes, in cmurf's testing, after less than 10% RAM and 10% swap is remaining)

earlyoom starts sending SIGTERM once both memory and swap are below their respective PERCENT setting, default 10%. It sends SIGKILL once both are below their respective KILL_PERCENT setting, default 5%.

what exactly gets killed? This is complicated and non-obvious to know in advance, whether earlyoom is running or not. We probably want to see the process consuming the most resources killed. But that isn't necessarily what has the highest oom_score. In cmurf's testing (both with and without earlyoom running), often one seemingly unrelated, but important, process gets killed off before the one mostly causing the problem. He's seen oom-killer take out sshd, systemd-journald, sssd-nss, GNOME Maps, and Text Editor, before the wayward process. Why? Good question. Does this suggest initial oom_score for processes are suboptimally set? Are there really old bugs long ignored because we're all just giving up and hitting the power button and starting over? All of the above? Are these bugs? It's decently likely there's a bug any time you're experiencing unexpected behavior.

The package includes configuration file /etc/default/earlyoom which sets option -r 60 causing a memory report to be entered into the journal every minute.

Dependencies

earlyoom package has no dependencies

Contingency Plan

Contingency mechanism: Owner will revert all changes
Contingency deadline: Final freeze
Blocks release? No
Blocks product? No

Documentation

man earlyoom

https://www.kernel.org/doc/gorman/html/understand/understand016.html

Release Notes

Earlyoom service is enabled by default, which will cause kernel oom-killer to trigger sooner. To revert to previous behavior:
sudo systemctl disable earlyoom.service

And to customize see man earlyoom.

@@ Line 2: / Line 2: @@
 == Summary ==
-Install earlyoom package, and enable it by default. This will cause the kernel oomkiller to trigger sooner, but will not affect which process it chooses to kill off. The idea is to recover from out of memory situations sooner, rather than the typical complete system hang in which the user has no other choice but to force power off.
+Install earlyoom package, and enable it by default. If both RAM and swap go below 10% free, earlyoom issues SIGTERM to the process with the largest oom_score. If both RAM and swap go below 5% free, earlyoom issues SIGKILL to the process with the largest oom_score. The idea is to recover from out of memory situations sooner, rather than the typical complete system hang in which the user has no other choice but to force power off.
 == Owner ==

Search

Changes/EnableEarlyoom: Difference between revisions