From Fedora Project Wiki

< Changes

Revision as of 16:18, 8 July 2020 by Bcotton (talk | contribs) (Submitted to FESCo)

Enable EarlyOOM on Fedora KDE

Summary

As Fedora Workstation did in F32, install earlyoom package, and enable it by default. If both RAM and swap go below 10% free, earlyoom issues SIGTERM to the process with the largest oom_score. If both RAM and swap go below 5% free, earlyoom issues SIGKILL to the process with the largest oom_score. The idea is to recover from out of memory situations sooner, rather than the typical complete system hang in which the user has no other choice but to force power off.

Owner

Current status

  • FESCo issue: #2435
  • Tracker bug: <will be assigned by the Wrangler>
  • Release notes tracker: <will be assigned by the Wrangler>

Detailed Description

Shamelessly copied from Workstation, which did it in the last release:

Certain workloads have heavy memory demands, quickly consume all of RAM, and start to heavily page out to swap. (Heavy paging, is often called "swap thrashing" for added descriptive effect, probably because it's noticeable and annoying). Incidental swap usage is a good thing, it frees up memory for active pages used by a process. Heavy swap usage quickly leads to a very negative UX, because it's slow, even on modern SSDs. Due to installer defaults, the swap partition is made the same size as available memory (at install time), which can be huge. This just extends swap thrashing time.

On the one hand, we want this resource hungry job to complete. On the other hand, we want our system to be responsive while that other work is going on. But once the GUI stutters or even comes to an apparent stand still (hang), we're really wishing the kernel oom-killer would kick in and free up memory, so we can start over (maybe using memory or thread limiting options - which arguably should be more intelligently figured out, and that too is a work in progress but beyond the scope of this feature).

However, once in a heavy swap scenario, it's relatively common the system gets stuck in it, where GUI interactivity is terrible to non-existent, and also the kernel oom-killer doesn't trigger. From a certain point of view, this is working as intended. The kernel oom-killer is concerned about keeping the kernel running. It's not at all concerned about user space responsiveness.

Instead of the system becoming completely unresponsive for tens of minutes, hours or days, this feature expects that an offending process (determined by oom_score, same as the kernel oom-killer) will be killed off within seconds or a few minutes.

Feedback

Why not all desktops?

They're welcome to join in.

This will kill my applications

The service is easy enough for administrators to tune or disable, so that should not prevent making this the default. Workstation has used it for a release without any apparent trouble.

Benefit to Fedora

KDE users will be able to take advantage of the benefits Workstation users got from enabling earlyOOM in Fedora 32:

  • improved user experience by more quickly regaining control over one's system, rather than having to force power off in low-memory situations where there's aggressive swapping. Once a system becomes unresponsive, it's completely reasonable for the user to assume the system is lost, but that includes high potential for data loss.
  • reducing forced poweroff as the main work around will increase data collection, improving understanding of low memory situations and how to handle them better
  • earlyoom first sends SIGTERM to the chosen process, so it has a chance of a proper shutdown, unlike the kernel's oom-killer

Scope

# enable earlyoom by default on KDE
enable earlyoom.service
  • Other developers: None, unless KDE-based Spins/Labs want to opt out
  • Release engineering: N/A
  • Policies and guidelines: N/A
  • Trademark approval: N/A

Upgrade/compatibility impact

earlyoom.service will be enabled on upgrade. An upgraded system should exhibit the same behaviors as a newly-installed system.

How To Test

  • Fedora 31/32 KDE users can test today:
    • sudo dnf install earlyoom
    • sudo systemctl enable --now earlyoom

And then attempt to cause an out of memory situation. Examples:

User Experience

earlyoom sends SIGTERM to processes based on oom_score when both memory and swap have less than 10% free and SIGKILL when below 5%.

Dependencies

None

Contingency Plan

  • Contingency mechanism: (What to do? Who will do it?) Owner reverts changes
  • Contingency deadline: Final freeze
  • Blocks release? No

Documentation

Release Notes

The earlyoom service is now enabled by default in Fedora KDE.

The earlyoom service monitors system memory usage. If free memory falls below a set limit, earlyoom terminates an appropriate process to free up memory. As a result, the system does not become unresponsive for long periods of time in low-memory situations.

The following is the default earlyoom configuration:

  • If both RAM and swap go below 10% free, earlyoom sends the SIGTERM signal to the process with the largest oom_score.
  • If both RAM and swap go below 5% free, earlyoom sends the SIGKILL signal to the process with the largest oom_score.

For more information, see the earlyoom man page.