m (→How To Test) |
(Add trackers) |
||
(18 intermediate revisions by 2 users not shown) | |||
Line 8: | Line 8: | ||
This proposal adds cgroup based resource protections for the active graphical session. This is done by passing a memory protection of 250MiB to active users (capped at 10% of system memory) and by enabling other cgroup controllers (CPU, IO) to ensure important session processes get the resources they need. | This proposal adds cgroup based resource protections for the active graphical session. This is done by passing a memory protection of 250MiB to active users (capped at 10% of system memory) and by enabling other cgroup controllers (CPU, IO) to ensure important session processes get the resources they need. | ||
See: https://pagure.io/fedora-workstation/issue/154 | |||
== Owner == | == Owner == | ||
Line 24: | Line 26: | ||
== Current status == | == Current status == | ||
[[Category: | [[Category:ChangeAcceptedF33]] | ||
[[Category:SelfContainedChange]] | |||
* Targeted release: [[Releases/ | * Targeted release: [[Releases/33|Fedora 33]] | ||
* Last updated: <!-- this is an automatic macro — you don't need to change this line --> {{REVISIONYEAR}}-{{REVISIONMONTH}}-{{REVISIONDAY2}} | * Last updated: <!-- this is an automatic macro — you don't need to change this line --> {{REVISIONYEAR}}-{{REVISIONMONTH}}-{{REVISIONDAY2}} | ||
<!-- After the change proposal is accepted by FESCo, tracking bug is created in Bugzilla and linked to this page | <!-- After the change proposal is accepted by FESCo, tracking bug is created in Bugzilla and linked to this page | ||
Line 43: | Line 39: | ||
CLOSED as NEXTRELEASE -> change is completed and verified and will be delivered in next release under development | CLOSED as NEXTRELEASE -> change is completed and verified and will be delivered in next release under development | ||
--> | --> | ||
* FESCo issue: | * FESCo issue: [https://pagure.io/fesco/issue/2457 #2457] | ||
* Tracker bug: | * Tracker bug: [https://bugzilla.redhat.com/show_bug.cgi?id=1867216 #1867216] | ||
* Release notes tracker: | * Release notes tracker: [https://pagure.io/fedora-docs/release-notes/issue/547 #547] | ||
== Detailed Description == | == Detailed Description == | ||
Line 51: | Line 47: | ||
<!-- Expand on the summary, if appropriate. A couple sentences suffices to explain the goal, but the more details you can provide the better. --> | <!-- Expand on the summary, if appropriate. A couple sentences suffices to explain the goal, but the more details you can provide the better. --> | ||
Graphical sessions should always be | Graphical sessions should always be responsive, even when the machine is doing a lot work or in the extreme case has started to thrash. We have started to ship EarlyOOM with F32, however, while it is a good solution to this date, it is shipped with the understanding of being superseded by other approaches in the future. | ||
With `uresourced` a small daemon was written that enables protection of the graphical user session. It serves the following main purposes: | With `uresourced` a small daemon was written that enables protection of the graphical user session. It serves the following main purposes: | ||
Line 61: | Line 57: | ||
Precautions are in place to not negatively affect systems: | Precautions are in place to not negatively affect systems: | ||
* Active users will | * Active users will receive a protected memory allocation of 250MiB allocation, but capped at 10% of system memory. So low memory systems should not be negatively impacted. Said differently, the memory subsystem treats the core session processes in comparison to everything else as if they were using 250MiB less than they actually are. | ||
* `uresourced` detects whether the user session is using systemd to prevent passing memory guarantees to processes that are not important (e.g. not a GNOME session). | * `uresourced` detects whether the user session is using systemd to prevent passing memory guarantees to processes that are not important (e.g. not a GNOME session). | ||
* Enabling the IO controller has no effect on Fedora currently. | |||
NOTES: | NOTES: | ||
Line 109: | Line 106: | ||
<!-- What work do the feature owners have to accomplish to complete the feature in time for release? Is it a large change affecting many parts of the distribution or is it a very isolated change? What are those changes?--> | <!-- What work do the feature owners have to accomplish to complete the feature in time for release? Is it a large change affecting many parts of the distribution or is it a very isolated change? What are those changes?--> | ||
* Install `uresourced` on workstations by default | * Install `uresourced` on workstations by default | ||
* Add a preset | * Add a preset to enable `uresourced` by default | ||
* Other developers: | * Other developers: no further changes are needed <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | ||
<!-- What work do other developers have to accomplish to complete the feature in time for release? Is it a large change affecting many parts of the distribution or is it a very isolated change? What are those changes?--> | <!-- What work do other developers have to accomplish to complete the feature in time for release? Is it a large change affecting many parts of the distribution or is it a very isolated change? What are those changes?--> | ||
* Release engineering: [https://pagure.io/releng/ | * Release engineering: [https://pagure.io/releng/issue/9592] (a check of an impact with Release Engineering is needed) <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | ||
<!-- Does this feature require coordination with release engineering (e.g. changes to installer image generation or update package delivery)? Is a mass rebuild required? include a link to the releng issue. | <!-- Does this feature require coordination with release engineering (e.g. changes to installer image generation or update package delivery)? Is a mass rebuild required? include a link to the releng issue. | ||
The issue is required to be filed prior to feature submission, to ensure that someone is on board to do any process development work and testing and that all changes make it into the pipeline; a bullet point in a change is not sufficient communication --> | The issue is required to be filed prior to feature submission, to ensure that someone is on board to do any process development work and testing and that all changes make it into the pipeline; a bullet point in a change is not sufficient communication --> | ||
* Policies and guidelines: N/A (not | * Policies and guidelines: N/A (not needed) <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | ||
<!-- Do the packaging guidelines or other documents need to be updated for this feature? If so, does it need to happen before or after the implementation is done? If a FPC ticket exists, add a link here. --> | <!-- Do the packaging guidelines or other documents need to be updated for this feature? If so, does it need to happen before or after the implementation is done? If a FPC ticket exists, add a link here. --> | ||
Line 127: | Line 124: | ||
No impact. The worst case scenario is that the feature will not be enabled. | No impact. The worst case scenario is that the feature will not be enabled. | ||
== How To Test == | == How To Test == | ||
<!-- This does not need to be a full-fledged document. Describe the dimensions of tests that this change implementation is expected to pass when it is done. If it needs to be tested with different hardware or software configurations, indicate them. The more specific you can be, the better the community testing can be. | <!-- This does not need to be a full-fledged document. Describe the dimensions of tests that this change implementation is expected to pass when it is done. If it needs to be tested with different hardware or software configurations, indicate them. The more specific you can be, the better the community testing can be. | ||
Line 151: | Line 144: | ||
* Reboot (to make absolutely sure the user session has picked up all changes, logout may *not* be sufficient) | * Reboot (to make absolutely sure the user session has picked up all changes, logout may *not* be sufficient) | ||
* Check values in `/sys/fs/cgroup/user.slice/memory.low`, `/sys/fs/cgroup/user.slice/user-1000.slice/memory.low`, `/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.low` and `/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/session.slice/memory.low` (should usually be 250MiB with the default configuration). | * Check values in `/sys/fs/cgroup/user.slice/memory.low`, `/sys/fs/cgroup/user.slice/user-1000.slice/memory.low`, `/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.low` and `/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/session.slice/memory.low` (should usually be 250MiB with the default configuration). | ||
* Verify that the allocation is zero if the user is not active on any seat (e.g. switch to GDM and log in via SSH or by doing a `sleep 10; cat ...` and coming back). | |||
* Check enabled controllers in `/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/cgroup.controllers` (should be `cpu io memory pids`). | * Check enabled controllers in `/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/cgroup.controllers` (should be `cpu io memory pids`). | ||
Beyond that, a test can be done to show that the cgroup kernel controllers are actually beneficial in various scenarios. Possible examples are: | Beyond that, a test can be done to show that the cgroup kernel controllers are actually beneficial in various scenarios. Possible examples are: | ||
* Running mprime (http://www.mersenne.org/ftp_root/gimps/p95v298b6.linux64.tar.gz); choose local stress test, repeat by selecting 15 <br>NOTE: mcatanzaro has reported a huge impact, with both the session remaining mostly responsive and EarlyOOM not kicking in ( | * Running mprime (http://www.mersenne.org/ftp_root/gimps/p95v298b6.linux64.tar.gz); choose local stress test, repeat by selecting 15 <br>NOTE: mcatanzaro has reported a huge impact, with both the session remaining mostly responsive and EarlyOOM not kicking in (EarlyOOM not kicking in is odd, there might be other relevant factors to reproduce). The proposal owners have not been able to reproduce this corner case so far. | ||
* | * Log in two user A and B (same seat), run `stress-ng -c NCPUS` in both. Switch between them and look at `top` to verify that the active user gets a 5 times higher CPU share overall. | ||
== User Experience == | == User Experience == | ||
Line 179: | Line 170: | ||
There are no further dependencies. | There are no further dependencies. | ||
== Contingency Plan == | == Contingency Plan == | ||
<!-- If you cannot complete your feature by the final development freeze, what is the backup plan? This might be as simple as "Revert the shipped configuration". Or it might not (e.g. rebuilding a number of dependent packages). If you feature is not completed in time we want to assure others that other parts of Fedora will not be in jeopardy. --> | <!-- If you cannot complete your feature by the final development freeze, what is the backup plan? This might be as simple as "Revert the shipped configuration". Or it might not (e.g. rebuilding a number of dependent packages). If you feature is not completed in time we want to assure others that other parts of Fedora will not be in jeopardy. --> | ||
* Contingency mechanism: | * Contingency mechanism: Remove uresourced from the default install set and possibly also remove the preset again | ||
<!-- When is the last time the contingency mechanism can be put in place? This will typically be the beta freeze. --> | <!-- When is the last time the contingency mechanism can be put in place? This will typically be the beta freeze. --> | ||
* Contingency deadline: | * Contingency deadline: Final freeze <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | ||
<!-- Does finishing this feature block the release, or can we ship with the feature in incomplete state? --> | <!-- Does finishing this feature block the release, or can we ship with the feature in incomplete state? --> | ||
* Blocks release? | * Blocks release? No <!-- REQUIRED FOR SYSTEM WIDE CHANGES --> | ||
* Blocks product? | * Blocks product? - <!-- Applicable for Changes that blocks specific product release/Fedora.next --> | ||
== Documentation == | == Documentation == | ||
Line 197: | Line 185: | ||
Upstream is identical to the change owner. The upstream repository has a further README https://gitlab.freedesktop.org/benzea/uresourced (which should not contain any more information than what is here). | Upstream is identical to the change owner. The upstream repository has a further README https://gitlab.freedesktop.org/benzea/uresourced (which should not contain any more information than what is here). | ||
== Release Notes == | == Release Notes == |
Latest revision as of 18:04, 7 August 2020
Reserve resources for active users (Workstation)
Summary
This proposal adds cgroup based resource protections for the active graphical session. This is done by passing a memory protection of 250MiB to active users (capped at 10% of system memory) and by enabling other cgroup controllers (CPU, IO) to ensure important session processes get the resources they need.
See: https://pagure.io/fedora-workstation/issue/154
Owner
- Name: Benjamin Berg
- Email: bberg@redhat.com
- Product: Workstation
- Responsible WG: Workstation
Current status
- Targeted release: Fedora 33
- Last updated: 2020-08-07
- FESCo issue: #2457
- Tracker bug: #1867216
- Release notes tracker: #547
Detailed Description
Graphical sessions should always be responsive, even when the machine is doing a lot work or in the extreme case has started to thrash. We have started to ship EarlyOOM with F32, however, while it is a good solution to this date, it is shipped with the understanding of being superseded by other approaches in the future.
With uresourced
a small daemon was written that enables protection of the graphical user session. It serves the following main purposes:
- Safely modify existing GNOME systemd units to closer adhere to https://systemd.io/DESKTOP_ENVIRONMENTS/ (until this is merged upstream).
- Enables the CPU and IO cgroup controllers for users to prevent e.g. fork bombs from taking over the system.
- Allocates a memory guarantee for any *active* user which is distributed to core session processes.
Precautions are in place to not negatively affect systems:
- Active users will receive a protected memory allocation of 250MiB allocation, but capped at 10% of system memory. So low memory systems should not be negatively impacted. Said differently, the memory subsystem treats the core session processes in comparison to everything else as if they were using 250MiB less than they actually are.
uresourced
detects whether the user session is using systemd to prevent passing memory guarantees to processes that are not important (e.g. not a GNOME session).- Enabling the IO controller has no effect on Fedora currently.
NOTES:
uresourced
is designed to be obsoleted. Everything it does should be absorbed by other upstreams. However, it is a good and safe solution that eases development and permits shipping the benefits to users now.- Enabling the cgroup controllers may slightly increase the scheduling overhead that the kernel imposes. I don't have numbers right now, but expect this to be <=1% of overall system CPU time.
Feedback
Benefit to Fedora
This change proposal will improve interactivity of graphical sessions in certain situations. It also is an important step on the path to reap the benefits of systemd and cgroups in workstation scenarios.
Scope
- Proposal owners:
* Installuresourced
on workstations by default * Add a preset to enableuresourced
by default
- Other developers: no further changes are needed
- Release engineering: [1] (a check of an impact with Release Engineering is needed)
- Policies and guidelines: N/A (not needed)
- Trademark approval: N/A (not needed for this Change)
Upgrade/compatibility impact
No impact. The worst case scenario is that the feature will not be enabled.
How To Test
Testing this has multiple aspects. From the technical side, a test is as simple as:
- Install and enable
uresourced
- Reboot (to make absolutely sure the user session has picked up all changes, logout may *not* be sufficient)
- Check values in
/sys/fs/cgroup/user.slice/memory.low
,/sys/fs/cgroup/user.slice/user-1000.slice/memory.low
,/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/memory.low
and/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/session.slice/memory.low
(should usually be 250MiB with the default configuration). - Verify that the allocation is zero if the user is not active on any seat (e.g. switch to GDM and log in via SSH or by doing a
sleep 10; cat ...
and coming back). - Check enabled controllers in
/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/cgroup.controllers
(should becpu io memory pids
).
Beyond that, a test can be done to show that the cgroup kernel controllers are actually beneficial in various scenarios. Possible examples are:
- Running mprime (http://www.mersenne.org/ftp_root/gimps/p95v298b6.linux64.tar.gz); choose local stress test, repeat by selecting 15
NOTE: mcatanzaro has reported a huge impact, with both the session remaining mostly responsive and EarlyOOM not kicking in (EarlyOOM not kicking in is odd, there might be other relevant factors to reproduce). The proposal owners have not been able to reproduce this corner case so far. - Log in two user A and B (same seat), run
stress-ng -c NCPUS
in both. Switch between them and look attop
to verify that the active user gets a 5 times higher CPU share overall.
User Experience
See other sections.
Dependencies
There are no further dependencies.
Contingency Plan
- Contingency mechanism: Remove uresourced from the default install set and possibly also remove the preset again
- Contingency deadline: Final freeze
- Blocks release? No
- Blocks product? -
Documentation
Upstream is identical to the change owner. The upstream repository has a further README https://gitlab.freedesktop.org/benzea/uresourced (which should not contain any more information than what is here).