From Fedora Project Wiki
(→‎Contingency Plan: tweak contingency deadline)
(Ready for wrangler)
Line 23: Line 23:


== Current status ==
== Current status ==
[[Category:ChangePageIncomplete]]
[[Category:ChangeReadyForWrangler]]
<!-- When your change proposal page is completed and ready for review and announcement -->
<!-- When your change proposal page is completed and ready for review and announcement -->
<!-- remove Category:ChangePageIncomplete and change it to Category:ChangeReadyForWrangler -->
<!-- remove Category:ChangePageIncomplete and change it to Category:ChangeReadyForWrangler -->

Revision as of 21:45, 21 December 2022

🔗 Shorter Shutdown Timer

This is a proposed Change for Fedora Linux.
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.

🔗 Summary

A downstream configuration change to reduce the systemd unit timeout from 2 minutes to 15 seconds.

🔗 Owner

  • Name: catanzaro
  • Email: mcatanzaro at redhat dot com
  • Name: aday
  • Email: aday at redhat dot com

🔗 Current status

  • Targeted release: Fedora Linux 38
  • Last updated: 2022-12-21
  • FESCo issue: <will be assigned by the Wrangler>
  • Tracker bug: <will be assigned by the Wrangler>
  • Release notes tracker: <will be assigned by the Wrangler>

🔗 Detailed Description

Currently, a service that fails to stop at shutdown time can block shutdown for up to 2 minutes. This is extremely frustrating for our users - someone goes to shutdown or reboot their system, and then unexpectedly has to wait for a long time before they can do anything else.

The most common service to cause this issue is PackageKit, but there are others.

When a service fails to shutdown when it is instructed to do so, it is not behaving properly, and it is preventing the system from behaving in an orderly and predictable manner. Desktop APIs exist for cases when services or apps legitimately need to prevent shutdown, and these allow the shutdown inhibit to be communicated to admins and users, so they understand what is happening. When the user decides to shut down anyway, services must terminate in a timely manner. The Workstation Working Group feels that 15 seconds is the maximum appropriate time for both system and user services, and that Fedora should be robust to buggy and misbehaving services that do not shut down in an appropriate manner.

🔗 History

The Workstation Working Group has been working on this issue for several years. Investigations have revealed that it's not possible to fix every misbehaving service: in some cases the misbehaviour comes from design flaws that are difficult to resolve.

An attempt has also been made to have the unit timeout changed in upstream systemd. That attempt did not go anywhere, despite various efforts to move it along.

To our knowledge, there are no issues that will result from forcing services to stop after 15 seconds on typical systems. However, system administrators may need to configure a higher timeout if waiting longer for a particular service, which may be true for database services, for example.

🔗 Feedback

The relevant Workstation Working Group ticket includes some discussion. This change was also previously proposed to FESCo.

🔗 Benefit to Fedora

The primary benefit of the change will be to mitigate a very annoying and - frankly - embarrassing bug. Our users shouldn't have to randomly sit waiting for their machine to shutdown. It will also encourage the correct use of shutdown inhibit APIs.

Although this change will "paper over" bugs in services without fixing them, we emphasize that reducing the timeout is not merely a workaround for buggy services, but also the desired permanent design. Of course it is desirable to fix the underlying bugs as well, but it doesn't make sense to require this before fixing the service timeout to match our needs.

🔗 Scope

  • Proposal owners:
  • Other developers:
    • Test their packages with the new behavior and report issues as necessary.
  • Release engineering: #11193
  • Policies and guidelines: No policy or guideline changes required
  • Trademark approval: N/A (not needed for this Change)
  • Alignment with Objectives: N/A (not needed for this Change)

🔗 Upgrade/compatibility impact

System and user services will be killed with SIGKILL 15 seconds after receiving SIGTERM, from previously 1 minute 30 seconds for most system and user services, or 2 minutes for user manager system services (the system service that runs all user services for a user), so services will have less time to shut down gracefully by default. These defaults are configurable and system administrators who require longer timeouts would need to adjust them before or after upgrade. You may edit the DefaultTimeoutStopSec= setting in /etc/systemd/user.conf and /etc/systemd/system.conf. You may also create a drop-in to change the TimeoutStopSec= setting for user@service.

🔗 How To Test

Given the intermittent and unpredictable nature of the bug that is being targeted, the best way to test is by using the upcoming Fedora release. Are shutdown delays eliminated as intended? Do system services experience issues as a result of the change?

🔗 User Experience

This change will make the Fedora user experience less annoying. It will also encourage the use of the existing inhibit APIs, which provide better feedback for users when system shutdown does need to be delayed.

🔗 Dependencies

No specific changes are required in other packages. However, service developers may want to take this opportunity to examine the shutdown behavior of their components.

🔗 Contingency Plan

  • Contingency mechanism: the change owners will revert the change in systemd.
  • Contingency deadline: if we back out the change it would be best to do it before beta freeze, but this can happen at any point.
  • Blocks release? No.

🔗 Documentation

Documentation isn't required for this minor configuration change. Services that legitimately need to prevent system shutdown should use systemd inhibit. Desktop applications can use the XDG inhibit portal.

🔗 Release Notes