From Fedora Project Wiki

< User:Zbyszek

Revision as of 17:02, 6 November 2017 by Zbyszek (talk | contribs) (create)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Status

Systemd provides various protection features which can be used to easily constrain system services so that they cannot be used extract user data or further compromise the machine.

Available protections as of systemd 235

Each item includes a subjective evaluation on three "axes":

  • is it much work to enable? One of "easy", "medium", or "hard"
  • how likely it is to cause trouble? One of "safe", "medium", and "questionable"
  • is it a building block or a high-level setting that abstracts the details? One of "low-level" and "high-level".
PrivateTmp

Isolate /tmp and /var/tmp directories of the service using mount namespaces. Neither users can see those directories which are used by the service, nor the service can see the directories outside of its namespace.

(easy, safe, high-level)

ReadOnlyPaths, InaccessiblePaths

Those settings use mount namespaces to make various directories either read-only or completely inaccessible for the service.

(medium, medium, low-level)

ProtectHome and ProtectSystem

A specialization of ReadOnlyPaths and InaccessiblePaths applied to /home and /usr and some other directories. Different levels are possible: either making parts of the tree read-only, or hiding them completely.

(easy, medium, high-level)

ProtectKernelTunables

This uses mount namespaces to make parts of /sys and /proc read-only.

(easy, medium, high-level)

ProtectControlGroups

This uses mount namespaces to protect /sys/fs/cgroup from writing.

(easy, safe, high-level)

ProtectKernelModules

This uses a combination of mount namespaces, capability removal, and seccomp to prevent explicit module load requests.

(easy, safe, high-level)

RestrictRealtime

This setting uses seccomp to limit access to realtime scheduling.

(easy, safe, high-level)

MemoryDenyWriteExecute

Uses seccomp to prevent creation of writable executable mappings.

(easy, medium, low-level)

NoNewPrivileges

Disable elevation of privileges based on files with setuid, setgid, and capability bits.

(easy, medium, low-level)

RestrictNamespaces

Limits access to namespace manipulation commands, to prevent exploitation of kernel bugs.

(easy, safe, high-level)

SystemCallArchitectures

Limit allowed system call architectures to the specified list. In particular SystemCallArchitectures=native is useful.

(easy, medium, high-level)

LockPersonality

Prevent switching of personality, to prevent exploitation of kernel bugs.

(easy, safe, high-level)

PrivateDevices

This uses mount namespaces to hide device nodes (except for pseudo-devices like /dev/null and /dev/full).

(easy, safe, high-level)

PrivateNetwork

This uses network namespaces to cut the service off from the network. Unfortunately this also disables AF_NETLINK and abstract AF_UNIX sockets.

(easy, hard, high-level)

IPAddressDeny

This uses eBPF filters to firewall the service from the network. The difference wrt. PrivateNetwork is that this is more explicit and only applies to IPv4 and IPv6.

(medium, medium, low-level)

PrivateUsers

This runs the service in a user namespaces. The service is unprivileged in the main namespace, and only sees root and its own user.

(medium, hard, high-level)

SystemCallFilter

Installs a seccomp filter that blacklists or whitelists system calls. This is a very powerful setting, but requires intimate knowledge of the service (including all libraries).

(hard, questionable, low-level)

CapabilityBoundingSet/AmbientCapabilities/SecureBits

Trim the capability sets of the service.

(hard, questionable, low-level)

Proposed FESCo decision

If it is possible to run the service as a user instead of as root, it MUST use User= or implement custom setuid mechanism.

Services MUST use PrivateTmp, ProtectHome, ProtectSystem, unless they are running in a way in which the full file system is not visible through some other means. If access only to a limited set of paths is necessary, ReadWritePaths= SHOULD be used instead of forgoing ProtectHome and ProtectSystem entirely.

Services which run as root and use PrivateTmp, ProtectHome, ProtectSystem, SHOULD combine that with either CapabilityBoundingSet=~CAP_SYS_ADMIN or SystemCallFilter=~@mount, see https://www.freedesktop.org/software/systemd/man/systemd.exec.html#ReadWritePaths=.

Services SHOULD use SystemCallArchitectures=native and RestrictNamespaces=yes, LockPersonality=yes, PrivateDevices=yes.

Services which run as root SHOULD use ProtectKernelTunables=yes, ProtectControlGroups=yes, ProtectKernelModules=yes, RestrictRealtime=yes, and RestrictNamespaces=yes.

Services which do not need network access or only need it in limited form, CAN run with PrivateNetwork=yes or some IPAddressDeny= filters. Those settings can be problematic, for example they interfere with some nss modules, so their use should be carefuly considered.

In all cases, if those protection settings interfere with the operation of the service, they should be skipped, obviously. This SHOULD be documented, either in the .service file or in .spec. For example, services which need to exchange information through /tmp are exempt from PrivateTmp use. Similarly, services which need write access to /usr or /etc for system updates are exempt from ProtectSystem settings. Services which need access to user files for backup purposes are obviously exempt from ProtectHome, etc.

Statistics

Current status of all .service files is shown at https://in.waw.pl/~zbyszek/fedora/protections/protections.html.