Features/UsrMove

= Move all to /usr =

Summary
Refer to http://www.freedesktop.org/wiki/Software/systemd/TheCaseForTheUsrMerge first.

Provide a simple way of mounting almost the entire installed operating system read-only, atomically snapshot it, or share it between multiple hosts to save maintenance and space. Instead of spreading RPM package content all over the place in the filesystem, and artificially separate /bin from /usr/bin and /lib from /usr/lib, move all content to /usr and provide only symlinks in the root filesystem.

/usr on its own filesystem provides a lot of valuable options in custom setups. For historic reasons, we split-off more and more tools from /usr and put them in /. But, advanced features in today's systems can not really bootup with an empty /usr anymore. More and more fails in subtle ways in such setups.

Instead of moving more tools to /, we today already require /usr to be mounted from inside the initramfs, to be available before the real 'init' starts. The split of the root filesystem and /usr serves no purpose in Linux anymore and only complicates or prevents simple and more flexible setups.

Owner

 * Name: Harald Hoyer
 * Email: harald@redhat.com
 * Name: Kay Sievers
 * Email: kay@redhat.com

Current status

 * Targeted release: Fedora 17
 * Last updated: 2012-02-07
 * Percentage of completion: 100%

Detailed Description
There is no way to reliably bring up a modern system with an empty /usr, there are two alternatives to fix it: copy /usr back to the rootfs or use an initramfs which can hide the split-off from the system.

Historically /bin, /sbin, /lib had the purpose to contain the utilities to mount /usr. This role can now be taken by the initramfs. Because the initramfs knows, where to find the root partition (which includes /etc), it can parse /etc/fstab and other configuration files and mount /usr before it finally switches the root partition and executes /usr/bin/init. From this point on init mounts the remaining partitions in /etc/fstab and the system starts as usual.

The long-term plan is to clean up the mess and confusion the current split of / vs. /usr has created. All tools will move back to /usr where they belong, and the rootfs will only contain compat-symlinks into /usr. Almost the entire system installed by packages will reside in /usr. This will split all non-host specific data to /usr. /usr can then be seen as the Unix System Resources partition (/System), which defines the base operating system (e.g. F18 or RHEL-7).

This new /usr could be mounted read-only by default, while the rootfs is read-write and contains only empty mount points, compat-symlinks to /usr and the host-specific data like /etc, /root, /srv. Compared to today's setups, the rootfs will be very small. The new /usr could also easily be shared read-only across several systems, and it would contain almost the entire system. Such setups are more efficient, can optionally provide a lot more security, are more flexible, provide more sane options for custom setups, and are much simpler to setup and maintain.

This leaves us with the following well-defined directories, which compose the base of the system:


 * /usr - installed system; shareable; possibly read-only
 * /etc - config data; non-shareable
 * /var - persistent data; non-shareable;
 * /run - volatile data; non-shareable; mandatory tmpfs filesystem

/ `-- lib64 -> usr/lib64
 * -- etc
 * -- usr
 * |-- bin
 * |-- sbin
 * |-- lib
 * `-- lib64
 * -- run
 * -- var
 * -- bin -> usr/bin
 * -- sbin -> usr/sbin
 * -- lib -> usr/lib

Benefit to Fedora

 * Simpler and cleaner overall file system layout, with full compatibility.
 * Clear separation of operating system and host specific resources.
 * Improve compatibility with other Unixes/linux, no confusion about tools install locations, no $PATH fiddling, all possible paths to a binary will always work. All binaries will be available on both /usr and / thus minimizing compatibility problems.
 * Improve compatibility with build systems such as GNU autotools who never have been aware of the /usr split in the first place
 * Minimize difference to other Unixes, such as Solaris, which already did the same move
 * Isolate the vendor-supplied mostly read-only operating system resources from the rest, thus allow snapshotting of the OS, and easy lightweight container OS duplication

Scope

 * The ability to share /usr is especially useful for clusters and virtual machines.
 * The ability to mount /usr read-only (e.g. on read-only media) can add to the security of the machine.
 * The entire /usr can safely be snapshotted during upgrades.

How To Test
https://fedoraproject.org/wiki/Upgrading_Fedora_using_yum#Fedora_16_-.3E_Fedora_17
 * instructions on how to test via yum update:

-> see symbolic toplevel links: /lib -> usr/lib /lib64 -> usr/lib64 /sbin -> usr/sbin /bin -> usr/bin
 * install a fresh F17 with the usrmove feature included (not yet available)

Ensure that basic shell operations work.
 * 1) /sbin/ifconfig |/bin/grep -i ip

User Experience

 * fewer toplevel directories

Dependencies

 * initramfs (dracut)
 * changes in selinux policies
 * repackaging of packages with content in /bin, /sbin, /lib*
 * alternatives symlinks?
 * filesystem rpm, toplevel symlinks
 * anaconda update support https://bugzilla.redhat.com/show_bug.cgi?id=787893

Roadmap

 * Provide a dracut module to move all content from /bin, /sbin, /lib, /lib64 to /usr.
 * Add check to rpm+filesystem.rpm version 3, that refuses to install itself when /bin, /sbin, /lib, /lib64 is a directory. On new installation: create symlinks /bin -> usr/bin, /sbin -> usr/sbin, /lib -> usr/lib, /lib64 -> usr/lib64
 * Change the ~32 RPM packages which install conflicting files in /bin, /sbin, /lib, /lib64 to install only into /usr, and add Conflicts: filesystem < 3.
 * Change the SELinux policies.
 * Make sure dracut is able to mount needed filesystems specifies in /etc/fstab before starting systemd.

Mock Transition

 * Disable mock root cache in buildsystem
 * Step 0: update rpm on the mock system to at least:
 * RHEL-6: 4.8.0-19.el6.0.usrmove.1 (Repository until RHEL-6.3 is out)
 * F-15: rpm-4.9.1.2-3.fc15.3
 * F-16: rpm-4.9.1.2-4.fc16.1
 * F-17: rpm-4.9.1.2-8.fc17

acl attr db4 findutils gawk gettext gzip nss-softokn policycoreutils libdb coreutils filesystem util-linux bash udev systemd alsa-utils davfs2 ethtool fuse iputils isdn4k-utils libselinux nano nspr ncpfs ntfs-3g plymouth psacct rp-pppoe vim acl attr db4 findutils gawk gettext gzip libdb nss-softokn policycoreutils coreutils
 * Step 1: Build without filesystem conflict
 * Step 2: Build
 * Step 3: Build
 * Step 4: Build
 * Step 5: Build with filesystem conflict
 * Turn root cache back in buildsystem mock

Koji Transition

 * Step 0: update rpm on the koji systems to at least:
 * RHEL-6: 4.8.0-19.el6.0.usrmove.1 (Repository until RHEL-6.3 is out)
 * F-15: rpm-4.9.1.2-3.fc15.3
 * F-16: rpm-4.9.1.2-4.fc16.1
 * F-17: rpm-4.9.1.2-8.fc17

acl attr db4 findutils gawk gettext gzip nss-softokn policycoreutils libdb coreutils filesystem gcc (gcc may not compile after filesystem >= 3 is installed)
 * Step 1: build without tagging the packages in the koji repo


 * Step 2: tag all of the above packages in the koji repo

util-linux bash udev systemd alsa-utils davfs2 ethtool fuse iputils isdn4k-utils libselinux nano nspr ncpfs ntfs-3g plymouth psacct rp-pppoe vim
 * Step 3: build

Contingency Plan

 * We do not support to bootup with an empty /usr today, so moving things to /usr and have compat links in the rootfs should be low risk.

Documentation

 * This page is the primary source of documentation
 * Freedesktop page listing the benefits
 * Solaris (has the same model for ages)
 * Why /usr on a separate partition is broken currently
 * Discussion on fedora-devel
 * openSUSE status (wiki) page for the merging project
 * Proposal for openSUSE
 * Discussion on debian-devel
 * Discussion on gentoo-devel and several following threads about udev and /usr
 * Discussion on slashdot
 * Media
 * itworld.com
 * osnews.com
 * Heise
 * Heise - german only
 * Linux.com feature/blog

Release Notes
/bin -> usr/bin /sbin -> usr/sbin /lib -> usr/lib and for 64bit architectures /lib64 -> usr/lib64
 * With this release, packages will not install files anymore in the following directories: /bin /sbin /lib /lib64 and /usr/sbin.
 * Fresh installations of this release, will have the following symbolic links in the toplevel directory:

Steps to upgrade to Fedora 17 using yum directly: https://fedoraproject.org/wiki/Upgrading_Fedora_using_yum#Fedora_16_-.3E_Fedora_17

Comments and Discussion

 * See Talk:Features/UsrMove

What problem are you trying to solve?
We want to make /usr shareable in a sane way.

Additional benefits of this feature are:
 * less clutter across the filesystem
 * if you snapshot /usr before updating, you have snapshotted the OS at once.

What is currently broken with having /usr as a separate partition?
http://www.freedesktop.org/wiki/Software/systemd/separate-usr-is-broken

I don’t have /usr as a separate partition. What changes for me?
Nothing changes in functionality. All the old paths are reachable, because there a compat symlinks in place, which will not go away (at least not in the near future). All your scripts and binaries should work, like they did before. For the upgrade process to work, you will find /sbin, /bin, /lib and /lib64 mostly containing symbolic links. As soon, as these directories only contain symbolic links, the whole directory is replaced by only one symbolic link. These three or four toplevel symbolic links will stay there as long as the linux elf loader ABI is defined with “/lib/ld-linux.so.2” or their architecture specific counterpart like “/lib64/ld-linux-x86-64.so.2”, and as long as scripts use “#!/bin/sh”.

I have /usr as a separate partition. What changes for me?
Not sure, how you managed to do that. In general, having /usr as a separate partition does not really work right now. See http://www.freedesktop.org/wiki/Software/systemd/separate-usr-is-broken. But with this feature implemented, things will now come back to a sane and supported way of having a /usr mount point.

Why don’t you fix the /usr situation by putting all the relevant binaries in /bin /sbin /lib and /lib64?
Then you would share the toplevel directory hierarchy among all hosts. Hosts would need to mount /etc and /var for host-only versions. Especially /etc/fstab is not accessible, without adding information to the initramfs on how to mount it. For every host-only additional top level directory like /opt and /srv, you would have to have a mountpoint.

So, why don’t you just mount /usr from the initramfs and leave the files where they are?
Ok, so imagine you have a /usr mounted from a network location and you want to update a package. So maybe you mount the master copy of /usr on your master machine and update /usr with your package manager. Then you provide a new copy of the master /usr to the other machines, when they reboot. They all have the new updated /usr now. But what about /sbin /bin /lib and /lib64? They still have the old binaries. No glibc security update for them. So, every machine has to update these directories via rsync or such (rpm will not work with a readonly /usr). This doubles the maintenance to keep both parts of the system in sync.

You are doing it wrong! /bin and /sbin are there to rescue a broken /usr!
The most critical filesystem is /boot, because the kernel lives there. So the purpose of having /bin and /sbin for /usr repairing relied on _two_ working filesystems ( / and /boot). If either of them was broken, you were not able to rescue /usr. The role of the rescue system can easily be fulfilled by a rescue initramfs. So having the rescue initramfs in /boot, which contains the fsck utils, is in the same danger of becoming corrupted as the kernel. Now you only have to pull out your rescue CD, if /boot is corrupted and not if / is corrupted.

Then, let’s share /bin /sbin /lib /lib64 and /usr and mount them all from the initramfs!
Now, you get a feeling, that moving everything to /usr might make things easier....

Why don’t you move all /usr contents to / and forget about /usr?
Because this introduces a lot of new toplevel directories, which all have to be mount points then to be shared across other hosts.

Ok, but what about a root filesystem on the network and mounting local filesystems only?
Then you would share the toplevel directory hierarchy among all hosts. Hosts would need to mount /etc and /var for host-only versions. Especially /etc/fstab is not accessible, without adding information to the initramfs on how to mount it. For every host-only additional top level directory like /opt and /srv, you would have to have a mountpoint.