From Fedora Project Wiki

< Changes

Revision as of 21:37, 9 August 2021 by Ngompa (talk | contribs) (Clarify this is not changing default behavior for now)

DNF/RPM Copy on Write enablement for all variants

Summary

RPM Copy on Write provides a better experience for Fedora Users as it reduces the amount of I/O and offsets CPU cost of package decompression. RPM Copy on Write uses reflinking capabilities in btrfs, which is the default filesystem starting from Fedora 33 for most variants. Note that this behavior is not being turned on by default for this Change.

Owners

Current status

  1. Changes to rpm: published in https://github.com/rpm-software-management/rpm/pull/1470
  2. Changes to librepo: published in https://github.com/rpm-software-management/librepo/pull/222
  3. New package dnf-plugin-cow: published in https://github.com/facebookincubator/dnf-plugin-cow New package bug #1919003
  • Targeted release: Fedora 35
  • Last updated: 2021-08-09
  • FESCo issue: #2534
  • Tracker bug: #1915976
  • Release notes tracker: #634

Detailed description

Installing and upgrading software packages is a standard part of managing the lifecycle of any operating system. For the entire lifecycle of Fedora, all software is packaged and distributed using the RPM file fomat. This proposal changes how software is downloaded and installed, leaving the distribution process unmodified. Note that this behavior is not being turned on by default for this Change, and thus is explicitly opt-in for now.

Current process

  1. Resolve packaging request into a list of packages and operations
  2. Download and verify new packages
  3. Install and/or upgrade packages sequentially using RPM files, decompressing, and writing a copy of the new files to storage.

New process

  1. Resolve packaging request into a list of packages and operations
  2. Download and decompress packages into a locally optimized rpm file
  3. Install and/or upgrade packages sequentially using RPM files, using reference linking (reflinking) to reuse data already on disk.

The outcome is intended to be the same, but the order of operations is different.

  1. Decompression happens inline with download. This has a positive effect on resource usage: downloads are typically limited by bandwidth. Decompression and writing the full data into a single file per rpm is essentially free. Additionally: if there is more than one download at a time, a multi-CPU system can be better utilized. All compression types supported in RPM work because this uses the rpm I/O functions.
  2. RPMs are cached on local storage between downloading and installation time as normal. This allows DNF to defer actual RPM installation to when all the RPM are available. This is unchanged.
  3. The file format for RPMs is different with Copy on Write. The headers are identical, but the payload is different. There is also a footer.
    1. Files are converted (“transcoded”) locally during download using /usr/bin/rpm2extents (part of rpm codebase). The format is not intended to be “portable” - i.e. copying the files from the cache is not supported.
    2. Regular RPMs use a compressed .cpio based payload. In contrast, extent based RPMs contain uncompressed data aligned to the fundamental page size of the architecture, e.g. 4KiB on x86_64. This alignment is required for FICLONERANGE to work. Only files are represented in the payload, other directory entries like symlinks, device nodes etc are constructed entirely from rpm header information. Files are referenced by their digest, so identical files are de-duplicated.
    3. The footer currently has three sections
      1. Table of original (rpm) file digests, used to validate the integrity of the download in dnf.
      2. Table of digest → offset used when actually installing files.
      3. Signature 8 bytes at the end of the file, used to differentiate between traditional RPMs and extent based.

Notes

  1. The headers are preserved bit for bit during transcoding. This preserves signatures. The signatures cover the main header blob, and the main header blob ensures the integrity of data in two ways:
    1. Each file with content has a digest. Originally this was md5, but today it’s usually sha256. In normal RPM this is only used to verify the integrity of files, e.g. rpm -V. With CoW we use this as a content key.
    2. There is/are one or two digests (PAYLOADDIGEST and PAYLOADDIGESTALT) covering the payload archive (compressed cpio). The header value is preserved, but transcoded RPMs do not preserve the original structure so RPM’s pre-installation verification (controlled by %_pkgverify_level) will fail. dnf-plugin-cow disables this check in dnf because it verifies the whole file digest which is captured during download/transcoding. The second one is likely used for delta rpm.
  2. This is untested, and possibly incompatible with delta RPM (drpm). The process for reconstructing an rpm to install from a delta is expensive from both a CPU and I/O perspective, while only providing marginal benefits on download size. It is expected that having delta rpm enabled (which is the default) will be handled gracefully.
  3. Disk space requirements are expected to be marginally higher than before: all new packages or updates will consume their installed size before installation instead of about half their size (regular rpms with payloads still cost space).
  4. rpm-plugin-reflink will fall back to simple file copying when the destination path is not on the same filesystem/subvolume. A common example is /boot and/or /boot/efi.
  5. The system will still work on other filesystem types, but will always fall back to simple copying. This is expected to be slightly slower than not enabling CoW because the source for copying will be the decompressed data.
  6. For systems that enable transparent filesystem compression: every file will continue to be decompressed from the original rpm, and then transparently re-compressed by the filesystem. There is no effective change here. There is a future project to investigate alternate distribution mechanics to provide parallel versions of file content pre-compressed in a filesystem specific format, reducing both CPU costs and I/O. It is expected that this will result in slightly higher network utilization because filesystem compression is purposely restricted to allow random I/O.
  7. Current implementation of dnf-plugin-cow is in Python, but it looks possible to implement this in libdnf instead which would make it work in packagekit.

Performance Metrics

Ballpark performance difference is about half the duration for file download+install time. A lot of rpms are very small, so it’s difficult to see/measure. Larger RPMs give much clearer signal.

(Actual numbers/charts will be supplied in Jan 2021)

update 2021-01-31

I've promised metrics "in Jan 2021". The astute reader will note that it's still "Jan 2021" (where the author is). I've been working on this topic for the last week. I have made progress, but I'm not happy with the results.

I'm shying away from using the rpm measure plugin. I had used it to measure CoW on CentOS 7 along with yum, but it's pretty complex to set up and consume. I'm looking for something simpler.

There are three kinds of test:

  1. Download+install: The normal use case when using 'dnf install/upgrade'
  2. Download only: e.g. downloading and caching a set of packages for offline update
  3. Install only: Using the data from the cache to install, reinstall or upgrade the base OS, or subordinate like a chroot or container.

Picking Fedora specifically: there is a trend towards offline updates. The use of CoW pushes decompression and writing of data up front, which should reduce the actual downtime of offline updates. Secondly, any time a package is used twice on the system, e.g. between the rootfs and the container - there is a change the same disk bits can be used. This is an implicit form of data de-duplication.

I'd like to set forth my expectations for CPU and I/O usage:

CPU

  1. Download+install: Aggregate usage should be roughtly the same.
  2. Download only: CPU usage for download should be much higher due to decompression. The bottleneck in downloads is typically network. If the network is not a bottleneck and parallel downloads are used, there is a potential to use multiple CPU cores for decompression. The source rpm's digest is calculated from network reads, and (soon) the headers+digests in rpm should be validated inline, instead of distinct passes.
  3. Install only: CPU usage should be much lower due to lack of decompression.

I/O

Note I want to explicitly exclude the I/O for the repo metadata in dnf, the libsolv caches, rpmdb, and the dnf history. These aren't in scope here. I've run tests with small and large numbers of packages to try to work out what the constant cost is each time.

  1. Download+install: Should see about 1/3 less bytes written for payload, owing to no compressed archive ever being written to disk.
  2. Download only: Data written during download is about 2x the old value because we're writing the uncompressed data instead of compressed data. Assuming 2x compression.
  3. Install only: Involves reflink + ftruncate per file (which could end up forking < pagesize data per file). This should be on the order of a few hundred bytes per file, but overall, should be close to negligable.

The results I've got so far confirm all the hypothesis except one: total bytes written in items 1 and 3 is much too high, and the wall clock/cpu time is too high too. I will continue to investigate this. It could be an error in my test methodology, or it could be a bug.

Here's the code I'm using to test: https://github.com/facebookincubator/dnf-plugin-cow/blob/main/tests/perf.sh

Here's some stats from running it in a Fedora 33 VM:

Original With CoW
[root@localhost dnf-plugin-cow]# tests/perf.sh split
Running as unit: run-u937.service
warning: /var/tmp/dnfcowperf.uqmuXa/var/cache/dnf/updates-0e22a1f5a0a34771/packages/fedora-release-33-3.noarch.rpm: Header V4 RSA/SHA256 Signature, key ID 9570ff31: NOKEY
Importing GPG key 0x9570FF31:
 Userid     : "Fedora (33) <fedora-33-primary@fedoraproject.org>"
 Fingerprint: 963A 2BEB 0200 9608 FE67 EA42 49FD 7749 9570 FF31
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-33-x86_64

real    0m22.974s
user    0m2.595s
sys     0m1.087s
Download usage:
8:0 rbytes=268.43MiB rios=4438.00 wios=1030.00 wbytes=45.48MiB

real    0m5.777s
user    0m3.835s
sys     0m0.989s
Install usage:
8:0 rbytes=281.07MiB rios=4760.00 wios=3010.00 wbytes=122.96MiB

real    0m0.373s
user    0m0.037s
sys     0m0.115s
remove jq using rpm:
8:0 rbytes=31.06MiB rios=683.00 wios=754.00 wbytes=24.16MiB
Preparing...                          ################################# [100%]
Updating / installing...
   1:jq-1.6-5.fc33                    ################################# [100%]

real    0m0.473s
user    0m0.079s
sys     0m0.188s
reinstall jq using rpm:
8:0 rbytes=34.01MiB rios=842.00 wios=297.00 wbytes=10.12MiB

real    0m0.754s
user    0m0.291s
sys     0m0.197s
remove jq:
8:0 rbytes=63.25MiB rios=1613.00 wios=443.00 wbytes=14.12MiB

real    0m2.799s
user    0m1.782s
sys     0m0.514s
reinstall jq using dnf:
8:0 rbytes=268.88MiB rios=4453.00 wios=1005.00 wbytes=43.54MiB
Finished with result: success
Main processes terminated with: code=exited/status=0
Service runtime: 2min 18.576s
CPU time consumed: 53.376s
IO bytes read: 868.0K
IO bytes written: 0B
[root@localhost dnf-plugin-cow]# tests/perf.sh split
Running as unit: run-u896.service
warning: /var/tmp/dnfcowperf.FUsLRb/var/cache/dnf/updates-0e22a1f5a0a34771/packages/fedora-release-33-3.noarch.rpm: Header V4 RSA/SHA256 Signature, key ID 9570ff31: NOKEY
Importing GPG key 0x9570FF31:
 Userid     : "Fedora (33) <fedora-33-primary@fedoraproject.org>"
 Fingerprint: 963A 2BEB 0200 9608 FE67 EA42 49FD 7749 9570 FF31
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-33-x86_64

real    0m14.710s
user    0m3.583s
sys     0m1.656s
Download usage:
8:0 rbytes=272.01MiB rios=4508.00 wios=1282.00 wbytes=85.25MiB

real    0m5.377s
user    0m3.723s
sys     0m0.985s
Install usage:
8:0 rbytes=312.30MiB rios=4914.00 wios=1905.00 wbytes=75.88MiB

real    0m0.241s
user    0m0.031s
sys     0m0.092s
remove jq using rpm:
8:0 rbytes=30.73MiB rios=679.00 wios=465.00 wbytes=13.23MiB
Preparing...                          ################################# [100%]
Updating / installing...
   1:jq-1.6-5.fc33                    ################################# [100%]

real    0m0.392s
user    0m0.067s
sys     0m0.152s
reinstall jq using rpm:
8:0 rbytes=37.41MiB rios=857.00 wios=283.00 wbytes=8.61MiB

real    0m0.754s
user    0m0.307s
sys     0m0.195s
remove jq:
8:0 rbytes=66.81MiB rios=1643.00 wios=387.00 wbytes=10.43MiB

real    0m2.775s
user    0m1.786s
sys     0m0.480s
reinstall jq using dnf:
8:0 rbytes=272.62MiB rios=4441.00 wios=791.00 wbytes=34.50MiB
Finished with result: success
Main processes terminated with: code=exited/status=0
Service runtime: 2min 4.078s
CPU time consumed: 58.115s
IO bytes read: 820.0K
IO bytes written: 0B

This is ongoing work. See #1922920

Terminology

  • Copy on Write (CoW) is a broad description of any technology that reduces or eliminates data duplication by sharing the data behind the scenes until one of the references makes changes. This has been a cornerstone technology in memory management in Unix systems. Here we are using it to specifically reference Copy on Write as supported in modern filesystems, e.g. btrfs, xfs and potentially others.
  • Reflink is the verb for duplicating stored data on a filesystem. See ioctl_ficlonerange(2) for the specific call we use on Linux
  • Extent (based RPMs) refers to how payload file data is stored in within an RPM. Normal RPMs simply contain a compressed CPIO archive. Extent based RPMs contain the raw data uncompressed, which can be referenced with reflink.

Feedback

Why not just integrate this in libdnf and add a knob in dnf.conf?

This is being looked into. It’s a good idea. It’s definitely more complex as there's fewer examples of the API in use, and no promise of API stability, but it should be manageable.

Where is the repo?

In review, should be out soon: see the dependencies link. This might also be superseded by the libdnf approach if that works out.

Does this work with XFS?

A simple test has confirmed this should work, but it's not been validated extensively.

How does this relate to ostree?

As the change owners understand it, ostree shares some concepts, like organizing file content by digest. The main difference is in how the data is shared. CoW uses reflinking which is file range based, and ostree uses hard links.

The hardlinking approach in rpm-ostree depends on either a completely read-only system, or the use of a layered filesystem like overlayfs. This is attractive for some workloads like containers (eg FCOS and Silverblue), but requires a fundamentally different management method. CoW aims to bring most of the benefits into the main Fedora distribution, transparently, save for the requirement of a CoW filesystem.

What happens if the fs doesn't support reflinking?

See Notes, 5

What about deltarpm?

See Notes, 2

What happens if one rsyncs the cache between machines?

There was feedback on the mailing list that some users may share or copy one host's dnf cache directory to another. There are endianness, and page size dependencies which are immaterial within a single host that could add complexity here. It’s worth noting that copying any program’s cache directory from one host to another is volatile, and to the author’s understanding: is not something explicitly supported in dnf. That said, for homogenous hardware types, sharing contents should work fine today.

What about verification?

See Notes, 1. Somewhat related, https://github.com/rpm-software-management/rpm/pull/1470#issuecomment-752335847 highlighted that transcoding in this version involves trusting decompression libraries and the mirror the rpm files are downloaded from.

Benefit to Fedora

Faster package installs and upgrades

Scope

  • Proposal owners:
    • Merge changes to rpm, librepo to enable capabilities
    • Add dnf-plugin-cow to available packages
    • Test days
    • Aid with documentation
  • Other developers:
    • rpm, librepo: review PRs as needed
  • Release engineering: https://pagure.io/releng/issue/9914
  • Policies and guidelines: N/A
  • Trademark approval: N/A

Upgrade/compatibility impact

None, RPM with CoW is not enabled by default.

Upgrades with keepcache in dnf.conf will be able to use existing packages, but it will not convert them. This only happens at download time.

If a system is configured to keep packages in the cache (keepcache in dnf.conf) and dnf-plugin-cow is removed then the packages will be unusable. Recommend dnf clean packages to resolve this.

How to test

Enable RPM with CoW with

$ sudo dnf install python3-dnf-plugin-cow
...
$ sudo dnf install hello
...
$ hello
Hello, world!

There should be no end user visible changes, except timing.

User experience

No anticipated user visible changes in this change proposal. This makes the feature available, but does not enable it by default.

Dependencies

  1. A copy-on-write filesystem; this Change is primarily targeting btrfs, but RPM with CoW should work with XFS as well (untested)
  2. Most package install paths and the dnf package cache on the same filesystem / subvolume.
  3. rpm with Copy on Write patch set: https://github.com/rpm-software-management/rpm/pull/1470
  4. librepo with transcoding support: https://github.com/rpm-software-management/librepo/pull/222
  5. dnf-plugin-reflink (a new package): https://github.com/facebookincubator/dnf-plugin-cow/

Contingency plan

  • Contingency mechanism: will not include PR patches if not merged upstream, skip dnf-plugin-cow
  • Contingency deadline: Final freeze
  • Blocks release? No
  • Blocks product? No

Documentation

Documentation will be available at https://github.com/facebookincubator/dnf-plugin-cow in the coming weeks

Release Notes

RPM with CoW is not enabled by default. To enable it:

$ sudo dnf install python3-dnf-plugin-cow