From Fedora Project Wiki

Revision as of 15:57, 8 May 2024 by Amoloney (talk | contribs) (adding tracker bug)

Reproducible Package Builds

Summary

A post-build cleanup is integrated into the RPM build process so that common causes of build irreproducibility in packages are removed, making most of Fedora packages reproducible.

Owner

  • Email: dcavalca@fedoraproject.org
  • Email: neil at shrug.pw
  • Email: mhroncok at redhat.com
  • Email: zbyszek at in.waw.pl

Current status

Detailed Description

As of 2023 there is an active effort to implement Reproducible builds in Fedora. Reproducible builds will allow our users to be able to independently verify that the RPMs have not been tampered with (either maliciously or via hardware/software fault): someone can do an independent rebuild of a package and confirm that they get identical binaries when building with the same versions of the compiler and other tools. This Change allows us to move forward in this direction by removing the common sources of irreproducibility.

add-determinism is a Rust program which, as its name suggests, adds determinism to files that are given as input by attempting to standardize metadata contained in binary or source files to ensure consistency and clamping to $SOURCE_DATE_EPOCH in all instances. add-determinism is the "Fedora version" of strip-nondeterminism from the Debian project. Since strip-nondeterminism is written in perl, it is undesirable for use in Fedora, as we don't want to pull perl in the buildroot for every package.

It's worth noting that this Change does not intend to impose any specific reproducibility requirements on Fedora packages. Once this Change is implemented and we have been through a mass rebuild and can verify that the common causes of irreproducibility have indeed been removed, we can consider further steps. But that will be at least one release later.

This change does add a small amount of time to the processing of RPMs at the end of a build. Accordingly, packages containing large quantities or sizes of files be slower, but this effect is not expected to be noticeable. add-determinism takes steps to ensure it does not interfere with other buildroot post processors like mangle-shebangs, python-hardlink, python-bytecompile. It defaults to not doing any modifications in case it doesn't understand the input file or there are any other problems.

add-determinism uses Python marshalparser module for pyc files and links to libpython3.xx.so. This functionality will be made optional, so that the dependencies are only pulled in when python3 is already installed in the buildroot.

A mechanism to opt-out will be provided: to either completely disable the postprocessing step or to disable specific "handlers" (i.e. implementations of cleanup for specific file types, for example static archives). See macros.build-reproducibility.

Related Changes

Feedback

Benefit to Fedora

Adding determinism (i.e., removing non-determinsim) enables the Fedora community to have confidence that, if given the same source code, build environment, build instructions, and metadata from the build artifacts, any party can recreate copies of the artifacts that are identical except for the signatures and some parts of metadata.

Reproducibility of builds leads to packages of higher quality. It turns out that quite often those irreproducible bits are caused by an error or sloppiness in the code. In particular, any dependence on architecture in noarch packages is almost always unwanted and/or a bug. Test builds that check reproducibility will expose such instances.

Reproducibility of builds makes it easier to develop packages: when a small change is made and a package is rebuilt (in the same environment), then with a reproducible package, the only difference is directly caused by the change. If the package is different every time it is rebuilt, making a comparison is much harder.

Build reproducibility for noarch subpackages solves the problem where package builds on different architectures are different, causing mock to reject the whole build. In particular, this issue occurs for pyc files. This will now be solved without requiring opt-in from individual packages.


Scope

  • Proposal Owners:
    • Integrate add-determinism as a BuildRoot Policy script
    • Add a dependency on marshalparser to python3 (probably conditionalized on rpm-build)
  • Other Developers:
    • Test their packages with the additional phase, report problems
    • Potentially integrate changes to packages to enable reproducibility
  • Release Engineering: Ideally we want this to happen before the mass rebuild, but that is not strictly required.
  • Policies and Guidelines: Fedora Packaging Guidelines should be updated to include information on the add-determinism BuildRoot Policy. User documentation should be amended to include instructions on how to verify reproducibility for a given package, and what packages are known to be non-reproducible, and how to opt-out.
  • Trademark approval: N/A (not needed for this Change)
  • Alignment with Community Initiatives: All software and requests are consistent with the decision process and similar across other groups in Fedora. The Fedora Reproducibility Working group begin at Flock 2023 in Cork.

Upgrade/compatibility impact

No impact is expected.

How To Test

To test on the level of individual files:

  • install add-determinism
  • call SOURCE_DATE_EPOCH=… add-determinism -v ./path/to/file

To test package builds:

(This can be done on a normal system or in a mock chroot.)

User Experience

No impact is expected.

Dependencies

Contingency Plan

  • Contingency mechanism:
    • In case of major problems, disable the change in redhat-rpm-config.
    • In case of problems with specific packages, opt-out by setting a macro.
  • Contingency deadline: No limit really.
  • Blocks release? No.

Documentation

Release Notes

Fedora package builds are now more deterministic, bringing the distribution closer to the goal of achieving fully reproducible builds for all of its packages.