From Fedora Project Wiki
m (Add python-rpm-macros PR link)
 
(17 intermediate revisions by 2 users not shown)
Line 2: Line 2:


= Reproducible builds: Clamp build mtimes to $SOURCE_DATE_EPOCH <!-- The name of your change proposal --> =
= Reproducible builds: Clamp build mtimes to $SOURCE_DATE_EPOCH <!-- The name of your change proposal --> =
{{Change_Proposal_Banner}}


== Summary ==
== Summary ==
<!-- A sentence or two summarizing what this change is and what it will do. This information is used for the overall changeset summary page for each release. Note that motivation for the change should be in the Benefit to Fedora section below, and this part should answer the question "What?" rather than "Why?". -->
<!-- A sentence or two summarizing what this change is and what it will do. This information is used for the overall changeset summary page for each release. Note that motivation for the change should be in the Benefit to Fedora section below, and this part should answer the question "What?" rather than "Why?". -->
The `%clamp_mtime_to_source_date_epoch` RPM macro will be set to `1`. When an RPM package is built, mtimes of packaged files will be clamped to `$SOURCE_DATE_EPOCH` which is already set to the date of the latest `%changelog` entry. As a result, more RPM packages will be reproducible: The actual modification time of files that are e.g. modified in the `%prep` section will not be reflected in the RPM package.
The `%clamp_mtime_to_source_date_epoch` RPM macro will be set to `1`. When an RPM package is built, mtimes of packaged files will be clamped to `$SOURCE_DATE_EPOCH` which is already set to the date of the latest `%changelog` entry. As a result, more RPM packages will be reproducible: The actual modification time of files that are e.g. modified in the `%prep` section or built in the `%build` section will not be reflected in the resulting RPM packages. Files in RPM packages will have mtimes that are independent of the time of the actual build.


== Owner ==
== Owner ==
Line 22: Line 20:


== Current status ==
== Current status ==
[[Category:ChangePageIncomplete]]
[[Category:ChangeAcceptedF38]]
<!-- When your change proposal page is completed and ready for review and announcement -->
<!-- When your change proposal page is completed and ready for review and announcement -->
<!-- remove Category:ChangePageIncomplete and change it to Category:ChangeReadyForWrangler -->
<!-- remove Category:ChangePageIncomplete and change it to Category:ChangeReadyForWrangler -->
Line 29: Line 27:


<!-- Select proper category, default is Self Contained Change -->
<!-- Select proper category, default is Self Contained Change -->
[[Category:SelfContainedChange]]
<!-- [[Category:SelfContainedChange]] -->
<!-- [[Category:SystemWideChange]] -->
[[Category:SystemWideChange]]


* Targeted release: [https://docs.fedoraproject.org/en-US/releases/f38/ Fedora Linux 38]
* Targeted release: [https://docs.fedoraproject.org/en-US/releases/f38/ Fedora Linux 38]
Line 40: Line 38:
ON_QA -> change is fully code complete
ON_QA -> change is fully code complete
-->
-->
* FESCo issue: <will be assigned by the Wrangler>
* [https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/MWKWFO52KTOGVGOEUDZT7YBOON2G5A2K/ devel thread]
* Tracker bug: <will be assigned by the Wrangler>
* FESCo issue: [https://pagure.io/fesco/issue/2899 #2899]
* Release notes tracker: <will be assigned by the Wrangler>
* Tracker bug: [https://bugzilla.redhat.com/show_bug.cgi?id=2149310 #2149310]
* Release notes tracker: [https://pagure.io/fedora-docs/release-notes/issue/928 #928]


== Detailed Description ==
== Detailed Description ==
Line 51: Line 50:
Patching is not necessary to make this happen. When a source file is compiled into a binary file, the modification datetime is also set to the datetime of the build. In practice, the modification datetime of many files packaged in RPM packages is dependent on when the package was actually built.
Patching is not necessary to make this happen. When a source file is compiled into a binary file, the modification datetime is also set to the datetime of the build. In practice, the modification datetime of many files packaged in RPM packages is dependent on when the package was actually built.


To eliminate this problem, we propose to clamp build mtimes to `$SOURCE_DATE_EPOCH`. RPM build in Fedora already sets the `$SOURCE_DATE_EPOCH` environment variable based on the latest `%changelog` entry because the `%source_date_epoch_from_changelog` macro is set to `1`. We will also set the `%clamp_mtime_to_source_date_epoch` macro to `1`. As a result, when files are packaged to the RPM package, their modification datetimes are clamped to `$SOURCE_DATE_EPOCH` (to the latest changelog entry datetime). Clamping means that all files which would have a modification datetime higher than `$SOURCE_DATE_EPOCH` will have the modification datetime changed to `$SOURCE_DATE_EPOCH`; files with mtime lower (or equal) to `$SOURCE_DATE_EPOCH` will retain the original mtimes.
To eliminate this problem, we propose to clamp build mtimes to `$SOURCE_DATE_EPOCH`. RPM build in Fedora already sets the `$SOURCE_DATE_EPOCH` environment variable based on the latest `%changelog` entry because the `%source_date_epoch_from_changelog` macro is set to `1`. We will also set the `%clamp_mtime_to_source_date_epoch` macro to `1`. As a result, when files are packaged to the RPM package, their modification datetimes will be clamped to `$SOURCE_DATE_EPOCH` (to the latest changelog entry datetime). Clamping means that all files which would otherwise have a modification datetime higher than `$SOURCE_DATE_EPOCH` will have the modification datetime changed to `$SOURCE_DATE_EPOCH`; files with mtime lower (or equal) to `$SOURCE_DATE_EPOCH` will retain the original mtimes.


This functionality is already implemented in RPM. We will enable it by setting `%clamp_mtime_to_source_date_epoch` to `1`.
This functionality is already implemented in RPM. We will enable it by setting `%clamp_mtime_to_source_date_epoch` to `1`.
Line 62: Line 61:


When Python bytecode cache (a `.pyc` file) is built, the mtime of the corresponding Python source file (`.py`) is included in it for invalidation purposes. Since the `.pyc` file is created before RPM clamps the mtime of the `.py` file, the mtime stored in the `.pyc` file might be higher than the corresponding mtime of the `.py` file.
When Python bytecode cache (a `.pyc` file) is built, the mtime of the corresponding Python source file (`.py`) is included in it for invalidation purposes. Since the `.pyc` file is created before RPM clamps the mtime of the `.py` file, the mtime stored in the `.pyc` file might be higher than the corresponding mtime of the `.py` file.
With the previous example, if `skynet` is written in Python:
# `skynet.py` is modified in `%prep` and hence has mtime set to the time of the build
# `skynet.pyc` is generated in `%install` and the mtime of `skynet.py` is saved in it
# RPM clamps the mtime of `skynet.py`
# `skynet.pyc` is considered invalid by Python on runtime, as the stored and actual mtime of `skynet.py` don't match


To solve this, we will modify Python to clamp the stored mtime to `$SOURCE_DATE_EPOCH` as well (when building RPM packages). Upstream Python chooses to invalidate bytecode cache based on hashes instead of mtimes when `$SOURCE_DATE_EPOCH` is set, but that could cause performance issues for big files, so Fedora's Python already deviates from upstream behavior when building RPM packages. To avoid accidentally breaking the behavior when `%clamp_mtime_to_source_date_epoch` is not set to `1`, RPM macros and buildroot policy scripts for creating the Python bytecode cache will be modified to unset `$SOURCE_DATE_EPOCH` when `%clamp_mtime_to_source_date_epoch` is not set to `1`.
To solve this, we will modify Python to clamp the stored mtime to `$SOURCE_DATE_EPOCH` as well (when building RPM packages). Upstream Python chooses to invalidate bytecode cache based on hashes instead of mtimes when `$SOURCE_DATE_EPOCH` is set, but that could cause performance issues for big files, so Fedora's Python already deviates from upstream behavior when building RPM packages. To avoid accidentally breaking the behavior when `%clamp_mtime_to_source_date_epoch` is not set to `1`, RPM macros and buildroot policy scripts for creating the Python bytecode cache will be modified to unset `$SOURCE_DATE_EPOCH` when `%clamp_mtime_to_source_date_epoch` is not set to `1`.
Line 112: Line 117:
== Scope ==
== Scope ==
* Proposal owners:
* Proposal owners:
** Propose a PR for {{package|redhat-rpm-config}} (set `%clamp_mtime_to_source_date_epoch` to `1`)
** Propose a PR for {{package|redhat-rpm-config}} (set `%clamp_mtime_to_source_date_epoch` to `1`, possibly only when `%source_date_epoch_from_changelog` is set)
** Propose a PR for {{package|python-rpm-macros}} (unset `$SOURCE_DATE_EPOCH` while creating `.pyc` files iff `%clamp_mtime_to_source_date_epoch` is not `1`)
** Propose a PR for {{package|python-rpm-macros}} (unset `$SOURCE_DATE_EPOCH` while creating `.pyc` files iff `%clamp_mtime_to_source_date_epoch` is not `1`)
*** https://src.fedoraproject.org/rpms/python-rpm-macros/pull-request/154
** Propose a PR for [https://src.fedoraproject.org/rpms/python3.11/blob/b2d80045f9/f/00328-pyc-timestamp-invalidation-mode.patch the Python's bytecode invalidation mode patch] for all Python versions that have it
** Propose a PR for [https://src.fedoraproject.org/rpms/python3.11/blob/b2d80045f9/f/00328-pyc-timestamp-invalidation-mode.patch the Python's bytecode invalidation mode patch] for all Python versions that have it
** Backport (the new portion of) the patch to older Pythons ({{package|python2.7}}, {{package|python3.6}} and PyPys)
** Backport (the new portion of) the patch to older Pythons ({{package|python2.7}}, {{package|python3.6}} and PyPys)
Line 141: Line 147:


<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
 
Nothing anticipated.


== How To Test ==
== How To Test ==
Line 163: Line 169:


Other packages can test by building their packages and verifying they still work as expected and no packaged files have higher mtimes than the last `%changelog` entry.
Other packages can test by building their packages and verifying they still work as expected and no packaged files have higher mtimes than the last `%changelog` entry.
To verify if this change has landed, run: `rpm --eval '%clamp_mtime_to_source_date_epoch'` on Fedora 38. The result should be `1`.


== User Experience ==
== User Experience ==
Line 175: Line 183:
  - Green has been scientifically proven to be the most relaxing color. The move to a default background color of green with green text will result in Fedora users being the most relaxed users of any operating system.
  - Green has been scientifically proven to be the most relaxing color. The move to a default background color of green with green text will result in Fedora users being the most relaxed users of any operating system.
-->
-->
Users of Fedora Linux on their machines should not be impacted at all. Users who build RPM packages atop Fedora will be impacted by this change the same way Fedora is.


== Dependencies ==
== Dependencies ==
Line 180: Line 189:


<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
 
* RPM needs to support this (it already does)
* RPM needs to set `$SOURCE_DATE_EPOCH` (it already does)


== Contingency Plan ==
== Contingency Plan ==


<!-- If you cannot complete your feature by the final development freeze, what is the backup plan?  This might be as simple as "Revert the shipped configuration".  Or it might not (e.g. rebuilding a number of dependent packages).  If you feature is not completed in time we want to assure others that other parts of Fedora will not be in jeopardy.  -->
<!-- If you cannot complete your feature by the final development freeze, what is the backup plan?  This might be as simple as "Revert the shipped configuration".  Or it might not (e.g. rebuilding a number of dependent packages).  If you feature is not completed in time we want to assure others that other parts of Fedora will not be in jeopardy.  -->
* Contingency mechanism: (What to do?  Who will do it?) N/A (not a System Wide Change)  <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* Contingency mechanism: The change owners or {{package|redhat-rpm-config}} maintainers or proven packagers will revert the change in {{package|redhat-rpm-config}}. That should be enough to undo anything as the changes in Python should be dependent on that. If not enough, revert everything. <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- When is the last time the contingency mechanism can be put in place?  This will typically be the beta freeze. -->
<!-- When is the last time the contingency mechanism can be put in place?  This will typically be the beta freeze. -->
* Contingency deadline: N/A (not a System Wide Change) <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* Contingency deadline: Ideally, we should do this before the Mass Rebuild. Technically, we can land it any time before the Beta Freeze, but it would not change all the packages, which is a bit messy. <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- Does finishing this feature block the release, or can we ship with the feature in incomplete state? -->
<!-- Does finishing this feature block the release, or can we ship with the feature in incomplete state? -->
* Blocks release? N/A (not a System Wide Change), Yes/No <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
* Blocks release? No <!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
 


== Documentation ==
== Documentation ==
Line 196: Line 205:


<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
<!-- REQUIRED FOR SYSTEM WIDE CHANGES -->
N/A (not a System Wide Change)
This page is the documentation.


== Release Notes ==
== Release Notes ==

Latest revision as of 10:15, 12 December 2022


Reproducible builds: Clamp build mtimes to $SOURCE_DATE_EPOCH

Summary

The %clamp_mtime_to_source_date_epoch RPM macro will be set to 1. When an RPM package is built, mtimes of packaged files will be clamped to $SOURCE_DATE_EPOCH which is already set to the date of the latest %changelog entry. As a result, more RPM packages will be reproducible: The actual modification time of files that are e.g. modified in the %prep section or built in the %build section will not be reflected in the resulting RPM packages. Files in RPM packages will have mtimes that are independent of the time of the actual build.

Owner

Current status

Detailed Description

This change exists to make RPM package builds more reproducible. A common problem that prevents build reproducibility is the mtime (modification times) of the packaged files.

Suppose we package an RPM package of software called skynet in version 1.0. Upstream released this version at datetime A. A Fedora packager creates the RPM package at datetime B. Unfortunately, the packager needs to patch the sources in the RPM %prep section. When the build runs at datetime C, the modification datetime of the patched file is set to C. When the build runs again in an otherwise identical environment at datetime D, the modification datetime of the patched file is set to D. As a result, the build is not bit-by-bit reproducible, because the datetime of the build is saved in the resulting package. Patching is not necessary to make this happen. When a source file is compiled into a binary file, the modification datetime is also set to the datetime of the build. In practice, the modification datetime of many files packaged in RPM packages is dependent on when the package was actually built.

To eliminate this problem, we propose to clamp build mtimes to $SOURCE_DATE_EPOCH. RPM build in Fedora already sets the $SOURCE_DATE_EPOCH environment variable based on the latest %changelog entry because the %source_date_epoch_from_changelog macro is set to 1. We will also set the %clamp_mtime_to_source_date_epoch macro to 1. As a result, when files are packaged to the RPM package, their modification datetimes will be clamped to $SOURCE_DATE_EPOCH (to the latest changelog entry datetime). Clamping means that all files which would otherwise have a modification datetime higher than $SOURCE_DATE_EPOCH will have the modification datetime changed to $SOURCE_DATE_EPOCH; files with mtime lower (or equal) to $SOURCE_DATE_EPOCH will retain the original mtimes.

This functionality is already implemented in RPM. We will enable it by setting %clamp_mtime_to_source_date_epoch to 1.

Non-goal

We do not aim to make all Fedora packages reproducible (at least not as part of this change proposal). We just eliminate one problem that we consider the biggest blocker for reproducible builds.

Python bytecode

When Python bytecode cache (a .pyc file) is built, the mtime of the corresponding Python source file (.py) is included in it for invalidation purposes. Since the .pyc file is created before RPM clamps the mtime of the .py file, the mtime stored in the .pyc file might be higher than the corresponding mtime of the .py file.

With the previous example, if skynet is written in Python:

  1. skynet.py is modified in %prep and hence has mtime set to the time of the build
  2. skynet.pyc is generated in %install and the mtime of skynet.py is saved in it
  3. RPM clamps the mtime of skynet.py
  4. skynet.pyc is considered invalid by Python on runtime, as the stored and actual mtime of skynet.py don't match

To solve this, we will modify Python to clamp the stored mtime to $SOURCE_DATE_EPOCH as well (when building RPM packages). Upstream Python chooses to invalidate bytecode cache based on hashes instead of mtimes when $SOURCE_DATE_EPOCH is set, but that could cause performance issues for big files, so Fedora's Python already deviates from upstream behavior when building RPM packages. To avoid accidentally breaking the behavior when %clamp_mtime_to_source_date_epoch is not set to 1, RPM macros and buildroot policy scripts for creating the Python bytecode cache will be modified to unset $SOURCE_DATE_EPOCH when %clamp_mtime_to_source_date_epoch is not set to 1.

This behavior might be proposed upstream if it turns out to be superior to the current upstream choice, in case we won't redesign the bytecode-source relationship entirely instead.

Opting out

Packages broken by this new behavior can unset %clamp_mtime_to_source_date_epoch but packagers are encouraged to fix the problem instead.

Feedback

Enabling this RPM feature was proposed as a pull request to Package-x-generic-16.pngredhat-rpm-config in April 2021. It received good feedback with the exception of the following:

  • it was said the change needs to be coordinated with the Python maintainers
  • it was said the change should be done via a change process for better coordination and exposure

We believe that by proposing this via the change process and planning for the changes needed in Python, both issues are addressed.

Benefit to Fedora

We believe that many RPM packages will become reproducible and others will be more reproducible than before. The benefits of reproducible builds are better explained at https://reproducible-builds.org/

Scope

  • Other developers:
    • Test their packages with the new behavior, report problems, and opt-out if really needed.
  • Release engineering: N/A (not needed for this Change)
  • Policies and guidelines: N/A (not needed for this Change)
  • Trademark approval: N/A (not needed for this Change)
  • Alignment with Objectives: N/A (not needed for this Change)

Upgrade/compatibility impact

Nothing anticipated.

How To Test

The change owners plan to perform a mass rebuild in Copr to see if this breaks anything significantly. If it actually works as anticipated, they also plan to run some reproducibility tests and hopefully produce some statistics before and after this change.

Other packages can test by building their packages and verifying they still work as expected and no packaged files have higher mtimes than the last %changelog entry.

To verify if this change has landed, run: rpm --eval '%clamp_mtime_to_source_date_epoch' on Fedora 38. The result should be 1.

User Experience

Users of Fedora Linux on their machines should not be impacted at all. Users who build RPM packages atop Fedora will be impacted by this change the same way Fedora is.

Dependencies

  • RPM needs to support this (it already does)
  • RPM needs to set $SOURCE_DATE_EPOCH (it already does)

Contingency Plan

  • Contingency mechanism: The change owners or Package-x-generic-16.pngredhat-rpm-config maintainers or proven packagers will revert the change in Package-x-generic-16.pngredhat-rpm-config. That should be enough to undo anything as the changes in Python should be dependent on that. If not enough, revert everything.
  • Contingency deadline: Ideally, we should do this before the Mass Rebuild. Technically, we can land it any time before the Beta Freeze, but it would not change all the packages, which is a bit messy.
  • Blocks release? No

Documentation

This page is the documentation.

Release Notes