From Fedora Project Wiki
Line 54: Line 54:


<!-- Expand on the summary, if appropriate.  A couple sentences suffices to explain the goal, but the more details you can provide the better. -->
<!-- Expand on the summary, if appropriate.  A couple sentences suffices to explain the goal, but the more details you can provide the better. -->
=== What is the Python bytecode cache ===
When Python code is interpreted, it is compiled to [https://docs.python.org/3/glossary.html#term-bytecode Python bytecode]. When a pure Python module is imported for the first time, the compiled bytecode is serialized and cached to a <code>.pyc</code> file located in the <code>__pycache__</code> directory next to the <code>.py</code> source. Subsequent imports use the cache directly, until it is invalidated (for example when the <code>.py</code> source is edited and its <code>mtime</code> stamp is bumped) -- at that point, the cache is updated. This behavior is explained in detail in [https://www.python.org/dev/peps/pep-3147/#python-behavior PEP 3147]. The invalidation is described in [https://www.python.org/dev/peps/pep-0552/ PEP 552].
Python can operate in 3 different optimization levels: 0, 1 and 2. By default, the optimization level is 0. When invoked with the [https://docs.python.org/3/using/cmdline.html#cmdoption-o <code>-O</code> command line option] optimization is set to 1, similarly with [https://docs.python.org/3/using/cmdline.html#cmdoption-oo <code>-OO</code>] it is 2. Bytecode cache for different optimization levels is saved with different filenames as described in [https://www.python.org/dev/peps/pep-0488/ PEP 488].
As an example, a Python module located at <code>/path/to/basename.py</code> will have bytecode cache files for CPython 3.9 stored as:
* <code>/path/to/__pycache__/basename.cpython-39.pyc</code> for the non-optimized bytecode
* <code>/path/to/__pycache__/basename.cpython-39.opt-1.pyc</code> for optimization level 1
* <code>/path/to/__pycache__/basename.cpython-39.opt-2.pyc</code> for optimization level 2
=== Python bytecode cache in RPM packages (status quo) ===
Pure Python modules shipped in RPM packages (and namely the ones shipped trough the {{package|python3-libs}} package) are located at paths not writable by regular user, under <code>/usr/lib(64)/python3.9/</code>, hence the bytecode cache is also located in such locations. To work around this problem, the bytecode cache is pre-compiled when RPM packages are built and {{package|python3-libs}} ships and owns the sources as well as the bytecode cache:
$ rpm -ql python3-libs
...
/usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-1.pyc
/usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-2.pyc
/usr/lib64/python3.9/__pycache__/ast.cpython-39.pyc
...
/usr/lib64/python3.9/ast.py
...
As a result, the package is quite big, essentially shipping all pure Python modules 4 times.
Depending of the module content, its bytecode cache files might be identical across optimization levels. For such cases, the files are hardlinked to reduce the bloat:
$ ls -1i /usr/lib64/python3.9/collections/__pycache__/abc.*pyc
8634 /usr/lib64/python3.9/collections/__pycache__/abc.cpython-39.opt-1.pyc
8634 /usr/lib64/python3.9/collections/__pycache__/abc.cpython-39.opt-2.pyc
8634 /usr/lib64/python3.9/collections/__pycache__/abc.cpython-39.pyc
This is however not possible for all the modules from {{package|python3-libs}}:
$ ls -1i /usr/lib64/python3.9/__pycache__/ast.*pyc
8438 /usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-1.pyc
8440 /usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-2.pyc
8441 /usr/lib64/python3.9/__pycache__/ast.cpython-39.pyc


== Feedback ==
== Feedback ==

Revision as of 12:35, 7 September 2020


Python: Optional Bytecode Cache

Summary

The Python standard library bytecode cache files (e.g. /usr/lib64/python3.9/.../__pycache__/*.pyc) will be moved from the Package-x-generic-16.pngpython3-libs package to three new optional subpackages (split by optimization level). The non-optimized bytecode cache will be recommended by Package-x-generic-16.pngpython3-libs and installed by default but removable. The bytecode cache optimization level 1 and 2 will not be recommended (and hence will not be installed by default) but will be installable. The default SELinux policy will be adapted not to audit AVC denials when the bytecode cache is created by Python on runtime. This will save 8.89 MiB disk space on default installations or 17.12 MiB on minimal installations (by opting-out from the recommended subpackage with non-optimized bytecode cache). When all three new packages are installed, the size will increase slightly over the status quo (by 4.5 MiB).

Owner

Current status

  • Targeted release: Fedora 34
  • Last updated: 2020-09-07
  • FESCo issue: <will be assigned by the Wrangler>
  • Tracker bug: <will be assigned by the Wrangler>
  • Release notes tracker: <will be assigned by the Wrangler>

Detailed Description

What is the Python bytecode cache

When Python code is interpreted, it is compiled to Python bytecode. When a pure Python module is imported for the first time, the compiled bytecode is serialized and cached to a .pyc file located in the __pycache__ directory next to the .py source. Subsequent imports use the cache directly, until it is invalidated (for example when the .py source is edited and its mtime stamp is bumped) -- at that point, the cache is updated. This behavior is explained in detail in PEP 3147. The invalidation is described in PEP 552.

Python can operate in 3 different optimization levels: 0, 1 and 2. By default, the optimization level is 0. When invoked with the -O command line option optimization is set to 1, similarly with -OO it is 2. Bytecode cache for different optimization levels is saved with different filenames as described in PEP 488.

As an example, a Python module located at /path/to/basename.py will have bytecode cache files for CPython 3.9 stored as:

  • /path/to/__pycache__/basename.cpython-39.pyc for the non-optimized bytecode
  • /path/to/__pycache__/basename.cpython-39.opt-1.pyc for optimization level 1
  • /path/to/__pycache__/basename.cpython-39.opt-2.pyc for optimization level 2

Python bytecode cache in RPM packages (status quo)

Pure Python modules shipped in RPM packages (and namely the ones shipped trough the Package-x-generic-16.pngpython3-libs package) are located at paths not writable by regular user, under /usr/lib(64)/python3.9/, hence the bytecode cache is also located in such locations. To work around this problem, the bytecode cache is pre-compiled when RPM packages are built and Package-x-generic-16.pngpython3-libs ships and owns the sources as well as the bytecode cache:

$ rpm -ql python3-libs
...
/usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-1.pyc
/usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-2.pyc
/usr/lib64/python3.9/__pycache__/ast.cpython-39.pyc
...
/usr/lib64/python3.9/ast.py
...

As a result, the package is quite big, essentially shipping all pure Python modules 4 times.

Depending of the module content, its bytecode cache files might be identical across optimization levels. For such cases, the files are hardlinked to reduce the bloat:

$ ls -1i /usr/lib64/python3.9/collections/__pycache__/abc.*pyc
8634 /usr/lib64/python3.9/collections/__pycache__/abc.cpython-39.opt-1.pyc
8634 /usr/lib64/python3.9/collections/__pycache__/abc.cpython-39.opt-2.pyc
8634 /usr/lib64/python3.9/collections/__pycache__/abc.cpython-39.pyc

This is however not possible for all the modules from Package-x-generic-16.pngpython3-libs:

$ ls -1i /usr/lib64/python3.9/__pycache__/ast.*pyc
8438 /usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-1.pyc
8440 /usr/lib64/python3.9/__pycache__/ast.cpython-39.opt-2.pyc
8441 /usr/lib64/python3.9/__pycache__/ast.cpython-39.pyc

Feedback

Benefit to Fedora

Scope

  • Proposal owners:
  • Other developers: N/A (not a System Wide Change)
  • Policies and guidelines: N/A (not a System Wide Change)
  • Trademark approval: N/A (not needed for this Change)
  • Alignment with Objectives:

Upgrade/compatibility impact

N/A (not a System Wide Change)

How To Test

N/A (not a System Wide Change)

User Experience

Dependencies

N/A (not a System Wide Change)

Contingency Plan

  • Contingency mechanism: (What to do? Who will do it?) N/A (not a System Wide Change)
  • Contingency deadline: N/A (not a System Wide Change)
  • Blocks release? N/A (not a System Wide Change), Yes/No
  • Blocks product? product

Documentation

N/A (not a System Wide Change)

Release Notes