From Fedora Project Wiki

Revision as of 17:48, 29 January 2010 by Toshio (talk | contribs) (fix typo)

Warning.png
This page is a draft only
It is still under construction and content may change. Do not rely on the information on this page.

A parallel-installable Python 3 stack has been added to Fedora 13. This requires us to update the python guidelines with information on how to package both python2 and python3 modules. This draft is meant to encompass a sane way to package Python 3 modules, generalize our python packaging rules to support more than one python runtime, and update the python2 guidelines since some things haven't been necessary since Fedora 3.

See the feature page: https://fedoraproject.org/wiki/Features/Python3F13
and also this thread: https://www.redhat.com/archives/fedora-devel-list/2009-October/msg00054.html

Note.png
Except where noted, this page will replace the existing Python packaging guidelines

Addon Packages (python3 modules)

An rpm with a python prefix or suffix means a python2 rpm so we need a different prefix to denote python3 packages. For this, we use python3. We have two constraints that the python2 packages don't operate under:

  1. We need to be clear about these modules being for python3 so we don't have an exception for packages that already have "py" in their names like python2 modules.
  2. Consumers of the packages need to be able to find them even if they don't know whether they're using the python2 or python3 version.

So all python3 modules MUST have python3 in their name. Other than that, the module must be in the same format as the python2 package. Some examples:

Fedora python 2 package Upstream name Proposed python 3 package name
python-lxml lxml python3-lxml
pygtk2 pygtk python3-pygtk
gstreamer-python gst-python gstreamer-python3
gnome-python2 gnome-python gnome-python3
rpm-python (part of rpm) rpm-python3


Multiple Python Runtimes

In Fedora we have multiple python runtimes, one for each supported major release.

Each runtime corresponds to a binary of the form /usr/bin/python$MAJOR.$MINOR

One of these python runtimes is the "system runtime". It can be identified by the destination of the symlink /usr/bin/python. Currently this is /usr/bin/python-2.6

Note.png
Currently /usr/bin/python is actually a duplicate copy of the ELF file, rather than a symlink. This shouldn't cause any problems for packagers of python modules but we see this as [a bug] that needs fixing.

All python runtimes have a virtual provide for python(abi) = $MAJOR-$MINOR. For example, the python-3.1 runtime rpm has:

 $ rpm -q --provides python3 |grep -i abi
 python(abi) = 3.1

python modules using these runtimes should have a corresponding "Requires" line on the python runtime that they are used with. This is done automatically for files below /usr/lib[^/]*/python${PYVER}

Warning.png
The script /usr/lib/rpm/pythondeps.sh is what automatically emits "Requires" lines for files below /usr/lib[^/]*/python${PYVER}. The script needs reworking for python3. I've rewritten the script, but it isn't yet in our F13 rpm-build rpm. This is being tracked as [bug 532118].
Note.png
For Runtime Packagers
Unlike the Requires lines, the "Provides" for each runtime are manually entered into the specfile for each runtime. In theory /usr/lib/rpm/pythondeps.sh would also automatically generate "Provides" lines for the runtime, but in practice rpmbuild only invokes it for files in the rpm payload identified as "python" by the file utility, and the runtime is an ELF binary, not a python script, hence it isn't passed. It's simplest to manually supply the Provides line, rather than change these innards of rpmbuild. See [bug 532118].

BuildRequires

To build a package containing python2 files, you need to have

BuildRequires: python2-devel

Similarly, when building a package which ships python3 files, you need

BuildRequires: python3-devel

A package that has both python2 and python3 files will need to BuildRequire both.

Macros

In Fedora less than 12 and RHEL less than 5, python2 packages that install python modules need to define python_sitelib or python_sitearch macros that tell where to find the python directory that modules are installed in. This is not needed in Fedora 13 or with python3 modules as the macros are defined by rpm and the python3-devel package. To define those conditionally you can use this:

%if ! (0%{?fedora} > 12 || 0%{?rhel} > 5)
%{!?python_sitelib: %global python_sitelib %(%{__python} -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())")}
%{!?python_sitearch: %global python_sitearch %(%{__python} -c "from distutils.sysconfig import get_python_lib; print(get_python_lib(1))")}
%endif

Note that using %{!? [...]} does allow this to work without the check for fedora and rhel versions but putting the conditional in documents when we can remove the entire stanza from the spec file.

In Fedora 13 and greater, the following macros are defined for you:

Macro Normal Definition
__python /usr/bin/python
__python3 /usr/bin/python3
python_sitelib /usr/lib/python2.X/site-packages, where pure python2 modules are installed
python_sitearch /usr/lib64/python2.X/site-packages on x86_64
/usr/lib/python2.X/site-packages on x86, where python2 extension modules that are compiled C are installed
python3_sitelib /usr/lib/python3.X/site-packages, where pure python3 modules are installed
python3_sitearch /usr/lib64/python3.X/site-packages on x86_64
/usr/lib/python3.X/site-packages on x86, where python3 extension modules that are compiled C are installed

During %install or when listing %files you can use the python_sitearch and python_sitelib macros to specify where the installed modules are to be found. For instance:

%files
%defattr(-,root,root,-)
# A pure python2 module
%{python_sitelib}/foomodule/
# A compiled python2 extension module
%{python_sitearch}/barmodule/
# A compiled python3 extension module
%{python3_sitearch}/bazmodule/

Using the macros has several benefits.

  1. It ensures that the packages are installed correctly on multilib architectures.
  2. Using these macros instead of hardcoding the directory in the specfile ensures your spec remains compatible with the installed python version even if the directory structure changes radically (for instance, if python_sitelib moves into %{_datadir})

Byte Compiling

Warning.png
brp-python-bytecompile status
The script that does automatic byte compilation is currently broken in two ways. 1) It embeds the wrong path in the .pyc files which can show up in tracebacks in some circumstances. 2) On SyntaxError, the script stops which means that not all the files may be byte compiled (or byte compiled with both .pyc and .pyos). [bug]

When byte compiling a .py file, python embeds a magic number in the byte compiled files that correspond to the runtime. Files in {%python_sitelib} and %{python_sitearch} must correspond to the runtime for which they were built. For instance, a pure python module compiled for the 3.1 runtime needs to be below %{_usr}/lib/python3.1/site-packages

Normally, this is done for you by the brp-python-bytecompile script. This script runs after the %install section of the spec file has been processed and byte-compiles any .py files that it finds (this recompilation puts the proper filesystem paths into the modules otherwise tracebacks would include the %{BUILDROOT} in them). The script determines which interpreter to byte compile the module with by following these steps:

  1. what directory is the module installed in? If it's /usr/lib/pythonX.Y, then pythonX.Y is used to byte compile the module. If pythonX.Y is not installed, then an error is returned and the rpm build process will exit on an error so remember to BuildRequire the proper python package.
  2. the script interpreter defined in __python is used to compile the modules. This defaults to the latest python2 version on Fedora. If you need to compile this module for python3, set it to /usr/bin/python3 instead. Like this:
    %global __python %{__python3}
    

    This step is useful when you have a python3 application that's installing a private module into its own directory. For instance, if the foobar application installs a module for use only by the command line application in %{_datadir}/foobar. Since these files are not in one of the python3 library paths (like /usr/lib/python3.1) you have to set %{__python} manually to tell brp-python-bytecompile what python interpreter to byte compile for.

These settings are enough to properly byte compile any package that only builds python modules (in %{python_sitelib} or %{python_sitearch}) or builds for only a single python interpreter. However, if the application you're packaging needs to build with both python2 and python3 and install into a private module directory (perhaps because it provides one utility written in python2 and a second utility written in python3) then you need to do this manually. Here's a sample spec file snippet that shows what to do:

# Turn off the brp-python-bytecompile script
%global __os_install_post %(echo '%{__os_install_post}' | sed -e 's!/usr/lib[^[:space:]]*/brp-python-bytecompile!!g')
# Buildrequire both python2 and python3
BuildRequires: python-devel python3-devel
[...]

%install
# Installs a python2 private module into %{buildroot}%{_datadir}/mypackage/foo
# and installs a python3 private module into %{buildroot}%{_datadir}/mypackage/bar
make install DESTDIR=%{buildroot}

# Manually invoke the python byte compile macro for each path that needs byte
# compilation.
%{py_byte_compile} %{__python} %{buildroot}%{_datadir}/mypackage/foo
%{py_byte_compile} %{__python3} %{buildroot}%{_datadir}/mypackage/bar

Common SRPM vs split SRPMs

Many times when you package a python module you will want to create a module for python2 and a module for python3. There are two ways of doing this: either from a single SRPM or from multiple. The rule to choose which method is simple: if the python2 and python3 modules are distributed as a single tarball (many times as a single directory of source where the /usr/bin/2to3 program is used to transform the code from python2 to python3 at buildtime) then you must package it as a subpackage. If it comes in multiple tarballs then package it from multiple SRPMs.

Split/separate SRPMs: a src.rpm for python- and another for python3-

Given package python-foo in packaging CVS, there would be a separate python3-foo for the python 3 version. There would be no expectation that the two would need to upgrade in lock-step. (The two SRPMS could have different maintainers within Fedora: the packager of a python 2 module might not yet have any interest in python 3)

Example: python3-setuptools https://bugzilla.redhat.com/show_bug.cgi?id=531648 (simple adaptation of python-setuptools, apparently without needing an invocation of 2to3)

Advantages:

  • if the python-foo maintainer doesn't care about python 3, he/she doesn't need to
  • the two specfiles can evolve separately; if 2 and 3 need to have different versions, they can

Disadvantages:

  • the two specfiles have to be maintained separately
  • when upstream release e.g. security fixes, they have to be tracked in two places

Single shared SRPM emitting both python- and python3- subpackages

Method

  • Use the -n syntax to emit a python3-foo subpackage from a python-foo build.
  • Towards the end of the %prep phase, copy the code to a parallel subdirectory, and invoke 2to3 --write --nobackups . upon it
Note.png
Use "--write --nobackups" when invoking 2to3
You need the "--write" option to make 2to3 actually change the files, and "--nobackups" to avoid leaving foo.py.bak droppings, which otherwise would likely make it into the final package payload.
Idea.png
Run 2to3 on the correct directory
If your specfile runs 2to3 on the code, make sure you are running it on the full tree. A common mistake here for distutils packages has been to run it on the directory below setup.py, missing the setup.py file, leading to errors when python3 tries to execute setup.py

Examples:

Advantages:

  • single src.rpm and build; avoid having to update multiple packages when things change.

Disadvantages:

  • The Fedora maintainer needs to care about python 3. By adding python 3 to the mix, we're giving them extra work.
  • 2 and 3 versions are in lockstep. Requires upstream to case about Python 3 as well (or for Python 2, for that matter)
  • Bugzilla components are set up by source RPM, so they would have a single shared bugzilla component. This could be confusing to end-users, as it would be more difficult to figure out e.g. that a bug with python3-foo needs to be filed against python-foo. There's a similar problem with checking out package sources from CVS, though this is less serious as it doesn't affect end-users so much.

When should we have two split SRPMs vs one shared, and vice versa?

The easy case is when upstream release separate tarballs for the python 2 and python 3 versions of code. In that case, it makes sense to follow upstream and have separate specfiles, separate source rpms, etc.

The more difficult case is when the python module is emitted as part of the build of a larger module.

One case is for an extension module giving python bindings for a library built within the larger rpm. Some examples:

I believe the ideal here is to patch the code so that it will build against both python versions, then take a copy of the sources during the %prep phase, and configure one subdirectory to build against python 2, another to build against python 3.

Guidelines for adding python3 subpackages to an existing package

Provide a with_python3 conditional

All parts of the build relating to python3 should be conditionalized, to make it easy to turn off the python3 build when tracking down problems.

You should add this fragment to the top of the source file:

%if 0%{?fedora} > 12
%global with_python3 1
%endif

Rationale: we should consistently use "with_python3". The conditionals also make it easy to use the same spec for RHEL and other branches than devel.

Once python 3 support has been added to a package, you must leave it enabled. End users could be using the python3 subpackage that is being built and turning the subpackage build on and off will cause the package to unexpectedly disappear from the repos. You should only turn off with_python3 as a debugging measure within scratch builds, or for releases that do not support python 3.

All usage of this macro should look like this:

%if 0%{?with_python3}
...
%endif # with_python3

This way the code will be disabled if the macro is not defined, and it is easy to visually match if/endif pairs

Separate python 2 and python 3 build directories

The python 2 and python 3 build should be as independent as possible.

You should use the %{py3dir} macro to specify the location of the python 3 build directory, so that the python 2 sources (e.g. %{_builddir}/Foo-1.0/) are entirely independent from the python 3 sources (e.g. %{_builddir}/python3-foo-1.0-1.fc13/).

The %{py3dir} macro is defined for you in python3-devel (from 3.1.1-19.fc13 onwards) in /etc/rpm/macros.python3 as:

%py3dir %{_builddir}/python3-%{name}-%{version}-%{release}

so you should not define it yourself.

The %prep phase

The %prep phase of the build should prepare an entirely distinct source tree for the python3 build in the py3dir.

A recommended way to do this is to add this to the %prep section:

%prep

%if 0%{?with_python3}
rm -rf %{py3dir}
cp -a . %{py3dir}
%endif # with_python3

Make sure that you are copying the correct code. The above code assumes that you are within the top of the sources directory (typically with the "Foo-1.0" within the build). If the %prep has changed directory you will need to change back to the tarball location.

If your package requires you to apply some patches only to the python 2 build, and some patches only to the python 3 build, you should structure your %prep like this:

%setup

# Apply patches relevant to both python 2 and python 3:
%patch0
%patch1
...

# Create source tree for python3 build:
%if 0%{?with_python3}
cp -a . %{py3dir}
%endif # with_python3
Note.png
Avoid version specific patches
Since you have both a python2 and a python3 directory you might be tempted to patch each one separately. Resist! Upstream for your package has chosen to distribute a single source tree that builds for both python2 and python3. For your patches to get into upstream, you need to write patches that work with both as well.

rpmbuild resets the directory at the end of each phase, so you don't need to restore the directory at the end of %prep.

Other phases

For each of the %build, %check and %install phases, you should copy the existing code, wrapping it with a pushd/popd of %{py3dir}, and convert all macro references:

  • from %{__python} to %{__python3},
  • %{python_sitelib} to %{python3_sitelib} and
  • %{python_sitearch} to %{python3_sitearch}.

For example, this %build section:

CFLAGS="$RPM_OPT_FLAGS" %{__python} setup.py build

should become:

# Python 2:
CFLAGS="$RPM_OPT_FLAGS" %{__python} setup.py build

# Python 3:
%if 0%{?with_python3}
pushd %{py3dir}
CFLAGS="$RPM_OPT_FLAGS" %{__python3} setup.py build
popd
%endif # with_python3

so that the python 2 and python 3 versions of the code line up vertically, making it easier to see differences. The usage of pushd/popd commands will ensure that the directories are logged.

Rationale: it's not easily possible to turn this into a loop (FIXME: is it?) due to the macro differences, so we must unroll the loop and repeat ourselves.

Avoiding collisions between the python 2 and python 3 stacks

The python 2 and python 3 stacks are intended to be fully-installable in parallel. When generalizing the package for both python 2 and python 3, it is important to ensure that two different built packages do not attempt to place different payloads into the same path.

Executables in /usr/bin

The problem

Many existing python packages install executables into /usr/bin.

For example if we have a console_scripts in a setup.py shared between python 2 and python 3 builds: these will spit out files in /usr/bin/, and these will collide.

For example python-coverage has a setup.py that contains:

    entry_points = {
        'console_scripts': [
            'coverage = coverage:main',
            ]
        },

which thus generates a /usr/bin/coverage executable (this is a python script that runs another python script whilst generating code-coverage information on the latter).

Similarly for the 'scripts' clause; see e.g. python-pygments: Pygments-1.1.1/setup.py has:

    scripts = ['pygmentize'],

which generates a /usr/bin/pygmentize (this is a python script that leverages the pygments syntax-highlighting module, giving a simple command-line interface for generating syntax-highlighted files)

Guidelines

If the executables provide the same functionality independent of whether they are run on top of Python 2 or Python 3, then only one version of the executable should be packaged. Currently it will be the python 2 implementation, but once the Python 3 implementation is proven to work, the executable can be retired from the python 2 build and enabled in the python 3 package. Be sure to test the new implementation. FOR DISCUSSION: how do we do the transition period?

Examples of this:

  • /usr/bin/pygmentize ought to generate the same output regardless of whether it's implemented via Python 2 or Python 3, so only one version needs to be shipped.

If the executables provide different functionality for Python 2 and Python 3, then both versions should be packaged.

Examples of this:

  • /usr/bin/coverage runs a python script, augmenting the interpreter with code-coverage information. Given that the interpreter itself is the thing being worked with, it's reasonable to package both versions of the executable.
  • /usr/bin/bpython augments the interpreter with a "curses" interface. Again, it's reasonable to package both versions of this.
  • /usr/bin/easy_install installs a module into one of the Python runtimes: we need a version for each runtime.

As an exception, for the rpms that are part of a python runtime itself, we plan to package both versions of the executables, so that e.g. both the python 2 and python 3 versions of 2to3 are packaged.

Naming

Many executables already contain a "-MAJOR.MINOR" suffix, for example /usr/bin/easy_install-3.1. These obviously can be used as-is, as they won't conflict.

For other executables, the general rule is:

  • if only one executable is to be shipped, then it owns its own slot
  • if executables are to be shipped for both python 2 and python 3, then the python 3 version of the executable gains a python3- prefix. For example, the python 2 version of "coverage" remains /usr/bin/coverage and the python 3 version is /usr/bin/python3-coverage.

See this thread for a discussion of this.

Best Practices

Recommended best-practices for keeping python 2 and python 3 in sync:

  • when packaging a module for python 3, you should approach the python 2 package owners.
  • if separate maintainership for python 2 vs python 3 modules, you should request a watchbugzilla and watchcommit on each other's packages
  • complete any python 2 Merge Review before doing a python 3 version
  • add link to the python 2 Merge Review/Package Review to the python 3 Package Review
  • if you need to run 2to3 to fix code, use 2to3 to use the /usr/bin/2to3 from the python-tools rpm, rather than /usr/bin/python3-2to3 from the python3-tools rpm (rationale: 2to3 is the standard upstream name for this tool).
  • if 2to3 runs into a problem, please file a bug. Please try to isolate a minimal test case that reproduces the problem when doing so.
  • remember to test the built RPMs and verify that they actually work!
  • if you are requesting that a package switch from Python 2 to Python 3 for its Python implementation, please provide supporting material (e.g. a list of tests performed, and their outcome). Simply getting a package to build against Python 3 is no guarantee that the package's functionality still works.

Anti best Practices

Warning.png
You shouldn't rely on INSTALLED_FILES, as that will not list directories, which will need to be specified in the %files section as well. Using globs in the %files section is safer.

TODO

These items need to be addressed before the Guidelines can be brought to the Packaging Committee

  • Should get bug 532118 addressed so the Requires: python(abi) is automatically extracted. Then we can remove the warning from the #Multiple_Runtimes section. If it can't be done we'll need to tell people to manually specify "python(abi) = %{py3ver}" and define py3ver in the python3-devel package.
  • Must get this bug fixed so byte compilation does not stop midway through if there's a SyntaxError bug 558997
  • [done] brp_python_bytecompile updated
  • [done] py_byte_compile macro to do manual byte compilation is in python3-devel as of python3-devel-3.1.1-21.fc13
  • Approve the Naming Guidelines rewrite
  • [done] it may be possible to install a common py3dir definition within /etc/rpm/macros.python3; we are researching whether this is techically feasible. If it is, then that macro should simply be used.

Notes

  • we considered a with_python2 macro, but for now it's simpler to omit it
  • we considered supplying: __python2, python2_sitelib and python2_sitearch in addition to the existing macros and gradually shift to the former, but for now it's simplest to keep using the existing macros
  • re /usr/bin paths, we considered that the python 2 version could gain a python2- prefix, and have the main path becoming a symlink to the python2- version, but that's not necessary for now; we can look at that in the future
  • we will probably eventually need a retirement process for backing out python3 subpackages; I don't see that as a blocker for these guidelines (dmalcolm)
  • Dave Malcolm has written a tool to aid in the creation of split SRPM python3 packages. It generates a python3-foo.spec from a python-foo.spec; see http://dmalcolm.fedorapeople.org/python3-packaging/rpm2to3.py However, that tool was written in November 2009 and is not up-to-date with the latest version of these guidelines. (He does not regard finishing the tool as a blocker for having these guidelines approved).