At Flock 2013 in Charleston, SC we met to discuss various ways in which the Python Guidelines should be updated in light of the changes happening to upstream packaging standards, tools, and the increasing push to use python3. These are the notes from that discussion.
- 1 Wheel: the new upstream distribution format
- 2 Shebang lines
- 3 Parallel Python2 and Python3 stack
- 4 Naming of python modules and subpackages
- 5 pypy
- 6 Tangent: SCL - Collections
Wheel: the new upstream distribution format
Wheels have more metadata so it becomes more feasible to automatically generate spec files given upstream spec files. In Fedora we'd use wheels like this:
- Use the tarball from pypi, not the wheel.
- In %prep, unpack the tarball
- In %build create a wheel with something like
pip wheel --no-deps $(pwd).
- This may create a .whl file or an unpacked wheel. Either one can be used in the next step
- In %install, use something like
pip install wheel --installdirto install the wheel. It gets installed onto the system in different FHS compliant dirs:
- These dirs are encoded in a pip (or python3 stdlib) config file.
Installing wheels creates a "metadata store" (distinfo directory) so we would want to install using the wheel package that we build so that this directory is fully installed. This way pip knows about everything that's installed via system packages.
- setup.py install => will only play nice with the distinfo data in certain cases. So most of the time we want to convert to wheel building.
* If the package can't be built as a wheel then distinfo will be created if: * setuptools is used in setup.py to build if a special command line flag is used. * if it's not then it likely will not.
- pip always uses setuptools to install (even if distutils is used in the setup.py) so it will always create distinfo metadata.
- With pip wheel we can use a single directory. No need to copy to a second directory anymore.
- pip wheel (build) will clean the build artifcats automatically.
- We will no longer need egginfo files and dirs (if distinfo is installed)
pip-1.5 due out by end of year (?Not sure why this was important... it brought a new feature but I don't remember what that was?)
Upgrading to Metadata 2.0 will be an automatic thing if we build and install from wheels. METADATA-2.0 will be able to answer "This package installs these python modules". The timeframe for this is pip-1.6 which is due out the middle of next year. (Hopefully f22).
pyp2rpm from slavek may be able to use Metadata 2.0 to generate nearly complete
Should we depend on both pip and setuptools explicitly?
In guidelines BR both because upstream pip may make this an optional feature and we may or may not put that requirement into pip.
Metadata 2.0 for non-wheels
For automake and other ways of creating packages; we want to install distinfo directory. Currently, the upstream may be generating and installing egg-info. If so, this could just be updated to provide distinfo instead. If the upstream doesn't provide egg-info now, we aren't losing anything by not generating distinfo (ie: things that didn't work before (because they lacked metadata) will simply continue not to work).
It might be nice to get generation of the metadata into upstream automake itself but someone would have to commit to doing that. We probably don't need to get generation of wheels into upstream automake because wheels are a distribution format, not an install format.
Agree that we want to convert shebang lines to /usr/bin/python2 and /usr/bin/python3 (and drop usage of /usr/bin/python).
FPC ticket has been opened already -- hashing out an implementation on that ticket. Something that may help is checking that the shebang line on pip itself is /usr/bin/python2... if we change that to /usr/bin/python2 it should affect everything that it installs (Need to check this)
- May need to use some pip command line option to have scripts installed the setup.py script target install (?not sure what this note was meant to mean?)
Parallel Python2 and Python3 stack
Notes to packagers who need to port
Packagers can help upstreams port their code to python3. Here are some hints to help them:
from __future__ import unicode_literals is almost certainly a bad thing for several reasons:
- Some things should be the native string type. Attribute names on objects, for instance.
- If you are in the frame of mind that you are reading python2 code, then you may be surprised when a bare literal string returns unicode. The
from __future__ import unicode_literalsoccurs at the top of the file while the strings themselves are spread throughout. When you get a traceback and go to look at the code you will almost certainly jump down to the line the traceback is on and may well miss the unicode_literals line at the top.
Some programs and command line switches help migrate:
- python-unicodenazi package provides a module that will help catch mixing byte str and unicode string. These mixtures are almost certianly illegal in python3.
- python2 -b -- turns off automatic conversion of byte str and unicode string so that you get a warning or an error when you mix bytes and unicode strings.
- python-modernize -- attempts to convert your code to a subset of python2 that runs on python3.
- 2to3 -- (when run in non-overwrite mode, it will simply tell you what things need to be changed).
Python3 by default
We decided on the mailing lists to switch over when PEP394 changes its recommendation. 2015 is the earliest that upstream is likely to change this and it may be later depending on what the ecosystem of python2 and python3 looks like at that time.
To get ready for that eventuality, we need to change shebang lines from /usr/bin/python to /usr/bin/python2. Since moving to pip as the means to install this, we should audit these after the pip migration and change any of these that the pip conversion did not take care of.
We also discussed whether to convert scripts from /usr/bin/env python to /usr/bin/pythonX. In the past, there was perceived cost as this would deviate from upstream. Now, however, we will have to maintain patches to convert to using "python2" rather than "python" so we could consider banning /usr/bin/env as well. env is not good in the shebang line for several reasons:
- Will always ignore virtualenv. So scripts run in a virtualenv that use /usr/bin/env will use the system python instead of the virtualenv's python.
- If a sysadmin installs another python interpreter on the path (for instance, in /usr/local) for their use on their systems, that python interpreter may also end up being used by scripts which use /usr/bin/env to find the interpreter. This might break rpm installed scripts.
- python3.4 will bundle a version of pip as get_pip which users of upstream releases can use to bootstrap an updated pip package from pypi. In Fedora we can have python-libs Require: python-pip and use a symlink or something to replace the bundled version
Naming of python modules and subpackages
We have three potential package names:
These can be real toplevel packages (directly made from an srpm name) or a subpackage. There are several reasons that separate packages are better than subpackages:
- It allows the packager to tell when to abandon the python2 version. If they orphan the python2 version and no one picks it up, then it is no longer important enough to anyone to use. With subpackages, the maintainer would remove the python2 version from their spec file. Then they'd get a bug report asking them to put it back in if someone was still using it (or people would stop using Fedora because it was no longer providing the python2 modules that they needed).
- It allows the python2 and python3 packages to develop independently. With subpackages, a bug in one version of the package prevents the build from succeeding in either. This can stop package updates to either version even though the issue only exists in one or the other.
- Spec file is cleaner in that there's no conditionals for disabling python2 or python3 builds
Separate packages have the following drawback:
- A packager that cares about both python2 and python3 has to review and build two separate packages.
- We suspect that with two packages, many python modules will only be built for python2 because no one will care about building the python3 version and it's more extra work.
On first discussing this, we came up with the following plan:
- New packages -- Two separate packages
- Old packages -- grandfathered in but if the reasons make sense to the packager then you can split them into separate packages
After further discussion and deciding to put more weight on wanting to have python3 packages built we decided that we'd stay closer to the current guidelines, proposing slight guidelines changes so that rationale for subpackages vs dual packages is more clear and the two approaches are on a more equal footing.
We decided that even though spec files would get uglier it would make sense to have python-MODULE packages with python2-MODULE and python3-MODULE subpackages. Packages which had separate srpms for these would simply have separately named python2-MODULE and python3-MODULE toplevel packages. The result of this is that users of bugzilla may have a problem in their python2-MODULE install and have to look up both python2-MODULE and python-MODULE in order to find what component to file the bug under. This may cause extra work but it won't be outright confusing (ie: no python3-MODULE will need to file under python2-MODULE or vice versa).
For the subpackages, we can add with_python2 conditionals to make building python2 optional on some Fedora versions. There are currently no Fedora or RHEL versions that would disable python2 module building.
We wondered how we should (or if we should) package modules for pypy. Problems with pypy:
- Realistically if you're using C dependencies you shouldn't be using pypy (pypy doesn't do ref counting natively so it has to be emulated for the CAPI. This can cause problems as bugs in the extension's refcounting may cause problems in the emulation where they would be hidden in C Python.)
- Some of platlib will work using the emulated CAPI.
- The byte compiled files will differ
- At the source level you could share purelib
- python3.2(3?) added a different directory to save the CPython byte compiled files but this won't help with python2
After some tired discussion (this was at the end of the day and end of the discussion) we decided it would be worthwhile to try this:
- Could be worth a try to have it use the system site-packages that python has.
- pypy using the site-package via a symlink in pypy to the system site-packages. We release note it as:
This is a technical preview -- many things may not work and we reserve the right for this to go away in the future. The implementation of how pypy gets access to site-packages may well change in the future.
We also tried to decide whether we only wanted to build up a pypy module stack or if we also wanted to allow applications we ship to use pypy. At first we thought that it might be better not to rely on pypy. But someone brough up the skeinforge package. skeinforge runs 4x faster when it uses pypy than when it uses cpython. (skeinforge slices 3d models for 3d printers to print) So there is a desire to be able to use it.
We tentatively decided that packages should be able to use pypy at maintainer discretion. May need more thought on this to limit it in some way for now (esp. because we may change how pypy site-packages works).
Tangent: SCL - Collections
- Use it to create a parallel stack.
What is the advantage over virtualenv
With virtualenv, to find out what's on your system you have to consult both rpm and pip. SCL can tell you useful information with a single system. If you build SCLs from an existing rpm then you may know more about what rpms are installed. Otherwise you just have a blob but even the blob has useful information:
- You do have knowledge of what files are on the filesystem in the rpm database so that allows
rpm -qfto work
- virtualenv doesn't integrate with people's current tools to deal with rpms (createrepo, yum, etc)
- Better that you have one-one relationship between what's in SCL and system packages (No bundling)