Static Analysis of Python Reference Counts

Summary

I've written a static analysis tool that can detect reference-counting errors made in Python extension modules written in C. We'll run the tool on all such code in Fedora 17 and make an effort to fix as many problems as time allows.

Owner

Name: Dave Malcolm

Email: dmalcolm@redhat.com

Current status

Targeted release: Fedora 17
Last updated: 2012-01-23
Percentage of completion: 30%

The code works, and has found real bugs, but still contains bugs itself. It's only been run on a small subset of the Python code in Fedora.

Major TODO items remaining:

there's a gcc-4.7 incompatibility that will need a couple of days to fix
automate running it on all code
go through the results, fixing the bugs in the checker itself, and reporting/fixing the real bugs that it finds.

Detailed Description

This is the continuation of the "Static Analysis of CPython Extensions" Fedora 16 feature.

Python makes it relatively easy to write wrapper code for C and C++ libraries, acting as a "glue" from which programs can be created.

Unfortunately, such wrapper code must manually manage the reference-counts of objects, and mistakes here can lead to /usr/bin/python leaking memory or segfaulting. There's also plenty of code out there that doesn't check for errors.

In Fedora 16, we shipped an initial version of a static analysis tool I've written (gcc-with-cpychecker), implementing some basic checks.

The latest version of the checker can now detect reference-counting bugs, along with paths through code that doesn't properly handle errors from the Python extension API, and I've already used it to patch some significant memory leaks.

My hope was to integrate this with Fedora's packaging, so that all C extension modules packaged for Python 2 and Python 3 can be guaranteed free of such errors (by adding hooks to the python-devel and python3-devel packages). Unfortunately it's not possible to get the signal:noise ratio good enough in time for Fedora 17 for that.

My plan is to automate running it on all of the C extension modules in Fedora 17, and to analyze the results. Initially bugs would be filed against the tool itself (gcc-python-plugin), and I would then triage them; genuine bugs would be reassigned to the appropriate components, and I'd try to fix the high-value ones, sending fixes upstream. However, this is a large task, and I'm likely to need help from package owners and other Python developers. False positives would thus remain as bugs in the checker itself, and I'd work on fixing them.

This will also benefit PyPy. PyPy has its own implementation of the CPython extension API, and certain bugs in extension code can lead to more severe symptoms with PyPy than with CPython. Specifically, some reference-counting bugs that are harmless on CPython can lead to segfaults of PyPy. So by fixing these kinds of bug, we also help PyPy.

Benefit to Fedora

Fedora is already a great environment for doing Python development - having a good-quality static analysis tool integrated into Fedora's build system for python extension modules will make Fedora even more compelling for Python developers. (Naturally the tool will be Free Software, and thus usable on other platforms; but we'll have it first).

The presence of the tool should also make it easier to fix certain awkward bugs, and make it easier to support secondary CPU architectures.

Scope

This involves:

writing the tool
ensuring that it works well on historical bugs (examples of real bugs that are now fixed)
tuning it to achieve a good signal:noise ratio:
- testing it on everything in Fedora:
  - analyzing the issues that it reports
  - fixing bugs in the tool
  - fixing bugs in the software-under-test
  - generating a test suite for the tool
integrating it into the python 2 and python 3 build of Fedora RPMs (python-devel and python3-devel)
ensuring that it does not substantially increase the time it takes to build the software-under-test
- the selftest suite for the tool will need a performance component; we also need to be careful how we integrate it into Fedora's build system

The bugs I intend for the tool to detect are:

ob_refcnt errors: missing Py_INCREF/Py_DECREF etc
tp_traverse errors (which can mess up the garbage collector); missing it altogether, or omitting fields
errors in PyArg_ParseTuple and friends (often leads to flaws on big-endian 64-bit architectures)

There are two approaches to integrating it:

"all in": turning it on by default, by adding the relevant compilation flags to sysconfig/distutils: -fplugin=python2 -fplugin-arg-python2-script=PATH_TO_/cpychecker.py so that all compilation using python-devel and python3-devel uses it, and providing flags to turn it off for when it's problematic.

"gcc-with-cpychecker": package it, leaving it optional, providing a /usr/bin/gcc-with-cpychecker wrapper script, to be invoked in place of gcc, so that people can opt in to using it.

In both cases, I plan to run all of the C Python extension code in Fedora 16 through it.

How To Test

Exactly how to test will depend on which of the two approaches we go with (see "Scope" above)

Try to compile C Python extension code.

I'll provide an example of buggy extension code within the documentation part of the package, to make it easy to verify that GCC detects the bugs.

User Experience

Non-technical end-users of Fedora should see no difference (other than more a robust operating system).

Python users/developers should see additional warnings/errors when building Python extension modules that contain bugs. The exact experience will depend on how much we can be sure that an issue is a real problem; we don't want to impact the ability for people to do automated buildouts from PyPI.

For examples of the output from the checker, see: http://dmalcolm.livejournal.com/6560.html

Dependencies

I'm planning to do this via a GCC plugin that embeds Python, so that I can write the checker in Python itself.

FWIW I also investigated a few other approaches to doing this:

as a patch to LLVM's static analysis tool (packaged as part of llvm.src.rpm)
using sparse
using CIL (see e.g. the work we did to detect errors in libvirt).
using Coccinelle, like my experiment on PyArg_ParseTuple from November 2009
using a Python library to parse C, e.g. pycparser or pyclibrary

Contingency Plan

There can be various levels of fallback:

the ability to set a flag in an rpm specfile that turns off testing for this rpm build
the ability to set a variable in the environment to suppress testing (perhaps this is the other way around: the extra tests are only run when a value is set)
(worst case) fully removing the testing hooks from python-devel and python3-devel if the feature proves problematic and is impeding getting the release out of the door.

I'm not yet sure what the structures of opt-in/opt-out and per-test/per-file/per-build should be.

Documentation

Upstream documentation: http://readthedocs.org/docs/gcc-python-plugin/en/latest/cpychecker.html

Release Notes

Fedora now ships with a gcc-with-cpychecker variant of GCC, which adds additional compile-time checks to Python extension modules written in C, detecting various common problems (e.g. reference counting mistakes). This variant is itself written in Python.

Comments and Discussion

See Talk:Features/StaticAnalysisOfCPythonExtensions

Search

Features/StaticAnalysisOfPythonRefcounts

Contents